Douglas Hubbard describes a metric for measuring the value of information in "How To Measure Anything". The idea is that given some decision that needs to be made regarding some random outcome, we can calculate the expected opportunity loss given the odds of our choice not be ideal. The more information we have about the outcome, the more we can minimize our expected Opportunity Loss. In this notebook we provide a general Python class to make this calculation, alongside some example cases.

In [1]:
import numpy as np
import pandas as pd
from tqdm import tqdm
from matplotlib import pyplot as plt


class ExpectedOpportunityLoss:
    """
    Monte Carlo Simulation Class that models expected loss using a list of random value generators and reward functions.
    """

    def __init__(cls, reward_functions, random_inputs, initial_sample=10):
        """
        :param reward_functions: {str: function}
            Deterministic reward function that takes a single dictionary as input, whose keys are a subset of the keys in the random_inputs passed.
        :param random_value_generators: {str: random_generator}
            dictionary of random number generators. These are functions that take a single integer input n and return an array of n randomly 
        """

        assert isinstance(reward_functions, dict), "reward_functions must be a dictionary"
        assert isinstance(random_inputs, dict), "random_value_generators must be a dictionary"
        cls.reward_functions = reward_functions
        cls.random_inputs = random_inputs # TODO: Allow for dependence of variables in the ramdom inputs.
        cls.options = reward_functions.keys()
        cls.data = pd.DataFrame(columns = list(cls.random_inputs.keys()) + list(cls.reward_functions.keys()) + ["optimal_decision"])

    def add_data_points(self,n):    
        new_data = pd.DataFrame({key: random_input(n) for key, random_input in self.random_inputs.items()})
        for key,reward_function in self.reward_functions.items():
            new_data[key] = new_data.apply(reward_function, axis=1)
        new_data["optimal_decision"] = new_data[self.options].apply(max, axis=1)
        self.data = pd.concat([self.data,new_data], ignore_index=True)

    def _generate_single_metric(self, key):
        """
        Generates the expected_loss and std_deviation for a single option
        """
        distribution = list(self.data["optimal_decision"] - self.data[key])
        return {"expected_loss": np.mean(distribution),
                "std_dev": np.std(distribution)}
        
    def generate_metrics(self):
        """
        Generates the metrics table.
        """
        self.metrics = pd.DataFrame({
            key: self._generate_single_metric(key) for key in self.options})
        return self.metrics.T.sort_values("expected_loss", ascending=True)

# Trump Endorsement
The above is the general class that performs all the needed calculations. Let's now assume a simple real life example. Suppose I believe that, next year, Trump will get elected with probability 70%. I can endorse his campaign for 2 million. If I don't endorse his campaign, I just get zero reward. If I endorse him and he loses, I lose the 2 million, whereas if I endorse him and he wins, I get 3 million back for an overal profit of 1 million. What is my Expected Opportunity Loss for each case?

In [2]:
def reward_if_endorse_trump(obs):
    if obs["TrumpWin"]==1:
        return 1 # Get 1 million overall if Trump wins
    else:
        return -2 # Lose 2 million if Trump loses

TrumpEOLAnalysis = ExpectedOpportunityLoss(
    reward_functions = {
        "Endorse Trump": reward_if_endorse_trump,
        "Don't Endorse Trump": lambda obs: 0 # Known profit of 0 if no endorsment happens
    },
    random_inputs = {
        "TrumpWin": lambda n: np.random.binomial(size=n, n=1, p= 0.5)
    }
)

TrumpEOLAnalysis.add_data_points(10000)
TrumpEOLAnalysis.generate_metrics()

Unnamed: 0,expected_loss,std_dev
Don't Endorse Trump,0.5009,0.499999
Endorse Trump,0.9982,0.999998


# Distribution Center
Now let us try a much harder example. Say we manage a distribution center and on any given day we distribute packages. We receive a random number of packages a day (Binomially distributed: we have thousands of people in the world, each person sends a package each day with probability p, and we can assume independence between different people for the time being).
We can hire any number of trucks for the distribution, and our delivering capabilities is the number of trucks times the truck capacity.
Our profit is given by the number of packages delivered minus the cost of truck mantenaince. Assume all the calculations below are done per truck
For practical reasons let's say we know we'll never get more than 1000 trucks

In [3]:
truck_capacity = 1000 # A truck can carry 1000 packages a day
options_list = [x for x in range(15) if x>=5] # We already have 5 trucks, and we can buy up to 10 more
from functools import partial

def reward_function(obs,number_of_trucks):
  """
  The profits is the number of packages delivered times the profit per package
  The costs relate to the truck maintenance
  """
  return min([obs["number_of_packages"],number_of_trucks*truck_capacity]) * obs["profit_per_package"] - number_of_trucks * obs["maintenance_cost_per_truck"]

DistributionCenterAnalysis = ExpectedOpportunityLoss(
    reward_functions = {
        f"{key} trucks": partial(reward_function, number_of_trucks=key) for key in options_list
    },
    random_inputs = {
        "number_of_packages" : lambda n : np.random.binomial(size=n, n=10000000, p = 0.001), # There's 10 million people, each of which with .1 % chance of sending a package that day
        "profit_per_package" : lambda n: np.random.exponential(size=n, scale=3), # Packages costs are exponentially distributed, with average 3 euros per package
        "maintenance_cost_per_truck" : lambda n : np.random.uniform(50,500, size=n) # Maintenance of trucks has a fixed cost of 50€ per truck (man work) + extra if replacements are needed
    }
)

DistributionCenterAnalysis.add_data_points(10000)
DistributionCenterAnalysis.generate_metrics()

Unnamed: 0,expected_loss,std_dev
10 trucks,128.573143,360.450212
11 trucks,287.323942,372.28878
12 trucks,562.893613,446.899219
13 trucks,838.463285,542.699076
14 trucks,1114.032956,650.391467
9 trucks,2729.005534,2968.68826
8 trucks,5450.786369,5986.762011
7 trucks,8172.567204,9017.876337
6 trucks,10894.348039,12052.196179
5 trucks,13616.128874,15087.787668
