ENH: support agent-specific product availability

- New availability field in product_data, similar to product-specific demographics. - Multiplies exponentiated probabilities. - Typically 0s or 1s, but can be other numbers to model known probabilities of availability that differ by demographic.
jeffgortmaker · Jun 27, 2023 · 5d4a71f · 5d4a71f
1 parent 0a592fd
commit 5d4a71f
Show file tree

Hide file tree

Showing 10 changed files with 150 additions and 19 deletions.
diff --git a/README.rst b/README.rst
@@ -117,6 +117,7 @@ Features
 - Multiple equation GMM
 - Demographic interactions
 - Product-specific demographics
+- Consumer-specific product availability
 - Flexible micro moments that can match statistics based on survey data
 - Support for micro moments based on second choice data
 - Support for optimal micro moments that match micro data scores

diff --git a/docs/notation.rst b/docs/notation.rst
@@ -89,6 +89,7 @@ Symbol                                                 Dimensions
 :math:`\Gamma`                                         :math:`J_t \times J_t`              Another matrix used to decompose :math:`\eta` and :math:`\zeta` in market :math:`t`
 :math:`d`                                              :math:`I_t \times D`                Observed agent characteristics called demographics in market :math:`t`
 :math:`\nu`                                            :math:`I_t \times K_2`              Unobserved agent characteristics called integration nodes in market :math:`t`
+:math:`a`                                              :math:`I_t \times J_t`              Agent-specific product availability in market :math:`t`
 :math:`w`                                              :math:`I_t \times 1`                Integration weights in market :math:`t`
 :math:`\delta`                                         :math:`N \times 1`                  Mean utility
 :math:`\mu`                                            :math:`J_t \times I_t`              Agent-specific portion of utility in market :math:`t`

diff --git a/pyblp/economies/problem.py b/pyblp/economies/problem.py
@@ -1385,8 +1385,8 @@ class Problem(ProblemEconomy):
               same ID within a market.
 
         Along with ``market_ids`` and ``agent_ids``, the names of any additional fields can be typically be used as
-        variables in ``agent_formulation``. The exception is the name ``'demographics'``, which is reserved for use by
-        :class:`Agents`.
+        variables in ``agent_formulation``. Exceptions are the names ``'demographics'`` and ``'availability'``, which
+        are reserved for use by :class:`Agents`.
 
         In addition to standard demographic variables :math:`d_{it}`, it is also possible to specify product-specific
         demographics :math:`d_{ijt}`. A typical example is geographic distance of agent :math:`i` from product
@@ -1397,6 +1397,25 @@ class Problem(ProblemEconomy):
         the market, as ordered in ``product_data``. The last index should be the number of products in the largest
         market, minus one. For markets with fewer products than this maximum number, latter columns will be ignored.
 
+        Finally, by default each agent :math:`i` in market :math:`t` is faced with the same choice set of product
+        :math:`j`, but it is possible to specify agent-specific availability :math:`a_{ijt}` much in the same way that
+        product-specific demographics are specified. To do so, the following field can be specified:
+
+            - **availability** : (`numeric, optional`) - Agent-specific product availability, :math:`a`. Choice
+              probabilities in :eq:`probabilities` are modified according to
+
+              .. math:: s_{ijt} = \frac{a_{ijt} \exp V_{ijt}}{1 + \sum_{k \in J_t} a_{ijt} \exp V_{ikt}},
+
+              and similarly for the nested logit model and consumer surplus calculations. By default, all
+              :math:`a_{ijt} = 1`. To have a product :math:`j` be unavailable to agent :math:`i`, set
+              :math:`a_{ijt} = 0`.
+
+              Agent-specific availability is specified in the same way that product-specific demographics are specified.
+              In ``agent_data``, one can include ``'availability0'``, ``'availability1'``, ``'availability2'``, and so
+              on, where the index corresponds to the order in which products appear within market in ``product_data``.
+              The last index should be the number of products in the largest market, minus one. For markets with fewer
+              products than this maximum number, latter columns will be ignored.
+
     integration : `Integration, optional`
         :class:`Integration` configuration for how to build nodes and weights for integration over agent choice
         probabilities, which will replace any ``nodes`` and ``weights`` fields in ``agent_data``. This configuration is

diff --git a/pyblp/economies/simulation.py b/pyblp/economies/simulation.py
@@ -187,6 +187,25 @@ class Simulation(Economy):
         the market, as ordered in ``product_data``. The last index should be the number of products in the largest
         market, minus one. For markets with fewer products than this maximum number, latter columns will be ignored.
 
+        Finally, by default each agent :math:`i` in market :math:`t` is faced with the same choice set of product
+        :math:`j`, but it is possible to specify agent-specific availability :math:`a_{ijt}` much in the same way that
+        product-specific demographics are specified. To do so, the following field can be specified:
+
+            - **availability** : (`numeric, optional`) - Agent-specific product availability, :math:`a`. Choice
+              probabilities in :eq:`probabilities` are modified according to
+
+              .. math:: s_{ijt} = \frac{a_{ijt} \exp V_{ijt}}{1 + \sum_{k \in J_t} a_{ijt} \exp V_{ikt}},
+
+              and similarly for the nested logit model and consumer surplus calculations. By default, all
+              :math:`a_{ijt} = 1`. To have a product :math:`j` be unavailable to agent :math:`i`, set
+              :math:`a_{ijt} = 0`.
+
+              Agent-specific availability is specified in the same way that product-specific demographics are specified.
+              In ``agent_data``, one can include ``'availability0'``, ``'availability1'``, ``'availability2'``, and so
+              on, where the index corresponds to the order in which products appear within market in ``product_data``.
+              The last index should be the number of products in the largest market, minus one. For markets with fewer
+              products than this maximum number, latter columns will be ignored.
+
     integration : `Integration, optional`
         :class:`Integration` configuration for how to build nodes and weights for integration over agent choice
         probabilities, which will replace any ``nodes`` and ``weights`` fields in ``agent_data``. This configuration is
@@ -459,12 +478,13 @@ def __init__(
                 if not isinstance(integration, Integration):
                     raise ValueError("integration must be None or an Integration instance.")
                 agent_market_ids, nodes, weights = integration._build_many(products.X2.shape[1], np.unique(market_ids))
-                agent_ids = None
+                agent_ids = availability = None
             elif agent_data is not None:
                 agent_market_ids = extract_matrix(agent_data, 'market_ids')
                 agent_ids = extract_matrix(agent_data, 'agent_ids')
                 nodes = extract_matrix(agent_data, 'nodes')
                 weights = extract_matrix(agent_data, 'weights')
+                availability = extract_matrix(agent_data, 'availability')
             else:
                 raise ValueError("At least one of agent_data or integration must be specified.")
 
@@ -473,7 +493,8 @@ def __init__(
                 'market_ids': (agent_market_ids, np.object_),
                 'agent_ids': (agent_ids, np.object_),
                 'nodes': (nodes, options.dtype),
-                'weights': (weights, options.dtype)
+                'weights': (weights, options.dtype),
+                'availability': (availability, options.dtype),
             }
             if agent_formulation is not None:
                 for name in sorted(agent_formulation._names - set(agent_mapping)):

diff --git a/pyblp/markets/economy_results_market.py b/pyblp/markets/economy_results_market.py
@@ -377,6 +377,10 @@ def safely_compute_consumer_surplus(
         exp_utilities = np.exp(utilities - utility_reduction)
         scale_weights = 1
 
+        # optionally adjust for agent-specific product availability
+        if self.agents.availability.size > 0:
+            exp_utilities *= self.agents.availability.T
+
         # eliminate any products from the choice set
         if eliminate_product_ids is not None:
             for j, product_id in enumerate(self.products.product_ids[:, product_ids_index]):

diff --git a/pyblp/markets/market.py b/pyblp/markets/market.py
@@ -76,7 +76,7 @@ def __init__(
             self.products = update_matrices(self.products, products_update_mapping)
 
         # fill missing columns of integration nodes (associated with zeros in sigma) with zeros and drop extra
-        #   product-specific demographic values for product indices not in this market
+        #   product-specific demographic/agent-specific product availability values for products not in this market
         agents_update_mapping: Dict[str, Tuple[Optional[Array], Any]] = {}
         if self.agents.nodes.shape[1] != economy.K2 and not parameters.nonzero_sigma_index.all():
             nodes = np.zeros((self.agents.shape[0], economy.K2), self.agents.nodes.dtype)
@@ -85,6 +85,9 @@ def __init__(
         if len(self.agents.demographics.shape) == 3:
             demographics = self.agents.demographics[..., :self.products.size]
             agents_update_mapping['demographics'] = (demographics, demographics.dtype)
+        if self.agents.availability.size > 0:
+            availability = self.agents.availability[..., :self.products.size]
+            agents_update_mapping['availability'] = (availability, availability.dtype)
         if agents_update_mapping:
             self.agents = update_matrices(self.agents, agents_update_mapping)
 
@@ -331,8 +334,8 @@ def compute_probabilities(
             self, delta: Array = None, mu: Optional[Array] = None, linear: bool = True, safe: bool = True,
             utility_reduction: Optional[Array] = None, numerator: Optional[Array] = None,
             eliminate_outside: bool = False, eliminate_product: Optional[int] = None,
-            eliminate_product_id: Optional[Any] = None, product_ids_index: Optional[int] = None) -> (
-            Tuple[Array, Optional[Array]]):
+            eliminate_product_id: Optional[Any] = None, product_ids_index: Optional[int] = None,
+            availability: Optional[Array] = None) -> Tuple[Array, Optional[Array]]:
         """Compute choice probabilities. By default, use unchanged delta and mu values. If linear is False, delta and mu
         must be specified and already be exponentiated. If safe is True, scale the logit equation by the exponential of
         negative the maximum utility for each agent, and if utility_reduction is specified, it should be values that
@@ -384,6 +387,12 @@ def compute_probabilities(
         if eliminate_outside:
             scale = 0
 
+        # optionally adjust for agent-specific product availability
+        if availability is None and self.agents.availability.size > 0:
+            availability = self.agents.availability
+        if availability is not None:
+            exp_utilities *= availability.T
+
         # optionally eliminate a product from the choice set
         if eliminate_product is not None:
             exp_utilities[eliminate_product] = 0
@@ -1023,11 +1032,21 @@ def compute_probabilities_by_parameter_tangent(
             # compute the tangent of marginal probabilities with respect to the parameter (re-scale for robustness)
             utility_reduction = np.clip(utilities.max(axis=0, keepdims=True), 0, None)
             with np.errstate(divide='ignore', invalid='ignore'):
+                exp_utilities = np.exp(utilities - utility_reduction)
+
+                # hand agent-specific product availability
+                if self.agents.availability.size > 0:
+                    availability = self.agents.availability
+                    if agent_indices is not None:
+                        availability = availability[agent_indices]
+                    exp_utilities *= availability.T
+
                 B = marginals * (
                     A_sums * (1 - self.group_rho) -
-                    (np.log(self.groups.sum(np.exp(utilities - utility_reduction))) + utility_reduction)
+                    (np.log(self.groups.sum(exp_utilities)) + utility_reduction)
                 )
                 marginals_tangent = group_associations * B - marginals * (group_associations.T @ B)
+
             marginals_tangent[~np.isfinite(marginals_tangent)] = 0
 
         else:
@@ -1421,8 +1440,11 @@ def compute_micro_dataset_contributions(
             delta = self.delta
 
         mu = None
+        availability = None
         if agent_indices is not None:
             mu = self.mu[:, agent_indices]
+            if self.agents.availability.size > 0:
+                availability = self.agents.availability[agent_indices]
 
         # pre-compute and validate micro dataset weights, multiplying these with probabilities and using these to
         #   compute micro value denominators
@@ -1454,7 +1476,7 @@ def compute_micro_dataset_contributions(
 
             # pre-compute probabilities
             if probabilities is None:
-                probabilities, _ = self.compute_probabilities(delta, mu)
+                probabilities, _ = self.compute_probabilities(delta, mu, availability=availability)
 
             # pre-compute outside probabilities
             need_outside_probabilities = len(weights.shape) == 2 and weights.shape[1] == 1 + self.J
@@ -1485,7 +1507,7 @@ def compute_micro_dataset_contributions(
                         # re-compute probabilities if there is nesting or there was a numerical error
                         if eliminated_probabilities_j is None or not np.isfinite(eliminated_probabilities_j).all():
                             eliminated_probabilities_j, eliminated_conditionals_j = self.compute_probabilities(
-                                delta, mu, eliminate_product=j
+                                delta, mu, eliminate_product=j, availability=availability
                             )
 
                         eliminated_probabilities_list.append(eliminated_probabilities_j)
@@ -1510,7 +1532,8 @@ def compute_micro_dataset_contributions(
                         # re-compute probabilities if there is nesting or there was a numerical error
                         if eliminated_probabilities_j is None or not np.isfinite(eliminated_probabilities_j).all():
                             eliminated_probabilities_j, eliminated_conditionals_j = self.compute_probabilities(
-                                delta, mu, eliminate_product_id=product_id, product_ids_index=ids_index
+                                delta, mu, eliminate_product_id=product_id, product_ids_index=ids_index,
+                                availability=availability
                             )
 
                         eliminated_probabilities_list.append(eliminated_probabilities_j)
@@ -1602,7 +1625,7 @@ def compute_micro_dataset_contributions(
                 # re-compute probabilities if there is nesting or there was a numerical error
                 if outside_eliminated_probabilities is None or not np.isfinite(outside_eliminated_probabilities).all():
                     outside_eliminated_probabilities, outside_eliminated_conditionals = self.compute_probabilities(
-                        delta, mu, eliminate_outside=True
+                        delta, mu, eliminate_outside=True, availability=availability
                     )
 
                 if compute_jacobians: