# Problem Simulation Tutorial

In [10]:
import pyblp
import numpy as np
import pandas as pd

pyblp.options.digits = 2
pyblp.options.verbose = False
pyblp.__version__

'1.1.2'

Before configuring and solving a problem with real data, it may be a good idea to perform Monte Carlo analysis on simulated data to verify that it is possible to accurately estimate model parameters. For example, before configuring and solving the example problems in the prior tutorials, it may have been a good idea to simulate data according to the assumed models of supply and demand. During such Monte Carlo anaysis, the data would only be used to determine sample sizes and perhaps to choose reasonable true parameters.

Simulations are configured with the :class:`Simulation` class, which requires many of the same inputs as :class:`Problem`. The two main differences are:

1. Variables in formulations that cannot be loaded from `product_data` or `agent_data` will be drawn from independent uniform distributions.
2. True parameters and the distribution of product unobservables are specified.

First, we'll use :func:`build_id_data` to build market and firm IDs for a model in which there are $T = 50$ markets, and in each market $t$, a total of $J_t = 20$ products produced by $F = 10$ firms.

In [11]:
id_data = pyblp.build_id_data(T=600, J=4, F=4)
id_data

rec.array([([0], [0]), ([0], [1]), ([0], [2]), ..., ([599], [1]),
           ([599], [2]), ([599], [3])],
          dtype=[('market_ids', 'O', (1,)), ('firm_ids', 'O', (1,))])

Next, we'll create an :class:`Integration` configuration to build agent data according to a Gauss-Hermite product rule that exactly integrates polynomials of degree $2 \times 9 - 1 = 17$ or less.

In [12]:
integration = pyblp.Integration('product', 9)
integration

Configured to construct nodes and weights according to the level-9 Gauss-Hermite product rule with options {}.

We'll then pass these data to :class:`Simulation`. We'll use :class:`Formulation` configurations to create an $X_1$ that consists of a constant, prices, and an exogenous characteristic; an $X_2$ that consists only of the same exogenous characteristic; and an $X_3$ that consists of the common exogenous characteristic and a cost-shifter.

In [13]:
simulation = pyblp.Simulation(
   product_formulations=(
       pyblp.Formulation('0 + prices + x + satellite + wired'),
       pyblp.Formulation('0 + satellite + wired'),
       pyblp.Formulation('1 + w')
   ),
   beta=[-2, 1, 4, 4],
   sigma=np.eye(2),
   gamma=[1/2, 1/4],
   product_data=id_data,
   integration=integration,
   seed=1995
)
simulation

Dimensions:
 T    N     F     I     K1    K2    K3 
---  ----  ---  -----  ----  ----  ----
600  2400   4   48600   4     2     2  

Formulations:
        Column Indices:              0        1        2        3  
-------------------------------  ---------  -----  ---------  -----
  X1: Linear Characteristics      prices      x    satellite  wired
 X2: Nonlinear Characteristics   satellite  wired                  
X3: Linear Cost Characteristics      1        w                    

Nonlinear Coefficient True Values:
 Sigma:    satellite   wired  
---------  ---------  --------
satellite  +1.0E+00           
  wired    +0.0E+00   +1.0E+00

Beta True Values:
 prices      x      satellite   wired  
--------  --------  ---------  --------
-2.0E+00  +1.0E+00  +4.0E+00   +4.0E+00

Gamma True Values:
   1         w    
--------  --------
+5.0E-01  +2.5E-01

When :class:`Simulation` is initialized, it constructs :attr:`Simulation.agent_data` and simulates :attr:`Simulation.product_data`.

The :class:`Simulation` can be further configured with other arguments that determine how product unobservables are simulated and how marginal costs are specified.

At this stage, simulated variables are not consistent with true parameters, so we still need to solve the simulation with :meth:`Simulation.replace_endogenous`. This method replaced simulated prices and market shares with values that are consistent with the true parameters. Just like :meth:`ProblemResults.compute_prices`, to do so it iterates over the $\zeta$-markup equation from :ref:`references:Morrow and Skerlos (2011)`.

In [14]:
simulation_results = simulation.replace_endogenous()
simulation_results

Simulation Results Summary:
Computation  Fixed Point  Fixed Point  Contraction  Profit Gradients  Profit Hessians  Profit Hessians
   Time       Failures    Iterations   Evaluations      Max Norm      Min Eigenvalue   Max Eigenvalue 
-----------  -----------  -----------  -----------  ----------------  ---------------  ---------------
 00:00:02         0          15713        15713         +4.5E-13         -1.5E+00         -1.2E-03    

Now, we can try to recover the true parameters by creating and solving a :class:`Problem`. 

The convenience method :meth:`SimulationResults.to_problem` constructs some basic "sums of characteristics" BLP instruments that are functions of all exogenous numerical variables in the problem. In this example, excluded demand-side instruments are the cost-shifter `z` and traditional BLP instruments constructed from `x`. Excluded supply-side instruments are traditional BLP instruments constructed from `x` and `z`.

In [15]:
problem = simulation_results.to_problem()

In [None]:
X1_names = sorted(simulation.product_formulations[0]._names - {'prices'})
J_variation = any(indices.size < simulation._max_J for indices in simulation._product_market_indices.values())
demand_labels_full = [f'{name}_same_firm_sum' for name in X1_names] + [f'{name}_rival_sum' for name in X1_names]
manual_demand = np.zeros((simulation.N, 0), pyblp.options.dtype)
demand_labels = []
if X1_names:
    demand_formula = ' + '.join(['1' if J_variation else '0'] + X1_names)
    demand_raw = pyblp.build_blp_instruments(pyblp.Formulation(demand_formula), simulation.product_data)
    demand_mask = (demand_raw != demand_raw[0]).any(axis=0)
    manual_demand = demand_raw[:, demand_mask]
    demand_labels = [label for label, keep in zip(demand_labels_full, demand_mask) if keep]
manual_ZD = manual_demand
supply_shifter_labels = []
if simulation.product_formulations[2] is not None:
    X3_names = simulation.product_formulations[2]._names - {'shares'}
    supply_shifter_names = sorted(X3_names - set(X1_names))
    if supply_shifter_names:
        supply_shifter_formula = ' + '.join(['0'] + supply_shifter_names)
        supply_shifters = pyblp.build_matrix(pyblp.Formulation(supply_shifter_formula), simulation.product_data)
        manual_ZD = np.c_[manual_ZD, supply_shifters]
        supply_shifter_labels = [f'{name}_supply_shifter' for name in supply_shifter_names]
manual_ZD = manual_ZD.astype(pyblp.options.dtype, copy=False)
instrument_labels = demand_labels + supply_shifter_labels
pd.DataFrame(problem.products.ZD, columns=instrument_labels).head()

Unnamed: 0,satellite_same_firm_sum,wired_same_firm_sum,x_same_firm_sum,satellite_rival_sum,wired_rival_sum,x_rival_sum,w_supply_shifter
0,1.33027,1.20997,1.409702,0.575855,0.752982,0.619867,0.903278
1,1.923801,1.980612,1.885988,0.026764,0.276697,0.026336,0.132635
2,1.502239,1.145213,1.376928,0.190481,0.785756,0.447898,0.968035
3,1.094101,2.003948,1.815435,0.398059,0.347249,0.856036,0.1093
4,0.810499,1.781592,2.232317,0.603672,0.328896,0.675464,0.263395


In [None]:
pd.DataFrame(manual_ZD, columns=instrument_labels).head()

ValueError: Shape of passed values is (2400, 4), indices imply (2400, 7)

In [None]:
np.allclose(manual_ZD, problem.products.ZD), np.abs(manual_ZD - problem.products.ZD).max()

We'll choose starting values that are half the true parameters so that the optimization routine has to do some work. Note that since we're jointly estimating the supply side, we need to provide an initial value for the linear coefficient on prices because this parameter cannot be concentrated out of the problem (unlike linear coefficients on exogenous characteristics).

In [15]:
results = problem.solve(
    sigma=0.5 * simulation.sigma, 
    beta=[0.5 * simulation.beta[0, 0], None, None, None],
    optimization=pyblp.Optimization('l-bfgs-b', {'gtol': 1e-5})
)
results

Problem Results Summary:
GMM   Objective    Projected    Reduced Hessian  Reduced Hessian  Clipped  Weighting Matrix  Covariance Matrix
Step    Value    Gradient Norm  Min Eigenvalue   Max Eigenvalue   Shares   Condition Number  Condition Number 
----  ---------  -------------  ---------------  ---------------  -------  ----------------  -----------------
 2    +1.4E+01     +1.3E-05        +2.1E+01         +8.5E+02         0         +1.2E+03          +4.0E+03     

Cumulative Statistics:
Computation  Optimizer  Optimization   Objective   Fixed Point  Contraction
   Time      Converged   Iterations   Evaluations  Iterations   Evaluations
-----------  ---------  ------------  -----------  -----------  -----------
 00:02:09       Yes          19           26         160757       500504   

Nonlinear Coefficient Estimates (Robust SEs in Parentheses):
 Sigma:    satellite     wired   
---------  ----------  ----------
satellite   +8.3E-01             
           (+2.7E-01)            
     

The parameters seem to have been estimated reasonably well.

In [16]:
np.c_[simulation.beta, results.beta]

array([[-2.        , -1.93527134],
       [ 1.        ,  0.9462575 ],
       [ 4.        ,  3.90220011],
       [ 4.        ,  3.82957091]])

In [17]:
np.c_[simulation.gamma, results.gamma]

array([[0.5       , 0.4704242 ],
       [0.25      , 0.26426238]])

In [18]:
np.c_[simulation.sigma, results.sigma]

array([[1.        , 0.        , 0.82865619, 0.        ],
       [0.        , 1.        , 0.        , 0.8427668 ]])

In addition to checking that the configuration for a model based on actual data makes sense, the :class:`Simulation` class can also be a helpful tool for better understanding under what general conditions BLP models can be accurately estimated. Simulations are also used extensively in pyblp's test suite.