# Simulating Problems

In [1]:
import pyblp
import numpy as np

pyblp.options.digits = 3
pyblp.options.verbose = False
np.set_printoptions(precision=2, threshold=10, linewidth=100)
pyblp.__version__

'0.5.0'

Before configuring and solving a problem with real data, papers such as :ref:`references:Armstrong (2016)` recommend performing Monte Carlo analysis on simulated data to verify that it is possible to accurately estimate model parameters. For example, before configuring and solving the above automobile problem, it may have been a good idea to simulate data according to the assumed models of supply and demand. During such Monte Carlo anaysis, the data would only be used to determine sample sizes and perhaps to choose true parameters that are within reason.

Simulations are configured with the :class:`Simulation` class, which requires much of the same inputs as the :class:`Problem` class. The two main differences are:

1. Variables in formulations that cannot be loaded from `product_data` or `agent_data` will be drawn from independent uniform distributions.
2. True parameters along with the distribution of unobserved product characteristics are both specified.

First, we'll use :func:`build_id_data` to build market and firm IDs for a model in which there are $T = 50$ markets, and in each market $t$, $J_t = 20$ products produced by $F_t = 10$ firms.

In [2]:
id_data = pyblp.build_id_data(T=50, J=20, F=10)

Next, we'll configure :class:`Integration` to build agent data according to a level-`5` Gauss-Hermite product rule.

In [3]:
integration = pyblp.Integration('product', 5)
integration

Configured to construct nodes and weights according to the level-5 Gauss-Hermite product rule.

We'll then pass these data and configuration to :class:`Simulation`.

In [4]:
simulation = pyblp.Simulation(
   product_formulations=(
       pyblp.Formulation('0 + prices + x + y'),
       pyblp.Formulation('0 + y'),
       pyblp.Formulation('0 + x + z')
   ),
   beta=[-5, 1, 1],
   sigma=0.5,
   gamma=[2, 2],
   product_data=id_data,
   agent_formulation=pyblp.Formulation('0 + d'),
   pi=3,
   integration=integration,
   seed=0
)
simulation

Dimensions:
 N     T    K1    K2    K3    D    MD    MS 
----  ---  ----  ----  ----  ---  ----  ----
1000  50    3     1     2     1    7     7  

Formulations:
       Column Indices:           0       1      2  
-----------------------------  ------  -----  -----
 X1: Linear Characteristics    prices    x      y  
X2: Nonlinear Characteristics    y                 
  X3: Cost Characteristics       x       z         
       d: Demographics           d                 

Linear Parameters:
Beta:     prices          x            y     
------  -----------  -----------  -----------
         -5.00E+00    +1.00E+00    +1.00E+00 
Gamma:       x            z                  
------  -----------  -----------             
         +2.00E+00    +2.00E+00              

Nonlinear Parameters:
Sigma:       y       |   Pi:         d     
------  -----------  |  ------  -----------
  y      +5.00E-01   |    y      +3.00E+00 

When :class:`Simulation` is initialized, it constructs agent data and simulates all product data except for prices and shares.

In [5]:
simulation.product_data

rec.array([([0], [0], [0.], [0.], [8.12e-01, 7.15e-01, 1.01e-02, 1.04e+01, 7.78e+00], [5.93e-01, 7.15e-01, 4.76e-01, 1.04e+01, 6.73e+00], [0.55], [0.59], [0.81]),
           ([0], [0], [0.], [0.], [4.76e-01, 5.49e-01, 5.93e-01, 1.04e+01, 7.78e+00], [1.01e-02, 5.49e-01, 8.12e-01, 1.04e+01, 6.73e+00], [0.72], [0.01], [0.48]),
           ([0], [1], [0.], [0.], [5.23e-01, 5.45e-01, 7.09e-01, 1.05e+01, 7.20e+00], [4.76e-01, 5.45e-01, 2.51e-01, 1.05e+01, 7.25e+00], [0.6 ], [0.48], [0.52]),
           ...,
           ([49], [8], [0.], [0.], [3.24e-01, 5.15e-01, 7.21e-01, 9.93e+00, 1.01e+01], [4.80e-01, 5.15e-01, 5.46e-01, 9.93e+00, 8.93e+00], [0.94], [0.48], [0.32]),
           ([49], [9], [0.], [0.], [8.14e-01, 6.77e-01, 5.02e-01, 1.05e+01, 1.01e+01], [6.44e-01, 6.77e-01, 6.97e-01, 1.05e+01, 8.29e+00], [0.23], [0.64], [0.81]),
           ([49], [9], [0.], [0.], [6.97e-01, 2.29e-01, 6.44e-01, 1.05e+01, 1.01e+01], [5.02e-01, 2.29e-01, 8.14e-01, 1.05e+01, 8.29e+00], [0.68], [0.5 ], [0.7 ])],
  

In [6]:
simulation.agent_data

rec.array([([0], [-2.86], [0.01], [0.41]), ([0], [-1.36], [0.22], [0.63]),
           ([0], [ 0.  ], [0.53], [0.78]), ..., ([49], [ 0.  ], [0.53], [0.94]),
           ([49], [ 1.36], [0.22], [0.93]), ([49], [ 2.86], [0.01], [0.01])],
          dtype=[('market_ids', 'O', (1,)), ('nodes', '<f8', (1,)), ('weights', '<f8', (1,)), ('d', '<f8', (1,))])

The excluded instruments in :attr:`Simulation.product_data` include basic instruments computed with :func:`build_blp_instruments` that are functions of all exogenous numerical variables in the problem. In this example, excluded demand-side instruments are four columns of traditional BLP instruments based on `x` and `y`, along with the cost-shifter `z`. Excluded supply-side instruments are traditional BLP instruments based on `x` and `z`, along with `y`.

The :class:`Simulation` can be further configured with other arguments that determine how unobserved product characteristics are simulated and how marginal costs are specified.

Since at this stage prices and shares are all zeros, we still need to solve the simulation with :meth:`Simulation.solve`. This method computes synthetic prices and shares. Just like :meth:`ProblemResults.compute_prices`, to do so it iterates over the $\zeta$-markup equation from :ref:`references:Morrow and Skerlos (2011)`.

In [7]:
simulation_results = simulation.solve()
simulation_results

Simulation Results Summary:
Computation  Fixed Point  Contraction
   Time      Iterations   Evaluations
-----------  -----------  -----------
  0:00:00        732          732    

Now, we can try to recover the true parameters by creating and solving a :class:`Problem`. By default, the convenience method :meth:`SimulationResults.to_problem` uses the same formulations and unobserved agent data as the simulation, so estimation is relatively easy. However, we'll choose starting values that are half the true parameters so that the optimization routine has to do some work.

In [8]:
problem = simulation_results.to_problem()
problem

Dimensions:
 N     T    K1    K2    K3    D    MD    MS 
----  ---  ----  ----  ----  ---  ----  ----
1000  50    3     1     2     1    7     7  

Formulations:
       Column Indices:           0       1      2  
-----------------------------  ------  -----  -----
 X1: Linear Characteristics    prices    x      y  
X2: Nonlinear Characteristics    y                 
  X3: Cost Characteristics       x       z         
       d: Demographics           d                 

In [9]:
problem_results = problem.solve(
   0.5 * simulation.sigma, 
   0.5 * simulation.pi,
)
problem_results

Problem Results Summary:
Cumulative  GMM   Optimization   Objective   Total Fixed Point  Total Contraction  Objective    Gradient   
Total Time  Step   Iterations   Evaluations     Iterations         Evaluations       Value    Infinity Norm
----------  ----  ------------  -----------  -----------------  -----------------  ---------  -------------
 0:00:04     2         5             8             7355               22450        +8.64E+03    +7.74E-03  

Linear Estimates (Robust SEs in Parentheses):
Beta:     prices          x            y     
------  -----------  -----------  -----------
         -4.95E+00    +8.87E-01    +9.21E-01 
        (+5.90E-02)  (+2.06E-01)  (+3.32E+00)
Gamma:       x            z                  
------  -----------  -----------             
         +1.95E+00    +2.10E+00              
        (+1.14E-01)  (+1.15E-01)             

Nonlinear Estimates (Robust SEs in Parentheses):
Sigma:       y       |   Pi:         d     
------  -----------  |  ------  --

In [10]:
simulation.beta

array([[-5.],
       [ 1.],
       [ 1.]])

In [11]:
simulation.gamma

array([[2.],
       [2.]])

In [12]:
simulation.sigma

array([[0.5]])

In [13]:
simulation.pi

array([[3.]])

The parameters seem to have been estimated reasonably well.

In addition to checking that the configuration for a model based on actual data makes sense, the :class:`Simulation` class can also be a helpful tool for better understanding under what general conditions BLP models can be accurately estimated. Simulations are also used extensively in pyblp's test suite.