# Micro Dataset Example

In [1]:
import pyblp

pyblp.__version__

'0.13.0'

In this example, we'll build a configuration for a micro dataset. Micro datasets, typically surveys, are independent sources of information that typically relate individual purchases to consumer demographics. For background and notation involving micro moments, see :ref:`background:Micro Moments`.

Configuring a micro dataset does not require access to the full micro data. It simply specifies metadata about that dataset that will be used for estimation and for outputting information during estimation. These metadata include a unique name for the dataset indexed by $d$, the number of observations $N_d$, a function that defines survey weights $w_{dijt}$, and if relevant, a subset of markets from which the micro data was sampled.

First, we'll define a configuration for the micro dataset used by :ref:`references:Petrin (2002)`, which is also used in the corresponding [tutorial](petrin.ipynb).

In [4]:
micro_dataset = pyblp.MicroDataset(
    name="CEX", 
    observations=29125, 
    compute_weights=lambda t, p, a: np.ones((a.size, 1 + p.size)),
)
micro_dataset

CEX: 29125 Observations in All Markets

We called the dataset "CEX", defined the number of observations in it, and also defined a lambda function for computing survey weights in a market. Since we did not specify `market_ids`, we are assuming that the underlying micro data were sampled from all markets in the product data.

The `compute_weights` function has three arguments: the current market's ID $t$, the $J_t$ :class:`Products` inside the market, and the $I_t$ :class:`Agents` inside the market. In this case, we are assuming that each product and agent/consumer type are sampled with equal probability, so we simply return a matrix of ones of shape $I_t \times (1 + J_t)$. This sets each $w_{dijt} = 1$.

By using $1 + J_t$ instead of $J_t$, we are specifying that the micro dataset contains observations of the outside option $j = 0$. If we instead specified a matrix of shape $I_t \times (1 + J_t)$, this would be the same as setting the first column equal to all zeros, so that outside choices are not sampled from.

Lastly, returning an array with two dimensions means that this micro dataset does not contain second choices. If we were to add a third dimension, for example specifying `lambda t, p, a: np.ones((a.size, 1 + p.size, 1 + p.size))`, we would be configuring the micro dataset to also have information about second choices, including for second choices of the outside option.