<a href="https://colab.research.google.com/github/pmontman/tmp_choicemodels/blob/main/nb/WK_07_panel_data_mixed_logit.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Panel data and the Mixed logit model


We will introduce two topics
 * Panel Data
 * The mixed logit model

Very briefly: Panel data are **data that involves repeated measures over time**. In the case of choice modelling, it means observing the decision makers on repeated decisions. The mixed logit model is another extension of the logit family that involves **random coefficients**, when we consider that the Betas vary among individuals and are randomly distributed among a population.

First of all, these are two different ideas so they should not be confused.
Confusion might appear because traditionally the mixed logit model is applied when we have panel data, so they often appear together.

---
---


#Panel data
Panel data is an econometrics term that is also called repeated observations.
Very often, we have a sample of individuals and we track their choices through time.

Examples:
 * In the first lecture, we describe an example of tracking the purchases of coffee that households make in a supermarket over a period of one year (using barcode scanner). Households being the decision making 'individuals', we have data about the decisions (brand of coffe purchased) of each individual over time, and the individuals are identified (we have a variable that identifies which indidivual made the purchase).

 * We can track stock purchases in portfolio management: each day, each individual is faced with the choice among different stocks (each stock is an alternative), and the price of the stock (attributes) changes each day. (Example from Bierlare)

 * In a survey, we could ask the  participants to choose considering hypothetical choice situations, by changing the attributes of the alternatives. For example, we can ask participants about smartphones when changing screen sizes, cost, etc. in a marketing research survey. In this case, we would get repeated choices for each individual.



While panel data can be often considered as a time series for each individual, it is a more general concept. For example, in a panel we could have a different number of observations for each individual (some individuals are not properly tracked through time) or the actual time dimension is not that relevant. A more general term is 'repeated measures' coming from Statistics.

 Finally, the respective term when we have data for the individuals at one point in time is **cross-sectional** data.

 ---
 ---

#Types of panel data
When we have measurements over time, we can start talking about the effect of factors that changeove time. This can split the panel data into two broad categoies:
 * Static panels
 * Dynamic panels

We introduce some new mathematical notation:

 $$U_{jit} = V_{jit} + \varepsilon_{jit}$$

with $j$ representing the alternative, $i$ the individual and $t$ the instant of time at which the decision is made. We can read this a the utility that the individual $i$ received from alternative $j$ at time $t$.
 As we have seen before, depending on what we assumen of the $\varepsilon$s, we can reach different models.
 
Remember, we often assume that the $\varepsilon$s are independent across individuals and they follow the extreme value distribution, and then
* If the $\varepsilon$s are indepent across alternatives we would be talking about the logit, if the $\varepsilon$s are dependent across alternatives we could be talking about the nested logit.

* In the case of independence across time,  if the $\varepsilon$s are independent over time we can be talking about static panels, if they are dependent we can be talking about dynamic panels. 

As we have mentioned before, a very imporant point, how we specify a model can change the static/dynamic nature, in the case of time **we can make a dynamic panel become static if we can capture the source of dependence over time.**

---
---

# Panel data: Static panel

When assume that the errors are for each individual are independent over time, a panel data reduces to the examples that we have seen before in other lectures, we treat observations of each individual 'as if' they were coming from different individuals and the modelling reduces to what we have been doing until now (we now have a term for these situations of one sample per individual, cross-sectional data).

---
---

# Panel data: Dynamic panel

When there is dependence over time, we can think of two general sources for this dependence.

 * Dependence of not only the present state, but on past states. In choice modelling, we can think of different effects:
    * Choice fatigue
    * Addiction / familiarity / habit
    * Novelty seeking
    * Learning effects: Know more information about a product when buying it, or through other means (user opinions that accumulate over time).
    * Anticipation: For example, when there is a clear pattern of evolution of prices over time, individuals can consider it when making choices (buy a house in a trending area).

 * Effect of the individual: agent-effect or 'serial correlation'. Because we have repeated observations, it becomes more meaningful to talk about the effects that are related to each specific individual. This is because we can estimate it, as opposed to a cross-section which the estimation would not be reliable.

Again, these two sources is just a way of thinking, depends on what sources of influence we can capture of the underlying process. For example, the size of the agent effect might be reduced if we can capture the variable that drives it. Imagine that gender is an important factor, and we are not capturing it, the we will see that the errors for each individual are correlated, the errors of each individual will be more similar compared to the sample (unless that individual randomly changes gender) .

---
---



# Capturing dynamics: Lagged variables

We can try to capture the dynamics of a choice process in several way.
A common way to capture the dynamics is to think that the utility depends on variables measured at other instants of time.

 For example, the decision of whether to buy a house can depend on the current value, but also on recent past values if a trend is identified. Plainly speaking, two house at the same current price, one of those has been increasing its price for 5 years, the other has been decreasing its price for 5 years, Which one would you buy?

 Another example we can consider: past choices a one of the variables, the past selection of smartphone brand can influence a current choice, for example if we buy a phone of a brand that generates familiarity / aversion to change.

 All these fall in to the umbrella term of 'Lagged variables'

In mathematical notation, we start from the familiar specification of observed component of the utility as a linear combination of some varaibles.
 $$ V_{jit} = \beta X_{jit} $$
 Notice the subindex $t$ represents the value of variables (attributes and characteristics) measured at the specific point it time t.

 We can introduce lagged variables in a very general way by adding another set of coefficients that  

 $$V_{jit} = \beta X_{jit} + \gamma X_{ji(t-1)}$$

 Of course, we might only select a few variables from the past, and we can introduce instants of time $(t-2), (t-3), ...$

 A particularly important lagged variable in choice modelling is past choices, which is quite special because it cannot be measured in the present, it is what we want to predict. Past choices can be introduced into the model as dummy (indicator for each) or in more compact form, for example a binary variable indicating whether it changed or not, the number of times that each was chose in the past and so on.

 **What we expect when introducing lagged variables is that the dependence across time is captured, and then we can treat the model as usual, as a static panel or cross-section.**

 ---
 ---

# Capturing dynamics: Agent effect

A natural idea to consider is 'choice' heterogeneity, or that each individual has its own utility function.


In panel data, because we have repeated observations, we can try to estimate effects of each individual, roughly speaking it would be like estimating a logit for each individual.

The notation:

$$ V_{jit} = \beta X_{jit} + \alpha_{ji} $$

With the $\alpha$ being the simplest form of agent effect, just a constant. Notice the subindices of the $\alpha$, there is no reference to the instant of time. This means that utility has a common part to all individuals, and the individual specific part that remains consant through time.
When we introduce the agent effect, we again expect that the effect thorugh time is captured and we can afterwards treach the model as if it was static / no dependence across time.


Even though we can estimate the agent effects when we have repeated measures, we might still fall into small sample size problems, not enough repreated observations.
 * Fixed effects: The coefficients are fixed and we want to estimate them.
 * Random effects: We consider that the effect of the individuals are random coming from a specific probability distribution. This leads us to the mixed logit model. 

---
---

# Limitations of agent effects

  This seems very reasonable, but it has two problems:
 - 1) It requires a lot of data to estimate, since we have a set of coefficients per individual!
 - 2) What do we do when we want to predict the choices for individuals in a population?

 
Therefore agent effects can be important, but are less general applicable than others parameters of the model. The are relevant when trying to accurately estimate **the other** parameters of the model. 

---
---

# Mixed logit

The mixed logit comes from considering a logit model that has both fixed effects (the usual) and random effects (the one we will introduce here). Ther term comes from the more general term 'mixed model' or 'mixed-effects' model in Statistics.

A fixed effects model is what we have been doing, the paremeters are fixed or deterministic all individuals in the population, and we want to estimate their values.

$$V_{ijt} = \beta X_{jit}$$

A random effects model assumes that there is a beta for each individual

$$V_{ijt} = \beta'_i X_{jit}$$ and what we want to estimate for the $\beta'$,but these are actually unkown and what we can do is **just estimate the distribution** of the $\beta$s.
For example, the mean and variance of the $\beta$s assuming that they are normally distributed.

---
---


# Mixed logit for estimating the agent effect

The agent effect can be linked to both fixed and random effects model by considering dummy indicator variables (a variable for each individual) or individual specific 'constants'.

---
---

# Comparing fixed and random effects
The advantage of the random effects model is that we can get a better estimation
of the coefficients (the true population average) compared to the fixed effects. This of course assuming that we have correctly specified the distribution of the parameters.

The difference becomes especially relevant in panel data, because we might have a different amount of observation for each individual, so we can capture the excessive influence of some individuals that might be overrepresented.

---
---

# Final words

Panel data usal has larger sample sizes which allows us to try more advanced models,relative to cross-section, so there is always the tradeoff vs model complexity, try to capture as much effect as possible vs. new model methodologies such as agent effects or mixed models. In other words, we might treat panel data as cross sectional and try more complex models, with more variable transformations, or add more variables into the model.

For example, just adding a constant agent effect if we have a sample size of 1000 individuals, we would be adding 1000 parameters to the model, so maybe we could be trying a 'static' model with more complex variable transformations. As always, it depends on the context and our interests. We could be interested in prediction, in which case we would go for the more complex model. Or we could be interested in estimating the values of some of the parameters in a simple model as best as possible (for example, insolate the effect of price in a linear model), in which case we could opt for the mixed logit approach.

This discussion is becoming increasingly relevant in modern times, as we get more advanced tools, much more data and the focus from interpretation shifts towards prediction.

---
---