# 02 Panel Data

![img](https://cdn.dribbble.com/users/502247/screenshots/2446485/more-_-more_2_.gif)

In [None]:
import pandas as pd

from linearmodels import PanelOLS

import statsmodels.api as sm
from statsmodels.iolib.summary2 import summary_col

In [None]:
df = pd.read_csv('data/Hyp_panelperformance.csv')
df.head()

In [None]:
df.drop('panel_id', axis=1).describe().T

In [None]:
df.shape

## First Example

In [None]:
y = df['performance']

In [None]:
X = sm.add_constant(df[['clientactivity', 'overtime', 'seniorityrank', 'collaboration', 'managerrelationship', 'slackactivity', 'externalvisibility']])

In [None]:
# run a regression on our main metric using all variables in our dataset as the independent variables
model = sm.OLS(y, X).fit()

# print the summary
print(model.summary())

## Second Example

In [None]:
X_2 = sm.add_constant(df.drop(['performance', 'panel_id', 'year', 'teamno5'], axis=1))

In [None]:
# run a regression on our main metric using all variables in our dataset as the independent variables
model2 = sm.OLS(y, X_2).fit()

# print the summary
print(model2.summary())

## Panel Data

Panel data is a type of data that comes with two main characteristics that are, is time series and cross-sectional data.

> "Panel data, also known as longitudinal data or cross-sectional time series data in some special cases, is data that is derived from a (usually small) number of observations over time on a (usually large) number of cross-sectional units like individuals, households, firms, or governments." by Mike Moffatt at [thoughtco](https://www.thoughtco.com/panel-data-definition-in-economic-research-1147034)

This is the main formula,  
$Y_{ijt} = α + βX_{ijt} + ϕ_i + θ_j + μ_t + ε_{ijt}$

But this is another way to look at it,

$Y_{👩 👯‍ ⏳} = α + β🏄_{👩 👯‍ ⏳} + 🎬_{👩} + 🎬_{👯‍} + 🎬_{⏳} + ε_{👩 👯‍ ⏳}$


- $i$ or 👩🏽‍ = individual  
- $j$ or 👯‍ = team  
- $t$ or ⏳ = year  
- $Y_{ijt}$ or $Y_{👩 👯‍ ⏳}$ = observed performance of individual i working in team j in year t  
- $α$ = constant  
- $X$ or 🏄🏽‍ = employees’ actions (e.g. client activity, slack activity, external visibility…)  
- $ϕ_i$ = effect of individual  
- $θ_j$ = effect of team  
- $μ_t$ = effect of year  
- $ε_{ijt}$ = random noise  

## Third Example

In [None]:
df['year'] = pd.to_datetime(df['year'], format='%Y').dt.year
new_df = df.set_index(['panel_id', 'year']).copy()
print(new_df.shape)
new_df.head()

In [None]:
independent_vars = ['clientactivity', 'overtime', 'seniorityrank', 'collaboration', 'managerrelationship', 'slackactivity', 'externalvisibility', 
                    'teamno1', 'teamno2', 'teamno3', 'teamno4', 'teamno6', 'teamno7', 'teamno8', 'teamno9', 'teamno10']

In [None]:
model3 = PanelOLS(dependent=new_df['performance'], exog=sm.add_constant(new_df[independent_vars]), entity_effects=True, time_effects=True)
print(model3.fit())

In [None]:
model4 = PanelOLS.from_formula("""performance ~ 1 + clientactivity + overtime + seniorityrank + collaboration + 
                                                    managerrelationship + slackactivity + externalvisibility + teamno1 + 
                                                    teamno2 + teamno3 + teamno4 + teamno6 + teamno7 + teamno8 + teamno9 + 
                                                    teamno10 + EntityEffects + TimeEffects""", data=new_df).fit()
print(model4)