# Regression with panel data (an aside)

In many studies in strategy and OT, we use text analysis as part of econometric models with panel data.
Since we do not cover it elsewhere in the curriculum, we will take a small aside to discuss some of these models.

**Note:** I'm using Stata here, so none of this content is interactive.

This is partially adapted from the Stata `xtreg` docs, because we are covering it very quickly.
You can find more detail [here](https://www.stata.com/manuals13/xtxtreg.pdf).

# Read data

In Stata, the `use` command reads data, including from URLs.

# Setting the panel variables

To help the model commands understand the panel structure, we use the `xtset` command. 
Do note that the year variables are not automatically added, so you would need to add `i.year` to have Stata create and use indicators for you.

`xtset idcode year`

The output of `xtset` tells us about the panel variables.

# Using local macros for collecting variable names

A good practice with Stata is using a local macro to collect variable names.
That way, if we're running multiple models, we can keep them in sync.
It's especially helpful when we decide to add a control or other variable, and we want the change to apply to all models.

```stata
local controls ///
   grade ///
   age ///
   ttl_exp ///
   tenure


local ivs ///
    not_smsa ///
    south
```

Note that we're using Stata's line continuation sentinel, `///`. 
This allows us to tell Stata that it should ignore the end of the line and process the next one as if there is no line break.

There are two forms of practical significance here. 
First, we can avoid having a command that is one very long line that is hard to read and edit.
Second, we can add a line continuation in front of one of these variables, and that one will be skipped, allowing us to easily "turn off" a variable in our analyses.

**Note:** For some reason, the Stata app does not properly handle line continuations when entered in the command window.

# Regressions compared

The model above is simply an OLS model.
As we'll see below, some of these parameter estimates are a lot higher than they are when we account for the non-independence.

Note the syntax for using the local macros we created earlier: we use the name with a backtick `` ` `` on the left (the key to the left of the number 1 on a US keyboard) and an apostrophe `'` (the key to the right of the semicolon key on a US keyboard).

This is a fixed effects model.
Note that grade does not vary within units, so the model drops it.
Also, note that it splits out the within, between, and overall effects for us, and reports some panel stats in the header.

It also has an F test that the unit effects are zero, which is rejected in this case.
Note that, when using robust standard errors (as we often do), that test is suppressed.

The command at the bottom, `estimates store fe` stores the model estimates with the name `fe`.
We could have named it anything, but `fe` is descriptive.

This is a random effects model.
Note the differences when we assume no correlation (and the model output reminds us of that fact).

# Testing whether the RE model is consistent

A Hausman test can test whether the FE and RE estimates are consistent. 
If they are, we can use use the more efficient RE model.

**Note:** Using this test assumes that a fixed-effects model would be appropriate.
If you want a time-invariant variable in the regression, it will be dropped be FE.
If you want a nearly time-invariant variable, almost all of the variance will be wiped out, but the model will still give you a parameter estimate.
Reviewers often ask for this test, and you may need to argue smartly if FE isn't appropriate for your study.