# Multiple Regression - Extended Applications

## Dummy variables

We will be using a modified data set originally obtained from
Wooldridge’ Introductory Econometrics data set on wages.

The modified data set used in this example has the following variables:

-   `wage`: average hourly earnings
-   `educ`: years of education
-   `exper`: years potential experience
-   `tenure`: years with current employer
-   `married`: =1 if married
-   `numdep`: number of dependents
-   `gender`: male or female
-   `skin`: color of the skin (white or nonwhite)

In [1]:
import pandas as pd
import statsmodels.formula.api as smf

df = pd.read_csv("./data/wage.csv")
df.head()

Let’s construct the dummy variables for `gender` and `skin`:

In [2]:
df = pd.get_dummies(df, columns=["gender", "skin"], drop_first=False, dtype=int)
df.head()

Now, let’s run a regression using these dummy variables:

In [3]:
model = smf.ols("wage ~ educ + exper + tenure + married + numdep + gender_female + skin_nonwhite", data=df)
results = model.fit()
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:                   wage   R-squared:                       0.370
Model:                            OLS   Adj. R-squared:                  0.362
Method:                 Least Squares   F-statistic:                     43.54
Date:                Sun, 27 Oct 2024   Prob (F-statistic):           2.53e-48
Time:                        21:33:03   Log-Likelihood:                -1311.4
No. Observations:                 526   AIC:                             2639.
Df Residuals:                     518   BIC:                             2673.
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
                    coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------
Intercept        -1.9799      0.783     -2.530

### Analyze the results

-   Do a significance analysis of the coefficients
-   Interpret the meaning of the coefficients using the corresponding
    units
-   Analyze the Adjusted R-squared

## Polynomial regression

Let’s consider experience as having potentially **non-linear effects**.
That is, we add `exper^2` to our regression to account for this
non-linearity:

In [4]:
# add the exper-squared to the dataset (df)
df["exper^2"] = df["exper"] ** 2
df.head()

In [5]:
# specify and estimate the new model with exper^2
model2 = smf.ols("wage ~ educ + exper + exper^2 + tenure + married + numdep + gender_female + skin_nonwhite", data=df)
results2 = model2.fit()
print(results2.summary())

                            OLS Regression Results                            
Dep. Variable:                   wage   R-squared:                       0.371
Model:                            OLS   Adj. R-squared:                  0.361
Method:                 Least Squares   F-statistic:                     38.10
Date:                Sun, 27 Oct 2024   Prob (F-statistic):           1.45e-47
Time:                        21:33:03   Log-Likelihood:                -1311.2
No. Observations:                 526   AIC:                             2640.
Df Residuals:                     517   BIC:                             2679.
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
                    coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------
Intercept        -2.0180      0.786     -2.569

### Analyze the results

-   Do a significance analysis of the coefficients
-   Interpret the meaning of the coefficients using the corresponding
    units
-   Analyze the Adjusted R-squared

**Conclusion**: given that the estimated coefficients for `exper` and
`exper^2` are not significant, we conclude that experience has no linear
effects on the hourly wage.