### Interpreting Results of Logistic Regression

In this notebook (and quizzes), you will be getting some practice with interpreting the coefficients in logistic regression.  Using what you saw in the previous video should be helpful in assisting with this notebook.

The dataset contains four variables: `admit`, `gre`, `gpa`, and `prestige`:

* `admit` is a binary variable. It indicates whether or not a candidate was admitted into UCLA (admit = 1) our not (admit = 0).
* `gre` is the GRE score. GRE stands for Graduate Record Examination.
* `gpa` stands for Grade Point Average.
* `prestige` is the prestige of an applicant alta mater (the school attended before applying), with 1 being the highest (highest prestige) and 4 as the lowest (not prestigious).

To start, let's read in the necessary libraries and data.

In [17]:
import numpy as np
import pandas as pd
import statsmodels.api as sm

df = pd.read_csv("./admissions.csv")
df.head()

Unnamed: 0,admit,gre,gpa,prestige
0,0,380,3.61,3
1,1,660,3.67,3
2,1,800,4.0,1
3,1,640,3.19,4
4,0,520,2.93,4


There are a few different ways you might choose to work with the `prestige` column in this dataset.  For this dataset, we will want to allow for the change from prestige 1 to prestige 2 to allow a different acceptance rate than changing from prestige 3 to prestige 4.

1. With the above idea in place, create the dummy variables needed to change prestige to a categorical variable, rather than quantitative, then answer quiz 1 below.

In [19]:
df[['prest1','prest2','prest3','prest4']]=pd.get_dummies(df.prestige)
df=df.join(prestige_dummies);

In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 397 entries, 0 to 396
Data columns (total 8 columns):
admit       397 non-null int64
gre         397 non-null int64
gpa         397 non-null float64
prestige    397 non-null int64
1           397 non-null uint8
2           397 non-null uint8
3           397 non-null uint8
4           397 non-null uint8
dtypes: float64(1), int64(3), uint8(4)
memory usage: 14.0 KB


In [20]:
df['prestige'].astype(str).value_counts()

2    148
3    121
4     67
1     61
Name: prestige, dtype: int64

In [21]:
df.head()

Unnamed: 0,admit,gre,gpa,prestige,prest1,prest2,prest3,prest4,1,2,3,4
0,0,380,3.61,3,0,0,1,0,0,0,1,0
1,1,660,3.67,3,0,0,1,0,0,0,1,0
2,1,800,4.0,1,1,0,0,0,1,0,0,0
3,1,640,3.19,4,0,0,0,1,0,0,0,1
4,0,520,2.93,4,0,0,0,1,0,0,0,1


`2.` Now, fit a logistic regression model to predict if an individual is admitted using `gre`, `gpa`, and `prestige` with a baseline of the prestige value of `1`.  Use the results to answer quiz 2 and 3 below.  Don't forget an intercept.

In [24]:
df['intercept']=1
logit_mod=sm.Logit(df.admit,df[['intercept', 'gre', 'gpa' ,'prest2' ,'prest3','prest4' ]])
res=logit_mod.fit()
res.summary2()

Optimization terminated successfully.
         Current function value: 0.573854
         Iterations 6


0,1,2,3
Model:,Logit,No. Iterations:,6.0
Dependent Variable:,admit,Pseudo R-squared:,0.082
Date:,2021-01-23 12:53,AIC:,467.6399
No. Observations:,397,BIC:,491.5435
Df Model:,5,Log-Likelihood:,-227.82
Df Residuals:,391,LL-Null:,-248.08
Converged:,1.0000,Scale:,1.0

0,1,2,3,4,5,6
,Coef.,Std.Err.,z,P>|z|,[0.025,0.975]
intercept,-3.8769,1.1425,-3.3934,0.0007,-6.1161,-1.6376
gre,0.0022,0.0011,2.0280,0.0426,0.0001,0.0044
gpa,0.7793,0.3325,2.3438,0.0191,0.1276,1.4311
prest2,-0.6801,0.3169,-2.1459,0.0319,-1.3013,-0.0589
prest3,-1.3387,0.3449,-3.8819,0.0001,-2.0146,-0.6628
prest4,-1.5534,0.4175,-3.7211,0.0002,-2.3716,-0.7352


In [26]:
np.exp(res.params)

intercept    0.020716
gre          1.002221
gpa          2.180027
prest2       0.506548
prest3       0.262192
prest4       0.211525
dtype: float64

In [27]:
1/_

intercept    48.272116
gre           0.997784
gpa           0.458710
prest2        1.974147
prest3        3.813995
prest4        4.727566
dtype: float64