<link rel='stylesheet' href='../assets/css/main.css'/>

# Multiple Logistic Regression in Python  - College Admission

### Overview
Predict college admission using Multiple Logistic Regression
 
### Builds on
None

### Run time
approx. 10-20 minutes

### Notes




In [2]:
%matplotlib inline

import numpy as np
import pandas as pd
from scipy import stats
import statsmodels.formula.api as sm
import matplotlib.pyplot as plt


## Step 1: College Admission Data

Let's look at the college admission data.  Here, we have some student test scores, GPA, and Rank, followed by whether the student was admitt ed or not.


|gre  |gpa  |rank |  admitted |
|-----------------------------|
|380  |3.61 | 3   |    no     |
|660  |3.67 | 1   |    yes    |
|800  |4.0  | 1   |    yes    |
|640  |3.19 | 4   |    yes    |
|520  |2.93 | 4   |    no     |
|760  |3.0  | 2   |    yes    |

## Step 1: Let's create a Pandas dataframe with the data


In [4]:
admissions = pd.read_csv("/data/college-admissions/admission-data.csv")
admissions.columns = ["admit", "gre", "gpa", "prestige"]

admissions

Unnamed: 0,admit,gre,gpa,prestige
0,0,380,3.61,3
1,1,660,3.67,3
2,1,800,4.00,1
3,0,640,3.19,4
4,0,520,2.93,4
5,1,760,3.00,2
6,0,560,2.98,1
7,0,400,3.08,2
8,0,540,3.39,3
9,1,700,3.92,2


## Step 2: Do some summary analytics

** Do a describe() on the data **

```python
print(df.describe())
```

** Get the Standard deviation by Column
```python
df.std()
```

In [None]:
# TODO: Do a Describe on the data


In [None]:
# TODO: Get Standard Deviation by column


### Crossstab

We can do a frequency table crosstab like this:

```python
print(pd.crosstab(admissions['admit'], admissions['prestige'], rownames=['admit']))
```

In [None]:
#TODO: Do a Frequency Table Crosstab


In [None]:
### Histogram

You can do a histogram simply by doing dataframe.hist()

In [None]:
#TODO: Perform a histogram

In [10]:
dummy_ranks = pd.get_dummies(admissions['prestige'], prefix='prestige')
print(dummy_ranks.head())

   prestige_1  prestige_2  prestige_3  prestige_4
0           0           0           1           0
1           0           0           1           0
2           1           0           0           0
3           0           0           0           1
4           0           0           0           1


In [14]:
# create a clean data frame for the regression
cols_to_keep = ['admit', 'gre', 'gpa']
data = admissions[cols_to_keep].join(dummy_ranks.loc[:, 'prestige_2':])
data 

Unnamed: 0,admit,gre,gpa,prestige_2,prestige_3,prestige_4
0,0,380,3.61,0,1,0
1,1,660,3.67,0,1,0
2,1,800,4.00,0,0,0
3,0,640,3.19,0,0,1
4,0,520,2.93,0,0,1
5,1,760,3.00,1,0,0
6,0,560,2.98,0,0,0
7,0,400,3.08,1,0,0
8,0,540,3.39,0,1,0
9,1,700,3.92,1,0,0



## Step 4: Run Logistic Regression in StatsModel

Let's run our logistic regression.  To do this we need to run call the Logit Class in statsmodel

In [15]:
data['intercept'] = 1.0

In [17]:
data

Unnamed: 0,admit,gre,gpa,prestige_2,prestige_3,prestige_4,intercept
0,0,380,3.61,0,1,0,1.0
1,1,660,3.67,0,1,0,1.0
2,1,800,4.00,0,0,0,1.0
3,0,640,3.19,0,0,1,1.0
4,0,520,2.93,0,0,1,1.0
5,1,760,3.00,1,0,0,1.0
6,0,560,2.98,0,0,0,1.0
7,0,400,3.08,1,0,0,1.0
8,0,540,3.39,0,1,0,1.0
9,1,700,3.92,1,0,0,1.0


In [18]:
result = sm.Logit(admissions['admit'], data[data.columns[1:]]).fit()
result

Optimization terminated successfully.
         Current function value: 0.408573
         Iterations 7


<statsmodels.discrete.discrete_model.BinaryResultsWrapper at 0x119b5c400>

In [19]:
result.summary()

0,1,2,3
Dep. Variable:,admit,No. Observations:,100.0
Model:,Logit,Df Residuals:,94.0
Method:,MLE,Df Model:,5.0
Date:,"Tue, 20 Feb 2018",Pseudo R-squ.:,0.4021
Time:,23:42:58,Log-Likelihood:,-40.857
converged:,True,LL-Null:,-68.331
,,LLR p-value:,1.338e-10

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
gre,0.0141,0.003,4.104,0.000,0.007,0.021
gpa,1.8221,0.888,2.052,0.040,0.082,3.563
prestige_2,0.5516,0.808,0.683,0.495,-1.032,2.135
prestige_3,0.3926,0.854,0.460,0.646,-1.281,2.066
prestige_4,-1.0491,0.965,-1.087,0.277,-2.940,0.842
intercept,-15.2991,3.502,-4.369,0.000,-22.162,-8.436


In [21]:
result.conf_int()


Unnamed: 0,0,1
gre,0.007364,0.020828
gpa,0.081661,3.562552
prestige_2,-1.032085,2.135275
prestige_3,-1.280697,2.06585
prestige_4,-2.940259,0.842005
intercept,-22.162085,-8.436214
