### Interpreting Results of Logistic Regression

In this notebook (and quizzes), you will be getting some practice with interpreting the coefficients in logistic regression.  Using what you saw in the previous video should be helpful in assisting with this notebook.

The dataset contains four variables: `admit`, `gre`, `gpa`, and `prestige`:

* `admit` is a binary variable. It indicates whether or not a candidate was admitted into UCLA (admit = 1) our not (admit = 0).
* `gre` is the GRE score. GRE stands for Graduate Record Examination.
* `gpa` stands for Grade Point Average.
* `prestige` is the prestige of an applicant alta mater (the school attended before applying), with 1 being the highest (highest prestige) and 4 as the lowest (not prestigious).

To start, let's read in the necessary libraries and data.

---
#### 로지스틱 회귀 결과 해석

이번 notebook 에서는 로지스틱 회귀 분석의 계수(coefficient)를 해석하는 연습을 해보자. 이전 영상에서 봤던 내용을 참고하는 것이 도움이 될 것.   

데이터셋은 4개의 변수를 가지고 있다 : `admit`, `gre`, `gpa`, 그리고 `prestige` :    

* `admit`은 이진변수. 지원자가 UCLA에 합격했는지 아닌지를 나타낸다(합격=1, 불합격=0)   
* `gre` 는 GRE 점수. 
* `gpa` 는 성적(Grade Point Average) 
* `prestige` 는 모교(전에 다닌 학교)의 명성. 1부터 4까지(가장 높음:1, 가장 낮음:4)    

In [22]:
import numpy as np
import pandas as pd
import statsmodels.api as sm

df = pd.read_csv("./admissions.csv")
df.head()

Unnamed: 0,admit,gre,gpa,prestige
0,0,380,3.61,3
1,1,660,3.67,3
2,1,800,4.0,1
3,1,640,3.19,4
4,0,520,2.93,4


There are a few different ways you might choose to work with the `prestige` column in this dataset.  For this dataset, we will want to allow for the change from prestige 1 to prestige 2 to allow a different acceptance rate than changing from prestige 3 to prestige 4.

1. With the above idea in place, create the dummy variables needed to change prestige to a categorical variable, rather than quantitative, then answer quiz 1 below.

---

이 데이터셋에 있는 `prestige` 컬럼을 어떻게 사용할지 여러 방법이 있다. 이 데이터셋에서는 다른 합격률을 허용하기 위해서 prestige 1을 prestige 2로 바꾸는 것을 허용하되 prestige 3에서 prestige 4로 바뀌는 것은 허용하지 않을 것. 

1. `prestige` 컬럼을 dummy variable 로 바꾸고 아래 Quiz 1에 답하시오. 

In [23]:
df = df.join(pd.get_dummies(df['prestige'], prefix='pre'))
df.head()

Unnamed: 0,admit,gre,gpa,prestige,pre_1,pre_2,pre_3,pre_4
0,0,380,3.61,3,0,0,1,0
1,1,660,3.67,3,0,0,1,0
2,1,800,4.0,1,1,0,0,0
3,1,640,3.19,4,0,0,0,1
4,0,520,2.93,4,0,0,0,1


In [25]:
# find the most common prestige 
df['pre_1'].sum(), df['pre_2'].sum(), df['pre_3'].sum(), df['pre_4'].sum()

(61, 148, 121, 67)

`2.` Now, fit a logistic regression model to predict if an individual is admitted using `gre`, `gpa`, and `prestige` with a baseline of the prestige value of `1`.  Use the results to answer quiz 2 and 3 below.  Don't forget an intercept. Remember to use the `.summary2()` method to get your summary results.

---

`2.` 이제 `gre`, `gpa`, 그리고 `prestige`(베이스라인=1) 컬럼을 사용해 로지스틱 회귀 모델을 만들어보자. 결과를 가지고 Quiz 2와 3에 답해보자.    

In [26]:
# fit logistic regression model 
df['intercept'] = 1 
log_md = sm.Logit(df['admit'], df[['intercept', 'gre', 'gpa', 'pre_2', 'pre_3', 'pre_4']])
results = log_md.fit()
results.summary2()

Optimization terminated successfully.
         Current function value: 0.573854
         Iterations 6


0,1,2,3
Model:,Logit,No. Iterations:,6.0
Dependent Variable:,admit,Pseudo R-squared:,0.082
Date:,2021-12-21 03:53,AIC:,467.6399
No. Observations:,397,BIC:,491.5435
Df Model:,5,Log-Likelihood:,-227.82
Df Residuals:,391,LL-Null:,-248.08
Converged:,1.0000,Scale:,1.0

0,1,2,3,4,5,6
,Coef.,Std.Err.,z,P>|z|,[0.025,0.975]
intercept,-3.8769,1.1425,-3.3934,0.0007,-6.1161,-1.6376
gre,0.0022,0.0011,2.0280,0.0426,0.0001,0.0044
gpa,0.7793,0.3325,2.3438,0.0191,0.1276,1.4311
pre_2,-0.6801,0.3169,-2.1459,0.0319,-1.3013,-0.0589
pre_3,-1.3387,0.3449,-3.8819,0.0001,-2.0146,-0.6628
pre_4,-1.5534,0.4175,-3.7211,0.0002,-2.3716,-0.7352


In [34]:
# exponentiate the coefficient for'pre_4' compared to that of baseline('pre_1')
np.exp(-1.5534) # 0.21152755611983615
# find reciprocal value for the result (take one over the exponential of the coefficients)
1/np.exp(-1.5534)

4.7275164443987272

In [38]:
# exponentiate the coefficient for'pre_3' compared to that of baseline('pre_1')
np.exp(-1.3387) # 0.26218628930498067
# find reciprocal value for the result (take one over the exponential of the coefficients)
1/np.exp(-1.3387)

3.81408197450317

In [36]:
# exponentiate the coefficient for'pre_2' compared to that of baseline('pre_1')
np.exp(-0.6801) # 0.50656633319935351
# find reciprocal value for the result (take one over the exponential of the coefficients)
1/np.exp(-0.6801)

1.9740751298733885

In [40]:
# exponentiate the coefficient for'gpa' 
np.exp(0.7793)

2.1799457692483717