### Sample notebook of logistic regression for Cake data 
ケーキデータのロジスティック回帰分析の手順例  

WedSatSun: 1 if the day is Wed or Sat or Sun.  
HighTemp: High temperature of the day.  
Sales: 1 if the specific cake has been sold.

#### Import libraries  

In [None]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn import preprocessing

#### Parameters  

In [None]:
csv_in = 'cake.csv'

#### Read CSV file

In [None]:
df = pd.read_csv(csv_in, delimiter=',', skiprows=0, header=0)
print(df.shape)
print(df.info())
display(df.head())

#### Separate explanatory variables and objective variable  
説明変数と目的変数を分ける  

In [None]:
y= df['Sales']
X = df[['WedSatSun','HighTemp']]
print('X:', X.shape)
print('y:', y.shape)

#### Logistic regression (standardized)

In [None]:
# NOTE: after scaling, X_scaled and Y_scaled are ndarray, not DataFrame.
X_scaled = preprocessing.scale(X)
dfX_scaled = pd.DataFrame(X_scaled, columns=X.columns)
model = LogisticRegression(C=1000000)
results = model.fit(dfX_scaled, y)

#### Regression coefficients (standardized)  

In [None]:
print(results.intercept_[0])
print(results.coef_[0])

**Contribution to objective variable (目的変数への寄与):**
**HighTemp(positive) > WedSatSun(positive)**

#### Logistic regression  

In [None]:
model = LogisticRegression(C=1000000)
results = model.fit(X, y)

#### Regression coefficients  

In [None]:
print(results.intercept_[0])
print(results.coef_[0])

#### Obtained model:
$\ln(\frac{\rm{Sales}}{1-\rm{Sales}}) = 2.443 * \rm{WedSatSun} + 0.545 * \rm{HighTemp} - 15.204$

#### Ratio of correct answers  

In [None]:
n_data = y.shape[0]
n_correct_pred = ((results.predict(X) >= 0.5) == y).sum()
print(n_correct_pred / n_data)

#### Odds ratio
How many times the objective variable will increase when corresponding variable is increased by 1  
and other variables are not changed  
得られたexp(回帰係数)の値(オッズ比)から、「各説明変数が1増えたときに目的変数が何倍になるか」がわかる。  

In [None]:
print(np.exp(results.intercept_[0]))
print(np.exp(results.coef_[0]))

#### Do prediction using the obtained model      
得られたモデルを用いて、予測を行う。

In [None]:
dfX_test = pd.DataFrame([[1, 23],
                         [0, 28],
                        ],
                        columns=X.columns)  # example
print('X for prediction:')
display(dfX_test)
print('Predicted y:')
y_test = results.predict_proba(dfX_test)
print(y_test[1])  # prob for Class 1
print(y_test[1] >= 0.5)