### Sample notebook of Poisson regression for Homerun data 
本塁打データのポアソン回帰分析の手順例  

Official: #Homerun of official 4-match competition  大会4試合での本塁打数  
Practice: Average #Homerun of 4 practice games  練習試合での平均本塁打数(4試合分)  
Stadium: Stadium for competition (A and B)  大会が開催される球場(A, B)  

#### Import libraries

In [None]:
import pandas as pd
import statsmodels.api as sm

#### Parameters  

In [None]:
csv_in = 'homerun.csv'

#### Read CSV file  

In [None]:
df = pd.read_csv(csv_in, delimiter=',', skiprows=0, header=0)
print(df.shape)
print(df.info())
display(df.head())

#### Make dummy variable  
ダミー変数化  

In [None]:
df_dumm = pd.get_dummies(df, drop_first=True)
display(df_dumm.head())

#### Separate explanatory variables and objective variable  
説明変数と目的変数を分ける    

In [None]:
X = df_dumm[['Practice', "Stadium_B"]]
y = df_dumm['Official']
X_c = sm.add_constant(X)

#### Poisson regression  

In [None]:
model = sm.GLM(y, X_c, family=sm.families.Poisson())
results = model.fit()
print(results.summary())

#### Regression coefficients  

In [None]:
print(results.params)

#### Obtained best model:
$\ln(\rm{Official}) = 0.080 * \rm{Practice} + (-0.032) * \rm{Stadium\_B} +1.263$

#### Do prediction using the obtained model    
NOTICE: add 1 at the head of each data for constant.  
(You can use sm.add_constant() to add 1)
得られたモデルを用いて、予測を行う。  
注: 予測のための各Xデータの先頭には定数項用の1を付加すること。  
(sm.add_constant()を用いて付加してもよい)  

In [None]:
dfX_test = pd.DataFrame([[1, 10.5, 0],
                         [1, 11.5, 1],
                        ],
                        columns=results.params.index)  # example
print('X for prediction:')
display(dfX_test)
y_test = results.predict(dfX_test)
print('Predicted y:')
print(y_test)