# bikeの特徴量の解釈
https://christophm.github.io/interpretable-ml-book/limo.html

## データの意味

- cnt:casualとregisteredのユーザーの数。回帰のターゲット
- season:spring, summer, fall or winter.
- holiday
- year: 2011 or 2012.
- dateday:2011/01/01からの日数。この特料量は時間経過によるトレンドを導くのに使用する。
- working day or weekend.
- 天気:
    - 1 clear, few clouds, partly cloudy, cloudy
    - 2 mist + clouds, mist + broken clouds, mist + few clouds, mist
    - 3 light snow, light rain + thunderstorm + scattered clouds, light rain + scattered clouds
    - 4 heavy rain + ice pallets + thunderstorm + mist, snow + mist
- temp：Temperature in degrees Celsius.41で割り、正規化されている。
- atemp：体感温度。50で割り、正規化されている。
- hum:湿度(0 to 100). 100で割り正規化されている。
- Wind speed: km/hour.　67で割り、正規化されている。

In [None]:
import os
import copy
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import sklearn
from sklearn import linear_model
from sklearn.preprocessing import LabelBinarizer
from sklearn.preprocessing import OneHotEncoder
from pandas import Series, DataFrame
import pandas as pd
import seaborn as sns;sns.set()
%matplotlib inline
os.getcwd()

In [None]:
data_dir = '../../data/bike'
day_file = 'day.csv'
hour_file = 'hour.csv'

In [None]:
df_day = pd.read_csv(os.path.join(data_dir, day_file))
# headにすることで先頭だけ確認できる。
df_day.head()

In [None]:
# ここからデータを整形していく
# 目的変数をYに
Y = df_day['cnt']

#　dteday をなくしt、instantをdayに（本当は日付を経過日数に変換しないと駄目）
del df_day['dteday']
del df_day['yr']

df_day = df_day.rename(columns={'instant': 'days'})

# onehot(季節)
encoder = OneHotEncoder()
enced = encoder.fit_transform(df_day.season.values.reshape(1, -1).transpose())
df_season = pd.DataFrame(index=df_day.season.index, columns=['season-SPRING', 'season-SUMMER', 'season-FALL', "season-WINTER"], data=enced.toarray())
df_day = pd.concat([df_day, df_season], axis=1)
del df_day['season']

# onehot(天気)
encoder = OneHotEncoder()
enced = encoder.fit_transform(df_day.weathersit.values.reshape(1, -1).transpose())
df_weather = pd.DataFrame(index=df_day.weathersit.index, columns=['weather-CLEAR', 'weather-MYST', 'season-SNOW'], data=enced.toarray())
df_day = pd.concat([df_day, df_weather], axis=1)
del df_day['weathersit']


# 未使用の特徴量を削除
del df_day['casual']
del df_day['registered']
del df_day['cnt']
del df_day['mnth']
del df_day['weekday']
del df_day['atemp']

df_day.head()

In [None]:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(df_day, Y)

In [None]:
#print("weight: ", model.coef_)
print("inteception: ", model.intercept_)
# 名前と正規化
parms = [('days', 1), ('holiday', 1), ('workingday', 1),
         ('temp', 41.0), ('hum', 100.0), ('windspeed', 67.0), 
         ('season-SPRING', 1), ('season-SUMMER',1),  ('season-FALL', 1),  ('season-WINTER', 1),
         ('weather-CLEAR', 1),  ('weather-MYST', 1), ('season-SNOW', 1)
]

for p, w in zip(parms, model.coef_):
    name, scale = p
    print("%s: %f" % (name, w / scale))



ここ
https://www.quora.com/In-scikit-learn-how-can-you-obtain-the-standard-errors-of-regression-coefficients
```python
def p_vals_per_coef(pred, true, coefs, X):
 sse = sum_squared_error(pred,true)/ float(X.shape[0] - X.shape[1])
 standard_error = np.array([np.sqrt(np.diagonal(sse * np.linalg.inv(np.dot(X.T, X))))])
 t_stats = coefs / standard_error
 p_vals = 2 * (1 - stats.t.cdf(np.abs(t_stats), y.shape[0] - X.shape[1]))
 return p_vals
```