<a href="https://colab.research.google.com/github/monda00/horse-race-notebook/blob/master/predict_show_neural_network.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ニューラルネットワークで予測

学習データの作成から予測の考察までやってみる。

- 単純に１行ずつ学習
- 各レースのデータ数を揃えて１レースまとめて学習

特徴量

|分類	|項目 |
|---|---|
|馬情報	|馬番 |
| |枠番 |
| |年齢 |
| |性別 |
| |体重（現在） | 
| |体重（前走との差分） |
| |負担重量 |
| 当日レース情報 |レース場 |
| |出走馬数 |
| |コース距離 |
| |周回方向 |
| |コースタイプ（ダ/芝/障） |
| |天気 |
| |馬場状態 |
| |開始時間帯 |
| |時期 |
|同馬の過去レース情報（×5走分）	|オッズ |
| |人気 |
| |順位 |
| |タイム（秒） |
| |前走からの経過日数 |
| |コース距離 |
| |コースタイプ（ダ/芝/障） |
| |天気 |
| |馬場状態 |

# 概要

- ライブラリ・データ読み込み
- 前処理
- 学習
- 予測
- 考察

## 参考

- [データ収集からディープラーニングまで全て行って競馬の予測をしてみた](https://qiita.com/kami634/items/55e49dad76396d808bf5#%E5%8F%96%E5%BE%97%E3%81%97%E3%81%9Furl%E3%82%92%E3%82%82%E3%81%A8%E3%81%ABhtml%E3%82%92%E5%BE%97%E3%82%8B)
- [競馬の予測をガチでやってみた](http://stockedge.hatenablog.com/entry/2016/01/03/103428)
- [ディープラーニングさえあれば、競馬で回収率100%を超えられる](https://qiita.com/yossymura/items/334a8f3ef85bff081913)
- [競馬予想AIを作る 〜ニューラルネットワークによる相対評価データセットの取り扱い例〜](https://cocon-corporation.com/cocontoco/horseraceprediction_ai/)

# ライブラリ・データ読み込み

In [1]:
import numpy as np
import pandas as pd
import datetime
from tqdm import tqdm
import collections

from tensorflow import keras
from tensorflow.keras.layers import Input, Dense, Dropout, BatchNormalization
from tensorflow.keras.layers import concatenate
from tensorflow.keras.models import Model

from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split

In [2]:
DATA_PATH = '/content/drive/My Drive/data/horse-race/'

In [3]:
df = pd.read_csv(DATA_PATH + 'train_nn.csv')

  interactivity=interactivity, compiler=compiler, result=result)


In [4]:
df.head()

Unnamed: 0,race_id,horse_number,frame_number,age,gen,weight,weight_diff,burden_weight,place,race_horse_number,distance,clockwise,field_type,field_condition,weather,time_hour,season,one_before_odd,one_before_popular,one_before_rank,one_before_time,one_before_elapsed_day,one_before_distance,one_before_field_type,one_before_field_condition,one_before_weather,two_before_odd,two_before_popular,two_before_rank,two_before_time,two_before_elapsed_day,two_before_distance,two_before_field_type,two_before_field_condition,two_before_weather,three_before_odd,three_before_popular,three_before_rank,three_before_time,three_before_elapsed_day,three_before_distance,three_before_field_type,three_before_field_condition,three_before_weather,four_before_odd,four_before_popular,four_before_rank,four_before_time,four_before_elapsed_day,four_before_distance,four_before_field_type,four_before_field_condition,four_before_weather,five_before_odd,five_before_popular,five_before_rank,five_before_time,five_before_elapsed_day,five_before_distance,five_before_field_type,five_before_field_condition,five_before_weather,show,date
0,201945010102,1.0,1.0,7.0,牝,448,0,54.0,川崎,10.0,1400.0,左,ダ,良,晴,11.0,winter,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,2019/1/1
1,201945010102,2.0,2.0,7.0,牡,464,7,56.0,川崎,10.0,1400.0,左,ダ,良,晴,11.0,winter,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,1.0,2019/1/1
2,201945010102,3.0,3.0,7.0,牝,464,4,54.0,川崎,10.0,1400.0,左,ダ,良,晴,11.0,winter,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,1.0,2019/1/1
3,201945010102,4.0,4.0,6.0,牡,449,7,55.0,川崎,10.0,1400.0,左,ダ,良,晴,11.0,winter,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,2019/1/1
4,201945010102,5.0,5.0,7.0,牡,502,1,56.0,川崎,10.0,1400.0,左,ダ,良,晴,11.0,winter,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,1.0,2019/1/1


# 前処理

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 407340 entries, 0 to 407339
Data columns (total 64 columns):
 #   Column                        Non-Null Count   Dtype  
---  ------                        --------------   -----  
 0   race_id                       407340 non-null  int64  
 1   horse_number                  253636 non-null  float64
 2   frame_number                  253636 non-null  float64
 3   age                           253636 non-null  float64
 4   gen                           253636 non-null  object 
 5   weight                        253636 non-null  object 
 6   weight_diff                   253636 non-null  object 
 7   burden_weight                 253636 non-null  float64
 8   place                         253636 non-null  object 
 9   race_horse_number             253636 non-null  float64
 10  distance                      253636 non-null  float64
 11  clockwise                     253636 non-null  object 
 12  field_type                    253636 non-nul

## 日付削除

日付はソートのために利用するだけのため削除

In [6]:
df = df.drop('date', axis=1)

## 欠損値

0埋めする。

In [7]:
df.isnull().sum().sum()

9529648

In [8]:
df = df.fillna(0)

過去レースの情報がない馬も結構いるかもしれない。

23752個のデータで過去レースの情報がない。
数値と文字列の0が混在している。

In [9]:
past_race_columns_base = ['odd', 'popular', 'rank', 'time', 'elapsed_day', 'distance', 'field_type', 'field_condition', 'weather']
past_race_num = ['one', 'two', 'three', 'four', 'five']
past_race_columns = []
for n in past_race_num:
  for c in past_race_columns_base:
    past_race_columns.append('{}_before_{}'.format(n, c))

## weight

計測不能が混じっている。

In [10]:
len(df[df['weight'] == '計不'])

44

In [11]:
df[df['weight'] == '計不'] = 0

In [12]:
df['weight'] = df['weight'].astype('int64')

## weight diff

型を変換する

In [13]:
df['weight_diff'] = df['weight_diff'].astype('int64')

## Label Encoding

In [14]:
categorical_cols = ['gen', 'place', 'clockwise', 'field_type', 'field_condition', 'weather', 'season']
categorical_cols_past_base = ['field_type', 'field_condition', 'weather']

In [15]:
categorical_cols_past = []
for n in past_race_num:
  for c in categorical_cols_past_base:
    categorical_cols_past.append('{}_before_{}'.format(n, c))

In [16]:
categorical_cols = categorical_cols + categorical_cols_past

In [17]:
df[categorical_cols] = df[categorical_cols].astype(str)

In [18]:
le_list = list()

In [19]:
for c in categorical_cols:
  le = LabelEncoder()
  le.fit(df[c])
  le_list.append(le)
  df[c] = le.transform(df[c])

In [22]:
np.save(DATA_PATH + 'le_list.npy', le_list)

In [None]:
df

Unnamed: 0,race_id,horse_number,frame_number,age,gen,weight,weight_diff,burden_weight,place,race_horse_number,distance,clockwise,field_type,field_condition,weather,time_hour,season,one_before_odd,one_before_popular,one_before_rank,one_before_time,one_before_elapsed_day,one_before_distance,one_before_field_type,one_before_field_condition,one_before_weather,two_before_odd,two_before_popular,two_before_rank,two_before_time,two_before_elapsed_day,two_before_distance,two_before_field_type,two_before_field_condition,two_before_weather,three_before_odd,three_before_popular,three_before_rank,three_before_time,three_before_elapsed_day,three_before_distance,three_before_field_type,three_before_field_condition,three_before_weather,four_before_odd,four_before_popular,four_before_rank,four_before_time,four_before_elapsed_day,four_before_distance,four_before_field_type,four_before_field_condition,four_before_weather,five_before_odd,five_before_popular,five_before_rank,five_before_time,five_before_elapsed_day,five_before_distance,five_before_field_type,five_before_field_condition,five_before_weather,show
0,201945010102,1.0,1.0,7.0,2,448,0,54.0,11,10.0,1400.0,4,1,3,3,11.0,4,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,1,1,1,0.0
1,201945010102,2.0,2.0,7.0,3,464,7,56.0,11,10.0,1400.0,4,1,3,3,11.0,4,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,1,1,1,1.0
2,201945010102,3.0,3.0,7.0,2,464,4,54.0,11,10.0,1400.0,4,1,3,3,11.0,4,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,1,1,1,1.0
3,201945010102,4.0,4.0,6.0,3,449,7,55.0,11,10.0,1400.0,4,1,3,3,11.0,4,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,1,1,1,0.0
4,201945010102,5.0,5.0,7.0,3,502,1,56.0,11,10.0,1400.0,4,1,3,3,11.0,4,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,1,1,1,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
407335,202048060911,0.0,0.0,0.0,0,0,0,0.0,0,0.0,0.0,0,0,0,0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0
407336,202048060911,0.0,0.0,0.0,0,0,0,0.0,0,0.0,0.0,0,0,0,0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0
407337,202048060911,0.0,0.0,0.0,0,0,0,0.0,0,0.0,0.0,0,0,0,0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0
407338,202048060911,0.0,0.0,0.0,0,0,0,0.0,0,0.0,0.0,0,0,0,0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0


# 学習

## 学習データと検証データに分割

In [None]:
train_df, test_df = train_test_split(df, test_size=54000, shuffle=False)

In [None]:
test_df

Unnamed: 0,race_id,horse_number,frame_number,age,gen,weight,weight_diff,burden_weight,place,race_horse_number,distance,clockwise,field_type,field_condition,weather,time_hour,season,one_before_odd,one_before_popular,one_before_rank,one_before_time,one_before_elapsed_day,one_before_distance,one_before_field_type,one_before_field_condition,one_before_weather,two_before_odd,two_before_popular,two_before_rank,two_before_time,two_before_elapsed_day,two_before_distance,two_before_field_type,two_before_field_condition,two_before_weather,three_before_odd,three_before_popular,three_before_rank,three_before_time,three_before_elapsed_day,three_before_distance,three_before_field_type,three_before_field_condition,three_before_weather,four_before_odd,four_before_popular,four_before_rank,four_before_time,four_before_elapsed_day,four_before_distance,four_before_field_type,four_before_field_condition,four_before_weather,five_before_odd,five_before_popular,five_before_rank,five_before_time,five_before_elapsed_day,five_before_distance,five_before_field_type,five_before_field_condition,five_before_weather,show
353340,202050042307,1.0,1.0,8.0,3,489,1,56.0,7,8.0,1400.0,1,1,3,3,16.0,2,187.3,11.0,12.0,792.0,29.0,1400.0,1,3,3,88.2,10.0,12.0,182.0,44.0,1230.0,2,2,6,131.0,9.0,9.0,891.0,59.0,1400.0,2,3,4,408.2,12.0,11.0,791.0,71.0,1400.0,2,4,5,120.7,10.0,9.0,399.0,85.0,1500.0,2,5,4,0.0
353341,202050042307,2.0,2.0,4.0,2,473,7,54.0,7,8.0,1400.0,1,1,3,3,16.0,2,7.3,3.0,2.0,887.0,21.0,1400.0,1,4,3,2.2,1.0,2.0,89.0,42.0,1400.0,2,5,4,2.0,2.0,4.0,92.0,79.0,1400.0,2,4,4,1.5,1.0,1.0,590.0,64.0,1400.0,2,3,4,2.6,2.0,2.0,515.0,249.0,1800.0,2,4,4,1.0
353342,202050042307,3.0,3.0,7.0,3,493,-4,56.0,7,8.0,1400.0,1,1,3,3,16.0,2,9.5,4.0,1.0,290.0,29.0,1400.0,1,3,3,18.3,5.0,1.0,887.0,43.0,1400.0,2,2,5,10.6,5.0,2.0,692.0,78.0,1400.0,2,4,4,3.4,2.0,4.0,179.0,64.0,1230.0,2,3,4,38.6,7.0,9.0,989.0,111.0,1400.0,2,5,4,0.0
353343,202050042307,4.0,4.0,6.0,2,418,7,54.0,7,8.0,1400.0,1,1,3,3,16.0,2,85.7,9.0,8.0,788.0,21.0,1400.0,1,4,3,15.5,4.0,9.0,890.0,41.0,1400.0,2,3,4,11.9,4.0,1.0,389.0,69.0,1400.0,2,5,5,38.1,7.0,6.0,590.0,91.0,1400.0,2,5,5,23.0,8.0,7.0,680.0,210.0,1230.0,2,4,4,1.0
353344,202050042307,5.0,5.0,4.0,3,487,0,56.0,7,8.0,1400.0,1,1,3,3,16.0,2,3.7,2.0,5.0,978.0,23.0,1230.0,1,2,4,9.7,2.0,2.0,989.0,41.0,1400.0,2,3,4,31.2,5.0,2.0,491.0,209.0,1400.0,2,4,5,16.1,4.0,4.0,92.0,225.0,1400.0,2,4,6,16.4,4.0,7.0,614.0,245.0,1700.0,2,4,4,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
407335,202048060911,0.0,0.0,0.0,0,0,0,0.0,0,0.0,0.0,0,0,0,0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0
407336,202048060911,0.0,0.0,0.0,0,0,0,0.0,0,0.0,0.0,0,0,0,0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0
407337,202048060911,0.0,0.0,0.0,0,0,0,0.0,0,0.0,0.0,0,0,0,0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0
407338,202048060911,0.0,0.0,0.0,0,0,0,0.0,0,0.0,0.0,0,0,0,0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0.0


In [None]:
X_train = train_df.drop(['race_id', 'show'], axis=1).values
y_train = train_df['show'].values
X_test = test_df.drop(['race_id', 'show'], axis=1).values
y_test = test_df['show'].values  

In [None]:
train_race_id_counter = collections.Counter(list(train_df['race_id'].values))
test_race_id_counter = collections.Counter(list(test_df['race_id'].values))
train_query = list(train_race_id_counter.values())
test_query = list(test_race_id_counter.values())

## 正規化

単純な標準化をする。

nullを0で補完しているためスケールがバラバラになってそう。

In [None]:
sc = StandardScaler()
sc.fit(X_train)

StandardScaler(copy=True, with_mean=True, with_std=True)

In [None]:
X_train_sc = sc.transform(X_train)
X_test_sc = sc.transform(X_test)

In [None]:
print('train X', X_train_sc.shape)
print('train y', y_train.shape)
print('test X', X_test_sc.shape)
print('test y', y_test.shape)

train X (353340, 61)
train y (353340,)
test X (54000, 61)
test y (54000,)


In [None]:
X_train_sc = X_train_sc.reshape([-1, 18, 61])
y_train = y_train.reshape([-1, 18])
X_test_sc = X_test_sc.reshape([-1, 18, 61])
y_test = y_test.reshape([-1, 18])

In [None]:
print('train X', X_train_sc.shape)
print('train y', y_train.shape)
print('test X', X_test_sc.shape)
print('test y', y_test.shape)

train X (19630, 18, 61)
train y (19630, 18)
test X (3000, 18, 61)
test y (3000, 18)


### 多入力モデル用に整形

In [None]:
X_train_sc_reshape = []
X_test_sc_reshape = []
for i in range(18):
  X_train_sc_reshape.append(X_train_sc[:, i, :])
  X_test_sc_reshape.append(X_test_sc[:, i, :])

## モデルの定義

In [None]:
def define_simple_model(input_shape):
  inp = Input(shape=input_shape)
  x = Dense(300, activation='relu')(inp)
  x = Dropout(0.2)(x)
  x = Dense(150, activation='relu')(x)
  x = Dropout(0.2)(x)
  x = Dense(50, activation='relu')(x)
  x = Dropout(0.2)(x)
  x = Dense(1, activation='sigmoid')(x)

  model = Model(inp, x)
  model.compile(optimizer='adam', loss='mean_squared_error')

  return model

In [None]:
def define_race_model(input_shape):
  horse_layers = []
  inp_layers = []
  for i in range(18):
    inp = Input(shape=input_shape, name='horse{}'.format(i))
    inp_layers.append(inp)
    x = Dense(128, activation='relu')(inp)
    x = Dropout(0.2)(x)
    x = Dense(32, activation='relu')(x)
    x = Dropout(0.2)(x)
    horse_layers.append(x)

  concatenated = concatenate(horse_layers)
  x = Dense(128, activation='relu')(concatenated)
  x = Dropout(0.2)(x)
  x = Dense(64, activation='relu')(x)
  x = Dropout(0.2)(x)
  output = Dense(18, activation='sigmoid')(x)

  model = Model(inp_layers, output)
  model.compile(optimizer='adam', loss='mean_squared_error')

  return model

## 学習

In [None]:
model = define_race_model((X_train_sc.shape[2],))

In [None]:
model.summary()

Model: "functional_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
horse0 (InputLayer)             [(None, 61)]         0                                            
__________________________________________________________________________________________________
horse1 (InputLayer)             [(None, 61)]         0                                            
__________________________________________________________________________________________________
horse2 (InputLayer)             [(None, 61)]         0                                            
__________________________________________________________________________________________________
horse3 (InputLayer)             [(None, 61)]         0                                            
_______________________________________________________________________________________

In [None]:
epoch = 400
batch = 4

In [None]:
model.fit(X_train_sc_reshape, y_train, epochs=epoch, batch_size=batch)

Epoch 1/400
Epoch 2/400
Epoch 3/400
Epoch 4/400
Epoch 5/400
Epoch 6/400
Epoch 7/400
Epoch 8/400
Epoch 9/400
Epoch 10/400
Epoch 11/400
Epoch 12/400
Epoch 13/400
Epoch 14/400
Epoch 15/400
Epoch 16/400
Epoch 17/400
Epoch 18/400
Epoch 19/400
Epoch 20/400
Epoch 21/400
Epoch 22/400
Epoch 23/400
Epoch 24/400
Epoch 25/400
Epoch 26/400
Epoch 27/400
Epoch 28/400
Epoch 29/400
Epoch 30/400
Epoch 31/400
Epoch 32/400
Epoch 33/400
Epoch 34/400
Epoch 35/400
Epoch 36/400
Epoch 37/400
Epoch 38/400
Epoch 39/400
Epoch 40/400
Epoch 41/400
Epoch 42/400
Epoch 43/400
Epoch 44/400
Epoch 45/400
Epoch 46/400
Epoch 47/400
Epoch 48/400
Epoch 49/400
Epoch 50/400
Epoch 51/400
Epoch 52/400
Epoch 53/400
Epoch 54/400
Epoch 55/400
Epoch 56/400
Epoch 57/400
Epoch 58/400
Epoch 59/400
Epoch 60/400
Epoch 61/400
Epoch 62/400
Epoch 63/400
Epoch 64/400
Epoch 65/400
Epoch 66/400
Epoch 67/400
Epoch 68/400
Epoch 69/400
Epoch 70/400
Epoch 71/400
Epoch 72/400
Epoch 73/400
Epoch 74/400
Epoch 75/400
Epoch 76/400
Epoch 77/400
Epoch 78

<tensorflow.python.keras.callbacks.History at 0x7f1cc50b9940>

In [None]:
model.save_weights(DATA_PATH + 'param.hdf5')

In [None]:
model.save(DATA_PATH + 'multi_input_model.h5')

# 予測

In [None]:
pred = model.predict(X_test_sc_reshape)

レースごとに予測された確率が最も高い馬が３位以内に入っている確率を算出する。

In [None]:
def calc_prob(predict):
  """
  レースごとの予測モデルではない場合のスコア計測
  """
  stack_q = 0
  correct = 0
  for query in test_query:
    ind = np.argmax(predict[stack_q:stack_q+query])
    stack_q += query
    if test_df.iloc[ind]['show'] == 1:
      correct += 1

  print('score is', correct / len(test_query))

In [None]:
def calc_prob_race(true, pred):
  """
  レースごとの予測モデル用スコア計測
  """
  correct = 0
  for i in range(len(true)):
    ind = np.argmax(pred[i])
    if true[i][ind] == 1:
      correct += 1

  print('socre is', correct / len(true))

In [None]:
calc_prob_race(y_test, pred)

socre is 0.4776666666666667


# 考察

## 単純なニューラルネットワークモデル

- 0.6406

## 多入力ニューラルネットワーク

### 出力層をsigmoidにして回帰

- 0.4756

### 出力層をsoftmaxにして分類

- 0.436

#### 多入力側に層を追加

- 0.4246

#### 層を減らして単純に

200epoch

- 0.2836

損失関数が減っていなかった