<a href="https://colab.research.google.com/github/monda00/horse-race-notebook/blob/master/predict_show_neural_network.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ニューラルネットワークで予測

学習データの作成から予測の考察までやってみる。

特徴量

|分類	|項目 |
|---|---|
|馬情報	|馬番 |
| |枠番 |
| |年齢 |
| |性別 |
| |体重（現在） | 
| |体重（前走との差分） |
| |負担重量 |
| 当日レース情報 |レース場 |
| |出走馬数 |
| |コース距離 |
| |周回方向 |
| |コースタイプ（ダ/芝/障） |
| |天気 |
| |馬場状態 |
| |開始時間帯 |
| |時期 |
|同馬の過去レース情報（×5走分）	|オッズ |
| |人気 |
| |順位 |
| |タイム（秒） |
| |前走からの経過日数 |
| |コース距離 |
| |コースタイプ（ダ/芝/障） |
| |天気 |
| |馬場状態 |

# 概要

- ライブラリ・データ読み込み
- 前処理
- 学習
- 予測
- 考察

## 参考

- [データ収集からディープラーニングまで全て行って競馬の予測をしてみた](https://qiita.com/kami634/items/55e49dad76396d808bf5#%E5%8F%96%E5%BE%97%E3%81%97%E3%81%9Furl%E3%82%92%E3%82%82%E3%81%A8%E3%81%ABhtml%E3%82%92%E5%BE%97%E3%82%8B)
- [競馬の予測をガチでやってみた](http://stockedge.hatenablog.com/entry/2016/01/03/103428)
- [ディープラーニングさえあれば、競馬で回収率100%を超えられる](https://qiita.com/yossymura/items/334a8f3ef85bff081913)
- [競馬予想AIを作る 〜ニューラルネットワークによる相対評価データセットの取り扱い例〜](https://cocon-corporation.com/cocontoco/horseraceprediction_ai/)

# ライブラリ・データ読み込み

In [14]:
import numpy as np
import pandas as pd

from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

In [15]:
DATA_PATH = '/content/drive/My Drive/data/horse-race/'

In [16]:
df = pd.read_csv(DATA_PATH + 'train_nn.csv')

  interactivity=interactivity, compiler=compiler, result=result)


In [31]:
df.head()

Unnamed: 0,race_id,horse_number,frame_number,age,gen,weight,weight_diff,burden_weight,place,race_horse_number,distance,clockwise,field_type,field_condition,weather,time_hour,season,one_before_odd,one_before_popular,one_before_rank,one_before_time,one_before_elapsed_day,one_before_distance,one_before_field_type,one_before_field_condition,one_before_weather,two_before_odd,two_before_popular,two_before_rank,two_before_time,two_before_elapsed_day,two_before_distance,two_before_field_type,two_before_field_condition,two_before_weather,three_before_odd,three_before_popular,three_before_rank,three_before_time,three_before_elapsed_day,three_before_distance,three_before_field_type,three_before_field_condition,three_before_weather,four_before_odd,four_before_popular,four_before_rank,four_before_time,four_before_elapsed_day,four_before_distance,four_before_field_type,four_before_field_condition,four_before_weather,five_before_odd,five_before_popular,five_before_rank,five_before_time,five_before_elapsed_day,five_before_distance,five_before_field_type,five_before_field_condition,five_before_weather
0,201945010102,3,3.0,7,牝,464,4,54.0,川崎,10,1400,左,ダ,良,晴,11,winter,0.0,0.0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0
1,201945010102,5,5.0,7,牡,502,1,56.0,川崎,10,1400,左,ダ,良,晴,11,winter,0.0,0.0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0
2,201945010102,2,2.0,7,牡,464,7,56.0,川崎,10,1400,左,ダ,良,晴,11,winter,0.0,0.0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0
3,201945010102,8,7.0,7,牝,399,3,54.0,川崎,10,1400,左,ダ,良,晴,11,winter,0.0,0.0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0
4,201945010102,10,8.0,4,牝,452,32,54.0,川崎,10,1400,左,ダ,良,晴,11,winter,0.0,0.0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0


# 前処理

In [28]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 253636 entries, 0 to 253635
Data columns (total 62 columns):
 #   Column                        Non-Null Count   Dtype  
---  ------                        --------------   -----  
 0   race_id                       253636 non-null  int64  
 1   horse_number                  253636 non-null  int64  
 2   frame_number                  253636 non-null  float64
 3   age                           253636 non-null  int64  
 4   gen                           253636 non-null  object 
 5   weight                        253636 non-null  object 
 6   weight_diff                   253636 non-null  object 
 7   burden_weight                 253636 non-null  float64
 8   place                         253636 non-null  object 
 9   race_horse_number             253636 non-null  int64  
 10  distance                      253636 non-null  int64  
 11  clockwise                     253636 non-null  object 
 12  field_type                    253636 non-nul

欠損値はなし

In [20]:
df.isnull().sum().sum()

0

過去レースの情報がない馬も結構いるかもしれない。

23752個のデータで過去レースの情報がない。
数値と文字列の0が混在している。

In [21]:
past_race_columns_base = ['odd', 'popular', 'rank', 'time', 'elapsed_day', 'distance', 'field_type', 'field_condition', 'weather']
past_race_columns = []
for n in past_race_num:
  for c in past_race_columns_base:
    past_race_columns.append('{}_before_{}'.format(n, c))

In [27]:
len(df[df['one_before_odd'] == 0])

23752

In [32]:
len(df[df['one_before_weather'] == '0'])

23752

Label Encoding