<a href="https://colab.research.google.com/github/monda00/horse-race-notebook/blob/master/make_neural_network_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ニューラルネットワーク用データ作成

基礎となる学習データの作成する。

# 概要

- ライブラリ・データ読み込み
- データ作成

## 参考

- [データ収集からディープラーニングまで全て行って競馬の予測をしてみた](https://qiita.com/kami634/items/55e49dad76396d808bf5#%E5%8F%96%E5%BE%97%E3%81%97%E3%81%9Furl%E3%82%92%E3%82%82%E3%81%A8%E3%81%ABhtml%E3%82%92%E5%BE%97%E3%82%8B)
- [競馬の予測をガチでやってみた](http://stockedge.hatenablog.com/entry/2016/01/03/103428)
- [ディープラーニングさえあれば、競馬で回収率100%を超えられる](https://qiita.com/yossymura/items/334a8f3ef85bff081913)
- [競馬予想AIを作る 〜ニューラルネットワークによる相対評価データセットの取り扱い例〜](https://cocon-corporation.com/cocontoco/horseraceprediction_ai/)

# ライブラリ・データ読み込み

In [1]:
import numpy as np
import pandas as pd
import re
import collections
import datetime
from tqdm import tqdm

from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

In [2]:
DATA_PATH = '/content/drive/My Drive/data/horse-race/'

In [3]:
df = pd.read_csv(DATA_PATH + 'train_raw.csv')
df = df.sort_values(by=['race_date', 'race_id', 'rank'])
df.reset_index(inplace=True, drop=True)

In [None]:
df

Unnamed: 0,agari,age,frame_number,horse_number,horse_weight,jockey,name,popular,race_date,race_id,race_name,rank,time,weight,win,show,clockwise,distance,field_condition,field_type,place,race_round,start_time,weather
0,39.5,牝7,3.0,3,464(+4),藤本現暉,リコーアペルタ,2.0,2019/1/1,201945010102,C3七　八,1,1:32.5,54.0,3.6,1,左,1400,良,ダ,川崎,2R,11:50,晴
1,39.8,牡7,5.0,5,502(+1),加藤和博,ミラクルツッキー,1.0,2019/1/1,201945010102,C3七　八,2,1:32.5,56.0,2.0,1,左,1400,良,ダ,川崎,2R,11:50,晴
2,40.3,牡7,2.0,2,464(+7),瀧川寿希,ロジレガシー,3.0,2019/1/1,201945010102,C3七　八,3,1:32.8,56.0,5.9,1,左,1400,良,ダ,川崎,2R,11:50,晴
3,40.1,牝7,7.0,8,399(+3),岡村健司,プチプチ,8.0,2019/1/1,201945010102,C3七　八,4,1:33.5,54.0,22.1,0,左,1400,良,ダ,川崎,2R,11:50,晴
4,41.1,牝4,8.0,10,452(+32),伊藤裕人,スエヒロドラ,4.0,2019/1/1,201945010102,C3七　八,5,1:33.8,54.0,10.3,0,左,1400,良,ダ,川崎,2R,11:50,晴
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
253631,40.2,牝5,8.0,10,435(-3),藤原良一,パートカラー,9.0,2020/6/9,202048060911,おおぐま座特別,6,1:45.3,54.0,66.4,0,右,1600,良,ダ,名古屋,11R,17:00,晴
253632,41.7,牝6,1.0,1,509(0),浅野皓大,フラワーイレブン,4.0,2020/6/9,202048060911,おおぐま座特別,7,1:45.5,51.0,8.2,0,右,1600,良,ダ,名古屋,11R,17:00,晴
253633,40.2,牡5,4.0,4,485(-5),加藤聡一,テイエムヨハネス,3.0,2020/6/9,202048060911,おおぐま座特別,8,1:45.6,56.0,6.2,0,右,1600,良,ダ,名古屋,11R,17:00,晴
253634,43.0,牝4,8.0,9,403(-2),友森翔太,メモリーバリケード,2.0,2020/6/9,202048060911,おおぐま座特別,9,1:46.9,55.0,3.2,0,右,1600,良,ダ,名古屋,11R,17:00,晴


# データ作成

以下のデータに整形する。

馬体重と差分はレース直前の木曜ぐらいにわかる。

|分類	|項目 |
|---|---|
|馬情報	|馬番 |
| |枠番 |
| |年齢 |
| |性別 |
| |体重（現在） | 
| |体重（前走との差分） |
| |負担重量 |
| 当日レース情報 |レース場 |
| |出走馬数 |
| |コース距離 |
| |周回方向 |
| |コースタイプ（ダ/芝/障） |
| |天気 |
| |馬場状態 |
| |開始時間帯 |
| |時期 |
|同馬の過去レース情報（×5走分）	|オッズ |
| |人気 |
| |順位 |
| |タイム（秒） |
| |前走からの経過日数 |
| |コース距離 |
| |コースタイプ（ダ/芝/障） |
| |天気 |
| |馬場状態 |

## カラム作成

In [None]:
id_column = ['race_id']
horse_columns = ['horse_number', 'frame_number', 'age', 'gen', 'weight', 'weight_diff', 'burden_weight']
race_columns = ['place', 'race_horse_number', 'distance', 'clockwise', 'field_type', 'field_condition', 'weather', 'time_hour', 'season']
past_race_columns_base = ['odd', 'popular', 'rank', 'time', 'elapsed_day', 'distance', 'field_type', 'field_condition', 'weather']

過去５回分のレースのカラム を作成。

In [None]:
past_race_num = ['one', 'two', 'three', 'four', 'five']

In [None]:
past_race_columns = []
for n in past_race_num:
  for c in past_race_columns_base:
    past_race_columns.append('{}_before_{}'.format(n, c))

In [None]:
columns = id_column + horse_columns + race_columns + past_race_columns

## 新しいdataframeの作成

In [None]:
train_df = pd.DataFrame(columns=columns)

In [None]:
input_columns = ['race_id', 'horse_number', 'frame_number', 'place', 'distance', 'clockwise', 'field_type', 'field_condition', 'weather']
train_df[input_columns] = df[input_columns]

### 年齢と性別

In [None]:
df['age'].value_counts()

牝3     37654
牡3     32885
牝4     31008
牡4     27611
牝5     20429
牡5     20394
牡6     14000
牝6     10809
牡7      9747
牝2      8603
牡2      7819
牡8      5950
牝7      4582
セ4      3235
セ5      3181
牡9      2793
セ6      2424
牝8      1782
セ3      1765
セ7      1750
牡10     1151
セ8      1105
牝9       753
セ9       590
牡11      435
セ10      267
牝10      250
牝11      168
セ2       152
牝12       94
セ11       69
牡12       51
セ13       29
セ12       27
牝15       26
牡13       20
牝16       14
セ14       11
牝13        2
牡14        1
Name: age, dtype: int64

In [None]:
gen = []
age = []
for i in range(len(df)):
  age_v = df.iloc[i]['age']
  gen.append(re.search(r'(.)(\d{1,2})', age_v).group(1))
  age.append(re.search(r'(.)(\d{1,2})', age_v).group(2))

In [None]:
train_df['age'] = age
train_df['gen'] = gen

### 負担重量

In [None]:
train_df['burden_weight'] = df['weight']

### 時間帯

In [None]:
time_hour = []
for i in range(len(df)):
  start_time = df.iloc[i]['start_time']
  time_hour.append(int(re.search(r'(.*):(.*)', start_time).group(1)))

In [None]:
train_df['time_hour'] = time_hour

### 季節

In [None]:
season = []
for i in range(len(df)):
  race_date = df.iloc[i]['race_date']
  race_month = int(re.search(r'\/.+?\/', race_date).group().replace('/', ''))
  if 3 <= race_month <= 5:
    season.append('sprint')
  elif 6 <= race_month <= 8:
    season.append('summer')
  elif 9 <= race_month <= 11:
    season.append('autumn')
  else:
    season.append('winter')

In [None]:
train_df['season'] = season

### 出走馬数

In [None]:
race_horse_number_counter = list(collections.Counter(list(df['race_id'].values)).values())

In [None]:
race_horse_number = []
for n in race_horse_number_counter:
  for _ in range(n):
    race_horse_number.append(n)

In [None]:
train_df['race_horse_number'] = race_horse_number

### 体重と体重の増減

In [None]:
weight = []
weight_diff = []
for i in range(len(df)):
  horse_weight = df.iloc[i]['horse_weight']
  if horse_weight == '計不':
    weight.append('計不')
    weight_diff.append('計不')
  else:
    weight.append(int(re.search(r'(.*)(\(.*?\))', horse_weight).group(1)))
    weight_diff.append(re.search(r'(.*)(\(.*?\))', horse_weight).group(2).replace('(', '').replace(')', ''))

In [None]:
train_df['weight'] = weight
train_df['weight_diff'] = weight_diff

### 過去レースのデータ

In [None]:
train_df[past_race_columns] = 0

191300

In [None]:
train_df = pd.read_csv(DATA_PATH + 'train_nn.csv')

  interactivity=interactivity, compiler=compiler, result=result)


In [None]:
for i in tqdm(range(191400, len(df))):
  horse_name = df.iloc[i]['name']
  race_date = datetime.datetime.strptime(df.iloc[i]['race_date'], "%Y/%m/%d")
  past_num = 0

  horse_df = df.iloc[:i].query('name == "{}"'.format(horse_name))
  for j in range(len(horse_df)-1, -1, -1):
    race_date_before = datetime.datetime.strptime(horse_df.iloc[j]['race_date'], "%Y/%m/%d")
    train_df.loc[i, '{}_before_{}'.format(past_race_num[past_num], 'odd')] = horse_df.iloc[j]['win']
    train_df.loc[i, '{}_before_{}'.format(past_race_num[past_num], 'popular')] = horse_df.iloc[j]['popular']
    train_df.loc[i, '{}_before_{}'.format(past_race_num[past_num], 'rank')] = horse_df.iloc[j]['rank']
    train_df.loc[i, '{}_before_{}'.format(past_race_num[past_num], 'time')] = horse_df.iloc[j]['time']
    train_df.loc[i, '{}_before_{}'.format(past_race_num[past_num], 'elapsed_day')] = abs(race_date - race_date_before).days
    train_df.loc[i, '{}_before_{}'.format(past_race_num[past_num], 'distance')] = horse_df.iloc[j]['distance']
    train_df.loc[i, '{}_before_{}'.format(past_race_num[past_num], 'field_type')] = horse_df.iloc[j]['field_type']
    train_df.loc[i, '{}_before_{}'.format(past_race_num[past_num], 'field_condition')] = horse_df.iloc[j]['field_condition']
    train_df.loc[i, '{}_before_{}'.format(past_race_num[past_num], 'weather')] = horse_df.iloc[j]['weather']
    past_num += 1
    
    if past_num >= 5:
      break

100%|██████████| 62236/62236 [2:26:39<00:00,  7.07it/s]


In [None]:
train_df.to_csv(DATA_PATH + 'train_nn.csv', index=False)

In [None]:
train_df.tail()

Unnamed: 0,race_id,horse_number,frame_number,age,gen,weight,weight_diff,burden_weight,place,race_horse_number,distance,clockwise,field_type,field_condition,weather,time_hour,season,one_before_odd,one_before_popular,one_before_rank,one_before_time,one_before_elapsed_day,one_before_distance,one_before_field_type,one_before_field_condition,one_before_weather,two_before_odd,two_before_popular,two_before_rank,two_before_time,two_before_elapsed_day,two_before_distance,two_before_field_type,two_before_field_condition,two_before_weather,three_before_odd,three_before_popular,three_before_rank,three_before_time,three_before_elapsed_day,three_before_distance,three_before_field_type,three_before_field_condition,three_before_weather,four_before_odd,four_before_popular,four_before_rank,four_before_time,four_before_elapsed_day,four_before_distance,four_before_field_type,four_before_field_condition,four_before_weather,five_before_odd,five_before_popular,five_before_rank,five_before_time,five_before_elapsed_day,five_before_distance,five_before_field_type,five_before_field_condition,five_before_weather
253631,202048060911,10,8.0,5,牝,435,-3,54.0,名古屋,10,1600,右,ダ,良,晴,17,summer,27.8,6.0,1,1:30.0,11,1400,ダ,良,晴,187.2,14.0,13,1:46.4,61,1600,ダ,良,曇,122.6,13.0,13,1:15.8,40,1200,ダ,稍,晴,279.6,13.0,14,1:47.2,83,1600,ダ,稍,晴,204.5,11.0,12,1:16.2,123,1200,ダ,良,晴
253632,202048060911,1,1.0,6,牝,509,0,51.0,名古屋,10,1600,右,ダ,良,晴,17,summer,4.0,3.0,1,1:42.9,25,1600,ダ,良,曇,17.0,5.0,6,1:43.8,321,1600,ダ,良,晴,5.7,2.0,9,1:32.5,348,1400,ダ,重,雨,1.8,1.0,1,1:29.4,364,1400,ダ,重,曇,5.5,3.0,1,1:45.0,404,1600,ダ,重,晴
253633,202048060911,4,4.0,5,牡,485,-5,56.0,名古屋,10,1600,右,ダ,良,晴,17,summer,1.8,1.0,1,1:29.9,11,1400,ダ,良,晴,49.9,14.0,16,1:15.3,122,1200,ダ,良,晴,22.4,8.0,10,1:55.4,374,1800,ダ,良,晴,14.6,7.0,5,1:53.2,394,1800,ダ,良,晴,6.8,4.0,8,1:53.7,416,1800,ダ,良,晴
253634,202048060911,9,8.0,4,牝,403,-2,55.0,名古屋,10,1600,右,ダ,良,晴,17,summer,3.4,2.0,1,1:28.1,34,1400,ダ,良,曇,5.8,3.0,1,1:57.3,14,1800,ダ,良,曇,2.8,2.0,3,1:30.5,46,1400,ダ,良,晴,1.8,1.0,3,1:31.0,60,1400,ダ,良,晴,2.5,1.0,4,1:30.8,88,1400,ダ,稍,曇
253635,202048060911,8,7.0,5,牝,424,-5,50.0,名古屋,10,1600,右,ダ,良,晴,17,summer,2.4,1.0,8,1:46.3,35,1600,ダ,良,晴,4.9,3.0,1,1:44.7,12,1600,ダ,良,晴,5.2,4.0,5,1:45.7,26,1600,ダ,良,晴,1.6,1.0,1,1:43.7,307,1600,ダ,良,晴,2.1,1.0,1,1:45.2,294,1600,ダ,良,曇


# タイムの変換

秒に変換する。

In [None]:
for i in tqdm(range(len(train_df))):
  for n in past_race_num:
    if train_df.loc[i, '{}_before_time'.format(n)] != '0' and train_df.loc[i, '{}_before_time'.format(n)] != 0:
      dt = datetime.datetime.strptime(train_df.loc[i, '{}_before_time'.format(n)], '%M:%S.%f')
      train_df.loc[i, '{}_before_time'.format(n)] = datetime.timedelta(minutes=dt.minute, seconds=dt.second, milliseconds=dt.microsecond).total_seconds()

100%|██████████| 253636/253636 [2:10:14<00:00, 32.46it/s]


In [None]:
train_df.tail()

Unnamed: 0,race_id,horse_number,frame_number,age,gen,weight,weight_diff,burden_weight,place,race_horse_number,distance,clockwise,field_type,field_condition,weather,time_hour,season,one_before_odd,one_before_popular,one_before_rank,one_before_time,one_before_elapsed_day,one_before_distance,one_before_field_type,one_before_field_condition,one_before_weather,two_before_odd,two_before_popular,two_before_rank,two_before_time,two_before_elapsed_day,two_before_distance,two_before_field_type,two_before_field_condition,two_before_weather,three_before_odd,three_before_popular,three_before_rank,three_before_time,three_before_elapsed_day,three_before_distance,three_before_field_type,three_before_field_condition,three_before_weather,four_before_odd,four_before_popular,four_before_rank,four_before_time,four_before_elapsed_day,four_before_distance,four_before_field_type,four_before_field_condition,four_before_weather,five_before_odd,five_before_popular,five_before_rank,five_before_time,five_before_elapsed_day,five_before_distance,five_before_field_type,five_before_field_condition,five_before_weather
253631,202048060911,10,8.0,5,牝,435,-3,54.0,名古屋,10,1600,右,ダ,良,晴,17,summer,27.8,6.0,1,90,11,1400,ダ,良,晴,187.2,14.0,13,506,61,1600,ダ,良,曇,122.6,13.0,13,875,40,1200,ダ,稍,晴,279.6,13.0,14,307,83,1600,ダ,稍,晴,204.5,11.0,12,276,123,1200,ダ,良,晴
253632,202048060911,1,1.0,6,牝,509,0,51.0,名古屋,10,1600,右,ダ,良,晴,17,summer,4.0,3.0,1,1002,25,1600,ダ,良,曇,17.0,5.0,6,903,321,1600,ダ,良,晴,5.7,2.0,9,592,348,1400,ダ,重,雨,1.8,1.0,1,489,364,1400,ダ,重,曇,5.5,3.0,1,105,404,1600,ダ,重,晴
253633,202048060911,4,4.0,5,牡,485,-5,56.0,名古屋,10,1600,右,ダ,良,晴,17,summer,1.8,1.0,1,989,11,1400,ダ,良,晴,49.9,14.0,16,375,122,1200,ダ,良,晴,22.4,8.0,10,515,374,1800,ダ,良,晴,14.6,7.0,5,313,394,1800,ダ,良,晴,6.8,4.0,8,813,416,1800,ダ,良,晴
253634,202048060911,9,8.0,4,牝,403,-2,55.0,名古屋,10,1600,右,ダ,良,晴,17,summer,3.4,2.0,1,188,34,1400,ダ,良,曇,5.8,3.0,1,417,14,1800,ダ,良,曇,2.8,2.0,3,590,46,1400,ダ,良,晴,1.8,1.0,3,91,60,1400,ダ,良,晴,2.5,1.0,4,890,88,1400,ダ,稍,曇
253635,202048060911,8,7.0,5,牝,424,-5,50.0,名古屋,10,1600,右,ダ,良,晴,17,summer,2.4,1.0,8,406,35,1600,ダ,良,晴,4.9,3.0,1,804,12,1600,ダ,良,晴,5.2,4.0,5,805,26,1600,ダ,良,晴,1.6,1.0,1,803,307,1600,ダ,良,晴,2.1,1.0,1,305,294,1600,ダ,良,曇


## 各レースのデータを18行ごとにする

レースをまとめてニューラルネットワークで処理させるために、各レースのデータ数を同じにする。

In [4]:
train_df = pd.read_csv(DATA_PATH + 'train_nn.csv')

  interactivity=interactivity, compiler=compiler, result=result)


日付の特徴量を追加して、ソートで利用する。

1. 各レースIDで足りない数の行を追加
2. レース日、レースID、馬番でソート

In [None]:
train_df['date'] = df['race_date']

In [None]:
train_df['race_horse_number'].value_counts()

10    48690
12    43968
9     33840
11    31911
16    30592
8     18824
14    16366
15    12360
13    10465
18     5022
17     1598
Name: race_horse_number, dtype: int64

In [None]:
race_id_li = list(train_df['race_id'].values)
date_li = list(train_df['date'].values)

In [None]:
len(train_df['race_id'].unique())

22630

In [None]:
for race_id in tqdm(train_df['race_id'].unique()):
  race_horse_num = int(train_df[train_df['race_id']==race_id]['race_horse_number'].unique()[0])
  date = train_df[train_df['race_id']==race_id]['date'].unique()[0]
  race_id_li.extend([race_id]*(18-race_horse_num))
  date_li.extend([date]*(18-race_horse_num))

100%|██████████| 22630/22630 [00:45<00:00, 495.38it/s]


In [None]:
len(race_id_li)

407340

In [None]:
train_df_ex = pd.DataFrame()
for c in train_df.columns:
  li = list(train_df[c].values)
  if len(li) < 407340:
    li.extend([np.nan]*(407340-len(li)))
  train_df_ex[c] = li

train_df_ex['race_id'] = race_id_li
train_df_ex['date'] = date_li

In [None]:
train_df_ex = train_df_ex.sort_values(by=['date', 'race_id', 'horse_number'])
train_df_ex.reset_index(inplace=True, drop=True)

In [None]:
train_df_ex.head(50)

Unnamed: 0,race_id,horse_number,frame_number,age,gen,weight,weight_diff,burden_weight,place,race_horse_number,distance,clockwise,field_type,field_condition,weather,time_hour,season,one_before_odd,one_before_popular,one_before_rank,one_before_time,one_before_elapsed_day,one_before_distance,one_before_field_type,one_before_field_condition,one_before_weather,two_before_odd,two_before_popular,two_before_rank,two_before_time,two_before_elapsed_day,two_before_distance,two_before_field_type,two_before_field_condition,two_before_weather,three_before_odd,three_before_popular,three_before_rank,three_before_time,three_before_elapsed_day,three_before_distance,three_before_field_type,three_before_field_condition,three_before_weather,four_before_odd,four_before_popular,four_before_rank,four_before_time,four_before_elapsed_day,four_before_distance,four_before_field_type,four_before_field_condition,four_before_weather,five_before_odd,five_before_popular,five_before_rank,five_before_time,five_before_elapsed_day,five_before_distance,five_before_field_type,five_before_field_condition,five_before_weather,show,date
0,201945010102,1.0,1.0,7.0,牝,448.0,0.0,54.0,川崎,10.0,1400.0,左,ダ,良,晴,11.0,winter,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2019/1/1
1,201945010102,2.0,2.0,7.0,牡,464.0,7.0,56.0,川崎,10.0,1400.0,左,ダ,良,晴,11.0,winter,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,2019/1/1
2,201945010102,3.0,3.0,7.0,牝,464.0,4.0,54.0,川崎,10.0,1400.0,左,ダ,良,晴,11.0,winter,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,2019/1/1
3,201945010102,4.0,4.0,6.0,牡,449.0,7.0,55.0,川崎,10.0,1400.0,左,ダ,良,晴,11.0,winter,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2019/1/1
4,201945010102,5.0,5.0,7.0,牡,502.0,1.0,56.0,川崎,10.0,1400.0,左,ダ,良,晴,11.0,winter,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,2019/1/1
5,201945010102,6.0,6.0,6.0,牡,445.0,-1.0,56.0,川崎,10.0,1400.0,左,ダ,良,晴,11.0,winter,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2019/1/1
6,201945010102,7.0,7.0,4.0,牡,481.0,-14.0,56.0,川崎,10.0,1400.0,左,ダ,良,晴,11.0,winter,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2019/1/1
7,201945010102,8.0,7.0,7.0,牝,399.0,3.0,54.0,川崎,10.0,1400.0,左,ダ,良,晴,11.0,winter,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2019/1/1
8,201945010102,9.0,8.0,8.0,牝,392.0,-2.0,51.0,川崎,10.0,1400.0,左,ダ,良,晴,11.0,winter,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2019/1/1
9,201945010102,10.0,8.0,4.0,牝,452.0,32.0,54.0,川崎,10.0,1400.0,左,ダ,良,晴,11.0,winter,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2019/1/1


In [None]:
train_df_ex.to_csv(DATA_PATH + 'train_nn.csv', index=False)

## 正解ラベルの追加

In [None]:
train_df['show'] = df['show']

In [18]:
train_df['rank'] = 0

In [29]:
for i in tqdm(range(len(train_df))):
  rank = df.loc[(df['race_id']==train_df.loc[i, 'race_id']) & (df['horse_number']==train_df.loc[i, 'horse_number']), 'rank'].values
  if rank:
    train_df.loc[i, 'rank'] = rank[0]

  This is separate from the ipykernel package so we can avoid doing imports until
100%|██████████| 407340/407340 [27:31<00:00, 246.59it/s]


In [31]:
train_df.to_csv(DATA_PATH + 'train_gb.csv', index=False)