How much does experience count for, in the world of horse racing?

[Benter][1] found that "number of past races" was one of the more significant factors in his handicapping model, and contributed greatly to the overall accuracy of his model's predictions.

This quick and dirty study aims to explore and validate this claim, using data from horse racing in Hong Kong, which was the same location as Benter ran his extremely successful betting operation from.

  [1]: https://www.scribd.com/doc/166556276/Benter

## Firstly, load the data

In [None]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Set some Pandas options
pd.set_option('max_columns', 30)
pd.set_option('max_rows', 20)

# Read the horse run data
df_runs = pd.read_csv('../input/runs.csv')
df_runs.head()

In [None]:
# We'll also need to get the race data
df_races = pd.read_csv('../input/races.csv', parse_dates=['date']).set_index('race_id')
df_races.head()

## Search for previous runs of each horse

In [None]:
# Group horse runs by horse
df_horse_runs = df_runs.groupby('horse_id')

In [None]:
# Find the number of previous runs for a horse
def number_of_previous(horse_id, race_date):
    this_horse_runs = df_horse_runs.get_group(horse_id)
    return len(this_horse_runs[this_horse_runs['date'] < race_date])

In [None]:
df_runs['date'] = df_runs.apply(lambda run: df_races.loc[run['race_id'], 'date'], axis=1)
df_runs['no_previous'] = df_runs.apply(lambda run: number_of_previous(run['horse_id'], run['date']), axis=1)
df_runs[['race_id', 'date', 'horse_no', 'horse_id', 'no_previous']].iloc[1000:1005]

## Analysis

Now that we know the number of previous runs of each horse, let's see how this is related to the number of wins.

In [None]:
runs_vs_wins = df_runs.groupby('no_previous')['won'].sum()
runs_vs_wins.plot()

At first sight, the above would seem to validate our hypothesis, albeit in a negative way. Experience does indeed count... **against** horses, it seems!

The problem with this is that we have only looked at the total number of wins in each experience group. We also need to consider the number of horses in each group, and it turns out that there are far fewer experienced horses than inexperienced ones.

In [None]:
experience = df_runs.groupby('no_previous').size()
experience.plot()

Looking at the strike rate in each experience group produces a slightly more accurate picture.

In [None]:
strike_rate_vs_wins = df_runs.groupby('no_previous')['won'].mean()
strike_rate_vs_wins.plot()

From the above, we can now conclude the following:

 - Strike rate only increases until the horse has had approximately 10 races, then gradually tapers off.

 - Few horses remain in service after 40 races, but these are most likely the cream of the crop, which would explain why some above this level of experience are outliers and have spikes of much higher strike rates.

## Conclusion

For horses, there is no simple linear relationship between experience and success. While some will enjoy great success well into the twilight of their careers, most will likely fade away and retire early.