# first attempt

hypothesis: last 16 games should correlate to future performance.

possible question to ask: given probability distribution based on last 16 games, a player falls into one of those probability buckets on his next game. what was the probability they would fall into that bucket?

what does this attempt at solving though? if they fall into higher probability buckets more frequently, which will confirm my hypothesis.

this is actually kind of difficult to even frame the question. we're not predicting anything. what if we compared cumulative versus 16 game performance and attempted to determine which was a better indicator of future performance?

## comparing two models?

event: DeMarco Murray rushing for 100-150 yards

- model A predicted a 25% chance
- model B predicted a 20% chance

model A was a better predictor in this case, or was generally a better model.

For each game, does model A or model B predict the values better?

Then, can we run this for all previous *x* games to figure out how many games we should be looking at?

Then, how fine-grained can we make the buckets?

Then, are there certain weights we can apply to past performance that will improve the model?

Then, bake in other parameters like expected value and variance?

In [71]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import lombardi

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [72]:
player_data = lombardi.data.demo_players()

In [None]:
start_idx = 16
bucket_sizes = {
    'qb': [25, 50],
    'rb': [10, 25, 50],
    'wr': [10, 25, 50]
}

results = []
for position, data in player_data.iteritems():
    
    metric = data['metric']
    x = np.linspace(0, lombardi.data.metric_range(metric), 1000)
    for player in data['players']:
        
        player_df = lombardi.data.player_df(player, metric)
        performance = list(player_df[metric])
        
        for bucket_size in bucket_sizes[position]:
        #for bucket_size in [50]:   
            print position, bucket_size, player
            
            for week in range(start_idx, len(performance)):
                
                try:
                    # get bucket limits
                    yards = performance[week]
                    y1 = yards - yards % bucket_size
                    y2 = y1 + bucket_size

                    samples = {
                        'cumulative': performance[0:week],
                        'running': performance[week-start_idx:week]
                    }
                    
                    for sample_type, sample_data in samples.iteritems():
                    
                        # compute probability for sample
                        x, cdf = lombardi.stats.cdf(x, sample_data)
                        p_yards = lombardi.stats.probability_bucket(x, cdf, y1, y2)

                        # save results
                        results.append({
                            'week': week,
                            'sample': sample_type,
                            'prob_yds': p_yards,
                            'player': player,
                            'metric': metric,
                            'bucket_size': bucket_size,
                            'position': position,
                            'yards': yards,
                        })
                except Exception, e:
                    print e

qb 25 Tom Brady
qb 50 Tom Brady
qb 25 Cam Newton
qb 50 Cam Newton
qb 25 Aaron Rodgers
qb 50 Aaron Rodgers
qb 25 Russell Wilson
qb 50 Russell Wilson
qb 25 Drew Brees
qb

In [78]:
df = pd.DataFrame(results)

In [80]:
df.to_sql('bucket_results', lombardi.data.write_conn, index=False, if_exists='append')