# Recap and Goal Setting

In the last notebook, we found that our method seems to work best when we filter out bad players first with the model that has the best results with choosing low scoring players (Gradient Boosting), followed by choosing players with a different model.

After filtering, there were several models that performed well as far as choosing high scoring players goes. We want to see if we can improve the performance of those models even further, so that hopefully whenever the lineup builder goes to work, it doesn't take as long, and the lineups we end up with are much more likely to be high-scoring. 

Currently, out of 100 lineups built, only about 10-20% of them (depending on the model used) are actually "in the money" (ITM), or something that would see a return on an investment. In the long run, this is probably enough to break even, possibly even be profitable. But, a better percentage would shorten the time to a positive ROI, increase the ROI, or both.

The other shortcoming right now: it takes about 70 min to generate these lineups. If a player is ruled out at the last minute, it wouldn't be feasible to try and re-run the current algorithm in it's current state with that player dropped, so speeding it up would be essential. I could probably just substitute a different predicted player in the position, but that subjects me to the emotions and other dubious pitfalls of picking players.

So for this round, we have 2 goals:

1. Reduce the time to build optimal lineups
2. Increase the percentage of ITM lineups

The goal is to implement Grid Search so that we filter players better, and after filtering, we choose players better. Those things in tandem should 1) make the pool smaller (speeds up lineup building) and 2) make the pool better representitive of high-scoring players (more ITM lineups).

## Import Libraries

In [1]:
from collections import defaultdict
from datetime import datetime
import random
import sys

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
pd.options.mode.chained_assignment = None # to remove some warnings
import seaborn as sns

from sklearn.compose import ColumnTransformer
from sklearn.ensemble import AdaBoostRegressor, GradientBoostingRegressor, RandomForestRegressor
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler, MinMaxScaler, OrdinalEncoder 
from sklearn.svm import SVR
import warnings
warnings.simplefilter(action="ignore", category=FutureWarning)
warnings.filterwarnings(action="ignore", module="scipy", message="^internal gelsd")

from xgboost import XGBRegressor

## Copied class from last notebook

In [2]:
class Lineup:
    """ 
    takes the results of the model prediction (dataframe 
    with attached predictions) and builds out a few lineups 
    """
    def __init__(self, df, def_df, verbose=False):
        self.verbose = verbose
        self.df = df
        self.def_df = def_df[:15]
        self.current_salary = 100*1000
        self.no_duplicates = False
        self.top_lineups = []
        self.qbs = []
        self.rbs = []
        self.wrs = []
        self.tes = []
        self.flex = []
        self.defs = []
    
    def find_top_10(self, position):
        arr = []
        end_of_range = len(self.df.loc[self.df['Pos']==position])
        if position == 'Flex':
            position_df = self.df.loc[(self.df['Pos']=='RB')|(self.df['Pos']=='TE')|(self.df['Pos']=='WR')]
            end_of_range = (len(self.df.loc[self.df['Pos']=='RB'])+
                            len(self.df.loc[self.df['Pos']=='WR'])+
                            len(self.df.loc[self.df['Pos']=='TE']))
        elif position == 'Def':
            end_of_range = len(self.def_df)
            position_df = self.def_df
            position_df = position_df.sort_values(by='pred', ascending=False)
        else:
            position_df = self.df.loc[self.df['Pos']==position]
        
        # print(position_df)
        for row in range(0,end_of_range):
            player = {
                'name': position_df.iloc[row]['Name'],
                'h/a': position_df.iloc[row]['h/a'],
                'pos': position_df.iloc[row]['Pos'],
                'salary': position_df.iloc[row]['DK salary'],
                'pred_points': position_df.iloc[row]['pred'],
                'act_pts':position_df.iloc[row]['actual_score']
            }
            if len(arr) < end_of_range:
                arr.append(player)
            else: 
                break
        return arr
    
    def get_players(self):
        top_10_qbs = self.find_top_10(position='QB')
        top_10_rbs = self.find_top_10(position='RB')
        top_10_wrs = self.find_top_10(position='WR')
        top_10_tes = self.find_top_10(position='TE')
        top_10_flex = self.find_top_10(position='Flex')
        top_10_defs = self.find_top_10(position='Def')
        return top_10_qbs, top_10_rbs, top_10_wrs, top_10_tes, top_10_flex, top_10_defs
    
    def check_salary(self, lineup):
        current_salary = 0
        for keys in lineup.keys():
            current_salary += lineup[keys]['salary']
        return current_salary
    
    def reduce_salary(self, lineup):
        while self.current_salary > 50*1000:
            position_df = self.df
            greatest_salary = 0
            pos = 'none'
            pos_to_change = 'none'
            for key in lineup.keys():
                if lineup[key]['salary'] > greatest_salary:
                    greatest_salary = lineup[key]['salary']
                    pos = lineup[key]['pos'] # RB, TE, Def, etc.
                    pos_to_change = key # RB1 or WR2 or something like that
            if pos_to_change == 'Def':
                position_df = def_df
            elif pos_to_change == 'Flex':
                position_df = self.df.loc[(self.df['Pos']=='RB')|(self.df['Pos']=='TE')|(self.df['Pos']=='WR')]
            else:
                pass
    #             print(position_df)    
            new_player = (position_df.loc[(position_df.Pos == pos)&(position_df['DK salary'] < greatest_salary)]).sort_values(by='DK salary', ascending=False).head(1)
            player = {
                'name': new_player['Name'].values[0],
                'h/a': new_player['h/a'].values[0],
                'pos': new_player['Pos'].values[0],
                'salary': new_player['DK salary'].values[0],
                'pred_points': new_player['pred'].values[0],
                'act_pts':new_player['actual_score'].values[0]
            }
    #         print(player)    
            lineup[pos_to_change] = player
    #         print(lineup)
            self.current_salary = self.check_salary(lineup)
        return lineup
    
    def check_duplicates(self, lineup):
        rb1_name = lineup['RB1']['name']
        rb2_name = lineup['RB2']['name']
        flex_name = lineup['Flex']['name']
        wr1_name = lineup['WR1']['name']
        wr2_name = lineup['WR2']['name']
        wr3_name = lineup['WR3']['name']
        te_name = lineup['TE']['name']
        names = [flex_name, rb1_name, rb2_name, wr1_name, wr2_name, wr3_name, te_name]
        while len(names) > 1:
            if names[0] in names[1:-1]:
                return False
            else:
                names.pop(0)   
        return True
    
    def shuffle_players(self):
        lineup = {
            'QB': self.qbs[random.randrange(len(self.df.loc[self.df['Pos']=='QB']))],
            'RB1': self.rbs[random.randrange(len(self.df.loc[self.df['Pos']=='RB']))],
            'RB2': self.rbs[random.randrange(len(self.df.loc[self.df['Pos']=='RB']))],
            'WR1': self.wrs[random.randrange(len(self.df.loc[self.df['Pos']=='WR']))],
            'WR2': self.wrs[random.randrange(len(self.df.loc[self.df['Pos']=='WR']))],
            'WR3': self.wrs[random.randrange(len(self.df.loc[self.df['Pos']=='WR']))],
            'TE': self.tes[random.randrange(len(self.df.loc[self.df['Pos']=='TE']))],
            'Flex': self.flex[random.randrange(len(self.df.loc[self.df['Pos']=='RB'])+
                                               len(self.df.loc[self.df['Pos']=='WR'])+
                                               len(self.df.loc[self.df['Pos']=='TE']))],
            'Def': self.defs[random.randrange(len(self.def_df))]
        }
        return lineup
    
    def build_lineup(self):
        # in theory, because of the legwork done by the algorithm,
        # any lineup should be good as long as it abides by the
        # constraints of DraftKings' team structures. So for
        # now, this will just give us the lineups that fit within
        # the salary cap and team requirements
        
        self.current_salary = 100*1000
        self.no_duplicates = False
        self.qbs, self.rbs, self.wrs, self.tes, self.flex, self.defs = self.get_players()
        lineup = self.shuffle_players()
        
        while True:
            if self.verbose:
                print('======================')
                print(f"Salary: {self.current_salary}")
                print(f"No Duplicates: {self.no_duplicates}")
                print('======================')
            self.no_duplicates = self.check_duplicates(lineup)
            self.current_salary = self.check_salary(lineup)
            # fix duplicates first
            if self.no_duplicates == False:
                lineup = self.shuffle_players()
            # check salary, making sure it's between 45k and 50k
            if self.current_salary > 50*1000:
                try:
                    lineup = self.reduce_salary(lineup)
                except:
                    lineup = self.shuffle_players()
            self.no_duplicates = self.check_duplicates(lineup)
            self.current_salary = self.check_salary(lineup)
            
            if (self.current_salary <= 50*1000 
#             and self.current_salary >= 45*1000 
            and self.no_duplicates):
                # if everything looks good, end the 
                # loop and append the lineup
                break
                
        
        self.top_lineups.append(lineup)
        if len(self.top_lineups) % 5 == 0:
            print(f"Added lineup. Total lineups: {len(self.top_lineups)}")
    


## Get and process data

In [3]:
# Helper Functions

def get_weekly_data(week, year):
    """ get player data for designated week """
    file_path = f"./csv's/{year}/year-{year}-week-{week}-DK-player_data.csv"
    df = pd.read_csv(file_path)
    return df

def get_ytd_season_data(year, current_week):
    """ get data for current season up to most recent week """
    df = get_weekly_data(1,year)
    for week in range(2,current_week+1):
        try:
            df = df.append(get_weekly_data(week, year), ignore_index=True)
        except:
            print("No data for week: "+str(week))
    df = df.drop(['Unnamed: 0', 'Year'], axis=1)
    return df

def get_season_data(year, drop_year=True):
    """ get entire season of data """
    df = get_weekly_data(1,year)
    for week in range(2,17):
        try:
            df = df.append(get_weekly_data(week, year), ignore_index=True)
        except:
            print("No data for week: "+str(week))
    if drop_year:
        df = df.drop(['Unnamed: 0', 'Year'], axis=1)
    else:
        df = df.drop(['Unnamed: 0'], axis=1)
    return df

def scale_features(sc_salary, sc_points, sc_pts_ald, X_train, X_test, first_time=False):
    """ scales data for training """
    if first_time:
        X_train['DK salary'] = sc_salary.fit_transform(X_train['DK salary'].values.reshape(-1,1))
#         X_train['Oppt_pts_allowed_lw'] = sc_pts_ald.fit_transform(X_train['Oppt_pts_allowed_lw'].values.reshape(-1,1))
    X_test['DK salary'] = sc_salary.transform(X_test['DK salary'].values.reshape(-1,1))
#     X_test['Oppt_pts_allowed_lw'] = sc_pts_ald.transform(X_test['Oppt_pts_allowed_lw'].values.reshape(-1,1))
    return X_train, X_test

def unscale_features(sc_salary, sc_points, sc_pts_ald, X_train, X_test):
    """ used to change features back so that human readable information can be used to assess
    lineups and player information and performance"""
    X_train['DK salary'] = sc_salary.inverse_transform(X_train['DK salary'].values.reshape(-1,1))
#     X_train['Oppt_pts_allowed_lw'] = sc_pts_ald.inverse_transform(X_train['Oppt_pts_allowed_lw'].values.reshape(-1,1))
    X_test['DK salary'] = sc_salary.inverse_transform(X_test['DK salary'].values.reshape(-1,1))
#     X_test['avg_points'] = sc_points.inverse_transform(X_test['avg_points'].values.reshape(-1,1))
#     X_test['Oppt_pts_allowed_lw'] = sc_pts_ald.inverse_transform(X_test['Oppt_pts_allowed_lw'].values.reshape(-1,1))
    return X_train, X_test

def handle_nulls(df):
    # players that have nulls for any of the columns are 
    # extremely likely to be under performing or going into a bye.
    # the one caveat is that some are possibly coming off a bye.
    # to handle this later, probably will drop them, save those
    # as a variable, and then re-merge after getting rid of the other
    # null values.
    df = df.dropna()
    return df

def eval_model(df):
    df['score_ratio'] = round(df['actual_score'] / df['pred'],4)
    return df

def remove_outliers_btwn_ij(df, i=-1, j=5):
    s = df.loc[(df.score_ratio > i) & (df.score_ratio < j)]
    return s, i, j

def get_RMSE(y_true, y_pred):
    MSE = mean_squared_error(y_true, y_pred)
    RMSE = np.sqrt(MSE)
    return RMSE

def summarize_df(df, o_u_thresh=15):
    df = eval_model(df)
    RMSE = get_RMSE(df['actual_score'], df['pred'])
    print(f"Total entries analyzed: {len(df)}")
    s, i, j = remove_outliers_btwn_ij(df)
    print(f"Total entries after outliers removed: {len(s)}. Left boundary: {i}x Right Boundary: {j}x")
    correct_preds_over_thresh = s[(s.pred >= o_u_thresh)&(s.actual_score>=o_u_thresh)]
    correct_preds_under_thresh = s[(s.pred <= o_u_thresh)&(s.actual_score<=o_u_thresh)]
    incorrect_preds_under_thresh = s[(s.pred <= o_u_thresh)&(s.actual_score>=o_u_thresh)]
    incorrect_preds_over_thresh = s[(s.pred >= o_u_thresh)&(s.actual_score<=o_u_thresh)]
    print(f"Correct predictions of over {o_u_thresh} pts: {len(correct_preds_over_thresh)}. Percent: {round(len(correct_preds_over_thresh)/len(s)*100,2)}") # True Positive
    print(f"Correct predictions of under {o_u_thresh} pts: {len(correct_preds_under_thresh)}. Percent: {round(len(correct_preds_under_thresh)/len(s)*100,2)}") # True Negative
    print(f"Incorrect predictions of over {o_u_thresh} pts: {len(incorrect_preds_over_thresh)}. Percent: {round(len(incorrect_preds_over_thresh)/len(s)*100,2)}") # False Positive
    print(f"Incorrect predictions of under {o_u_thresh} pts: {len(incorrect_preds_under_thresh)}. Percent: {round(len(incorrect_preds_under_thresh)/len(s)*100,2)}") # False Negative
    print(f"RMSE: {RMSE}")
    print("Ignore following metrics for filtered DF:")
    print(f"Total percent correct over {o_u_thresh}: {round(len(correct_preds_over_thresh)/len(s)*100,2)-round(len(incorrect_preds_over_thresh)/len(s)*100,2)}")
    print(f"Total percent correct under {o_u_thresh}: {round(len(correct_preds_under_thresh)/len(s)*100,2)-round(len(incorrect_preds_under_thresh)/len(s)*100,2)}")
    

In [4]:
season = 2020
week = 6
next_week = week + 1
dataset = get_season_data(season)
df = handle_nulls(dataset)
def_df = df.loc[df.Pos == 'Def']
def_df['fantasy_points_allowed_lw'] = 0
df['Oppt_pts_allowed_lw'] = 0
def_teams = [x for x in def_df['Team'].unique()]

for week in range(1,17):
    for team in def_teams:
        try:
            offense_df1 = df.loc[(df['Oppt']==team)&(df['Week']==week)]
            offense_df2 = df.loc[(df['Oppt']==team)&(df['Week']==week+1)]
            sum_ = offense_df1['DK points'].sum()
            def_df.loc[(df['Team']==team)&(df['Week']==week+1), 'fantasy_points_allowed_lw'] = sum_
            df.loc[(df['Oppt']==team)&(df['Week']==week+1), 'Oppt_pts_allowed_lw'] = sum_
        except:
            print('couldnt append data')
            pass
df = df[df.Week != 1] 
X = df.drop(labels='DK points', axis=1)
y = df['DK points']
X2 = pd.get_dummies(X)

## Grid searches

### Algorithms to Grid Search: Ada Boost, Random Forest, Ridge, XGBoost, & Support Vector Regression (rbf kernel)

In [5]:
grid_search_names = ['Ridge', 'Gradient Boosting', 'AdaBoost', 'Random Forest', 'XGBoost', 'SVR']
grid_searches = []

In [6]:
%%time
gamma_range = [1e-7,1e-6,1e-5,1e-4,1e-3,1e-2,1e-1,1e0,1e1]
c_range = [1e-3,1e-2,1e-1,1e0,1e1,1e2,1e3,1e4,1e5]
svr_param_grid = {
    'kernel' : ('rbf', 'sigmoid'),
    'C' : c_range,
    'gamma' : gamma_range
}
svr = SVR()
svr_cv = GridSearchCV(svr,svr_param_grid,cv=3,n_jobs=3)
svr_cv.fit(X2,y)
grid_searches.append({'name': 'SVR', 'search': svr_cv})

Wall time: 1h 15min 39s


In [7]:
%%time
ridge_param_grid = {
    'alpha': [1,0.1,0.01,0.001,0.0001,0] , 
    "fit_intercept": [True, False], 
    "solver": ['svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga']
}
ridge = Ridge()
ridge_cv = GridSearchCV(ridge,ridge_param_grid,cv=3,n_jobs=3)
ridge_cv.fit(X2,y)
grid_searches.append({'name': 'Ridge', 'search': ridge_cv})

TypeError: append() takes exactly one argument (0 given)

In [8]:
%%time
gb_param_grid = {
#     'n_estimators':[100,300,500], # default of 100 was consistenly the best for several tests
    'learning_rate': [0.1,0.05,0.02],
    'max_depth':[3,4,5,6], 
    'min_samples_leaf':[1,2,3], 
    'max_features':('auto', 'sqrt', 'log2')
}
gb = GradientBoostingRegressor()
gb_cv = GridSearchCV(gb,gb_param_grid,cv=3,n_jobs=3)
gb_cv.fit(X2,y)
grid_searches.append({'name': 'Gradient Boost', 'search': gb_cv})

Wall time: 5min 51s


In [9]:
%%time
ab_param_grid = {
    'n_estimators':[100,500,1000], 
    'learning_rate': [0.1,0.05,0.02],
    'loss': ('linear', 'square', 'exponential')
}
ab = AdaBoostRegressor()
ab_cv = GridSearchCV(ab,ab_param_grid,cv=3,n_jobs=3)
ab_cv.fit(X2,y)
grid_searches.append({'name': 'AdaBoost', 'search': ab_cv})

Wall time: 53min 39s


In [10]:
%%time
rf_param_grid = {
    'n_estimators':[100,300,500,1000], 
    'max_depth':[3,4,5,6], 
    'min_samples_leaf':[1,2,3],  
    'max_features':('auto', 'sqrt', 'log2')
}
rf = RandomForestRegressor()
rf_cv = GridSearchCV(rf,rf_param_grid,cv=3,n_jobs=3)
rf_cv.fit(X2,y)
grid_searches.append({'name': 'Random Forest', 'search': rf_cv})

Wall time: 23min 42s


In [11]:
%%time
xgb_param_grid = {
    'n_estimators':[100,500,1000], 
    'learning_rate': [0.1,0.05,0.02],
    'loss': ('linear', 'square', 'exponential')
}
xgb = XGBRegressor()
xgb_cv = GridSearchCV(xgb,xgb_param_grid,cv=3,n_jobs=3)
xgb_cv.fit(X2,y)
grid_searches.append({'name': 'XGBoost', 'search': xgb_cv})

Parameters: { loss } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Wall time: 38min 53s


In [12]:
for s in grid_searches:
    print(f"Grid Search: {s['name']}")
    print("Best Score: " + str(s['search'].best_score_))
    print("Best Parameters: " + str(s['search'].best_params_))
    print("======================================================")

Grid Search: SVR
Best Score: 0.3667398920100726
Best Parameters: {'C': 100000.0, 'gamma': 1e-07, 'kernel': 'rbf'}
Grid Search: Ridge
Best Score: 0.4064692035835569
Best Parameters: {'alpha': 1, 'fit_intercept': True, 'solver': 'svd'}
Grid Search: Gradient Boost
Best Score: 0.4224293893546227
Best Parameters: {'learning_rate': 0.05, 'max_depth': 3, 'max_features': 'auto', 'min_samples_leaf': 2}
Grid Search: AdaBoost
Best Score: 0.401264002451922
Best Parameters: {'learning_rate': 0.02, 'loss': 'exponential', 'n_estimators': 100}
Grid Search: Random Forest
Best Score: 0.4323875239294619
Best Parameters: {'max_depth': 6, 'max_features': 'auto', 'min_samples_leaf': 3, 'n_estimators': 300}
Grid Search: XGBoost
Best Score: 0.4118071249462032
Best Parameters: {'learning_rate': 0.05, 'loss': 'linear', 'n_estimators': 100}


## CV metrics from the regression notebook: 

- model name: ridge_reg
- R2: 0.41466767899168466
- new R2: 0.4064692035835569
---
- model name: svr1_reg (linear kernel)
- R2: -0.6560443491554244
---
- model name: svr2_reg (rbf kernel)
- R2: 0.3436414633993596
- new R2: 0.3667398920100726
---
- model name: random_forest_reg
- R2: 0.37988667548813754
- new R2: 0.4323875239294619
---
- model name: ada_boost_reg
- R2: -0.14535859319297745
- new R2: 0.401264002451922
---
- model name: gradient_boost_reg
- R2: 0.41595907588432846
- new R2: 0.4224293893546227
---
- model name: xgb_reg
- R2: 0.3757455851440273
- new R2: 0.4118071249462032
---

So all of the accuracies got better except one: Ridge regression. In theory that means it will do worse, but the diffence between accuracies is minimal, so I'd like to see what the performance of it is.

Adaboost saw a HUGE improvement, which is interesting, as it's already the best model used to pick players after filtering.

Next, we'll try just using one model to pick players and see the new results.

## Re-train new, tuned models

In [13]:
X_train, X_test, y_train, y_test = train_test_split(X2, y, test_size = 0.2, random_state = 0)

In [28]:
ridge_reg = Ridge(**[s['search'].best_params_ for s in grid_searches if s['name']=='Ridge'][0])
svr_reg = SVR(**[s['search'].best_params_ for s in grid_searches if s['name']=='SVR'][0])
rf_reg = RandomForestRegressor(**[s['search'].best_params_ for s in grid_searches if s['name']=='Random Forest'][0])
ab_reg = AdaBoostRegressor(**[s['search'].best_params_ for s in grid_searches if s['name']=='AdaBoost'][0]) 
gb_reg = GradientBoostingRegressor(**[s['search'].best_params_ for s in grid_searches if s['name']=='Gradient Boost'][0])
xgb_reg = XGBRegressor(**[s['search'].best_params_ for s in grid_searches if s['name']=='XGBoost'][0])

In [31]:
ridge_reg.fit(X_train, y_train)
svr_reg.fit(X_train, y_train)
rf_reg.fit(X_train, y_train)
ab_reg.fit(X_train, y_train)
gb_reg.fit(X_train, y_train)
xgb_reg.fit(X_train, y_train)

Parameters: { loss } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.




XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
             importance_type='gain', interaction_constraints='',
             learning_rate=0.05, loss='linear', max_delta_step=0, max_depth=6,
             min_child_weight=1, missing=nan, monotone_constraints='()',
             n_estimators=100, n_jobs=8, num_parallel_tree=1, random_state=0,
             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
             tree_method='exact', validate_parameters=1, verbosity=None)

In [32]:
y_pred1 = ridge_reg.predict(X_test)
y_pred2 = svr_reg.predict(X_test)
y_pred3 = rf_reg.predict(X_test)
y_pred4 = ab_reg.predict(X_test)
y_pred5 = gb_reg.predict(X_test)
y_pred6 = xgb_reg.predict(X_test)

In [33]:
errors = [get_RMSE(y_test, y_pred1), 
          get_RMSE(y_test, y_pred2), 
          get_RMSE(y_test, y_pred3), 
          get_RMSE(y_test, y_pred4), 
          get_RMSE(y_test, y_pred5), 
          get_RMSE(y_test, y_pred6)]
errors

[6.635322748793196,
 6.9486001509233075,
 6.6075320548750796,
 6.720340025829452,
 6.592370052275926,
 6.59784825206268]

## Rebuild Lineup Data Frame

In [14]:
lineup = Lineup(df_for_lineups, def_df)

NameError: name 'df_for_lineups' is not defined

In [None]:
%%time
# this step takes a while
for x in range (0,100):
    lineup.build_lineup()