# Improve Your Score by Betting
I'll be taking some calculated risks to improve my score. A great model might get a bronze medal, but when it comes to getting a competitive score, especially in sports prediction problem, taking a calculated risk is worth the reward. The aim is to strategically bet on the outcome of a couple of games


---
# If you use this notebook, do leave an upvote!
### To use this notebook, simply upload your submission file and choose one of the strategies mentioned below

# Strategies:
---


## 1. Moderate Risk - Moderate Reward
Betting on some matches with an almost sure outcome.
#### https://en.wikipedia.org/wiki/NCAA_Division_I_Women%27s_Basketball_Tournament_upsets


## 2. Better Safe than Sorry
Always predicting the probabilities between 0.05 and 0.95. Using this strategy prevents high log loss in case of unseen / black swan event

## 3. High Risk High Reward
Using the 2 submissions to take a chance on 3 most closely matched games (~0.5 prediction by model). One submission always predicts the team with the lower seed and other submission always predicts the team with the higher seed

## 4. Higher Risk Higher Reward (2019 3rd Place)
Instead of using the two submissions to flip one game (1 to 0), we can flip 32 most closely matched games (0.64 to 0.36). When the game is unpredictable (model prediction is around 0.5), we can pick the team with lower ID to win with 64% probability. In the second submission, we can predict that the other wins with 64% probability. Assuming that team IDs have nothing to do with winning, this is just random guessing. But if you get lucky, you can win big

In [None]:
from dataclasses import dataclass 
import pandas as pd 
import numpy as np


In [None]:
# Paste the path to your submission file here
your_submission_file_path = '../input/ncaaw-prediction-stage-2/submission.csv'

submission = pd.read_csv(your_submission_file_path)
submission.head()

In [None]:
MODERATE_RISK_MODERATE_REWARD = 'MODERATE_RISK_MODERATE_REWARD'
BETTER_SAFE_THAN_SORRY = 'BETTER_SAFE_THAN_SORRY'
HIGH_RISK_HIGH_REWARD = 'HIGH_RISK_HIGH_REWARD'
HIGHER_RISK_HIGHER_REWARD = 'HIGHER_RISK_HIGHER_REWARD'

STRATEGY = MODERATE_RISK_MODERATE_REWARD #Paste your strategy here

## Moderate Risk - Moderate Reward

In [None]:
tourney_results = pd.read_csv('../input/ncaaw-march-mania-2021/WDataFiles_Stage2/WNCAATourneyDetailedResults.csv')
seeds = pd.read_csv('../input/ncaaw-march-mania-2021/WDataFiles_Stage2/WNCAATourneySeeds.csv')
regular_results = pd.read_csv('../input/ncaaw-march-mania-2021/WDataFiles_Stage2/WRegularSeasonDetailedResults.csv')

def prepare_data(df):
    dfswap = df[['Season', 'DayNum', 'LTeamID', 'LScore', 'WTeamID', 'WScore', 'WLoc', 'NumOT', 
    'LFGM', 'LFGA', 'LFGM3', 'LFGA3', 'LFTM', 'LFTA', 'LOR', 'LDR', 'LAst', 'LTO', 'LStl', 'LBlk', 'LPF', 
    'WFGM', 'WFGA', 'WFGM3', 'WFGA3', 'WFTM', 'WFTA', 'WOR', 'WDR', 'WAst', 'WTO', 'WStl', 'WBlk', 'WPF']]

    dfswap.loc[df['WLoc'] == 'H', 'WLoc'] = 'A'
    dfswap.loc[df['WLoc'] == 'A', 'WLoc'] = 'H'
    df.columns.values[6] = 'location'
    dfswap.columns.values[6] = 'location'    
      
    df.columns = [x.replace('W','T1_').replace('L','T2_') for x in list(df.columns)]
    dfswap.columns = [x.replace('L','T1_').replace('W','T2_') for x in list(dfswap.columns)]

    output = pd.concat([df, dfswap]).reset_index(drop=True)
    output.loc[output.location=='N','location'] = '0'
    output.loc[output.location=='H','location'] = '1'
    output.loc[output.location=='A','location'] = '-1'
    output.location = output.location.astype(int)
    
    output['PointDiff'] = output['T1_Score'] - output['T2_Score']
    
    return output
regular_data = prepare_data(regular_results)
tourney_data = prepare_data(tourney_results)


In [None]:
seeds['seed'] = seeds['Seed'].apply(lambda x: int(x[1:3]))
seeds.head()
seeds_T1 = seeds[['Season','TeamID','seed']].copy()
seeds_T2 = seeds[['Season','TeamID','seed']].copy()
seeds_T1.columns = ['Season','T1_TeamID','T1_seed']
seeds_T2.columns = ['Season','T2_TeamID','T2_seed']
tourney_data = pd.merge(tourney_data, seeds_T1, on = ['Season', 'T1_TeamID'], how = 'left')
tourney_data = pd.merge(tourney_data, seeds_T2, on = ['Season', 'T2_TeamID'], how = 'left')

## Looking for upsets

In [None]:
pd.concat(
    [tourney_data[(tourney_data.T1_seed==1) & (tourney_data.T2_seed==16) & (tourney_data.T1_Score < tourney_data.T2_Score)],
     tourney_data[(tourney_data.T1_seed==2) & (tourney_data.T2_seed==15) & (tourney_data.T1_Score < tourney_data.T2_Score)],
     tourney_data[(tourney_data.T1_seed==3) & (tourney_data.T2_seed==14) & (tourney_data.T1_Score < tourney_data.T2_Score)],
     tourney_data[(tourney_data.T1_seed==4) & (tourney_data.T2_seed==13) & (tourney_data.T1_Score < tourney_data.T2_Score)],
     tourney_data[(tourney_data.T1_seed==16) & (tourney_data.T2_seed==1) & (tourney_data.T1_Score > tourney_data.T2_Score)],
     tourney_data[(tourney_data.T1_seed==15) & (tourney_data.T2_seed==2) & (tourney_data.T1_Score > tourney_data.T2_Score)],
     tourney_data[(tourney_data.T1_seed==14) & (tourney_data.T2_seed==3) & (tourney_data.T1_Score > tourney_data.T2_Score)],
     tourney_data[(tourney_data.T1_seed==13) & (tourney_data.T2_seed==4) & (tourney_data.T1_Score > tourney_data.T2_Score)]]
)   



In [None]:
seeds = seeds.set_index(['Season', 'TeamID'])
seeds

In [None]:
def get_seed(row): 
    season, team_id = row.values
    if (season, team_id) in seeds.index: 
        return seeds.loc[season, team_id]
    return -1

In [None]:
submission['Season'] = submission.ID.apply(lambda id_: int(id_[:4]))
submission['T1'] = submission.ID.apply(lambda id_: int(id_[5:9]))
submission['T2'] = submission.ID.apply(lambda id_: int(id_[10:]))
submission['T1_seed'] = submission[['Season', 'T1']].apply(get_seed, 1).seed.values
submission['T2_seed'] = submission[['Season', 'T2']].apply(get_seed, 1).seed.values
submission

In [None]:
always_one = (submission.T1_seed==1) & (submission.T2_seed==16) | (submission.T1_seed==2) & (submission.T2_seed==15) |  (submission.T1_seed==3) & (submission.T2_seed==14) | (submission.T1_seed==4) & (submission.T2_seed==13)
always_zero = (submission.T1_seed==16) & (submission.T2_seed==1) | (submission.T1_seed==15) & (submission.T2_seed==2) | (submission.T1_seed==14) & (submission.T2_seed==3) | (submission.T1_seed==13) & (submission.T2_seed==4)
always_one = always_one.values
always_zero = always_zero.values
preds = submission.Pred.values
moderate_risk_preds = np.where(always_one, 1, preds)
moderate_risk_preds = np.where(always_zero, 0, moderate_risk_preds)
moderate_risk_preds

In [None]:
if STRATEGY == MODERATE_RISK_MODERATE_REWARD: 
    submission.Pred = moderate_risk_preds
    moderate_risk_submission = submission[['ID', 'Pred']]
    moderate_risk_submission.to_csv('moderate_risk_submission.csv', index=False)

In [None]:
if STRATEGY == HIGHER_RISK_HIGHER_REWARD or STRATEGY == HIGH_RISK_HIGH_REWARD: 
    print('I have not yet implemented the high risk strategies yet')
    print('Give this notebook an upvote to motivate me to publish these strategies in the future')

## Better Safe than Sorry

In [None]:
preds = submission.Pred.values
safe_preds = np.where(preds > 0.95, 0.95, preds)
safe_preds = np.where(preds < 0.05, 0.05, preds)
if STRATEGY == BETTER_SAFE_THAN_SORRY: 
    submission.Pred = safe_preds
    submission.to_csv('safe_predictions.csv')

---
# Remember to comment down below and give this notebook an upvote if your score improved!