### **Purpose: **America's best player, Shane Van Boening, has been known to play poorly in the Mosconi Cup. This analysis will dive deeper to see which factors are most important in predicting his success.

In [243]:
import pickle
import pandas as pd

In [244]:
df = pickle.load(open('pkl/allyears_clean_locs','rb'))

In [245]:
df = df[df['American_player'].str.contains('Shane Van Boening')]

In [246]:
df['Europe_lost']= ~df['Europe_won']
df = df[['Format', 'European_player', 'European_score', 'American_score',
       'American_player', 'Europe_lost','Europe_won']]
df.columns=['Format', 'European_player', 'European_score', 'American_score',
       'American_player', 'America_won', 'America_lost']

In [247]:
def stats(df):
    loss,win = df['America_won'].value_counts(sort=False)
    mp= win+loss
    wl = 'Win-loss: {}-{}'.format(win,loss)
    pct = win/(win+loss)*100
    pc = round(pct,0)
    return('Matches Played: {}'.format(mp),wl,'pct: {}'.format(pc))

### Lets have a quick look at Shane's overall stats and check out his singles, doubles, and team stats.

In [248]:
stats(df)

('Matches Played: 60', 'Win-loss: 25-35', 'pct: 42.0')

In [249]:
sing = df[df['Format']=='Singles']
dub = df[df['Format']=='Doubles']
team = df[df['Format']=='Teams']

In [250]:
stats(sing)

('Matches Played: 25', 'Win-loss: 10-15', 'pct: 40.0')

In [251]:
stats(dub)

('Matches Played: 25', 'Win-loss: 11-14', 'pct: 44.0')

In [252]:
stats(team)

('Matches Played: 10', 'Win-loss: 4-6', 'pct: 40.0')

### So he plays slightly better in doubles matches. Let's see if he plays better with certain partners 

In [253]:
dub = dub[['European_player','American_player','America_won','America_lost']]
dub['Partner']=dub['American_player'].str.replace('Shane Van Boening','')
dub['Partner']=dub['Partner'].str.replace('&','')
dub['Partner']=dub['Partner'].str.strip()

In [254]:
dub.sort_values('Partner')

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,European_player,American_player,America_won,America_lost,Partner
Year,Location,Match,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2015,"Las Vegas, Nevada, USA",7,Darren Appleton Karl Boyes,Shane Van Boening Corey Deuel,False,True,Corey Deuel
2014,"Tower Circus, Blackpool, England",9,Darren Appleton & Mark Gray,Shane Van Boening & Corey Deuel,True,False,Corey Deuel
2010,"Bethnal Green, London, England",13,Mika Immonen Nick van den Berg,Shane Van Boening Corey Deuel,False,True,Corey Deuel
2012,"Bethnal Green, London, England",3,Darren Appleton Nick Ekonomopoulos,Dennis Hatch Shane Van Boening,True,False,Dennis Hatch
2010,"Bethnal Green, London, England",10,Darren Appleton Nick van den Berg,Shane Van Boening Dennis Hatch,True,False,Dennis Hatch
2013,"Las Vegas, Nevada, USA",7,Darren Appleton Mika Immonen,Earl Strickland Shane Van Boening,True,False,Earl Strickland
2007,"Las Vegas, Nevada, USA",5,Daryl Peach Ralf Souquet,Earl Strickland Shane Van Boening,False,True,Earl Strickland
2014,"Tower Circus, Blackpool, England",7,Darren Appleton & Nikos Ekonomopoulos,Shane Van Boening & John Schmidt,False,True,John Schmidt
2009,"Las Vegas, Nevada, USA",5,Ralf Souquet Niels Feijen,Shane Van Boening Johnny Archer,True,False,Johnny Archer
2011,"Las Vegas, Nevada, USA",5,Nick van den Berg Niels Feijen,Shane Van Boening Johnny Archer,True,False,Johnny Archer


In [255]:
dub.groupby('Partner').sum()

Unnamed: 0_level_0,America_won,America_lost
Partner,Unnamed: 1_level_1,Unnamed: 2_level_1
Corey Deuel,1.0,2.0
Dennis Hatch,2.0,0.0
Earl Strickland,1.0,1.0
John Schmidt,0.0,1.0
Johnny Archer,4.0,1.0
Justin Bergman,0.0,1.0
Mike Dechaine,1.0,2.0
Rodney Morris,1.0,4.0
Skyler Woodward,1.0,1.0
Óscar Domínguez,0.0,1.0


### Looks like Shane plays best with Johnny (4-1) and Dennis (2-0) and plays poorly with Rodney (1-4)

More questions to answer:
1. Does shane play better or worse against certain opponents?
2. Has shane's performance changed over the years?

In [256]:
sing.groupby('European_player').sum()

Unnamed: 0_level_0,America_won,America_lost
European_player,Unnamed: 1_level_1,Unnamed: 2_level_1
Chris Melling,0.0,1.0
Darren Appleton,1.0,6.0
Daryl Peach,1.0,0.0
Karl Boyes,1.0,0.0
Konstantin Stepanov,1.0,0.0
Mika Immonen,1.0,2.0
Nick Ekonomopoulos,0.0,1.0
Nick van den Berg,3.0,0.0
Niels Feijen,0.0,1.0
Nikos Ekonomopoulos,0.0,1.0


## Apparently, Shane does really poorly against Darren Appleton (1-6) and does well against Nick Van Den Berg (3-0). 
## Lets see what Shane's record was for each year.

In [257]:
years = [i for i in range(2007,2016)]
yearly=[]
for year in years:
    loss, win = df.loc[year]['America_won'].value_counts(sort=False)
    yearly.append((win,loss))

In [258]:
yr= {}
for k,v in zip(years,yearly):
    yr[k]=v
#manually insert the year 2016 (since he had zero wins, value_counts could not unpack properly to a tuple)
yr[2016]=(0,5)

In [259]:
yr

{2007: (4, 2),
 2008: (1, 5),
 2009: (4, 2),
 2010: (2, 4),
 2011: (3, 4),
 2012: (4, 2),
 2013: (2, 2),
 2014: (1, 6),
 2015: (4, 3),
 2016: (0, 5)}

In [260]:
df.loc[[2008,2014,2016]]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Format,European_player,European_score,American_score,American_player,America_won,America_lost
Year,Location,Match,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2008,"St. Julian's, Malta",0,Teams,"Niels Feijen, Tony Drago, Mika Immonen, Ralf S...",5,3,"Earl Strickland, Jeremy Jones, Rodney Morris, ...",False,True
2008,"St. Julian's, Malta",1,Doubles,Niels Feijen Ralf Souquet,5,3,Shane Van Boening Rodney Morris,False,True
2008,"St. Julian's, Malta",4,Doubles,Niels Feijen Ralf Souquet,5,3,Shane Van Boening Rodney Morris,False,True
2008,"St. Julian's, Malta",9,Singles,Tony Drago,2,5,Shane Van Boening,True,False
2008,"St. Julian's, Malta",12,Singles,Ralf Souquet,5,2,Shane Van Boening,False,True
2008,"St. Julian's, Malta",15,Singles,Mika Immonen,5,3,Shane Van Boening,False,True
2014,"Tower Circus, Blackpool, England",0,Teams,"Karl Boyes, Niels Feijen, Nikos Ekonomopoulos,...",5,1,"Corey Deuel, Shane Van Boening, Justin Bergman...",False,True
2014,"Tower Circus, Blackpool, England",3,Doubles,Niels Feijen & Nikos Ekonomopoulos,5,1,Shane Van Boening & Justin Bergman,False,True
2014,"Tower Circus, Blackpool, England",4,Singles,Darren Appleton,5,3,Shane Van Boening,False,True
2014,"Tower Circus, Blackpool, England",7,Doubles,Darren Appleton & Nikos Ekonomopoulos,5,2,Shane Van Boening & John Schmidt,False,True


### So his worst years were in 2008 (1-5), 2014 (1-6) and 2016 (0-5), all of which were 'away' games (malta in 08, england in 14&16). Let's dig a bit deeper to see how big of a factor location is for Shane's performance.

## Shane is regarded as a formidable player on US soil, but is also known for underperforming overseas. Let's see if the location is a major factor in the quality of his performance.

## Played 10 years total: 5 home (all vegas), 5 away (4 england, 1 malta)

In [261]:
home = df.loc[(slice(None),'Las Vegas, Nevada, USA'),:]

In [262]:
home['America_won'].value_counts(sort=False)

False    13
True     17
Name: America_won, dtype: int64

In [263]:
#even years are all away matches
away = df.loc[[2008,2010,2012,2014,2016]]

In [264]:
away['America_won'].value_counts(sort=False)

False    22
True      8
Name: America_won, dtype: int64

In [265]:
stats(home)

('Matches Played: 30', 'Win-loss: 17-13', 'pct: 57.0')

In [266]:
stats(away)

('Matches Played: 30', 'Win-loss: 8-22', 'pct: 27.0')

# 57% for home games vs 27% for away games is an *extremely* significant difference! 

### Let's drill down a bit further to parse out his singles and doubles matches for both home and away 

In [267]:
hsing = stats(home[home['Format']=='Singles'])

In [268]:
hdub = stats(home[home['Format']=='Doubles'])

In [269]:
asing = stats(away[away['Format']=='Singles'])

In [270]:
adub = stats(away[away['Format']=='Doubles'])

In [271]:
'Home Singles: {}'.format(hsing)
'Home Doubles: {}'.format(hdub)
'Away Singles: {}'.format(asing)
'Away Doubles: {}'.format(adub)

"Home Singles: ('Matches Played: 13', 'Win-loss: 7-6', 'pct: 54.0')"

"Home Doubles: ('Matches Played: 12', 'Win-loss: 7-5', 'pct: 58.0')"

"Away Singles: ('Matches Played: 12', 'Win-loss: 3-9', 'pct: 25.0')"

"Away Doubles: ('Matches Played: 13', 'Win-loss: 4-9', 'pct: 31.0')"

## Not much variation, but he does better in doubles in both cases.