### **Purpose: **America's best player, Shane Van Boening, has been known to play poorly in the Mosconi Cup. This analysis will dive deeper to see which factors are most important in predicting his success.

In [1]:
import pickle
import pandas as pd

In [2]:
path = '//DREW/Users/andrew/Desktop/mosconi/pkl/allyears_clean_locs'
dframe = pickle.load(open(path,'rb'))

In [3]:
df = dframe[dframe['American_player'].str.contains(player)]

In [4]:
df['Europe_lost']= ~df['Europe_won']
df = df[['Format', 'European_player', 'European_score', 'American_score',
       'American_player', 'Europe_lost','Europe_won']]
df.columns=['Format', 'European_player', 'European_score', 'American_score',
       'American_player', 'America_won', 'America_lost']

In [5]:
def stats(df):
    loss,win = df['America_won'].value_counts(sort=False)
    mp= win+loss
    wl = 'Win-loss: {}-{}'.format(win,loss)
    pct = win/(win+loss)*100
    pc = round(pct,0)
    return('Matches Played: {}'.format(mp),wl,'pct: {}'.format(pc))

### Lets have a quick look at Shane's overall stats and check out his singles, doubles, and team stats.

In [6]:
stats(df)

('Matches Played: 60', 'Win-loss: 25-35', 'pct: 42.0')

In [7]:
sing = df[df['Format']=='Singles']
dub = df[df['Format']=='Doubles']
team = df[df['Format']=='Teams']

In [8]:
stats(sing)

('Matches Played: 25', 'Win-loss: 10-15', 'pct: 40.0')

In [9]:
stats(dub)

('Matches Played: 25', 'Win-loss: 11-14', 'pct: 44.0')

In [10]:
stats(team)

('Matches Played: 10', 'Win-loss: 4-6', 'pct: 40.0')

### So he plays slightly better in doubles matches. Let's see if he plays better with certain partners 

In [11]:
dub = dub[['European_player','American_player','America_won','America_lost']]
dub['Partner']=dub['American_player'].str.replace('Shane Van Boening','')
dub['Partner']=dub['Partner'].str.replace('&','')
dub['Partner']=dub['Partner'].str.strip()

In [13]:
dub.groupby('Partner').sum()

Unnamed: 0_level_0,America_won,America_lost
Partner,Unnamed: 1_level_1,Unnamed: 2_level_1
Corey Deuel,1.0,2.0
Dennis Hatch,2.0,0.0
Earl Strickland,1.0,1.0
John Schmidt,0.0,1.0
Johnny Archer,4.0,1.0
Justin Bergman,0.0,1.0
Mike Dechaine,1.0,2.0
Rodney Morris,1.0,4.0
Skyler Woodward,1.0,1.0
Óscar Domínguez,0.0,1.0


### Looks like Shane plays best with Johnny (4-1) and Dennis (2-0) and plays poorly with Rodney (1-4)

More questions to answer:
1. Does shane play better or worse against certain opponents?
2. Has shane's performance changed over the years?

In [14]:
sing.groupby('European_player').sum()

Unnamed: 0_level_0,America_won,America_lost
European_player,Unnamed: 1_level_1,Unnamed: 2_level_1
Chris Melling,0.0,1.0
Darren Appleton,1.0,6.0
Daryl Peach,1.0,0.0
Karl Boyes,1.0,0.0
Konstantin Stepanov,1.0,0.0
Mika Immonen,1.0,2.0
Nick Ekonomopoulos,0.0,1.0
Nick van den Berg,3.0,0.0
Niels Feijen,0.0,1.0
Nikos Ekonomopoulos,0.0,1.0


## Apparently, Shane does really poorly against Darren Appleton (1-6) and does well against Nick Van Den Berg (3-0). 
## Lets see what Shane's record was for each year.

In [16]:
lst =df.index.values.tolist() 
years = set([])
for item in lst:
#     print(item[0])
    years.add(item[0])
years=list(years)

In [17]:
yearly=[]
for year in years:
    win = df.loc[year]['America_won'].sum()
    loss = df.loc[year]['America_lost'].sum()
    yearly.append((win,loss))

In [18]:
yr= {}
for k,v in zip(years,yearly):
    yr[k]=v

In [19]:
yr

{2007: (4, 2),
 2008: (1, 5),
 2009: (4, 2),
 2010: (2, 4),
 2011: (3, 4),
 2012: (4, 2),
 2013: (2, 2),
 2014: (1, 6),
 2015: (4, 3),
 2016: (0, 5)}

### So his worst years were in 2008 (1-5), 2014 (1-6) and 2016 (0-5), all of which were 'away' games (malta in 08, england in 14&16). Let's dig a bit deeper to see how big of a factor location is for Shane's performance.

## Shane is regarded as a formidable player on US soil, but is also known for underperforming overseas. Let's see if the location is a major factor in the quality of his performance.

## Played 10 years total: 5 home (all vegas), 5 away (4 england, 1 malta)

In [31]:
locs = pickle.load(open('pkl/dloc','rb'))

In [32]:
#works right for american players. switch home & away for euro players
hm=[]
aw=[]
for year in years:
    if 'USA' in locs[year]:
        hm.append(year)
    else:
        aw.append(year)

away= df.loc[aw]
home= df.loc[hm]

In [33]:
stats(home)

('Matches Played: 30', 'Win-loss: 17-13', 'pct: 57.0')

In [34]:
stats(away)

('Matches Played: 30', 'Win-loss: 8-22', 'pct: 27.0')

# 57% for home games vs 27% for away games is an *extremely* significant difference! 

### Let's drill down a bit further to parse out his singles and doubles matches for both home and away 

In [25]:
hsing = stats(home[home['Format']=='Singles'])

In [26]:
hdub = stats(home[home['Format']=='Doubles'])

In [27]:
asing = stats(away[away['Format']=='Singles'])

In [28]:
adub = stats(away[away['Format']=='Doubles'])

In [29]:
'Home Singles: {}'.format(hsing)
'Home Doubles: {}'.format(hdub)
'Away Singles: {}'.format(asing)
'Away Doubles: {}'.format(adub)

"Home Singles: ('Matches Played: 13', 'Win-loss: 7-6', 'pct: 54.0')"

"Home Doubles: ('Matches Played: 12', 'Win-loss: 7-5', 'pct: 58.0')"

"Away Singles: ('Matches Played: 12', 'Win-loss: 3-9', 'pct: 25.0')"

"Away Doubles: ('Matches Played: 13', 'Win-loss: 4-9', 'pct: 31.0')"

## Not much variation, but he does better in doubles in both cases.