### **Purpose: **America's best player, Earl Strickland, has been known to play poorly in the Mosconi Cup. This analysis will dive deeper to see which factors are most important in predicting his success.

In [1]:
import pickle
import pandas as pd

In [2]:
df = pickle.load(open('pkl/allyears_clean_locs','rb'))

In [3]:
df = df[df['American_player'].str.contains('Earl Strickland')]

In [4]:
df['Europe_lost']= ~df['Europe_won']
df = df[['Format', 'European_player', 'European_score', 'American_score',
       'American_player', 'Europe_lost','Europe_won']]
df.columns=['Format', 'European_player', 'European_score', 'American_score',
       'American_player', 'America_won', 'America_lost']

In [5]:
def stats(df):
    loss,win = df['America_won'].value_counts(sort=False)
    mp= win+loss
    wl = 'Win-loss: {}-{}'.format(win,loss)
    pct = win/(win+loss)*100
    pc = round(pct,0)
    return('Matches Played: {}'.format(mp),wl,'Pct: {}'.format(pc))

### Lets have a quick look at Earl's overall stats and check out his singles, doubles, and team stats.

In [6]:
stats(df)

('Matches Played: 69', 'Win-loss: 43-26', 'pct: 62.0')

In [41]:
sing = df[df['Format']=='Singles']
dub = df[df['Format']=='Doubles']
team = df[df['Format']=='Teams']
triples = df[df['Format']=='Triples']

In [8]:
stats(sing)

('Matches Played: 26', 'Win-loss: 14-12', 'pct: 54.0')

In [9]:
stats(dub)

('Matches Played: 36', 'Win-loss: 24-12', 'pct: 67.0')

In [10]:
stats(team)

('Matches Played: 5', 'Win-loss: 3-2', 'pct: 60.0')

In [42]:
stats(triples)

ValueError: not enough values to unpack (expected 2, got 1)

### So he plays slightly better in doubles matches. Let's see if he plays better with certain partners 

In [11]:
dub = dub[['European_player','American_player','America_won','America_lost']]
dub['Partner']=dub['American_player'].str.replace('Earl Strickland','')
dub['Partner']=dub['Partner'].str.replace('&','')
dub['Partner']=dub['Partner'].str.strip()

In [26]:
dub.sort_values('Partner')

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,European_player,American_player,America_won,America_lost,Partner
Year,Location,Match,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1996,"Dagenham, London, England",27,Steve Davis Ronnie O'Sullivan,Earl Strickland C.J. Wiley,True,False,C.J. Wiley
2007,"Las Vegas, Nevada, USA",3,Tony Drago Konstantin Stepanov,Earl Strickland Corey Deuel,True,False,Corey Deuel
2000,"Bethnal Green, London, England",4,Steve Davis Mika Immonen,Earl Strickland Corey Deuel,False,True,Corey Deuel
2000,"Bethnal Green, London, England",2,Steve Davis Steve Knight,Earl Strickland Corey Deuel,True,False,Corey Deuel
2013,"Las Vegas, Nevada, USA",10,Darren Appleton Ralf Souquet,Earl Strickland Dennis Hatch,False,True,Dennis Hatch
1998,"Bethnal Green, London, England",3,Mika Immonen Ralf Souquet,James Rempe Earl Strickland,False,True,James Rempe
1997,"Bethnal Green, London, England",2,Oliver Ortmann Steve Davis,Earl Strickland James Rempe,True,False,James Rempe
1999,"Bethnal Green, London, England",1,Oliver Ortmann Alex Lely,Earl Strickland James Rempe,True,False,James Rempe
1999,"Bethnal Green, London, England",4,Steve Knight Mika Immonen,Earl Strickland James Rempe,False,True,James Rempe
1998,"Bethnal Green, London, England",1,Oliver Ortmann Steve Knight,James Rempe Earl Strickland,False,True,James Rempe


In [27]:
dub.groupby('Partner').sum()

Unnamed: 0_level_0,America_won,America_lost
Partner,Unnamed: 1_level_1,Unnamed: 2_level_1
C.J. Wiley,1.0,0.0
Corey Deuel,2.0,1.0
Dennis Hatch,0.0,1.0
James Rempe,4.0,3.0
Jeremy Jones,5.0,2.0
Johnny Archer,2.0,0.0
Kim Davenport,1.0,0.0
Rodney Morris,7.0,4.0
Shane Van Boening,1.0,1.0
Tony Robles,1.0,0.0


### Looks like he plays best with Johnny (4-1) and Dennis (2-0) and plays poorly with Rodney (1-4)

More questions to answer:
1. Does shane play better or worse against certain opponents?
2. Has shane's performance changed over the years?

In [22]:
sing.groupby('European_player').sum().sort_values('America_won')

Unnamed: 0_level_0,America_won,America_lost
European_player,Unnamed: 1_level_1,Unnamed: 2_level_1
Karl Boyes,0.0,1.0
Oliver Ortmann,0.0,1.0
Oliver Ortmann,0.0,1.0
Ronnie O'Sullivan,0.0,1.0
Steve Davis,0.0,2.0
Daryl Peach,1.0,0.0
Mika Immonen,1.0,1.0
Mika Immonen,1.0,0.0
Nick van den Berg,1.0,1.0
Niels Feijen,1.0,0.0


## Apparently, he does really poorly against Darren Appleton (1-6) and does well against Nick Van Den Berg (3-0). 
## Lets see what his record was for each year.

In [16]:
lst =df.index.values.tolist() 
years = set([])
for item in lst:
#     print(item[0])
    years.add(item[0])
years=list(years)

In [19]:
yearly=[]
err=[]
for year in years:
    try:
        loss, win = df.loc[year]['America_won'].value_counts(sort=False)
    except:
        print('error in year {}'.format(year))
        err.append(year)
    yearly.append((win,loss))

error in year 1997
error in year 2001
error in year 2005


In [28]:
for year in err:
    df.loc[year]

Unnamed: 0_level_0,Unnamed: 1_level_0,Format,European_player,European_score,American_score,American_player,America_won,America_lost
Location,Match,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"Bethnal Green, London, England",2,Doubles,Oliver Ortmann Steve Davis,1,2,Earl Strickland James Rempe,True,False
"Bethnal Green, London, England",4,Doubles,Steve Davis Ronnie O'Sullivan,1,2,Earl Strickland James Rempe,True,False
"Bethnal Green, London, England",8,Singles,Ralf Souquet,1,2,Earl Strickland,True,False
"Bethnal Green, London, England",9,Doubles,Tommy Donlon Oliver Ortmann,1,2,Earl Strickland James Rempe,True,False
"Bethnal Green, London, England",17,Singles,Ralf Souquet,0,2,Earl Strickland,True,False


Unnamed: 0_level_0,Unnamed: 1_level_0,Format,European_player,European_score,American_score,American_player,America_won,America_lost
Location,Match,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"Bethnal Green, London, England",2,Doubles,Steve Davis Ralf Souquet,2,5,Earl Strickland Jeremy Jones,True,False
"Bethnal Green, London, England",3,Doubles,Steve Davis Ralf Souquet,2,5,Earl Strickland Jeremy Jones,True,False
"Bethnal Green, London, England",10,Singles,Niels Feijen,2,5,Earl Strickland,True,False


Unnamed: 0_level_0,Unnamed: 1_level_0,Format,European_player,European_score,American_score,American_player,America_won,America_lost
Location,Match,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"Las Vegas, Nevada, USA",4,Doubles,Mika Immonen Marcus Chamat,4,5,Earl Strickland Rodney Morris,True,False
"Las Vegas, Nevada, USA",9,Doubles,Alex Lely Raj Hundal,2,5,Earl Strickland Rodney Morris,True,False
"Las Vegas, Nevada, USA",12,Doubles,Marcus Chamat Mika Immonen,2,5,Rodney Morris Earl Strickland,True,False
"Las Vegas, Nevada, USA",13,Singles,Marcus Chamat,2,5,Earl Strickland,True,False


In [29]:
#manually insert the year 2016 (since he had zero wins, value_counts could not unpack properly to a tuple)
yr[1997]=(5,0)
yr[2001]=(3,0)
yr[2005]=(4,0)

In [30]:
yr= {}
for k,v in zip(years,yearly):
    yr[k]=v

In [32]:
yr

{1996: (2, 3),
 1997: (2, 3),
 1998: (3, 2),
 1999: (2, 2),
 2000: (4, 1),
 2001: (4, 1),
 2002: (2, 3),
 2003: (3, 1),
 2004: (3, 2),
 2005: (3, 2),
 2006: (5, 3),
 2007: (4, 2),
 2008: (2, 3),
 2013: (1, 4)}

### So his worst years were in 2008 (1-5), 2014 (1-6) and 2016 (0-5), all of which were 'away' games (malta in 08, england in 14&16). Let's dig a bit deeper to see how big of a factor location is for Shane's performance.

## Shane is regarded as a formidable player on US soil, but is also known for underperforming overseas. Let's see if the location is a major factor in the quality of his performance.

## Played 10 years total: 5 home (all vegas), 5 away (4 england, 1 malta)

In [33]:
home = df.loc[(slice(None),'Las Vegas, Nevada, USA'),:]

In [34]:
home['America_won'].value_counts(sort=False)

False     7
True     12
Name: America_won, dtype: int64

In [39]:
locs = pickle.load(open('pkl/dloc','rb'))

In [35]:
#even years are all away matches
away = df.loc[[2008,2010,2012,2014,2016]]

In [36]:
away['America_won'].value_counts(sort=False)

False    3
True     2
Name: America_won, dtype: int64

In [37]:
stats(home)

('Matches Played: 19', 'Win-loss: 12-7', 'pct: 63.0')

In [38]:
stats(away)

('Matches Played: 5', 'Win-loss: 2-3', 'pct: 40.0')

# 57% for home games vs 27% for away games is an *extremely* significant difference! 

### Let's drill down a bit further to parse out his singles and doubles matches for both home and away 

In [None]:
hsing = stats(home[home['Format']=='Singles'])

In [None]:
hdub = stats(home[home['Format']=='Doubles'])

In [None]:
asing = stats(away[away['Format']=='Singles'])

In [None]:
adub = stats(away[away['Format']=='Doubles'])

In [None]:
'Home Singles: {}'.format(hsing)
'Home Doubles: {}'.format(hdub)
'Away Singles: {}'.format(asing)
'Away Doubles: {}'.format(adub)

## Not much variation, but he does better in doubles in both cases.