# Analysis 400

## Purpose 
For this notebook we will begin our analysis for our research question 2 "What factors impact the success of a top player?". We will do this by analysing the 2 dataframes we created in Prep 300 and Prep 400 where we created a winners table containing the top 10 male and females and also a losers table containing players with high losing percentages.

## Datasets 
For analysis in this notebook we used the 2 dataframes created in Prep 300 and Prep 400. 


In [1]:
# Importing relevant libraries
import os
import sys
import hashlib
import numpy as np
import pandas as pd
from datetime import datetime
    
%matplotlib inline

## Reading the dataframes created in Prep 300 and Prep 400

In [2]:
RQ2_winners = pd.read_csv("../data/winners_df", low_memory = False)

In [3]:
RQ2_losers = pd.read_csv("../data/losers_df", low_memory = False)

## RQ2: What factors impact the success of a top player?

#### winning and losing percentage

In [4]:
RQ2_men_winners = RQ2_winners.head(10)
RQ2_men_winners['winning_perc'].mean()

0.7150882570194683

In [5]:
RQ2_men_losers = RQ2_losers.head(10)
RQ2_men_losers['losing_perc'].mean()

0.5226684704226198

To be a top player players would need to be winning on average 71% of the matches they play.
<br> Male winners win on average 20% more matches than the losing players. 

In [6]:
RQ2_women_winners = RQ2_winners.tail(10)
RQ2_women_winners['winning_perc'].mean()

0.6888076046100492

In [7]:
RQ2_women_losers = RQ2_losers.tail(10)
RQ2_women_losers['losing_perc'].mean()

0.5215599108169405

To be a top player players would need to be winning on average 69% of the matches they play.
<br> Female winners win on average 16% more matches than the losing players. 

#### Aces

An ace is when the opponent can't return the serve.

In [8]:
RQ2_men_winners['w_ace'].mean()

6.1235248683042105

In [9]:
RQ2_men_losers['l_ace'].mean()

3.707908641320755

We see male losing players on average have nearly half as many aces as the winning players

In [10]:
RQ2_women_winners['w_ace'].mean()

3.636743198948993

In [11]:
RQ2_women_losers['l_ace'].mean()

1.438534529454418

We see female winning players have more than twice as many aces as their losing competitor

#### Double faults

 A double fault is when a player misses both of their serves and the point is awarded to the opponent. 

In [12]:
RQ2_men_winners['w_df'].mean()

2.1361606598291165

In [13]:
RQ2_men_losers['l_df'].mean()

2.9526963810247855

Male losers on average have 1 more double fault than the winning players. 

In [14]:
RQ2_women_winners['w_df'].mean()

2.9849177797458943

In [15]:
RQ2_women_losers['l_df'].mean()

3.9379689416365244

Female losing players average 4 double faults per match. There are 4 points in a game so mistakes like that may cost that player the game.

#### Break points faced

 A breakpoint is when a player one point away from winning a game and they are receiving the serve.

In [16]:
RQ2_men_winners['w_bpFaced'].mean()

4.5157412727649415

In [17]:
RQ2_men_losers['l_bpFaced'].mean()

8.580205810486003

In [18]:
RQ2_women_winners['w_bpFaced'].mean()

5.6694329294469625

In [19]:
RQ2_women_losers['l_bpFaced'].mean()

10.129056518635831

Interestingly, both the losing men and women face twice as many break points as their winning opponents.

## Sorting the dataframes by winning and losing percentage

Roger Federer and Serena William unsurprisingly top the pile with the highest winning percentage and have very similar match statistics as seen in these dataframes below.

In [20]:
RQ2_sorted_winning_perc = RQ2_winners.sort_values('winning_perc', ascending=False).reset_index(drop=True)
RQ2_sorted_winning_perc.head()

Unnamed: 0,winner_name,matches_won,matches_lost,total_matches,winning_perc,w_ace,w_df,w_svpt,w_1stIn,w_1stWon,w_2ndWon,w_SvGms,w_bpSaved,w_bpFaced
0,Roger Federer,847,137,984,0.860772,7.775434,1.496278,73.7134,46.308933,36.566998,16.404467,12.477667,2.470223,3.42928
1,Serena Williams,475,88,563,0.843694,7.139535,2.489583,59.764858,36.369509,27.573643,12.033592,8.666667,2.855297,4.372093
2,Rafael Nadal,712,141,853,0.834701,3.057018,1.394737,70.70614,48.704678,35.869883,13.017544,11.69883,3.038012,4.30848
3,Novak Djokovic,612,141,753,0.812749,5.762565,2.124783,75.376083,48.790295,36.60312,15.126516,12.343154,3.15078,4.592721
4,Maria Sharapova,519,135,654,0.793578,3.890306,4.933673,65.359694,41.946429,29.522959,11.227041,,3.57398,5.772959


In [21]:
RQ2_sorted_losing_perc = RQ2_losers.sort_values('losing_perc', ascending=True).reset_index(drop=True)
RQ2_sorted_losing_perc.head()

Unnamed: 0,loser_name,matches_lost,matches_won,total_matches,losing_perc,l_ace,l_df,l_svpt,l_1stIn,l_1stWon,l_2ndWon,l_SvGms,l_bpSaved,l_bpFaced
0,Anabel Medina Garrigues,264,295,559,0.472272,1.101911,1.917197,72.140127,46.031847,25.751592,10.89172,,4.866242,10.070064
1,Gisela Dulko,183,204,387,0.472868,1.307692,4.324786,68.470085,41.376068,23.08547,11.042735,,4.803419,9.923077
2,Alize Cornet,176,189,365,0.482192,1.463087,4.744966,71.832215,42.362416,23.95302,11.463087,,5.342282,10.845638
3,Igor Andreev,229,236,465,0.492473,1.411215,2.724299,80.042056,48.962617,30.200935,14.691589,11.85514,5.158879,9.317757
4,Florian Mayer,211,217,428,0.492991,3.291457,2.045226,72.61809,43.929648,28.020101,12.79397,11.115578,4.050251,8.050251


## Sorting the dataframes by aces 

In [22]:
RQ2_sorted_w_ace = RQ2_winners.sort_values('w_ace', ascending=False).reset_index(drop=True)
RQ2_sorted_w_ace.head()

Unnamed: 0,winner_name,matches_won,matches_lost,total_matches,winning_perc,w_ace,w_df,w_svpt,w_1stIn,w_1stWon,w_2ndWon,w_SvGms,w_bpSaved,w_bpFaced
0,Andy Roddick,512,171,683,0.749634,12.588358,1.659044,72.862786,48.399168,39.432432,14.56341,12.503119,2.087318,2.794179
1,Tomas Berdych,485,263,748,0.648396,8.473451,2.20354,73.090708,42.988938,34.495575,16.893805,12.050885,2.942478,4.017699
2,Roger Federer,847,137,984,0.860772,7.775434,1.496278,73.7134,46.308933,36.566998,16.404467,12.477667,2.470223,3.42928
3,Serena Williams,475,88,563,0.843694,7.139535,2.489583,59.764858,36.369509,27.573643,12.033592,8.666667,2.855297,4.372093
4,Andy Murray,487,154,641,0.75975,7.134199,2.313853,75.510823,43.852814,33.679654,17.199134,12.287879,3.229437,4.911255


In [23]:
RQ2_sorted_l_ace = RQ2_losers.sort_values('l_ace', ascending=True).reset_index(drop=True)
RQ2_sorted_l_ace.head()

Unnamed: 0,loser_name,matches_lost,matches_won,total_matches,losing_perc,l_ace,l_df,l_svpt,l_1stIn,l_1stWon,l_2ndWon,l_SvGms,l_bpSaved,l_bpFaced
0,Iveta Benesova,222,174,396,0.560606,0.4,3.8,66.0,36.4,21.4,12.2,,3.8,8.4
1,Eleni Daniilidou,180,159,339,0.530973,0.89899,6.232323,68.535354,45.858586,25.868687,7.979798,,4.626263,9.888889
2,Anabel Medina Garrigues,264,295,559,0.472272,1.101911,1.917197,72.140127,46.031847,25.751592,10.89172,,4.866242,10.070064
3,Klara Koukalova,258,257,515,0.500971,1.117647,4.176471,66.411765,37.235294,20.411765,11.058824,,5.0,10.882353
4,Gisela Dulko,183,204,387,0.472868,1.307692,4.324786,68.470085,41.376068,23.08547,11.042735,,4.803419,9.923077


# Pressure Points
## Sorting the dataframes by by break points saved

In [24]:
RQ2_sorted_w_bpSaved = RQ2_winners.sort_values('w_bpSaved', ascending=False).reset_index(drop=True)
RQ2_sorted_w_bpSaved.head()

Unnamed: 0,winner_name,matches_won,matches_lost,total_matches,winning_perc,w_ace,w_df,w_svpt,w_1stIn,w_1stWon,w_2ndWon,w_SvGms,w_bpSaved,w_bpFaced
0,Caroline Wozniacki,415,167,582,0.713058,2.051136,1.778409,65.548295,45.238636,30.113636,10.460227,,3.798295,5.994318
1,Agnieszka Radwanska,401,174,575,0.697391,2.191045,1.171687,64.576119,42.659701,28.465672,11.18806,,3.773134,6.095522
2,Jelena Jankovic,506,278,784,0.645408,2.554622,3.222535,66.327731,43.745098,29.596639,11.02521,9.0,3.743662,6.16338
3,Flavia Pennetta,412,255,667,0.617691,3.548507,2.973881,64.708955,36.171642,25.30597,14.623134,,3.682836,5.925373
4,Tommy Robredo,429,270,699,0.613734,4.353081,2.13981,78.277251,51.094787,37.260664,15.116114,12.492891,3.64218,5.414692


In [25]:
RQ2_sorted_l_bpSaved = RQ2_losers.sort_values('l_bpSaved', ascending=True).reset_index(drop=True)
RQ2_sorted_l_bpSaved.head()

Unnamed: 0,loser_name,matches_lost,matches_won,total_matches,losing_perc,l_ace,l_df,l_svpt,l_1stIn,l_1stWon,l_2ndWon,l_SvGms,l_bpSaved,l_bpFaced
0,Iveta Benesova,222,174,396,0.560606,0.4,3.8,66.0,36.4,21.4,12.2,,3.8,8.4
1,Florian Mayer,211,217,428,0.492991,3.291457,2.045226,72.61809,43.929648,28.020101,12.79397,11.115578,4.050251,8.050251
2,Paul Henri Mathieu,244,223,467,0.522484,4.021097,3.362869,78.341772,42.696203,28.78481,17.168776,11.881857,4.472574,8.067511
3,Albert Montanes,237,210,447,0.530201,2.729958,3.679325,74.177215,43.85654,28.122363,13.603376,11.270042,4.57384,8.658228
4,Janko Tipsarevic,218,180,398,0.547739,5.916256,2.812808,79.871921,44.758621,31.064039,16.35468,12.054187,4.596059,8.108374


## Sorting the dataframes by break points faced

In [26]:
RQ2_sorted_w_bpFaced = RQ2_winners.sort_values('w_bpFaced', ascending=False).reset_index(drop=True)
RQ2_sorted_w_bpFaced.head()

Unnamed: 0,winner_name,matches_won,matches_lost,total_matches,winning_perc,w_ace,w_df,w_svpt,w_1stIn,w_1stWon,w_2ndWon,w_SvGms,w_bpSaved,w_bpFaced
0,Jelena Jankovic,506,278,784,0.645408,2.554622,3.222535,66.327731,43.745098,29.596639,11.02521,9.0,3.743662,6.16338
1,Agnieszka Radwanska,401,174,575,0.697391,2.191045,1.171687,64.576119,42.659701,28.465672,11.18806,,3.773134,6.095522
2,Caroline Wozniacki,415,167,582,0.713058,2.051136,1.778409,65.548295,45.238636,30.113636,10.460227,,3.798295,5.994318
3,Flavia Pennetta,412,255,667,0.617691,3.548507,2.973881,64.708955,36.171642,25.30597,14.623134,,3.682836,5.925373
4,Svetlana Kuznetsova,447,221,668,0.669162,3.906061,2.3769,68.218182,41.809091,28.915152,13.793939,,3.536364,5.781818


In [27]:
RQ2_sorted_l_bpFaced = RQ2_losers.sort_values('l_bpFaced', ascending=True).reset_index(drop=True)
RQ2_sorted_l_bpFaced.head()

Unnamed: 0,loser_name,matches_lost,matches_won,total_matches,losing_perc,l_ace,l_df,l_svpt,l_1stIn,l_1stWon,l_2ndWon,l_SvGms,l_bpSaved,l_bpFaced
0,Florian Mayer,211,217,428,0.492991,3.291457,2.045226,72.61809,43.929648,28.020101,12.79397,11.115578,4.050251,8.050251
1,Paul Henri Mathieu,244,223,467,0.522484,4.021097,3.362869,78.341772,42.696203,28.78481,17.168776,11.881857,4.472574,8.067511
2,Janko Tipsarevic,218,180,398,0.547739,5.916256,2.812808,79.871921,44.758621,31.064039,16.35468,12.054187,4.596059,8.108374
3,Victor Hanescu,241,197,438,0.550228,4.276018,1.80543,81.642534,56.402715,37.022624,11.638009,12.425339,4.628959,8.244344
4,Iveta Benesova,222,174,396,0.560606,0.4,3.8,66.0,36.4,21.4,12.2,,3.8,8.4
