![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fmisterhay%2FMath-20-2&branch=master&subPath=example-project.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

# Statistical Research Project Example

#### by Flor Nightgale

For this project we used secondary data about [Premier League (Soccer)](https://www.premierleague.com/tables).

## Team Statistics

In [1]:
import pandas as pd
data = pd.read_html('https://www.espn.com/soccer/table/_/league/eng.1')
soccer = data[0].join(data[1]) # join the two data tables together
soccer

Unnamed: 0,2019-2020,GP,W,D,L,F,A,GD,P
0,1LIVLiverpool,37,31,3,3,82,32,50,96
1,2MNCManchester City,37,25,3,9,97,35,62,78
2,3MANManchester United,37,17,12,8,64,36,28,63
3,4CHEChelsea,37,19,6,12,67,54,13,63
4,5LEILeicester City,37,18,8,11,67,39,28,62
5,6WOLVWolverhampton Wanderers,37,15,14,8,51,38,13,59
6,7TOTTottenham Hotspur,37,16,10,11,60,46,14,58
7,8SHUSheffield United,37,14,12,11,38,36,2,54
8,9BURBurnley,37,15,9,13,42,48,-6,54
9,10ARSArsenal,37,13,14,10,53,46,7,53


The team names got combined with their ranks and abbreviations, but that's fine for now.

Columns in the data set are:
* GP: Games Played
* W: Wins
* D: Draws
* L: Losses
* F: Goals For
* A: Goals Against
* GD: Goal Difference
* P: Points

## Player Statistics

We are also going to look at individual player statistics for scoring and assists. We'll download both and then look first at the `scorers` data table.

In [2]:
stats = pd.read_html('https://www.espn.com/soccer/stats/_/league/ENG.1/view/scoring')
scorers = stats[0]
assists = stats[1]
scorers

Unnamed: 0,RK,Name,Team,P,G
0,1.0,Jamie Vardy,Leicester City,34,23
1,2.0,Danny Ings,Southampton,37,21
2,3.0,Pierre-Emerick Aubameyang,Arsenal,35,20
3,4.0,Mohamed Salah,Liverpool,33,19
4,,Raheem Sterling,Manchester City,32,19
5,6.0,Raúl Jiménez,Wolverhampton Wanderers,37,17
6,,Sadio Mané,Liverpool,34,17
7,,Anthony Martial,Manchester United,31,17
8,,Marcus Rashford,Manchester United,30,17
9,,Harry Kane,Tottenham Hotspur,28,17


Columns:
* RK: Ranking
* P: Games played
* G: Goals scored
* A: Assists

There are quite a few missing (`NaN`) values, which means that player is tied with the player above them, so we can use `fillna(method='ffill')` which means "forward fill" values to replace missing values.

In [3]:
scorers = scorers.fillna(method='ffill')
assists = assists.fillna(method='ffill')
scorers

Unnamed: 0,RK,Name,Team,P,G
0,1.0,Jamie Vardy,Leicester City,34,23
1,2.0,Danny Ings,Southampton,37,21
2,3.0,Pierre-Emerick Aubameyang,Arsenal,35,20
3,4.0,Mohamed Salah,Liverpool,33,19
4,4.0,Raheem Sterling,Manchester City,32,19
5,6.0,Raúl Jiménez,Wolverhampton Wanderers,37,17
6,6.0,Sadio Mané,Liverpool,34,17
7,6.0,Anthony Martial,Manchester United,31,17
8,6.0,Marcus Rashford,Manchester United,30,17
9,6.0,Harry Kane,Tottenham Hotspur,28,17


In [4]:
assists

Unnamed: 0,RK,Name,Team,P,A
0,1.0,Kevin De Bruyne,Manchester City,34,19
1,2.0,Trent Alexander-Arnold,Liverpool,37,13
2,3.0,Andy Robertson,Liverpool,35,11
3,4.0,Mohamed Salah,Liverpool,33,10
4,4.0,Son Heung-Min,Tottenham Hotspur,29,10
5,4.0,David Silva,Manchester City,26,10
6,7.0,Adama Traoré,Wolverhampton Wanderers,36,9
7,7.0,Riyad Mahrez,Manchester City,32,9
8,9.0,Harvey Barnes,Leicester City,35,8
9,10.0,Roberto Firmino,Liverpool,37,7


## Research Question

**Does having more top scoring or top assisting players on a team correlate to that team having a higher standing?**

To answer this question, 

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)