# Homework 03 Part B: Pandas

In this homework assignment, you will import English Premier League (EPL) data from a news website into a pandas dataframe. You will then make some changes to the dataframe and do some data analysis and modelling.

First, run the code below to get the latest EPL table from the BBC sport website. You are getting data from a live webpage, so you may get different information based on when you access it. That is OK!

If you have any trouble accessing the data from the webpage, let me know.

Note: The following code also does a bit of data wrangling to make the pandas table cleaner.

In [None]:
# Import the requests library for making http requests and pandas library for dataframe
import requests as r, pandas as pd, numpy as np

# Create a header that says the request is coming form a browser-like agent
# This is to prevent the website potentially blocking our request
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',
}

# Make an http request to get the webpage with the following url
url = "https://www.bbc.com/sport/football/premier-league/table"
page = r.get(url, headers = headers)
if(page.status_code!=200):
    print("Stopping - couldn't get web page.")
else:
    print("Web request status code:", page.status_code)
    # read_html() will find all the tables in the webpage and put them in a list.
    tables = pd.read_html(page.content)
    # In this case, there is only one table, so the length of the list should be 1.
    print("Tables found on webpage:", len(tables))
    EPL = tables[0]
    print("Preparing dataframe of EPL table.")
    # Now do a bit of data cleaning/wrangling to simplify the dataframe
    EPL.insert(loc=0, value=0, column='Position')
    EPL.drop(columns=['Form, Last 6 games, Oldest first'], inplace=True)
    EPL.insert(loc=1, value='', column='Team New')
    rows = EPL.shape[0]
    for x in range(rows):
        EPL.iloc[x, 0]=x+1
        raw_name = EPL.iloc[x, 2]
        if x < 9:
            EPL.iloc[x, 1] = raw_name[1:len(raw_name)]
        elif x >=9:
            EPL.iloc[x, 1] = raw_name[2:len(raw_name)]
    EPL.drop(columns=['Team'], inplace=True)
    EPL.rename(columns={"Team New":"Team"}, inplace=True)
    pd.set_option('display.precision', 1)

## Task 1
The code above should have grabbed data from the BBC website and then stored it in a pandas dataframe called `EPL`.

Use the `head()` method to view the first few rows of the `EPL` dataframe.

In [None]:
# Type your code here


## Task 2

Use the `shape` property to get the dimensions of the dataframe.

In [None]:
# Type your code here


## Task 3

Use the `rename()` method to change the column title `'Drawn'` to `'Tied'`. (In British English the word 'draw' is used to mean 'tie' in American English.) Remember to include an argument `inplace=True` so that the existing dataframe is updated, rather than creating a copy.

The format for renaming columns in a dataframe called `df` is:
```
df.rename(columns={'Old Column Name':'New Column Name'}, inplace=True)
```

Then use the `head()` method again to check the change has been made. 

In [None]:
# Type your code here


## Task 4

Insert a new column called `'Win %'` between the `'Lost'` column and the `'Goals For'` column. Initially this can have the default value `0`. The code to do this is here:
``` 
    EPL.insert(loc=6, column='Win %', value=0)
```
This inserts a new column in column location 6 with the title 'Win %' and default value 0. 

Also insert a column immediately after than (so location 7) called 'Tie %' with a default value of 0. Then finally insert a column after that (so location 8) called 'Loss %' with a default value of 0. 

Use the `head()` method to confirm you've added the columns in the right place. 

In [None]:
# Type your code here


## Task 5

Now let's create the win percentage values. The win percentage is just the number of games won divided by the number of games played. We can do this easily in pandas with the following formula:
```
EPL['Win %'] = EPL['Won'] / EPL['Played'] * 100
```

In addition to updating the 'Win %' column, you need to update the values in the 'Tie %' and 'Loss %' columns. Once you have done this, each of the 'Win %', 'Tie %' and 'Loss %' columns should have a correct percentage value reflecting the percentage of games each team has won, tied or lost.

Then use `head()` to check your work. 

In [None]:
# Type your code here


## Task 6

Now that we have win, tie and loss percentages for each team, we can create a simple model to predict the likely outcome for various matchups. 

The function below does this. You don't need to understand the details of the code. You task is simply to use it to make a couple of predictions.

Run the code in the cell below first to declare the function `predict()`. Then use the following cell to make two more predictions based on our model, using different teams and the function. An example has been provided for you. 

In [None]:
# This function takes two team names (as strings) and prints out a prediction for a
# matchup between these two teams, based on each team's win/loss/tie percentages.

def predict(team1_name, team2_name):
    # Get index of row for each team
    i = np.where(EPL['Team']==team1_name)[0][0]
    j = np.where(EPL['Team']==team2_name)[0][0]
    team1_winpc = (EPL.loc[i, 'Win %']+EPL.loc[j, 'Loss %']) / 2
    team2_winpc = (EPL.loc[i, 'Loss %']+EPL.loc[j, 'Win %']) / 2
    tie_pc = (EPL.loc[i, 'Tie %']+EPL.loc[j, 'Tie %']) / 2
    print(team1_name+" vs "+team2_name + " prediction:\n")
    print(f"\t{team1_name} win %: {team1_winpc:.1f}%")
    print(f"\t{team2_name} win %: {team2_winpc:.1f}%")
    print(f"\tTie %: {tie_pc:.1f}%")
    mostlikely = max(team1_winpc, team2_winpc, tie_pc)
    if team1_winpc == mostlikely:
        print("\tMost likely outcome: " + team1_name + " win")
    elif team2_winpc == mostlikely:
        print("\tMost likely outcome: " + team2_name + " win")
    else:
        print("\tMost likely outcome: Tie")
    print("")

In [None]:
# Add TWO more function calls to the one below with different teams to see the predicted outcome.

predict('Arsenal', 'Manchester United')

# Two more teams - type your code here


# End of Homework 03 Part B

Remember to download your work as a `.ipynb` file and upload it to D2L, using the filename `Homework_03B.ipynb`.