<details><summary style="display:list-item; font-size:16px; color:blue;">Jupyter Help</summary>
    
Having trouble testing your work? Double-check that you have followed the steps below to write, run, save, and test your code!
    
[Click here for a walkthrough GIF of the steps below](https://static-assets.codecademy.com/Courses/ds-python/jupyter-help.gif)

Run all initial cells to import libraries and datasets. Then follow these steps for each question:
    
1. Add your solution to the cell with `## YOUR SOLUTION HERE ## `.
2. Run the cell by selecting the `Run` button or the `Shift`+`Enter` keys.
3. Save your work by selecting the `Save` button, the `command`+`s` keys (Mac), or `control`+`s` keys (Windows).
4. Select the `Test Work` button at the bottom left to test your work.

![Screenshot of the buttons at the top of a Jupyter Notebook. The Run and Save buttons are highlighted](https://static-assets.codecademy.com/Paths/ds-python/jupyter-buttons.png)

#### Setup

Start by running the cell below to import `pandas` and `numpy`, and read-in the `results.csv` dataset.

In [1]:
import pandas as pd
import numpy as np
results = pd.read_csv('results.csv')
results.head()

Unnamed: 0,date,year,home_team,away_team,home_score,away_score,total_goals,win_margin,tournament,city,country,neutral
0,2000-01-04,2000,Egypt,Togo,2,1,3,1,Friendly,Aswan,Egypt,False
1,2000-01-07,2000,Tunisia,Togo,7,0,7,7,Friendly,Tunis,Tunisia,False
2,2000-01-08,2000,Trinidad and Tobago,Canada,0,0,0,0,Friendly,Port of Spain,Trinidad and Tobago,False
3,2000-01-09,2000,Burkina Faso,Gabon,1,1,2,0,Friendly,Ouagadougou,Burkina Faso,False
4,2000-01-09,2000,Guatemala,Armenia,1,1,2,0,Friendly,Los Angeles,United States,True


#### 1. Split

We've already added to code to create two new columns:
- `home_win` is `True` whenever the home team won
- `away_win` is `True` whenever the away team won

We have also filtered the DataFrame so we're only analyzing non-neutral games (i.e. games where one team is genuinely at home and the other genuinely away.)

Create two DataFrames from `results`:

- `home`, containing the `home_team` and `home_win` columns
- `away`, containing the `away_team` and `away_win` columns

In [2]:
# create the new Boolean columns
results['home_win'] = results['win_margin'] > 0
results['away_win'] = results['win_margin'] < 0

# filter to non-neutral games
results = results[~results['neutral']]

## YOUR SOLUTION HERE ##
home = results[['home_team','home_win']]
away = results[['away_team','away_win']]

# show home
home

Unnamed: 0,home_team,home_win
0,Egypt,True
1,Tunisia,True
2,Trinidad and Tobago,False
3,Burkina Faso,False
5,Ivory Coast,True
...,...,...
17086,Thailand,False
17087,Vietnam,True
17089,Singapore,True
17092,Iraq,True


#### 2. Apply

With each piece, we want to calculate the corresponding win percent.

With `home`:

- group by `home_team`
- compute the mean of `home_win` (the mean of a Boolean is the percent of `True`s!)
- reset the index
- assign the result to `home_win_percent`

With `away`:

- group by `away_team`
- compute the mean of `away_win` (the mean of a Boolean is the percent of `True`s!)
- reset the index
- assign the result to `away_win_percent`.

In [3]:
## YOUR SOLUTION HERE ##
home_win_percent = home.groupby('home_team').agg({'home_win':'mean'}).reset_index()

away_win_percent = away.groupby('away_team').agg({'away_win':'mean'}).reset_index()

# show sample output
home_win_percent

Unnamed: 0,home_team,home_win
0,Abkhazia,0.000000
1,Afghanistan,0.500000
2,Albania,0.434343
3,Algeria,0.660000
4,Andalusia,0.750000
...,...,...
233,Ynys Môn,1.000000
234,Yorkshire,0.750000
235,Zambia,0.486842
236,Zanzibar,0.333333


#### 3. Combine

Merge `home_win_percent` on the left with `away_win_percent` on the right, matching `home_team` to `away_team`. Use an `outer` join to include teams that might have only played home or away. Assign the merged DataFrame to the variable `comparison`.

In [4]:
## YOUR SOLUTION HERE ##
comparison = pd.merge(
    left = home_win_percent, 
    right = away_win_percent, 
    left_on=['home_team'], 
    right_on=['away_team'],
    how = 'outer')

comparison.head()

Unnamed: 0,home_team,home_win,away_team,away_win
0,Abkhazia,0.0,Abkhazia,0.0
1,Afghanistan,0.5,Afghanistan,0.071429
2,Albania,0.434343,Albania,0.222222
3,Algeria,0.66,Algeria,0.283582
4,Andalusia,0.75,Andalusia,1.0


#### 4.  Analysis

Create a new column `diff` in `comparison` that is equal to `home_win` - `away_win`.

We've already included a call of `.describe`. What do you notice? Does this indicate a potential answer to the data question?

In [5]:
comparison['diff'] = comparison['home_win'] - comparison['away_win']

# show output
comparison['diff'].describe()

count    230.000000
mean       0.232303
std        0.165313
min       -0.400000
25%        0.140719
50%        0.228230
75%        0.329301
max        1.000000
Name: diff, dtype: float64

<details>
    <summary style="display:list-item; font-size:16px; color:blue;"><i>What did we learn from SAC? Toggle to check!</i></summary>

There are 230 teams in our dataset that played both home and away games (and so we could compute the difference.)
    
The average difference is `.23`, which means that on average these teams had a home win percentage 23 points higher than their away win percentage!

</details>