# Phase 1 Code Challenge
This code challenge is designed to test your understanding of the Phase 1 material. It covers:

- Pandas
- Data Visualization
- Exploring Statistical Data
- Python Data Structures

*Read the instructions carefully.* Your code will need to meet detailed specifications to pass automated tests.

## Code Tests

We have provided some code tests for you to run to check that your work meets the item specifications. Passing these tests does not necessarily mean that you have gotten the item correct - there are additional hidden tests. However, if any of the tests do not pass, this tells you that your code is incorrect and needs changes to meet the specification. To determine what the issue is, read the comments in the code test cells, the error message you receive, and the item instructions.

---
## Part 1: Pandas [Suggested Time: 15 minutes]
---
In this part, you will preprocess a dataset from the video game [FIFA19](https://www.kaggle.com/karangadiya/fifa19), which contains data from the players' real-life careers.

In [None]:
# Run this cell

import pandas as pd
import numpy as np
from numbers import Number
import warnings
warnings.filterwarnings('ignore')

### 1.1) Read `fifa.csv` into a pandas DataFrame named `df`

Use pandas to create a new DataFrame, called `df`, containing the data from the dataset in the file `fifa.csv` in the folder containing this notebook. 

Hint: Use the string `'./fifa.csv'` as the file reference.

**Starter Code**

    df = 

In [None]:
### BEGIN SOLUTION

df = pd.read_csv('./fifa.csv')

### END SOLUTION

In [None]:
# This test confirms that you have created a DataFrame named df

assert type(df) == pd.DataFrame

### BEGIN HIDDEN TESTS

_test_df = pd.read_csv('./fifa.csv')
assert df.equals(_test_df)

### END HIDDEN TESTS

### 1.2) Convert the `'Release Clause'` values from Euros to dollars

The `'Release Clause'` variable contains prices denominated in Euros. Use the exchange rate `1 Euro = 1.2 Dollars` to convert the prices to dollars. 

In [None]:
### BEGIN SOLUTION

df['Release Clause'] = df['Release Clause'] * 1.2

### END SOLUTION

In [None]:
# Ignore this cell

### BEGIN HIDDEN TESTS

_test_df_1p2 = _test_df['Release Clause'] * 1.2
assert df['Release Clause'].equals(_test_df_1p2)

### END HIDDEN TESTS

### 1.3) Drop rows from `df` with missing values for the `'Release Clause'` feature.
    
Make sure that `df` remains the name of the dataset with the dropped rows.

In [None]:
### BEGIN SOLUTION

df = df.dropna(subset=['Release Clause'])

### END SOLUTION

In [None]:
# This test confirms that your dataset has the correct number of observations after dropping

assert df['Release Clause'].shape[0] == 16643

### BEGIN HIDDEN TESTS

assert df['Release Clause'].isna().sum() == 0

### END HIDDEN TESTS

### 1.4) Create a list `top_10_countries` containing the names of the 10 countries with the most players (using the `'Nationality'` column).

Hint: Your answer should include England, Germany, Spain, France, and Argentina

**Starter Code**

    top_10_countries = 

In [None]:
### BEGIN SOLUTION

top_10_countries = df['Nationality'].value_counts()[0:10]
top_10_countries = list(top_10_countries.index)

### END SOLUTION

In [None]:
# This test confirms that you have created a list named top_10_countries

assert type(top_10_countries) == list

# This test confirms that top_10_countries contains England, Germany, Spain, France, and Argentina

assert set(['England', 'Germany', 'Spain', 'France', 'Argentina']).issubset(set(top_10_countries))

### BEGIN HIDDEN TESTS

_test_top_10 = ['England', 'Germany', 'Spain', 'France', 'Argentina', 'Brazil', 'Italy', 'Colombia', 'Japan', 'Netherlands']
assert set(top_10_countries) == set(_test_top_10)

### END HIDDEN TESTS

## Part 2: Data Visualization [Suggested Time: 20 minutes]
This part uses the same FIFA dataset, and asks you to plot data using `matplotlib`.

In [None]:
# Run this cell

import matplotlib
import matplotlib.pyplot as plt

%matplotlib inline

### 2.1) Create a matplotlib figure `player_count_figure` containing a labeled bar chart with the number of players from England, Germany, Spain, France, and Argentina

Use the strings provided below (`bar_chart_title`, `bar_chart_count_label`, and `bar_chart_series_label`) to title and label your bar chart. 

Hint: These are the countries with the top 5 numbers of players, so you may be able to adapt some of the code you used for question 1.4. If you were unable to complete 1.4, use the following values:

```
Country Name  | Num Players
============  | ===========
England       | 1000
Germany       | 900
Spain         | 800
France        | 700
Argentina     | 600
```

**Starter Code**

    player_count_figure, ax = plt.subplots(figsize=(10, 6))

In [None]:
bar_chart_countries = ['England', 'Germany', 'Spain', 'France', 'Argentina']

bar_chart_title = '5 Countries with the Most Players'
bar_chart_count_label = 'Number of Players'
bar_chart_series_label = 'Nationality'

### BEGIN SOLUTION

top_5_countries = df['Nationality'].value_counts()[0:5]

player_count_figure, ax = plt.subplots(figsize=(10, 6))

ax.set_title(bar_chart_title)
ax.set_ylabel(bar_chart_count_label)
ax.set_xlabel(bar_chart_series_label)

labels = list(top_5_countries.index)
values = list(top_5_countries.values)
ax.bar(labels, values)

### END SOLUTION

In [None]:
# This test confirms that you have created a figure named player_count_figure

assert type(player_count_figure) == plt.Figure

# This test confirms that the figure contains exactly one axis

assert len(player_count_figure.axes) == 1

### BEGIN HIDDEN TESTS

# Check that there are 5 bars with appropriate heights
_test_heights = [x.get_height() for x in player_count_figure.axes[0].findobj() if (type(x) == matplotlib.patches.Rectangle)][:-1]
assert len(_test_heights) == 5
assert set(_test_heights) in ({1662, 1198, 1072, 937, 914},{1475, 1151, 974, 853, 833}, {1000, 900, 800, 700,600})
    
# Check that the 5 countries are included
_test_countries = set([x.get_text() for x in player_count_figure.axes[0].findobj() if (type(x) == matplotlib.text.Text)])
assert {'England', 'Germany', 'Spain', 'France', 'Argentina'}.issubset(_test_countries)
    
### END HIDDEN TESTS

In [None]:
# These tests confirm that the figure has a title and axis labels 

assert player_count_figure.axes[0].get_title() != ''
assert player_count_figure.axes[0].get_ylabel() != ''
assert player_count_figure.axes[0].get_xlabel() != ''

### BEGIN HIDDEN TESTS

# Check labeling

assert player_count_figure.axes[0].get_title() == bar_chart_title
assert player_count_figure.axes[0].get_ylabel() == bar_chart_count_label
assert player_count_figure.axes[0].get_xlabel() == bar_chart_series_label

### END HIDDEN TESTS

### 2.2) Create a matplotlib figure `tackle_figure` containing a labeled scatter plot visualizing the relationship between `StandingTackle` (on X axis) and `SlidingTackle` (on Y axis)

Use the strings provided below (`scatter_plot_title`, `standing_tackle_label`, and `sliding_tackle_label`) to title and label your scatter plot. 

**Starter Code**

    tackle_figure, ax = plt.subplots(figsize=(10, 6))

In [None]:
scatter_plot_title = 'Relationship Between Standing Tackles and Sliding Tackles'
standing_tackle_label = 'Standing Tackles'
sliding_tackle_label = 'Sliding Tackles'

### BEGIN SOLUTION

tackle_figure, ax = plt.subplots(figsize=(10, 6))

ax.set_title(scatter_plot_title)
ax.set_ylabel(standing_tackle_label)
ax.set_xlabel(sliding_tackle_label)

ax.scatter(df['SlidingTackle'].values, df['StandingTackle'].values)

### END SOLUTION

In [None]:
# This test confirms that you have created a figure named tackle_figure

assert type(tackle_figure) == plt.Figure

# This test confirms that the figure contains exactly one axis

assert len(tackle_figure.axes) == 1

### BEGIN HIDDEN TESTS

# Check that it's a scatter plot with at least 16643 points
_test_data_shape = tackle_figure.axes[0].findobj()[0].get_offsets().data.shape
assert _test_data_shape[0] >= 16643
assert _test_data_shape[1] == 2

### END HIDDEN TESTS

## Part 3: Exploring Statistical Data [Suggested Time: 20 minutes]
This part does some exploratory analysis using the same FIFA dataset.

### 3.1) Create numeric variables `mean_age` and `median_age` containing the mean and median player ages (respectively).

**Starter Code**

    mean_age = 
    median_age = 

In [None]:
### BEGIN SOLUTION

mean_age = df['Age'].mean()
median_age = df['Age'].median()

### END SOLUTION

In [None]:
# These tests confirm that you have created numeric variables named mean_age and median_age

assert isinstance(mean_age, Number)
assert isinstance(median_age, Number)

### BEGIN HIDDEN TESTS

_test_df_no_nas = _test_df.dropna(subset=['Release Clause'])

_test_mean = round(mean_age,2)
_test_median = round(median_age,2)

assert (_test_mean == round(_test_df['Age'].mean(),2)) or (_test_mean == round(_test_df_no_nas['Age'].mean(),2))
assert (_test_median == round(_test_df['Age'].median(),2)) or (_test_median == round(_test_df_no_nas['Age'].median(),2))

### END HIDDEN TESTS

### 3.2) Create numeric variables `oldest_argentine_name` and `oldest_argentine_age` containing the name and age (respectively) of the oldest player with Argentina nationality.

**Starter Code**

    oldest_argentine_name = 
    oldest_argentine_age = 

In [None]:
### BEGIN SOLUTION

argentines = df.loc[df['Nationality'] == 'Argentina']

oldest_argentine = argentines.loc[argentines['Age'].idxmax(), ['Name', 'Age']]

oldest_argentine_name = oldest_argentine[0]
oldest_argentine_age = oldest_argentine[1]

### END SOLUTION

In [None]:
# This test confirms that you have created a string variable named oldest_argentine_name

assert type(oldest_argentine_name) == str

# This test confirms that you have created a numeric variable named oldest_argentine_age

assert isinstance(oldest_argentine_age, Number)

### BEGIN HIDDEN TESTS

assert oldest_argentine_name == 'C. Muñoz'
assert int(oldest_argentine_age) == 41

### END HIDDEN TESTS

## Part 4: Python Data Structures [Suggested Time: 20 min]

Below is a dictionary `players` with information about soccer players. The keys are player names and the values are dictionaries containing each player's age, nationality, and a list of teams they have played for.

In [None]:
# Run this cell

players = {
    'L. Messi': {
        'age': 31,
        'nationality': 'Argentina',
        'teams': ['Barcelona']
    },
    'Cristiano Ronaldo': {
        'age': 33,
        'nationality': 'Portugal',
        'teams': ['Juventus', 'Real Madrid', 'Manchester United']
    },
    'Neymar Jr': {
        'age': 26,
        'nationality': 'Brazil',
        'teams': ['Santos', 'Barcelona', 'Paris Saint-German']
    },
    'De Gea': {
        'age': 27,
        'nationality': 'Spain',
        'teams': ['Atletico Madrid', 'Manchester United']
    },
    'K. De Bruyne': {
        'age': 27,
        'nationality': 'Belgium',
        'teams': ['Chelsea', 'Manchester City']
    }
}

### 4.1) Create a list `player_names` of all the player names in dictionary `players`. 

**Starter Code**

    player_names = 

In [None]:
### BEGIN SOLUTION

player_names = list(players.keys())

### END SOLUTION

In [None]:
# This test confirms that you have created a list named player_names

assert type(player_names) == list

### BEGIN HIDDEN TESTS

assert set(player_names) == set(list(players.keys()))

### END HIDDEN TESTS

### 4.2) Create a list of tuples `player_nationalities` containing each player's name along with their nationality.

**Starter Code**

    player_nationalities = 

In [None]:
### BEGIN SOLUTION

player_nationalities = [(name, players[name]['nationality']) for name in player_names]

### END SOLUTION

In [None]:
# This test confirms that you have created a list named player_nationalities

assert type(player_nationalities) == list

### BEGIN HIDDEN TESTS

assert player_nationalities == [(name, players[name]['nationality']) for name in player_names]

### END HIDDEN TESTS

### 4.3) Define a function `get_players_on_team()` that returns a list of names of all the players who have played on a given team.

Your function should take two arguments:

- A dictionary of player information
- A string containing a team name (for which you are trying to find the player names)

**Starter Code**

    def get_players_on_team(player_dict, team_name):
        player_list = []
    
        return player_list

In [None]:
### BEGIN SOLUTION

def get_players_on_team(player_dict, team_name):
    player_list = []
    for player in player_dict:
        if team_name in player_dict[player]['teams']:
            player_list.append(player)
    return player_list

### END SOLUTION

In [None]:
# This test confirms that get_players_on_team() returns the right names for Manchester United

manchester_united_players = ['Cristiano Ronaldo', 'De Gea']
players_on_manchester_united = get_players_on_team(players, 'Manchester United')

assert players_on_manchester_united == manchester_united_players

### BEGIN HIDDEN TESTS

_test_barcelona_members = get_players_on_team(players, 'Barcelona')
assert _test_barcelona_members == ['L. Messi', 'Neymar Jr']

### END HIDDEN TESTS