# Module 1 Code Challenge

This code challenge is designed to test your understanding of the Module 1 material. It covers:

- Pandas
- Data Visualization
- Exploring Statistical Data
- Python Data Structures

_Read the instructions carefully._ You will be asked both to write code and to respond to a few short answer questions.

### Note on the short answer questions

For the short answer questions _please use your own words_. The expectation is that you have **not** copied and pasted from an external source, even if you consult another source to help craft your response. While the short answer questions are not necessarily being assessed on grammatical correctness or sentence structure, you should do your best to communicate yourself clearly.

---
## Part 1: Pandas [Suggested Time: 15 minutes]
---

In this section you will be doing some preprocessing for a dataset for the videogame [FIFA19](https://www.kaggle.com/karangadiya/fifa19). The dataset contains both data for the game as well as information about the players' real life careers.

In [1]:
# Run this cell without changes

import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

### 1.1) Read the CSV file into a pandas DataFrame

The data you'll be working with is in a file called `'./data/fifa.csv'`. Use your knowledge of pandas to create a new DataFrame, called `df`, using the data from this CSV file. 

Check the contents of the first few rows of your DataFrame, then show the size of the DataFrame. 

In [2]:
# Replace None with appropriate code
df = pd.read_csv('./data/fifa.csv')

In [3]:
# Code here to check the first few rows of the DataFrame
df.head()

Unnamed: 0,ID,Name,Age,Photo,Nationality,Flag,Overall,Potential,Club,Club Logo,...,Composure,Marking,StandingTackle,SlidingTackle,GKDiving,GKHandling,GKKicking,GKPositioning,GKReflexes,Release Clause
0,158023,L. Messi,31,https://cdn.sofifa.org/players/4/19/158023.png,Argentina,https://cdn.sofifa.org/flags/52.png,94,94,FC Barcelona,https://cdn.sofifa.org/teams/2/light/241.png,...,96.0,33.0,28.0,26.0,6.0,11.0,15.0,14.0,8.0,226500.0
1,20801,Cristiano Ronaldo,33,https://cdn.sofifa.org/players/4/19/20801.png,Portugal,https://cdn.sofifa.org/flags/38.png,94,94,Juventus,https://cdn.sofifa.org/teams/2/light/45.png,...,95.0,28.0,31.0,23.0,7.0,11.0,15.0,14.0,11.0,127100.0
2,190871,Neymar Jr,26,https://cdn.sofifa.org/players/4/19/190871.png,Brazil,https://cdn.sofifa.org/flags/54.png,92,93,Paris Saint-Germain,https://cdn.sofifa.org/teams/2/light/73.png,...,94.0,27.0,24.0,33.0,9.0,9.0,15.0,15.0,11.0,228100.0
3,193080,De Gea,27,https://cdn.sofifa.org/players/4/19/193080.png,Spain,https://cdn.sofifa.org/flags/45.png,91,93,Manchester United,https://cdn.sofifa.org/teams/2/light/11.png,...,68.0,15.0,21.0,13.0,90.0,85.0,87.0,88.0,94.0,138600.0
4,192985,K. De Bruyne,27,https://cdn.sofifa.org/players/4/19/192985.png,Belgium,https://cdn.sofifa.org/flags/7.png,91,92,Manchester City,https://cdn.sofifa.org/teams/2/light/10.png,...,88.0,68.0,58.0,51.0,15.0,13.0,5.0,10.0,13.0,196400.0


In [4]:
# Code here to see the size of the DataFrame
df.shape

(18207, 88)

In [28]:
df[(df['Release Clause'] != 'none') or (df['Release Clause'] != 'not given')]

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

### 1.2) Drop rows with missing values for `'Release Clause'`
    
Drop rows for which "Release Clause" is none or not given. This is part of a soccer player's contract dealing with being bought out by another team. After you have dropped them, see how many rows are remaining.

In [29]:
# Code here to drop rows with missing values for 'Release Clause'
df[df['Release Clause'] != 'none']

Unnamed: 0,ID,Name,Age,Photo,Nationality,Flag,Overall,Potential,Club,Club Logo,...,Composure,Marking,StandingTackle,SlidingTackle,GKDiving,GKHandling,GKKicking,GKPositioning,GKReflexes,Release Clause
0,158023,L. Messi,31,https://cdn.sofifa.org/players/4/19/158023.png,Argentina,https://cdn.sofifa.org/flags/52.png,94,94,FC Barcelona,https://cdn.sofifa.org/teams/2/light/241.png,...,96.0,33.0,28.0,26.0,6.0,11.0,15.0,14.0,8.0,226500.0
1,20801,Cristiano Ronaldo,33,https://cdn.sofifa.org/players/4/19/20801.png,Portugal,https://cdn.sofifa.org/flags/38.png,94,94,Juventus,https://cdn.sofifa.org/teams/2/light/45.png,...,95.0,28.0,31.0,23.0,7.0,11.0,15.0,14.0,11.0,127100.0
2,190871,Neymar Jr,26,https://cdn.sofifa.org/players/4/19/190871.png,Brazil,https://cdn.sofifa.org/flags/54.png,92,93,Paris Saint-Germain,https://cdn.sofifa.org/teams/2/light/73.png,...,94.0,27.0,24.0,33.0,9.0,9.0,15.0,15.0,11.0,228100.0
3,193080,De Gea,27,https://cdn.sofifa.org/players/4/19/193080.png,Spain,https://cdn.sofifa.org/flags/45.png,91,93,Manchester United,https://cdn.sofifa.org/teams/2/light/11.png,...,68.0,15.0,21.0,13.0,90.0,85.0,87.0,88.0,94.0,138600.0
4,192985,K. De Bruyne,27,https://cdn.sofifa.org/players/4/19/192985.png,Belgium,https://cdn.sofifa.org/flags/7.png,91,92,Manchester City,https://cdn.sofifa.org/teams/2/light/10.png,...,88.0,68.0,58.0,51.0,15.0,13.0,5.0,10.0,13.0,196400.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18202,238813,J. Lundstram,19,https://cdn.sofifa.org/players/4/19/238813.png,England,https://cdn.sofifa.org/flags/14.png,47,65,Crewe Alexandra,https://cdn.sofifa.org/teams/2/light/121.png,...,45.0,40.0,48.0,47.0,10.0,13.0,7.0,8.0,9.0,143000.0
18203,243165,N. Christoffersson,19,https://cdn.sofifa.org/players/4/19/243165.png,Sweden,https://cdn.sofifa.org/flags/46.png,47,63,Trelleborgs FF,https://cdn.sofifa.org/teams/2/light/703.png,...,42.0,22.0,15.0,19.0,10.0,9.0,9.0,5.0,12.0,113000.0
18204,241638,B. Worman,16,https://cdn.sofifa.org/players/4/19/241638.png,England,https://cdn.sofifa.org/flags/14.png,47,67,Cambridge United,https://cdn.sofifa.org/teams/2/light/1944.png,...,41.0,32.0,13.0,11.0,6.0,5.0,10.0,6.0,13.0,165000.0
18205,246268,D. Walker-Rice,17,https://cdn.sofifa.org/players/4/19/246268.png,England,https://cdn.sofifa.org/flags/14.png,47,66,Tranmere Rovers,https://cdn.sofifa.org/teams/2/light/15048.png,...,46.0,20.0,25.0,27.0,14.0,6.0,14.0,8.0,9.0,143000.0


In [20]:
# Code here to check how many rows are left 
df.shape

(18207, 88)

### 1.3) Convert the `'Release Clause'` Price from Euros to Dollars

Now that there are no missing values, we can change the values in the `'Release Clause'` column from Euro to Dollar amounts.

Assume the current exchange rate is `1 Euro = 1.2 Dollars`

In [25]:
# Code here to convert the column of euros to dollars
df['Release Clause'] * 1.2

0        271800.0
1        152520.0
2        273720.0
3        166320.0
4        235680.0
           ...   
18202    171600.0
18203    135600.0
18204    198000.0
18205    171600.0
18206    198000.0
Name: Release Clause, Length: 18207, dtype: float64

---
## Part 2: Data Visualization [Suggested Time: 20 minutes]
---

Continuing to use the same FIFA dataset, plot data using whichever plotting library you are most comfortable with.

In [26]:
# Run this cell without changes

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

### 2.1) Find the top 10 countries with the most players (using the `'Nationality'` column). Create a bar chart showing the number of players from those 10 countries.

Don't forget to add a **title** and **x axis label** to your charts.

If you are unable to find the top 10 countries but want the chance to demonstrate your plotting skills use the following dummy data to create a bar chart: 

```
Country Name  | Num Players
============  | ===========
Country A     | 100
Country B     | 60
Country C     | 125
Country D     | 89
```

In [43]:
# Code here to get the top 10 countries with the most players
df2 = df.groupby('Nationality').count()
df2.sort_values('ID').tail(10)

Unnamed: 0_level_0,ID,Name,Age,Photo,Flag,Overall,Potential,Club,Club Logo,Value,...,Composure,Marking,StandingTackle,SlidingTackle,GKDiving,GKHandling,GKKicking,GKPositioning,GKReflexes,Release Clause
Nationality,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Netherlands,453,453,453,453,453,453,453,453,453,453,...,452,452,452,452,452,452,452,452,452,426
Japan,478,478,478,478,478,478,478,478,478,478,...,478,478,478,478,478,478,478,478,478,455
Colombia,618,618,618,618,618,618,618,618,618,618,...,616,616,616,616,616,616,616,616,616,570
Italy,702,702,702,702,702,702,702,702,702,702,...,699,699,699,699,699,699,699,699,699,579
Brazil,827,827,827,827,827,827,827,827,827,827,...,825,825,825,825,825,825,825,825,825,788
France,914,914,914,914,914,914,914,914,914,914,...,911,911,911,911,911,911,911,911,911,853
Argentina,937,937,937,937,937,937,937,936,937,937,...,936,936,936,936,936,936,936,936,936,833
Spain,1072,1072,1072,1072,1072,1072,1072,1072,1072,1072,...,1071,1071,1071,1071,1071,1071,1071,1071,1071,974
Germany,1198,1198,1198,1198,1198,1198,1198,1198,1198,1198,...,1195,1195,1195,1195,1195,1195,1195,1195,1195,1151
England,1662,1662,1662,1662,1662,1662,1662,1662,1662,1662,...,1657,1657,1657,1657,1657,1657,1657,1657,1657,1475


In [49]:
# Code here to plot a bar chart.  A recommended figsize is (10, 6)
x = df2['Nationality']
y = range(400, 1701)
plt.bar(df2['ID'])
plt.title('Top 10 Countries with Most Players')
plt.xlabel('Country')
plt.ylabel('Number of Players')

plt.show()

KeyError: 'Nationality'

### 2.2) Describe the relationship between `StandingTackle` and `SlidingTackle`, as shown in the scatter plot produced below.

In [None]:
# Run this cell without changes

fig, ax = plt.subplots()

ax.set_title('Standing Tackle vs. Sliding Tackle')
ax.set_xlabel('Standing Tackle')
ax.set_ylabel('Sliding Tackle')

x = df['StandingTackle']
y = df['SlidingTackle']

ax.scatter(x, y)

Please describe in words the relationship between these two features.

In [None]:
# Your written answer here

---
## Part 3: Exploring Statistical Data [Suggested Time: 20 minutes]
---

### 3.1) What are the mean age and the median age for the players in this dataset?

In [50]:
# Code here to find the mean age and median age
df['Age'].mean()

25.122205745043114

In [51]:
df['Age'].median()

25.0

In your own words, how are the mean and median related to each other and what do these values tell us about the distribution of the column `'Age'`? 

# Your written answer here
They both measure the "center" of a dataset. The mean is the age of all the players divided by the number of players. If we were to count the number of players, and pick the number where half of the players were older and half were younger, that player's age would be the median age. The mean is affected by outlyers whereas the median is not.

### 3.2) Who is the oldest player from Argentina and how old is he?
Use the `Nationality` column.

In [66]:
# Code here to find the oldest player in Argentina
df3 = df[df['Nationality'] == 'Argentina']
df3.sort_values('Age')


Unnamed: 0,ID,Name,Age,Photo,Nationality,Flag,Overall,Potential,Club,Club Logo,...,Composure,Marking,StandingTackle,SlidingTackle,GKDiving,GKHandling,GKKicking,GKPositioning,GKReflexes,Release Clause
17177,242074,R. Gómez,16,https://cdn.sofifa.org/players/4/19/242074.png,Argentina,https://cdn.sofifa.org/flags/52.png,55,76,Belgrano de Córdoba,https://cdn.sofifa.org/teams/2/light/111022.png,...,46.0,26.0,35.0,32.0,7.0,13.0,14.0,13.0,6.0,466000.0
15441,243519,V. Burgoa,17,https://cdn.sofifa.org/players/4/19/243519.png,Argentina,https://cdn.sofifa.org/flags/52.png,59,74,Godoy Cruz,https://cdn.sofifa.org/teams/2/light/111706.png,...,54.0,62.0,56.0,60.0,13.0,6.0,14.0,9.0,13.0,623000.0
16038,242548,A. Manzur,17,https://cdn.sofifa.org/players/4/19/242548.png,Argentina,https://cdn.sofifa.org/flags/52.png,58,75,Godoy Cruz,https://cdn.sofifa.org/teams/2/light/111706.png,...,43.0,60.0,54.0,59.0,13.0,7.0,9.0,7.0,13.0,519000.0
10057,246045,P. De la Vega,17,https://cdn.sofifa.org/players/4/19/246045.png,Argentina,https://cdn.sofifa.org/flags/52.png,65,82,Club Atlético Lanús,https://cdn.sofifa.org/teams/2/light/110395.png,...,65.0,26.0,29.0,31.0,8.0,9.0,15.0,8.0,9.0,2300.0
17355,237125,V. Barbero,17,https://cdn.sofifa.org/players/4/19/237125.png,Argentina,https://cdn.sofifa.org/flags/52.png,54,73,Belgrano de Córdoba,https://cdn.sofifa.org/teams/2/light/111022.png,...,52.0,16.0,25.0,25.0,14.0,7.0,11.0,7.0,9.0,291000.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5040,142670,R. Braña,39,https://cdn.sofifa.org/players/4/19/142670.png,Argentina,https://cdn.sofifa.org/flags/52.png,70,70,Estudiantes de La Plata,https://cdn.sofifa.org/teams/2/light/101083.png,...,75.0,71.0,74.0,69.0,15.0,8.0,9.0,8.0,14.0,180000.0
2821,232543,S. Bertoli,40,https://cdn.sofifa.org/players/4/19/232543.png,Argentina,https://cdn.sofifa.org/flags/52.png,73,73,Patronato,https://cdn.sofifa.org/teams/2/light/110581.png,...,44.0,12.0,13.0,11.0,76.0,73.0,78.0,67.0,71.0,392000.0
4187,156483,C. Lucchetti,40,https://cdn.sofifa.org/players/4/19/156483.png,Argentina,https://cdn.sofifa.org/flags/52.png,71,71,Atlético Tucumán,https://cdn.sofifa.org/teams/2/light/111708.png,...,41.0,21.0,22.0,13.0,71.0,68.0,75.0,64.0,75.0,240000.0
1294,14907,A. Bizzarri,40,https://cdn.sofifa.org/players/4/19/14907.png,Argentina,https://cdn.sofifa.org/flags/52.png,76,76,Foggia,https://cdn.sofifa.org/teams/2/light/110911.png,...,60.0,11.0,12.0,11.0,76.0,74.0,66.0,82.0,76.0,840000.0


In [65]:
df3['Age'].max()

41

In [59]:
# Your written answer here
C. Munoz

---
## Part 4: Python Data Structures [Suggested Time: 20 min]
---

In this final section, we will work with various Python data types and try to accomplish certain tasks using some fundamental data structures in Python, rather than using Pandas DataFrames. Below, we've defined a dictionary with soccer player names as keys for nested dictionaries containing information about each player's age, nationality, and a list of teams they have played for.

In [67]:
# Run this cell without changes

players = {
    'L. Messi': {
        'age': 31,
        'nationality': 'Argentina',
        'teams': ['Barcelona']
    },
    'Cristiano Ronaldo': {
        'age': 33,
        'nationality': 'Portugal',
        'teams': ['Juventus', 'Real Madrid', 'Manchester United']
    },
    'Neymar Jr': {
        'age': 26,
        'nationality': 'Brazil',
        'teams': ['Santos', 'Barcelona', 'Paris Saint-German']
    },
    'De Gea': {
        'age': 27,
        'nationality': 'Spain',
        'teams': ['Atletico Madrid', 'Manchester United']
    },
    'K. De Bruyne': {
        'age': 27,
        'nationality': 'Belgium',
        'teams': ['Chelsea', 'Manchester City']
    }
}

### 4.1) Create a `list` of all the keys in the `players` dictionary. Store the list of player names in a variable called `player_names` to use in the next question.

Use [Python's documentation on dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) for help if needed. 

In [71]:
# Replace None with appropriate code to get the list of all player names

player_names = players.keys()
player_names

dict_keys(['L. Messi', 'Cristiano Ronaldo', 'Neymar Jr', 'De Gea', 'K. De Bruyne'])

In [72]:
# Run this cell without changes to check your answer

print(player_names)

dict_keys(['L. Messi', 'Cristiano Ronaldo', 'Neymar Jr', 'De Gea', 'K. De Bruyne'])


### 4.2) Great! Now that we have the names of all players, let's use that information to create a `list` of `tuples` containing each player's name along with their nationality. Store the list in a variable called `player_nationalities`.

In [73]:
# Replace None with appropriate code to generate list of tuples such that 
# the first element is a players name and the second is their nationality 
# Ex: [('L. Messi', 'Argentina'), ('Christiano Ronaldo', 'Portugal'), ...]

player_nationalities = player_names('nationality')

TypeError: 'dict_keys' object is not callable

In [None]:
# Run this cell without changes to check your answer

print(player_nationalities)

### 4.3) Define a function called `get_players_on_team()` that returns a `list` of the names of all the players who have played on a given team.

Your function should take two arguments: 

- a dictionary of player information
- the team name (as a `string`) you are trying to find the players for 

**Be sure that your function has a `return` statement.**

In [None]:
# Code here to define your get_players_on_team() function 


In [None]:
# Run this cell without changes to check your answer

players_on_manchester_united = get_players_on_team(players, 'Manchester United')
print(players_on_manchester_united)