## [mlcourse.ai](https://mlcourse.ai) – Open Machine Learning Course 
Author: Arina Lopukhova (@erynn). Edited by [Yury Kashnitskiy](https://yorko.github.io) (@yorko) and Vadim Shestopalov (@vchulski). This material is subject to the terms and conditions of the [Creative Commons CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. Free use is permitted for any non-commercial purpose.

The dataset has the following features:

- __ID__ - Unique number for each athlete
- __Name__ - Athlete's name
- __Sex__ - M or F
- __Age__ - Integer
- __Height__ - In centimeters
- __Weight__ - In kilograms
- __Team__ - Team name
- __NOC__ - National Olympic Committee 3-letter code
- __Games__ - Year and season
- __Year__ - Integer
- __Season__ - Summer or Winter
- __City__ - Host city
- __Sport__ - Sport
- __Event__ - Event
- __Medal__ - Gold, Silver, Bronze, or NA

In [1]:
import pandas as pd

In [2]:
# Change the path to the dataset file if needed. 
PATH = 'athlete_events.csv'

In [3]:
data = pd.read_csv(PATH)
data.head()

Unnamed: 0,ID,Name,Sex,Age,Height,Weight,Team,NOC,Games,Year,Season,City,Sport,Event,Medal
0,1,A Dijiang,M,24.0,180.0,80.0,China,CHN,1992 Summer,1992,Summer,Barcelona,Basketball,Basketball Men's Basketball,
1,2,A Lamusi,M,23.0,170.0,60.0,China,CHN,2012 Summer,2012,Summer,London,Judo,Judo Men's Extra-Lightweight,
2,3,Gunnar Nielsen Aaby,M,24.0,,,Denmark,DEN,1920 Summer,1920,Summer,Antwerpen,Football,Football Men's Football,
3,4,Edgar Lindenau Aabye,M,34.0,,,Denmark/Sweden,DEN,1900 Summer,1900,Summer,Paris,Tug-Of-War,Tug-Of-War Men's Tug-Of-War,Gold
4,5,Christine Jacoba Aaftink,F,21.0,185.0,82.0,Netherlands,NED,1988 Winter,1988,Winter,Calgary,Speed Skating,Speed Skating Women's 500 metres,


__1. How old were the youngest male and female participants of the 1992 Olympics?__


- 16 and 15
- 14 and 13 
- 13 and 11
- <b>11 and 12</b>

In [4]:
male = data[(data.Year == 1992) & (data.Sex == 'M')].Age.min()
female = data[(data.Year == 1992) & (data.Sex == 'F')].Age.min()
male, female

(11.0, 12.0)

__2. What was the percentage of male basketball players among all the male participants of the 2012 Olympics? Round the answer to the first decimal.__

*Hint:* drop duplicate athletes where necessary to count each athlete just once. This applies to other questions too. 

- 0.2
- 1.5 
- <b>2.5</b>
- 7.7

In [6]:
count = data[(data.Year == 2012) & (data.Sex == 'M')].drop_duplicates('Name').ID.count()
basketball_count = data[(data.Year == 2012) & (data.Sex == 'M') & (data.Sport == 'Basketball')].drop_duplicates('Name').ID.count()

percent = round(basketball_count * 100/count, 1)
percent

2.5

__3. What are the mean and standard deviation of height for female tennis players who participated in the 2000 Olympics? Round the answer to the first decimal.__

- <b>171.8 and 6.5</b>
- 179.4 and 10
- 180.7 and 6.7
- 182.4 and 9.1 

In [17]:
tennis = data[(data.Year == 2000) & (data.Sex == 'F') & (data.Sport == 'Tennis')].drop_duplicates('Name')
deviation = tennis.Height.describe()[['mean','std']]
round(deviation, 1)

mean    171.8
std       6.6
Name: Height, dtype: float64

__4. Find the heaviest athlete among 2006 Olympics participants. What sport did he or she do?__


- Judo
- Bobsleigh 
- <b>Skeleton</b>
- Boxing

In [56]:
data[(data.Weight == data[(data.Year == 2006)].Weight.max()) & (data.Year == 2006)].drop_duplicates('Name').Sport

8102    Skeleton
Name: Sport, dtype: object

__5. How many times did John Aalberg participate in the Olympics held in different years?__


- 0
- 1 
- <b>2</b>
- 3 

In [50]:
data[(data.Name == 'John Aalberg')].drop_duplicates('Year').ID.count()

2

__6. How many gold medals in tennis did the Switzerland team win at the 2008 Olympics?__


- 0
- 1 
- <b>2</b>
- 3 

In [57]:
data[(data.Medal == 'Gold') & (data.Sport == 'Tennis') & (data.Team == 'Switzerland') & (data.Year == 2008)].ID.count()

2

__7. Is it true that Spain won fewer medals than Italy at the 2016 Olympics? Do not consider NaN values in _Medal_ column.__ 


- <b>Yes</b>
- No

In [80]:
spain = data[(data.Year == 2016) & (data.Team == 'Spain')].Medal.count()
italy = data[(data.Year == 2016) & (data.Team == 'Italy')].Medal.count()
spain < italy


True

__8. What are the most and least common age groups among the participants of the 2008 Olympics?__


- <b>[45-55] and [25-35) correspondingly</b>
- [45-55] and [15-25) correspondingly
- [35-45) and [25-35) correspondingly
- [45-55] and [35-45) correspondingly

In [100]:
# Каковы наиболее и наименее распространенных возрастных группах среди участников Олимпиады-2008?
g1 = data[(data.Year == 2008) & (15 <= data.Age) & (data.Age < 25)].ID.count()
g2 = data[(data.Year == 2008) & (25 <= data.Age) & (data.Age < 35)].ID.count()
g3 = data[(data.Year == 2008) & (35 <= data.Age) & (data.Age < 45)].ID.count()
g4 = data[(data.Year == 2008) & (45 <= data.Age) & (data.Age <= 55)].ID.count()

print('[15-25) :', g1)
print('[25-35) :', g2)
print('[35-45) :', g3)
print('[45-55] :', g4)

[15-25) : 6294
[25-35) : 6367
[35-45) : 790
[45-55] : 119


__9. Is it true that there were Summer Olympics held in Atlanta? Is it true that there were Winter Olympics held in Squaw Valley?__


- <b>Yes, Yes</b>
- Yes, No
- No, Yes 
- No, No 

In [86]:
sum_Atlanta = data[(data.City == 'Atlanta') & (data.Season == 'Summer')].ID.count() 
win_SV = data[(data.City == 'Squaw Valley') & (data.Season == 'Winter')].ID.count() 
sum_Atlanta > 0, win_SV > 0 

(True, True)

__10. What is the absolute difference between the number of unique sports at the 1986 Olympics and 2002 Olympics?__


- 3 
- 10
- <b>15</b>
- 27 

In [81]:
abs(data[(data.Year == 2002)].Sport.nunique() - data[(data.Year == 1986)].Sport.nunique())

15

That's it! Now go and do 30 push-ups! :)