## [mlcourse.ai](https://mlcourse.ai) – Open Machine Learning Course 
Author: Arina Lopukhova (@erynn). Edited by [Yury Kashnitskiy](https://yorko.github.io) (@yorko) and Vadim Shestopalov (@vchulski). This material is subject to the terms and conditions of the [Creative Commons CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. Free use is permitted for any non-commercial purpose.

The dataset has the following features:

- __ID__ - Unique number for each athlete
- __Name__ - Athlete's name
- __Sex__ - M or F
- __Age__ - Integer
- __Height__ - In centimeters
- __Weight__ - In kilograms
- __Team__ - Team name
- __NOC__ - National Olympic Committee 3-letter code
- __Games__ - Year and season
- __Year__ - Integer
- __Season__ - Summer or Winter
- __City__ - Host city
- __Sport__ - Sport
- __Event__ - Event
- __Medal__ - Gold, Silver, Bronze, or NA

In [4]:
import pandas as pd
import numpy as np
import math

In [5]:
# Change the path to the dataset file if needed. 
PATH = 'athlete_events.csv'

In [6]:
data = pd.read_csv(PATH)
data.head()

Unnamed: 0,ID,Name,Sex,Age,Height,Weight,Team,NOC,Games,Year,Season,City,Sport,Event,Medal
0,1,A Dijiang,M,24.0,180.0,80.0,China,CHN,1992 Summer,1992,Summer,Barcelona,Basketball,Basketball Men's Basketball,
1,2,A Lamusi,M,23.0,170.0,60.0,China,CHN,2012 Summer,2012,Summer,London,Judo,Judo Men's Extra-Lightweight,
2,3,Gunnar Nielsen Aaby,M,24.0,,,Denmark,DEN,1920 Summer,1920,Summer,Antwerpen,Football,Football Men's Football,
3,4,Edgar Lindenau Aabye,M,34.0,,,Denmark/Sweden,DEN,1900 Summer,1900,Summer,Paris,Tug-Of-War,Tug-Of-War Men's Tug-Of-War,Gold
4,5,Christine Jacoba Aaftink,F,21.0,185.0,82.0,Netherlands,NED,1988 Winter,1988,Winter,Calgary,Speed Skating,Speed Skating Women's 500 metres,


__1. How old were the youngest male and female participants of the 1992 Olympics?__


- 16 and 15
- 14 and 13 
- 13 and 11
- 11 and 12

In [37]:
male_set = data[(data["Sex"]=="M")&(data["Year"]==1992)]
female_set = data[(data["Sex"]=="F")&(data["Year"]==1992)]
print(min(male_set["Age"]))
print(min(female_set["Age"]))

11.0
12.0


__2. What was the percentage of male basketball players among all the male participants of the 2012 Olympics? Round the answer to the first decimal.__

*Hint:* drop duplicate athletes where necessary to count each athlete just once. This applies to other questions too. 

- 0.2
- 1.5 
- 2.5
- 7.7

In [38]:
basket_set = data[(data["Sex"]=="M")&(data["Year"]==2012)&(data["Sport"]=="Basketball")]
basket_set = basket_set.drop_duplicates(subset="Name")
man_set = data[(data["Sex"]=="M")&(data["Year"]==2012)]
man_set = man_set.drop_duplicates(subset="Name")
res = round(basket_set.shape[0]/man_set.shape[0]*100,1)

print(res)


2.5


__3. What are the mean and standard deviation of height for female tennis players who participated in the 2000 Olympics? Round the answer to the first decimal.__

- 171.8 and 6.5
- 179.4 and 10
- 180.7 and 6.7
- 182.4 and 9.1 

In [40]:
def stdev(nums):
    diffs = 0
    avg = sum(nums)/len(nums)
    for n in nums:
        diffs += (n - avg)**(2)
    return round((diffs/(len(nums)-1))**(0.5), 1)

tennis_set = data[(data["Sex"]=="F")&(data["Year"]==2000)&(data["Sport"]=="Tennis")]
tennis_set = tennis_set.drop_duplicates(subset="Name")
height_set = tennis_set["Height"]
height_set = [x for x in height_set if str(x) != 'nan']
print(round(np.mean(height_set),1))
print(stdev(list(height_set))) 

171.8
6.6


__4. Find the heaviest athlete among 2006 Olympics participants. What sport did he or she do?__


- Judo
- Bobsleigh 
- Skeleton
- Boxing

In [57]:
heavy_set = data[(data["Year"]==2006)]
heavy_set = heavy_set.sort_values(by=['Weight'], ascending=False)
print(list(heavy_set["Sport"])[0])


Skeleton


__5. How many times did John Aalberg participate in the Olympics held in different years?__


- 0
- 1 
- 2
- 3 

In [61]:
john_set = data[(data["Name"]=="John Aalberg")]
john_set = john_set.drop_duplicates(subset="Games")
print(len(john_set))

2


__6. How many gold medals in tennis did the Switzerland team win at the 2008 Olympics?__


- 0
- 1 
- 2
- 3 

In [64]:
goldTennis_set = data[(data["Team"]=="Switzerland")&(data["Year"]==2008)&(data["Sport"]=="Tennis")&(data["Medal"]=="Gold")]
print(len(goldTennis_set))

2


__7. Is it true that Spain won fewer medals than Italy at the 2016 Olympics? Do not consider NaN values in _Medal_ column.__ 


- Yes
- No

In [66]:
Spain_set = data[(data["Team"]=="Spain")&(data["Year"]==2016)]
Italy_set = data[(data["Team"]=="Italy")&(data["Year"]==2016)]
Spain_set = Spain_set["Medal"]
Italy_set = Italy_set["Medal"]
Spain_set = [x for x in Spain_set if str(x) != 'nan']
Italy_set = [x for x in Italy_set if str(x) != 'nan']
print(len(Spain_set))
print(len(Italy_set))
print("Yes")

43
70
Yes


__8. What are the most and least common age groups among the participants of the 2008 Olympics?__


- [45-55] and [25-35) correspondingly
- [45-55] and [15-25) correspondingly
- [35-45) and [25-35) correspondingly
- [45-55] and [35-45) correspondingly

In [81]:
data_set = data[data["Year"]==2008]
data_set = data_set.drop_duplicates(subset="Name")

print('[15-25):', len(data_set[(data_set['Age'] >= 15) & (data_set['Age'] < 25)]))
print('[25-35):', len(data_set[(data_set['Age'] >= 25) & (data_set['Age'] < 35)]))
print('[35-45):', len(data_set[(data_set['Age'] >= 35) & (data_set['Age'] < 45)]))
print('[45-55]:', len(data_set[(data_set['Age'] >= 45) & (data_set['Age'] <= 55)]))
print('[45-55] and [25-35) correspondingly')

[15-25): 4776
[25-35): 5373
[35-45): 630
[45-55]: 78
[45-55] and [25-35) correspondingly


__9. Is it true that there were Summer Olympics held in Atlanta? Is it true that there were Winter Olympics held in Squaw Valley?__


- Yes, Yes
- Yes, No
- No, Yes 
- No, No 

In [88]:
print (len(data[(data['Season'] == 'Summer') & (data['City'] == 'Atlanta')])!=0)
print (len(data[(data['Season'] == 'Winter') & (data['City'] == 'Squaw Valley')])!=0)

True
True


__10. What is the absolute difference between the number of unique sports at the 1986 Olympics and 2002 Olympics?__


- 3 
- 10
- 15
- 27 

In [7]:
data1986_set = data.loc[data['Year'] == 1986, ['Sport']]['Sport'].nunique()
data2002_set = data.loc[data['Year'] == 2002, ['Sport']]['Sport'].nunique()
print(data2002_set - data1986_set)

15


That's it! Now go and do 30 push-ups! :)