# Lab Tasks

In this notebook we will analyse a dataset related to the London 2012 Olympics by using the Pandas library. In the dataset, each row represents a different country described by a number of features:

- *ISO:* Unique short ISO country code
- *Country*: Full country name
- *Gold*: Number of gold medals won by the country in 2012
- *Silver*: Number of silver medals won by the country in 2012
- *Bronze*: Number of bronze medals won by the country in 2012
- *Population*: 2011 population for the country, from the World Bank

## Task 1

Load the CSV file "olympics2012.csv" into a Pandas DataFrame, where each row is indexed by its ISO country code. Check the number of rows and the column names in the DataFrame.

In [28]:
import pandas as pd
import csv
olympic_medal_data = pd.read_csv("data/olympics2012.csv", index_col = "ISO")
olympic_medal_data.shape

(85, 5)

Display the first 15 rows of the data.

In [29]:
olympic_medal_data.head(15)

Unnamed: 0_level_0,Country,Gold,Silver,Bronze,Population
ISO,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
AFG,Afghanistan,0,0,1,35320445
ARG,Argentina,1,1,2,40764561
ARM,Armenia,0,1,2,3100236
AUS,Australia,7,16,12,22620600
AZE,Azerbaijan,2,2,6,9168000
BEL,Belgium,0,1,2,11008000
BGR,Bulgaria,0,1,1,7476000
BHR,Bahrain,0,0,1,1323535
BHS,Bahamas,1,0,0,347176
BLR,Belarus,3,5,5,9473000


Show the top 10 countries with the highest number of gold medals at the 2012 Olympics.

In [8]:
olympic_medal_data.sort_values(by = "Gold", ascending = False).head(5)

Unnamed: 0_level_0,ISO,Gold,Silver,Bronze,Population
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
United States,USA,46,29,29,311591917
China,CHN,38,27,22,1344130000
Great Britain,GBR,29,17,19,60924000
Russian Federation,RUS,24,25,33,141930000
"Korea, Rep.",KOR,13,8,7,49779000


## Task 2

Create a new column in the DataFrame called "Total" which indicates the total number of medals won by each country. Show the top 10 countries with the highest number of total medals.

In [31]:
olympic_medal_data["Total"] = olympic_medal_data["Gold"] + olympic_medal_data["Silver"] + olympic_medal_data["Bronze"]

In [32]:
olympic_medal_data.sort_values(by = "Total", ascending = False).head(10)

Unnamed: 0_level_0,Country,Gold,Silver,Bronze,Population,Total
ISO,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
USA,United States,46,29,29,311591917,104
CHN,China,38,27,22,1344130000,87
RUS,Russian Federation,24,25,33,141930000,82
GBR,Great Britain,29,17,19,60924000,65
DEU,Germany,11,19,14,81726000,44
JPN,Japan,7,14,17,127817277,38
AUS,Australia,7,16,12,22620600,35
FRA,France,11,11,12,65436552,34
ITA,Italy,8,9,11,60770000,28
KOR,"Korea, Rep.",13,8,7,49779000,28


Display the subset of countries which:
1. Won 20 or more gold medals
2. Won 25 or more total medals
3. Won only bronze medals

In [13]:
olympic_medal_data.loc[(olympic_medal_data["Gold"] > 20) | (olympic_medal_data["Total"] > 25) | ((olympic_medal_data["Bronze"] > 0) & (olympic_medal_data["Silver"] == 0) & (olympic_medal_data["Gold"] == 0))]

Unnamed: 0_level_0,ISO,Gold,Silver,Bronze,Population,Total
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Afghanistan,AFG,0,0,1,35320445,1
Australia,AUS,7,16,12,22620600,35
Bahrain,BHR,0,0,1,1323535,1
China,CHN,38,27,22,1344130000,87
Germany,DEU,11,19,14,81726000,44
France,FRA,11,11,12,65436552,34
Great Britain,GBR,29,17,19,60924000,65
Greece,GRC,0,0,2,11304000,2
Hong Kong,HKG,0,0,1,7071600,1
Italy,ITA,8,9,11,60770000,28


## Task 3

Create a new column called "WeightedTotal", which computes a weighted total for the number of medals won by each country, with weights: Gold=3, Silver=2, Bronze=1. 

Display the top 10 countries according to this score.

In [33]:
olympic_medal_data["WeightedTotal"] = (olympic_medal_data["Gold"] * 3) + (olympic_medal_data["Silver"] * 2) + olympic_medal_data["Bronze"] 
olympic_medal_data.sort_values("WeightedTotal", ascending=False).head(10)

Unnamed: 0_level_0,Country,Gold,Silver,Bronze,Population,Total,WeightedTotal
ISO,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
USA,United States,46,29,29,311591917,104,225
CHN,China,38,27,22,1344130000,87,190
RUS,Russian Federation,24,25,33,141930000,82,155
GBR,Great Britain,29,17,19,60924000,65,140
DEU,Germany,11,19,14,81726000,44,85
FRA,France,11,11,12,65436552,34,67
JPN,Japan,7,14,17,127817277,38,66
AUS,Australia,7,16,12,22620600,35,65
KOR,"Korea, Rep.",13,8,7,49779000,28,62
ITA,Italy,8,9,11,60770000,28,53


## Task 4

Create a new column "TotalPerPop" which is calculated as the total number of medals won by a country per million population.

Display the top 20 countries according to this score.

In [36]:
# convert our population values to millions
normalised_pop = olympic_medal_data["Population"]/1000000

olympic_medal_data["TotalPerPop"] = olympic_medal_data["Total"] / normalised_pop

olympic_medal_data.sort_values("TotalPerPop", ascending=False).head(20)

Unnamed: 0_level_0,Country,Gold,Silver,Bronze,Population,Total,WeightedTotal,TotalPerPop
ISO,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
GRD,Grenada,1,0,0,104890,1,3,9.533797
JAM,Jamaica,4,4,4,2709300,12,24,4.429188
TTO,Trinidad and Tobago,1,0,3,1346350,4,6,2.970996
NZL,New Zealand,5,3,5,4405200,13,26,2.951058
BHS,Bahamas,1,0,0,347176,1,3,2.880383
SVN,Slovenia,1,1,2,2052000,4,7,1.949318
MNG,Mongolia,0,2,3,2800114,5,7,1.785642
HUN,Hungary,8,4,5,9971000,17,37,1.704944
DNK,Denmark,2,4,3,5574000,9,17,1.614639
MNE,Montenegro,0,1,0,632261,1,2,1.581625
