In [1]:
import pandas as pd

# Olympics Data

In this assignment, we'll be working with [this data set](https://raw.githubusercontent.com/michaelschung/bc-data-processing/master/datasets/olympics-clean.csv), which contains Wikipedia data summarizing the medals that various countries have won at the Olympics.

Go ahead and click the link, and take a look at the file. You'll notice that the first line is a list of column names, and each line that follows holds the data for a particular country, in alphabetical order.

## 1) Reading the CSV

In the code cell below, read the CSV file into a DataFrame called `olympics`. Make sure that the column of country names gets used as the index.

In [2]:
olympics_filename = 'olympics.csv'

# Import the CSV file into a DataFrame with the name specified in the instructions above
olympics = pd.read_csv(olympics_filename, index_col=0)

## 2) Previewing the DataFrame

Display the first 15 rows of the the `olympics` DataFrame.

In [5]:
# Display the first 15 rows of olympics
olympics.head(15)

Unnamed: 0,#Summer,sGold,sSilver,sBronze,sTotal,#Winter,wGold,wSilver,wBronze,wTotal,#Games,Gold,Silver,Bronze,Total
Afghanistan (AFG),13,0,0,2,2,0,0,0,0,0,13,0,0,2,2
Algeria (ALG),12,5,2,8,15,3,0,0,0,0,15,5,2,8,15
Argentina (ARG),23,18,24,28,70,18,0,0,0,0,41,18,24,28,70
Armenia (ARM),5,1,2,9,12,6,0,0,0,0,11,1,2,9,12
Australasia (ANZ) [ANZ],2,3,4,5,12,0,0,0,0,0,2,3,4,5,12
Australia (AUS) [AUS] [Z],25,139,152,177,468,18,5,3,4,12,43,144,155,181,480
Austria (AUT),26,18,33,35,86,22,59,78,81,218,48,77,111,116,304
Azerbaijan (AZE),5,6,5,15,26,5,0,0,0,0,10,6,5,15,26
Bahamas (BAH),15,5,2,5,12,0,0,0,0,0,15,5,2,5,12
Bahrain (BRN),8,0,0,1,1,0,0,0,0,0,8,0,0,1,1


Look carefully at this table, and take the time to understand it before moving on to querying.

Each row consists of the following pieces of information. Corresponding column names are in **bold**.

- `[Index]`: Country name and abbreviation
- Summer statistics
  - **`#Summer`**: Summer Olympic Games competed
  - **`sGold`**: Gold medals won in the summer
  - **`sSilver`**: Silver medals won in the summer
  - **`sBronze`**: Bronze medals won in the summer
  - **`sTotal`**: Total medals won in the summer (sum of previous 3 columns)
- Winter statistics
  - **`#Winter`**: Winter Olympic Games competed
  - **`wGold`**: Gold medals won in the winter
  - **`wSilver`**: Silver medals won in the winter
  - **`wBronze`**: Bronze medals won in the winter
  - **`wTotal`**: Total medals won in the winter (sum of previous 3 columns)
- Combined statistics
  - **`#Games`**: Total Olympic Games competed (sum of **`#Summer`** and **`#Winter`**)
  - **`Gold`**: Total gold medals won (sum of **`sGold`** and **`wGold`**)
  - **`Silver`**: Total silver medals won (sum of **`sSilver`** and **`wSilver`**)
  - **`Bronze`**: Total bronze medals won (sum of **`sBronze`** and **`wBronze`**)
- **`Total`**: Total medals won (sum of **`sTotal`** and **`wTotal`**)

## 3) Basic Queries

### a) Summer Gold

Write a query that selects all countries that have won at least one gold medal in the Summer Games.

In [6]:
# Write your code for 3a here
olympics[olympics['sGold'] > 0]

Unnamed: 0,#Summer,sGold,sSilver,sBronze,sTotal,#Winter,wGold,wSilver,wBronze,wTotal,#Games,Gold,Silver,Bronze,Total
Algeria (ALG),12,5,2,8,15,3,0,0,0,0,15,5,2,8,15
Argentina (ARG),23,18,24,28,70,18,0,0,0,0,41,18,24,28,70
Armenia (ARM),5,1,2,9,12,6,0,0,0,0,11,1,2,9,12
Australasia (ANZ) [ANZ],2,3,4,5,12,0,0,0,0,0,2,3,4,5,12
Australia (AUS) [AUS] [Z],25,139,152,177,468,18,5,3,4,12,43,144,155,181,480
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Venezuela (VEN),17,2,2,8,12,4,0,0,0,0,21,2,2,8,12
Yugoslavia (YUG) [YUG],16,26,29,28,83,14,0,3,1,4,30,26,32,29,87
Zimbabwe (ZIM) [ZIM],12,3,4,1,8,1,0,0,0,0,13,3,4,1,8
Mixed team (ZZX) [ZZX],3,8,5,4,17,0,0,0,0,0,3,8,5,4,17


### b) Precise Winter

Write a query that selects all countries that have won exactly 6 silver medals in the Winter Games.

In [7]:
# Write your code for 3b here
olympics[olympics['wSilver'] == 6]

Unnamed: 0,#Summer,sGold,sSilver,sBronze,sTotal,#Winter,wGold,wSilver,wBronze,wTotal,#Games,Gold,Silver,Bronze,Total
Croatia (CRO),6,6,7,10,23,7,4,6,1,11,13,10,13,11,34
United Team of Germany (EUA) [EUA],3,28,54,36,118,3,8,6,5,19,6,36,60,41,137
Unified Team (EUN) [EUN],1,45,38,29,112,1,9,6,8,23,2,54,44,37,135


### c) High Scorers

Write a query that selects all countries that have won at least 20 medals total (across both the Summer and Winter Games).

In [8]:
# Write your code for 3c here
olympics[olympics['Total'] >= 20]

Unnamed: 0,#Summer,sGold,sSilver,sBronze,sTotal,#Winter,wGold,wSilver,wBronze,wTotal,#Games,Gold,Silver,Bronze,Total
Argentina (ARG),23,18,24,28,70,18,0,0,0,0,41,18,24,28,70
Australia (AUS) [AUS] [Z],25,139,152,177,468,18,5,3,4,12,43,144,155,181,480
Austria (AUT),26,18,33,35,86,22,59,78,81,218,48,77,111,116,304
Azerbaijan (AZE),5,6,5,15,26,5,0,0,0,0,10,6,5,15,26
Belarus (BLR),5,12,24,39,75,6,6,4,5,15,11,18,28,44,90
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Ukraine (UKR),5,33,27,55,115,6,2,1,4,7,11,35,28,59,122
United States (USA) [P] [Q] [R] [Z],26,976,757,666,2399,22,96,102,84,282,48,1072,859,750,2681
Uzbekistan (UZB),5,5,5,10,20,6,1,0,0,1,11,6,5,10,21
Yugoslavia (YUG) [YUG],16,26,29,28,83,14,0,3,1,4,30,26,32,29,87


## 4) Combined Queries

### a) Double Trouble

Write a query that selects all countries that have won gold in both the Summer and Winter Games.

In [9]:
# Write your code for 4a here
sGold = olympics['sGold'] > 0
wGold = olympics['wGold'] > 0
olympics[sGold & wGold]

Unnamed: 0,#Summer,sGold,sSilver,sBronze,sTotal,#Winter,wGold,wSilver,wBronze,wTotal,#Games,Gold,Silver,Bronze,Total
Australia (AUS) [AUS] [Z],25,139,152,177,468,18,5,3,4,12,43,144,155,181,480
Austria (AUT),26,18,33,35,86,22,59,78,81,218,48,77,111,116,304
Belarus (BLR),5,12,24,39,75,6,6,4,5,15,11,18,28,44,90
Belgium (BEL),25,37,52,53,142,20,1,1,3,5,45,38,53,56,147
Bulgaria (BUL) [H],19,51,85,78,214,19,1,2,3,6,38,52,87,81,220
Canada (CAN),25,59,99,121,279,22,62,56,52,170,47,121,155,173,449
China (CHN) [CHN],9,201,146,126,473,10,12,22,19,53,19,213,168,145,526
Croatia (CRO),6,6,7,10,23,7,4,6,1,11,13,10,13,11,34
Czech Republic (CZE) [CZE],5,14,15,15,44,6,7,9,8,24,11,21,24,23,68
Czechoslovakia (TCH) [TCH],16,49,49,45,143,16,2,8,15,25,32,51,57,60,168


### b) Only Winter

Write a query that selects all countries that have won at least one gold in the Winter, but none in the Summer.

In [10]:
# Write your code for 4b here
wGold = olympics['wGold'] > 0
noSum = olympics['sGold'] == 0
olympics[wGold & noSum]

Unnamed: 0,#Summer,sGold,sSilver,sBronze,sTotal,#Winter,wGold,wSilver,wBronze,wTotal,#Games,Gold,Silver,Bronze,Total
Liechtenstein (LIE),16,0,0,0,0,18,2,2,5,9,34,2,2,5,9


### c) Strangely Specific

Write a query that selects all countries that fulfill the following requirements:

- Has won at least one medal (of any kind) in both Summer and Winter
- Has won a total of at least 30 medals
- Has won at least 10 times more medals in the summer than in the winter

In [11]:
# Write your code for 4c here
req_1 = (olympics['sTotal'] > 0) & (olympics['wTotal'] > 0)
req_2 = olympics['Total'] >= 30
req_3 = olympics['sTotal'] >= olympics['wTotal'] * 10

olympics[req_1 & req_2 & req_3]

Unnamed: 0,#Summer,sGold,sSilver,sBronze,sTotal,#Winter,wGold,wSilver,wBronze,wTotal,#Games,Gold,Silver,Bronze,Total
Australia (AUS) [AUS] [Z],25,139,152,177,468,18,5,3,4,12,43,144,155,181,480
Belgium (BEL),25,37,52,53,142,20,1,1,3,5,45,38,53,56,147
Bulgaria (BUL) [H],19,51,85,78,214,19,1,2,3,6,38,52,87,81,220
Denmark (DEN) [Z],26,43,68,68,179,13,0,1,0,1,39,43,69,68,180
Great Britain (GBR) [GBR] [Z],27,236,272,272,780,22,10,4,12,26,49,246,276,284,806
Hungary (HUN),25,167,144,165,476,22,0,2,4,6,47,167,146,169,482
North Korea (PRK),9,14,12,21,47,8,0,1,1,2,17,14,13,22,49
New Zealand (NZL) [NZL],22,42,18,39,99,15,0,1,0,1,37,42,19,39,100
Poland (POL),20,64,82,125,271,22,6,7,7,20,42,70,89,132,291
Romania (ROU),20,88,94,119,301,20,0,0,1,1,40,88,94,120,302
