# The Case

It´s your first day in a Data Science advisory firm and your boss asks you to produce the official Summer Olympic Games Medal Tables for all Editions from 1896 to 2012.

All you can use is a dataset with raw data containing over 31,000 medals (summer.csv) and the official Medal Tables for the Editions 1996 and 1976 from Wikipedia. (wik_1996.csv, wik_1976.csv). Use the two official Medal Tables as a reference to check whether your code produces the correct output!

Your goal is to minimize the divergence between your aggregated Medal Tables and the official Medal Tables. Let´s assume that the official number of Gold Medals for the United States in the Edition 1996 is 44 and your code produces 46. This is an absolute divergence of 2.

Calculate the total absolute divergence for the Editions 1996 and 1976 (the "Score")! The optimal Score is 0!

In [13]:
import pandas as pd
summer = pd.read_csv("summer.csv")
summer.head(10)

Unnamed: 0,Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal
0,1896,Athens,Aquatics,Swimming,"HAJOS, Alfred",HUN,Men,100M Freestyle,Gold
1,1896,Athens,Aquatics,Swimming,"HERSCHMANN, Otto",AUT,Men,100M Freestyle,Silver
2,1896,Athens,Aquatics,Swimming,"DRIVAS, Dimitrios",GRE,Men,100M Freestyle For Sailors,Bronze
3,1896,Athens,Aquatics,Swimming,"MALOKINIS, Ioannis",GRE,Men,100M Freestyle For Sailors,Gold
4,1896,Athens,Aquatics,Swimming,"CHASAPIS, Spiridon",GRE,Men,100M Freestyle For Sailors,Silver
5,1896,Athens,Aquatics,Swimming,"CHOROPHAS, Efstathios",GRE,Men,1200M Freestyle,Bronze
6,1896,Athens,Aquatics,Swimming,"HAJOS, Alfred",HUN,Men,1200M Freestyle,Gold
7,1896,Athens,Aquatics,Swimming,"ANDREOU, Joannis",GRE,Men,1200M Freestyle,Silver
8,1896,Athens,Aquatics,Swimming,"CHOROPHAS, Efstathios",GRE,Men,400M Freestyle,Bronze
9,1896,Athens,Aquatics,Swimming,"NEUMANN, Paul",AUT,Men,400M Freestyle,Gold


In [14]:
summer_1976 = summer[summer.Year == 1976]
wik_1976 = pd.read_csv("wik_1976.csv")
wik_1976.head()

Unnamed: 0,Rank,NOC,Gold,Silver,Bronze,Total
0,1,Soviet Union (URS),49,41,35,125
1,2,East Germany (GDR),40,25,25,90
2,3,United States (USA),34,35,25,94
3,4,West Germany (FRG),10,12,17,39
4,5,Japan (JPN),9,6,10,25


In [15]:
summer_1996 = summer[summer.Year == 1996]
wik_1996 = pd.read_csv("wik_1996.csv")
wik_1996.head()

Unnamed: 0,Rank,Nation,Gold,Silver,Bronze,Total
0,1,United States (USA)*,44,32,25,101
1,2,Russia (RUS),26,21,16,63
2,3,Germany (GER),20,18,27,65
3,4,China (CHN),16,22,12,50
4,5,France (FRA),15,7,15,37


In [16]:
summer.Year.unique()

array([1896, 1900, 1904, 1908, 1912, 1920, 1924, 1928, 1932, 1936, 1948,
       1952, 1956, 1960, 1964, 1968, 1972, 1976, 1980, 1984, 1988, 1992,
       1996, 2000, 2004, 2008, 2012], dtype=int64)

#### Info from Google re: missing games:
"The 1916 Summer Olympics were cancelled due to the onset of WWI; both Summer Olympics of 1940 and 1944 were cancelled due to WWII. Some summer events were held by the IOC in celebration of its Jubilee in Lausanne, despite the war that cancelled the 1944 Summer Olympics, at the Jubilee Celebrations of the IOC."

In [17]:
summer.City.unique()

array(['Athens', 'Paris', 'St Louis', 'London', 'Stockholm', 'Antwerp',
       'Amsterdam', 'Los Angeles', 'Berlin', 'Helsinki',
       'Melbourne / Stockholm', 'Rome', 'Tokyo', 'Mexico', 'Munich',
       'Montreal', 'Moscow', 'Seoul', 'Barcelona', 'Atlanta', 'Sydney',
       'Beijing'], dtype=object)

#### Info from Google re: Melbourne / Stockholm
"The 1956 Summer Olympics (officially known as the Games of the XVI Olympiad) were an international multi-sport event held in Melbourne, Victoria, Australia, from 22 November to 8 December 1956, with the exception of the equestrian events, which were held in Stockholm, Sweden, in June 1956."

In [19]:
sports = summer.Sport.unique()

In [22]:
sports.sort()
sports

array(['Aquatics', 'Archery', 'Athletics', 'Badminton', 'Baseball',
       'Basketball', 'Basque Pelota', 'Boxing', 'Canoe', 'Canoe / Kayak',
       'Cricket', 'Croquet', 'Cycling', 'Equestrian', 'Fencing',
       'Football', 'Golf', 'Gymnastics', 'Handball', 'Hockey',
       'Ice Hockey', 'Jeu de paume', 'Judo', 'Lacrosse',
       'Modern Pentathlon', 'Polo', 'Rackets', 'Roque', 'Rowing', 'Rugby',
       'Sailing', 'Shooting', 'Skating', 'Softball', 'Table Tennis',
       'Taekwondo', 'Tennis', 'Triathlon', 'Tug of War', 'Volleyball',
       'Water Motorsports', 'Weightlifting', 'Wrestling'], dtype=object)

In [26]:
summer[summer.Sport.str.contains("Canoe")].groupby(by = ["Sport", "Discipline"]).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Year,City,Athlete,Country,Gender,Event,Medal
Sport,Discipline,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Canoe,Canoe Slalom,15,15,15,15,15,15,15
Canoe,Canoe Sprint,66,66,66,66,66,66,66
Canoe / Kayak,Canoe / Kayak F,912,912,912,912,912,912,912
Canoe / Kayak,Canoe / Kayak S,90,90,90,90,90,90,90


In [28]:
disciplines = summer.Discipline.unique()
disciplines.sort()
disciplines

array(['Archery', 'Artistic G.', 'Athletics', 'BMX', 'Badminton',
       'Baseball', 'Basketball', 'Basque Pelota', 'Beach Volleyball',
       'Beach volley.', 'Boxing', 'Canoe / Kayak F', 'Canoe / Kayak S',
       'Canoe Slalom', 'Canoe Sprint', 'Cricket', 'Croquet',
       'Cycling BMX', 'Cycling Road', 'Cycling Track', 'Diving',
       'Dressage', 'Eventing', 'Fencing', 'Figure skating', 'Football',
       'Golf', 'Gymnastics Artistic', 'Gymnastics Rhythmic', 'Handball',
       'Hockey', 'Ice Hockey', 'Jeu de Paume', 'Judo', 'Jumping',
       'Lacrosse', 'Marathon swimming', 'Modern Pentath.',
       'Modern Pentathlon', 'Mountain Bike', 'Polo', 'Rackets',
       'Rhythmic G.', 'Roque', 'Rowing', 'Rugby', 'Sailing', 'Shooting',
       'Softball', 'Swimming', 'Synchronized S.', 'Synchronized Swimming',
       'Table Tennis', 'Taekwondo', 'Tennis', 'Trampoline', 'Triathlon',
       'Tug of War', 'Vaulting', 'Volleyball', 'Water Motorspor',
       'Water Polo', 'Water polo', 'Weightlif

In [30]:
# Beach Volleyball // Beach volley.
# Gymnastics Artistic // Artistic G.
# Modern Pentathlon // Modern Pentath.
# Gymnastics Rhythmic // Rhythmic G. 
# Synchronized Swimming // Synchronized S. 
# Water Polo // Water polo
# Wresting Freestyle // Wrestling Free. 

summer.replace(to_replace = ["Beach volley.", "Artistic G.", "Modern Pentath.", "Rhythmic G.", "Synchronized S.", "Water polo", "Wrestling Free."], 
               value = ["Beach Volleyball", "Gymnastics Artistic", "Modern Pentathlon", "Gymnastics Rhythmic", "Synchronized Swimming", "Water Polo", "Wresting Freestyle"], inplace = True)

In [31]:
disciplines = summer.Discipline.unique()
disciplines.sort()
disciplines

array(['Archery', 'Athletics', 'BMX', 'Badminton', 'Baseball',
       'Basketball', 'Basque Pelota', 'Beach Volleyball', 'Boxing',
       'Canoe / Kayak F', 'Canoe / Kayak S', 'Canoe Slalom',
       'Canoe Sprint', 'Cricket', 'Croquet', 'Cycling BMX',
       'Cycling Road', 'Cycling Track', 'Diving', 'Dressage', 'Eventing',
       'Fencing', 'Figure skating', 'Football', 'Golf',
       'Gymnastics Artistic', 'Gymnastics Rhythmic', 'Handball', 'Hockey',
       'Ice Hockey', 'Jeu de Paume', 'Judo', 'Jumping', 'Lacrosse',
       'Marathon swimming', 'Modern Pentathlon', 'Mountain Bike', 'Polo',
       'Rackets', 'Roque', 'Rowing', 'Rugby', 'Sailing', 'Shooting',
       'Softball', 'Swimming', 'Synchronized Swimming', 'Table Tennis',
       'Taekwondo', 'Tennis', 'Trampoline', 'Triathlon', 'Tug of War',
       'Vaulting', 'Volleyball', 'Water Motorspor', 'Water Polo',
       'Weightlifting', 'Wresting Freestyle', 'Wrestling Freestyle',
       'Wrestling Gre-R'], dtype=object)

In [43]:
summer.Country.unique()

array(['HUN', 'AUT', 'GRE', 'USA', 'GER', 'GBR', 'FRA', 'AUS', 'DEN',
       'SUI', 'ZZX', 'NED', 'BEL', 'IND', 'CAN', 'BOH', 'SWE', 'NOR',
       'ESP', 'ITA', 'CUB', 'ANZ', 'RSA', 'FIN', 'RU1', 'EST', 'TCH',
       'NZL', 'BRA', 'JPN', 'LUX', 'ARG', 'POL', 'POR', 'URU', 'YUG',
       'ROU', 'HAI', 'EGY', 'PHI', 'IRL', 'CHI', 'LAT', 'MEX', 'TUR',
       'PAN', 'JAM', 'SRI', 'KOR', 'PUR', 'PER', 'IRI', 'TRI', 'URS',
       'VEN', 'BUL', 'LIB', 'EUA', 'ISL', 'PAK', 'BAH', 'BWI', 'TPE',
       'ETH', 'MAR', 'GHA', 'IRQ', 'SIN', 'TUN', 'KEN', 'NGR', 'GDR',
       'FRG', 'UGA', 'CMR', 'MGL', 'PRK', 'COL', 'NIG', 'THA', 'BER',
       'TAN', 'GUY', 'ZIM', 'CHN', 'CIV', 'ZAM', 'DOM', 'ALG', 'SYR',
       'SUR', 'CRC', 'INA', 'SEN', 'DJI', 'AHO', 'ISV', 'EUN', 'NAM',
       'QAT', 'LTU', 'MAS', 'CRO', 'ISR', 'SLO', 'IOP', 'RUS', 'UKR',
       'ECU', 'BDI', 'MOZ', 'CZE', 'BLR', 'TGA', 'KAZ', 'UZB', 'SVK',
       'MDA', 'GEO', 'HKG', 'ARM', 'AZE', 'BAR', 'KSA', 'KGZ', 'KUW',
       'VIE', 'MKD',

In [None]:
# nan
# RU1

In [37]:
summer[summer.Country.isnull()]

Unnamed: 0,Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal
29603,2012,London,Athletics,Athletics,Pending,,Women,1500M,Gold
31072,2012,London,Weightlifting,Weightlifting,Pending,,Women,63KG,Gold
31091,2012,London,Weightlifting,Weightlifting,Pending,,Men,94KG,Silver
31110,2012,London,Wrestling,Wrestling Freestyle,"KUDUKHOV, Besik",,Men,Wf 60 KG,Silver


#### Info from Google re: Besik Kudukhov
"Representing  Russia:
Olympic Games:
Bronze medal – third place 2008 Beijing 55 kg;
Silver medal – second place	2012 London	60 kg"

In [50]:
summer.loc[31110, "Country"] = "RUS"

In [51]:
summer[summer.Country.isnull()]

Unnamed: 0,Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal
29603,2012,London,Athletics,Athletics,Pending,,Women,1500M,Gold
31072,2012,London,Weightlifting,Weightlifting,Pending,,Women,63KG,Gold
31091,2012,London,Weightlifting,Weightlifting,Pending,,Men,94KG,Silver


#### Info from Google re: doping at 2012 Olympics
"Three athletes have been disqualified from the London 2012 Olympics for doping, the International Olympic Committee (IOC) has confirmed. Weightlifting bronze medallist Valentin Hristov tested positive for the anabolic steroid oral turinabol."

#### Info from Wikipedia on actual winners for above 3 catagories
"Mens 94kg: Gold - Saeid Mohammadpour IRN; Silver - Kim Min-jae KOR; Bronze - Tomasz Zieliński POL;
Womens 63kg: Gold - Christine Girard CAN; Silver - Milka Maneva BGR; Bronze - Luz Acosta MEX;
Womens 1500m: Gold - Maryam Yusuf Jamal BHR; Silver - Tatyana Tomashova RUS; Bronze - Abeba Aregawi ETH;"

In [71]:
summer.loc[31090:31092, ["Athlete", "Country"]]

Unnamed: 0,Athlete,Country
31090,"MOHAMMADPOURKARKARAGH, Saeid",IRI
31091,Pending,
31092,"KIM, Minjae",KOR


In [72]:
summer.loc[29603, "Athlete"] = "Maryam Yusuf Jamal"
summer.loc[29604, "Athlete"] = "Tatyana Tomashova"
summer.loc[29605, "Athlete"] = "Abeba Aregawi"
summer.loc[29603, "Country"] = "BHR"
summer.loc[29604, "Country"] = "RUS"
summer.loc[29605, "Country"] = "ETH"

In [73]:
summer.loc[31072, "Athlete"] = "Christine Girard"
summer.loc[31073, "Athlete"] = "Milka Maneva"
summer.loc[31074, "Athlete"] = "Luz Acosta"
summer.loc[31072, "Country"] = "CAN"
summer.loc[31073, "Country"] = "BGR"
summer.loc[31074, "Country"] = "MEX"

In [74]:
summer.loc[31090, "Athlete"] = "Saeid Mohammadpour"
summer.loc[31091, "Athlete"] = "Kim Min-jae"
summer.loc[31092, "Athlete"] = "Tomasz Zieliński"
summer.loc[31090, "Country"] = "IRN"
summer.loc[31091, "Country"] = "KOR"
summer.loc[31092, "Country"] = "POL"

In [75]:
summer[summer.Country.isnull()]

Unnamed: 0,Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal


In [76]:
summer[summer.Country == "RU1"]

Unnamed: 0,Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal
1852,1908,London,Skating,Figure skating,"PANIN, Nikolay",RU1,Men,Special Figures,Gold
1927,1908,London,Wrestling,Wrestling Gre-R,"ORLOFF, Nikolaï",RU1,Men,- 66.6KG (Lightweight),Silver
1930,1908,London,Wrestling,Wrestling Gre-R,"PETROFF, Aleksander",RU1,Men,+ 93KG (Super Heavyweight),Silver
2535,1912,Stockholm,Rowing,Rowing,"KUSIK, Mikhaïl Maksimilian",RU1,Men,Single Sculls (1X),Bronze
2538,1912,Stockholm,Sailing,Sailing,"Beloselsky-Belozersky, Esper Konstantinovich",RU1,Men,10M,Bronze
2539,1912,Stockholm,Sailing,Sailing,"BRASCHE, Ernest",RU1,Men,10M,Bronze
2540,1912,Stockholm,Sailing,Sailing,"LINDBLOM, Karl",RU1,Men,10M,Bronze
2541,1912,Stockholm,Sailing,Sailing,"PUSCHNITSKY, Nikolaï",RU1,Men,10M,Bronze
2542,1912,Stockholm,Sailing,Sailing,"RODIONOV, Aleksandr",RU1,Men,10M,Bronze
2543,1912,Stockholm,Sailing,Sailing,"SCHOMAKER, Iossif",RU1,Men,10M,Bronze


#### From Wikipedia
"Nikolai Panin - Russian Empire"
"Martin Klein - Russian Empire"
"Following the 1917 Revolution, four socialist republics were established on the territory of the former empire: the Russian and Transcaucasian Soviet Federated Socialist Republics and the Ukrainian and Belorussian Soviet Socialist Republics. On December 30, 1922, these constituent republics established the U.S.S.R."

In [77]:
summer[summer.Year == 1920].Country.unique()

array(['USA', 'SWE', 'DEN', 'GBR', 'BEL', 'AUS', 'CAN', 'FIN', 'FRA',
       'NED', 'ITA', 'RSA', 'NOR', 'EST', 'ESP', 'TCH', 'SUI', 'NZL',
       'BRA', 'GRE', 'JPN', 'LUX'], dtype=object)

#### Summary of Russias National Medal History
--> 1912: Medals recorded as RU1 (Russian Empire)<br>
1916: No Olympics<br>
1920: No Russian medals (From Google: "The 1920 Olympics were awarded to Antwerp in hopes of bringing a spirit of renewal to Belgium, which had been devastated during World War I. The defeated countries—Germany, Austria, Hungary, Bulgaria, and Turkey—were not invited. The new Soviet Union chose not to attend."<br>
--> 1988: Medals recorded as URS (Soviet Union)<br>
1992 -->: Medals recorded as RUS (Russian Federation)

In [78]:
summer.Gender.unique()

array(['Men', 'Women'], dtype=object)

In [79]:
summer.Event.unique()

array(['100M Freestyle', '100M Freestyle For Sailors', '1200M Freestyle',
       '400M Freestyle', '100M', '110M Hurdles', '1500M', '400M', '800M',
       'Discus Throw', 'High Jump', 'Long Jump', 'Marathon', 'Pole Vault',
       'Shot Put', 'Triple Jump', 'Individual Road Race', '100KM', '10KM',
       '12-Hour Race', '1KM Time Trial', 'Sprint Indivual',
       'Foil Individual', 'Foil, Masters', 'Sabre Individual',
       'Horizontal Bar', 'Parallel Bars', 'Pommel Horse', 'Rings',
       'Rope Climbing', 'Team, Horizontal Bar', 'Team, Parallel Bars',
       'Vault', '25M Army Pistol', '25M Rapid Fire Pistol (60 Shots)',
       '50M Pistol (60 Shots)', 'Army Rifle, 200M', 'Army Rifle, 300M',
       'Doubles', 'Singles', 'Heavyweight - One Hand Lift',
       'Heavyweight - Two Hand Lift', 'Open Event', '1500M Freestyle',
       '200M Backstroke', '200M Freestyle', '200M Obstacle Event',
       '200M Team Swimming', '4000M Freestyle', 'Underwater Swimming',
       'Water Polo', 'Au Chap

In [80]:
summer.Medal.unique()

array(['Gold', 'Silver', 'Bronze'], dtype=object)