# Comprehensive Project Challenge

## Summer Olympic Games, Medal Tables 1896-2012

![Image](\Olympics2.jpg)

### Welcome to the final coding challenge!

This challenge requires you to __apply__ and __combine__ many concepts and methods that you have learned in this course.

It is frequently used in job application processes / assessment centers in Data Science to check the candidates´ abilities to work with, manipulate and aggregate data. Also experienced Data Scientists have difficulties to solve the problem. And the reason for this is not that the challenge is coding-wise extremely complex. But it requires a combination of <br>
- __solid coding skills__, <br><br>
and, even more important <br><br>
   
- the ability to __interpret__ and __understand__ the __underlying data__ and to __incorporate inputs__ from subject matter __experts__ (in this case Sports experts) <br> <br>
__"Thinking in Data Structures!"__ -> requires some practise but also some talent. 

### The Case

It´s your first day in a Data Science advisory firm and your boss asks you to produce the __official Summer Olympic Games Medal Tables for all Editions from 1896 to 2012__. <br><br>
All you can use is a dataset with raw data containing over 31,000 medals (__summer.csv__) and the official Medal Tables for the Editions 1996 and 1976 from Wikipedia. (__wik_1996.csv__, __wik_1976.csv__). Use the two official Medal Tables as a __reference__ to check whether your code produces the correct output! <br><br>
Your goal is to __minimize the divergence__ between your aggregated Medal Tables and the official Medal Tables. Let´s assume that the official number of Gold Medals for the United States in the Edition 1996 is 44 and your code produces 46. This is an absolute divergence of 2. <br> <br>
__Calculate the total absolute divergence for the Editions 1996 and 1976 (the "Score")!__ The __optimal Score is 0__! 

In [3]:
import pandas as pd

In [4]:
summer = pd.read_csv("summer.csv")

In [5]:
summer76 = summer.loc[summer.Year == 1976]

In [6]:
summer76.head()

Unnamed: 0,Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal
13900,1976,Montreal,Aquatics,Diving,"ALEINIK, Vladimir",URS,Men,10M Platform,Bronze
13901,1976,Montreal,Aquatics,Diving,"DIBIASI, Klaus",ITA,Men,10M Platform,Gold
13902,1976,Montreal,Aquatics,Diving,"LOUGANIS, Gregory",USA,Men,10M Platform,Silver
13903,1976,Montreal,Aquatics,Diving,"WILSON, Deborah Keplar",USA,Women,10M Platform,Bronze
13904,1976,Montreal,Aquatics,Diving,"VAYTSEKHOVSKAYA, Elena",URS,Women,10M Platform,Gold


In [7]:
mt76 = summer76.groupby(["Country", "Medal"]).Medal.count().unstack(fill_value = 0)
mt76.head()

Medal,Bronze,Gold,Silver
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
AUS,8,0,16
AUT,1,0,0
BEL,6,0,3
BER,1,0,0
BRA,3,0,0


In [8]:
mt76 = mt76.sort_values(["Gold", "Silver", "Bronze"], ascending = False)[["Gold", "Silver", "Bronze"]]
mt76.head(10)

Medal,Gold,Silver,Bronze
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
URS,113,93,79
GDR,99,51,42
USA,63,56,36
JPN,25,6,10
FRG,21,24,30
POL,18,29,26
NZL,17,1,9
HUN,14,6,35
SWE,9,1,0
BUL,8,13,18


In [147]:
pd.read_csv("wik_1976.csv")

Unnamed: 0,Rank,NOC,Gold,Silver,Bronze,Total
0,1,Soviet Union (URS),49,41,35,125
1,2,East Germany (GDR),40,25,25,90
2,3,United States (USA),34,35,25,94
3,4,West Germany (FRG),10,12,17,39
4,5,Japan (JPN),9,6,10,25
5,6,Poland (POL),7,6,13,26
6,7,Bulgaria (BUL),6,9,7,22
7,8,Cuba (CUB),6,4,3,13
8,9,Romania (ROU),4,9,14,27
9,10,Hungary (HUN),4,5,13,22


 __Far away from the target!!!__

#### Fortunately, you could manage to get some useful information from Sports experts: <br>

Medals awarded in __Team Events__ (one medal for each member of the team) only count as __one Medal__. For example, the Basketball Team of the United States won the Gold Medal in the Edition 2012. In total __12 Basketball Athletes__ from the United States were awarded with a Gold Medal. For the official Medal Table 2012, this only counts as __one Gold Medal__ for the United States!<br> <br>
All Events with __5 or less than 5 medals__ shall be deemed __Singles Events__. All Events with __more than 5 medals__ shall be deemed __Team Events__. It frequently happens that 2 or 3 Athletes share the Bronze medal. Therefore, in total 4 or 5 medals are awarded in these Singles Events. All of these medals count for the official Medal Table! It also happens in Team Events that two Teams share the Bronze medal. Also in this case, in total 4 medals count for the official Medal Table (1 Gold, 1 Silver, 2 Bronze).
<br><br>
To identify all unique Events, the __Event Gender matters__! There are __Men__ Events, __Women__ Events and __Mixed__ Events. Assume that the following medals have been awarded in __Mixed Events__:
- the Event is marked with "__mixed__" or "__pairs__"
- all "__Equestrian__" Events
- all "__Sailing__" Events __before 1988__ (until and including 1984)
- the following medals (index labels) were awarded in __Badminton mixed Double Events__: [21773, 21782, 21776, 21785, 21770, 21779, 23703, 23712, 23706, 23715, 23709, 23700, 25720, 25729, 25723, 25732, 25726, 25717, 27727, 27736, 27730, 27739, 27724, 27733, 29784, 29785, 29786, 29787, 29788, 29789]

In [148]:
import pandas as pd
import numpy as np

In [149]:
summer = pd.read_csv("summer.csv")
wik_1996 = pd.read_csv("wik_1996.csv")
wik_1976 = pd.read_csv("wik_1976.csv")

Inspect the three datasets and align wik_1996 and wik_1976 to the summer dataset! You will need this later when comparing your results with the official Medal Tables! 

In [150]:
summer.head()

Unnamed: 0,Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal
0,1896,Athens,Aquatics,Swimming,"HAJOS, Alfred",HUN,Men,100M Freestyle,Gold
1,1896,Athens,Aquatics,Swimming,"HERSCHMANN, Otto",AUT,Men,100M Freestyle,Silver
2,1896,Athens,Aquatics,Swimming,"DRIVAS, Dimitrios",GRE,Men,100M Freestyle For Sailors,Bronze
3,1896,Athens,Aquatics,Swimming,"MALOKINIS, Ioannis",GRE,Men,100M Freestyle For Sailors,Gold
4,1896,Athens,Aquatics,Swimming,"CHASAPIS, Spiridon",GRE,Men,100M Freestyle For Sailors,Silver


In [151]:
summer.tail()

Unnamed: 0,Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal
31160,2012,London,Wrestling,Wrestling Freestyle,"JANIKOWSKI, Damian",POL,Men,Wg 84 KG,Bronze
31161,2012,London,Wrestling,Wrestling Freestyle,"REZAEI, Ghasem Gholamreza",IRI,Men,Wg 96 KG,Gold
31162,2012,London,Wrestling,Wrestling Freestyle,"TOTROV, Rustam",RUS,Men,Wg 96 KG,Silver
31163,2012,London,Wrestling,Wrestling Freestyle,"ALEKSANYAN, Artur",ARM,Men,Wg 96 KG,Bronze
31164,2012,London,Wrestling,Wrestling Freestyle,"LIDBERG, Jimmy",SWE,Men,Wg 96 KG,Bronze


In [152]:
summer[summer.isna().any(axis = 1)]

Unnamed: 0,Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal
29603,2012,London,Athletics,Athletics,Pending,,Women,1500M,Gold
31072,2012,London,Weightlifting,Weightlifting,Pending,,Women,63KG,Gold
31091,2012,London,Weightlifting,Weightlifting,Pending,,Men,94KG,Silver
31110,2012,London,Wrestling,Wrestling Freestyle,"KUDUKHOV, Besik",,Men,Wf 60 KG,Silver


In [153]:
summer.dropna(inplace=True)

In [154]:
wik_1976.head()

Unnamed: 0,Rank,NOC,Gold,Silver,Bronze,Total
0,1,Soviet Union (URS),49,41,35,125
1,2,East Germany (GDR),40,25,25,90
2,3,United States (USA),34,35,25,94
3,4,West Germany (FRG),10,12,17,39
4,5,Japan (JPN),9,6,10,25


In [155]:
wik_1996.head()

Unnamed: 0,Rank,Nation,Gold,Silver,Bronze,Total
0,1,United States (USA)*,44,32,25,101
1,2,Russia (RUS),26,21,16,63
2,3,Germany (GER),20,18,27,65
3,4,China (CHN),16,22,12,50
4,5,France (FRA),15,7,15,37


In [156]:
wik_1976.NOC.str.split("(", expand= True).iloc[:, 1].str.replace(")", "").str.replace("*", "")

0         URS
1         GDR
2         USA
3         FRG
4         JPN
5         POL
6         BUL
7         CUB
8         ROU
9         HUN
10        FIN
11        SWE
12        GBR
13        ITA
14        FRA
15        YUG
16        TCH
17        NZL
18        KOR
19        SUI
20        JAM
21        PRK
22        NOR
23        DEN
24        MEX
25        TRI
26        CAN
27        BEL
28        NED
29        POR
30        ESP
31        AUS
32        IRI
33        MGL
34        VEN
35        BRA
36        AUT
37        BER
38        PAK
39        PUR
40        THA
41    41 NOCs
Name: 1, dtype: object

In [157]:
wik_1976["Country"] = wik_1976.NOC.str.split("(", expand= True).iloc[:, 1].str.replace(")", "").str.replace("*", "")

In [158]:
wik_1976 = wik_1976.drop(columns =["Rank", "NOC", "Total"]).set_index("Country")

In [159]:
wik_1976.head()

Unnamed: 0_level_0,Gold,Silver,Bronze
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
URS,49,41,35
GDR,40,25,25
USA,34,35,25
FRG,10,12,17
JPN,9,6,10


In [160]:
wik_1996["Country"] = wik_1996.Nation.str.split("(", expand= True).iloc[:, 1].str.replace(")", "").str.replace("*", "")

In [161]:
wik_1996 = wik_1996.drop(columns =["Rank", "Nation", "Total"]).set_index("Country")

In [162]:
wik_1996.head()

Unnamed: 0_level_0,Gold,Silver,Bronze
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
USA,44,32,25
RUS,26,21,16
GER,20,18,27
CHN,16,22,12
FRA,15,7,15


### Step 2: Creating the Column Event_Gender

In a first step, we need to determine for each row / medal, whether the medal was awarded in a Men, Women or Mixed Event. <br>
The default assumption: The values in the new column Event_Gender are the same as in the column Gender (this is the Gender of the respective Athlete). Then, we need to identify Mixed Gender Events (inputs from the experts!).

In [163]:
summer["Event_Gender"] = summer.Gender

In [164]:
summer.head()

Unnamed: 0,Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal,Event_Gender
0,1896,Athens,Aquatics,Swimming,"HAJOS, Alfred",HUN,Men,100M Freestyle,Gold,Men
1,1896,Athens,Aquatics,Swimming,"HERSCHMANN, Otto",AUT,Men,100M Freestyle,Silver,Men
2,1896,Athens,Aquatics,Swimming,"DRIVAS, Dimitrios",GRE,Men,100M Freestyle For Sailors,Bronze,Men
3,1896,Athens,Aquatics,Swimming,"MALOKINIS, Ioannis",GRE,Men,100M Freestyle For Sailors,Gold,Men
4,1896,Athens,Aquatics,Swimming,"CHASAPIS, Spiridon",GRE,Men,100M Freestyle For Sailors,Silver,Men


1. The Event Column contains the string "mixed"

In [165]:
summer.Event.str.lower().str.contains("mixed").sum()

38

2. The Event Column contains the string "pairs"

In [166]:
summer.Event.str.lower().str.contains("pairs").sum()

12

3. All "Equestrian" Events have been mixed Events.

In [167]:
summer.Sport.str.lower().str.contains("equestrian").sum()

939

4. All "Sailing" Events before 1988 have been mixed Events.

In [168]:
((summer.Sport.str.lower().str.contains("sailing")) & (summer.Year < 1988)).sum()

755

In [169]:
mask1 = summer.Event.str.lower().str.contains("mixed")
mask2 = summer.Event.str.lower().str.contains("pairs")
mask3 = summer.Sport.str.lower().str.contains("equestrian")
mask4 = ((summer.Sport.str.lower().str.contains("sailing")) & (summer.Year < 1988))

In [170]:
summer.loc[mask1 | mask2 | mask3| mask4, "Event_Gender"] = "X"

In [171]:
summer.Event_Gender.value_counts()

Men      21227
Women     8190
X         1744
Name: Event_Gender, dtype: int64

5. The following medals (index labels) were awarded in Badminton mixed Double Events:

In [172]:
badm_mixed = [21773, 21782, 21776,21785, 21770, 21779,23703,23712,23706, 23715,23709,23700,25720,25729,25723,25732,25726,
              25717,27727,27736, 27730,27739,27724,27733, 29784, 29785,29786,29787,29788,29789]

In [173]:
summer.loc[badm_mixed, "Event_Gender"] = "X"

In [174]:
summer.Event_Gender.value_counts()

Men      21212
Women     8175
X         1774
Name: Event_Gender, dtype: int64

### Step 3: Identify all unique Events and count the amount of medals in each Event (new column Event_Medals)

Hint: The Columns "Year", "Sport", "Discipline", "Event", "Event_Gender" are relevant to group the summer DataFrame into unique events.

In [175]:
summer["Event_Medals"] = summer.groupby(["Year", "Sport", "Discipline", "Event", "Event_Gender"]).Medal.transform("count")

In [176]:
summer.head()

Unnamed: 0,Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal,Event_Gender,Event_Medals
0,1896,Athens,Aquatics,Swimming,"HAJOS, Alfred",HUN,Men,100M Freestyle,Gold,Men,2
1,1896,Athens,Aquatics,Swimming,"HERSCHMANN, Otto",AUT,Men,100M Freestyle,Silver,Men,2
2,1896,Athens,Aquatics,Swimming,"DRIVAS, Dimitrios",GRE,Men,100M Freestyle For Sailors,Bronze,Men,3
3,1896,Athens,Aquatics,Swimming,"MALOKINIS, Ioannis",GRE,Men,100M Freestyle For Sailors,Gold,Men,3
4,1896,Athens,Aquatics,Swimming,"CHASAPIS, Spiridon",GRE,Men,100M Freestyle For Sailors,Silver,Men,3


In [177]:
summer.Event_Medals.value_counts().sort_index()

1         5
2        54
3      9228
4      1684
5        50
6      1968
7        14
8       152
9      1107
10      120
11      143
12     3000
13      130
14      238
15     1245
16      272
17      374
18     1224
19      152
20      240
21      105
22       66
24      384
25       25
26       52
27     1080
28       56
29       58
30       60
31       31
       ... 
39      429
40      160
41      123
42      462
43       86
44      132
45      495
46       92
47      141
48      912
49       49
50      250
51      204
53       53
54      324
56      112
57       57
59       59
60      120
61      122
66      132
67       67
70       70
71       71
72      144
73       73
74       74
76       76
82       82
116     116
Name: Event_Medals, Length: 66, dtype: int64

In [178]:
summer.loc[summer.Event_Medals == 5]

Unnamed: 0,Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal,Event_Gender,Event_Medals
1273,1908,London,Athletics,Athletics,"ARCHIBALD, Edward Blake",CAN,Men,Pole Vault,Bronze,Men,5
1274,1908,London,Athletics,Athletics,"JACOBS, Charles Sherman",USA,Men,Pole Vault,Bronze,Men,5
1275,1908,London,Athletics,Athletics,"SÖDERSTRÖM, Bruno",SWE,Men,Pole Vault,Bronze,Men,5
1276,1908,London,Athletics,Athletics,"COOKE, Edward Tiffin",USA,Men,Pole Vault,Gold,Men,5
1277,1908,London,Athletics,Athletics,"GILBERT, Alfred Carleten",USA,Men,Pole Vault,Gold,Men,5
7770,1948,London,Gymnastics,Artistic G.,"MOGYOROSI-KLENCS, Janos",HUN,Men,Vault,Bronze,Men,5
7771,1948,London,Gymnastics,Artistic G.,"PATAKI, Ferenc",HUN,Men,Vault,Bronze,Men,5
7772,1948,London,Gymnastics,Artistic G.,"SOTORNIK, Leo",TCH,Men,Vault,Bronze,Men,5
7773,1948,London,Gymnastics,Artistic G.,"AALTONEN, Paavo Johannes",FIN,Men,Vault,Gold,Men,5
7774,1948,London,Gymnastics,Artistic G.,"ROVE, Olavi Antero",FIN,Men,Vault,Silver,Men,5


### Step 4: Identifying Team Events

All medals / rows, that were awarded in Events with more than 5 medals, shall be deemed Team Event Medals. (new column "Team")

In [179]:
summer["Team"] = pd.Series(np.where(summer.Event_Medals > 5, "Yes", "No"))

In [180]:
summer.head()

Unnamed: 0,Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal,Event_Gender,Event_Medals,Team
0,1896,Athens,Aquatics,Swimming,"HAJOS, Alfred",HUN,Men,100M Freestyle,Gold,Men,2,No
1,1896,Athens,Aquatics,Swimming,"HERSCHMANN, Otto",AUT,Men,100M Freestyle,Silver,Men,2,No
2,1896,Athens,Aquatics,Swimming,"DRIVAS, Dimitrios",GRE,Men,100M Freestyle For Sailors,Bronze,Men,3,No
3,1896,Athens,Aquatics,Swimming,"MALOKINIS, Ioannis",GRE,Men,100M Freestyle For Sailors,Gold,Men,3,No
4,1896,Athens,Aquatics,Swimming,"CHASAPIS, Spiridon",GRE,Men,100M Freestyle For Sailors,Silver,Men,3,No


In [181]:
summer.Team.value_counts()

Yes    20140
No     11017
Name: Team, dtype: int64

### Step 5: Removing Duplicated Medals in Team Events 

The subset for determining Duplicates shall be formed by the Columns "Year", "Sport", "Discipline", "Country", "Event", "Event_Gender", "Medal". Keep one Medal!

In [182]:
summer.reset_index(inplace=True)

In [183]:
summer.head()

Unnamed: 0,index,Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal,Event_Gender,Event_Medals,Team
0,0,1896,Athens,Aquatics,Swimming,"HAJOS, Alfred",HUN,Men,100M Freestyle,Gold,Men,2,No
1,1,1896,Athens,Aquatics,Swimming,"HERSCHMANN, Otto",AUT,Men,100M Freestyle,Silver,Men,2,No
2,2,1896,Athens,Aquatics,Swimming,"DRIVAS, Dimitrios",GRE,Men,100M Freestyle For Sailors,Bronze,Men,3,No
3,3,1896,Athens,Aquatics,Swimming,"MALOKINIS, Ioannis",GRE,Men,100M Freestyle For Sailors,Gold,Men,3,No
4,4,1896,Athens,Aquatics,Swimming,"CHASAPIS, Spiridon",GRE,Men,100M Freestyle For Sailors,Silver,Men,3,No


In [184]:
singles = summer.loc[summer.Team == "No"].copy()
singles.shape

(11017, 13)

In [185]:
team = summer.loc[summer.Team == "Yes"].copy()
team.shape

(20140, 13)

In [186]:
team.drop_duplicates(subset = ["Year", "Sport", "Discipline", "Country", "Event", "Event_Gender", "Medal"], inplace = True)

In [187]:
team.shape

(3701, 13)

In [188]:
pd.concat([singles, team]).shape

(14718, 13)

In [189]:
summer_new = pd.concat([singles, team])

In [190]:
summer_new.set_index("index", inplace= True)

In [191]:
summer_new.head()

Unnamed: 0_level_0,Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal,Event_Gender,Event_Medals,Team
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
0,1896,Athens,Aquatics,Swimming,"HAJOS, Alfred",HUN,Men,100M Freestyle,Gold,Men,2,No
1,1896,Athens,Aquatics,Swimming,"HERSCHMANN, Otto",AUT,Men,100M Freestyle,Silver,Men,2,No
2,1896,Athens,Aquatics,Swimming,"DRIVAS, Dimitrios",GRE,Men,100M Freestyle For Sailors,Bronze,Men,3,No
3,1896,Athens,Aquatics,Swimming,"MALOKINIS, Ioannis",GRE,Men,100M Freestyle For Sailors,Gold,Men,3,No
4,1896,Athens,Aquatics,Swimming,"CHASAPIS, Spiridon",GRE,Men,100M Freestyle For Sailors,Silver,Men,3,No


In [192]:
summer_new.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 14718 entries, 0 to 31036
Data columns (total 12 columns):
Year            14718 non-null int64
City            14718 non-null object
Sport           14718 non-null object
Discipline      14718 non-null object
Athlete         14718 non-null object
Country         14718 non-null object
Gender          14718 non-null object
Event           14718 non-null object
Medal           14718 non-null object
Event_Gender    14718 non-null object
Event_Medals    14718 non-null int64
Team            14718 non-null object
dtypes: int64(2), object(10)
memory usage: 1.5+ MB


### Step 6: Creating the official Medal Table for all Editions

In [193]:
medal_tables = summer_new.groupby(["Year", "Country", "Medal"]).Medal.count().unstack(fill_value = 0)[["Gold", "Silver", "Bronze"]]

In [194]:
medal_tables

Unnamed: 0_level_0,Medal,Gold,Silver,Bronze
Year,Country,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1896,AUS,2,0,0
1896,AUT,2,1,2
1896,DEN,1,2,3
1896,FRA,5,4,2
1896,GBR,2,3,2
1896,GER,6,5,2
1896,GRE,10,17,19
1896,HUN,2,1,3
1896,SUI,1,2,0
1896,USA,11,7,2


### Step 7: Comparison with Wikipedia Medal Tables

In [195]:
medal_tables.head()

Unnamed: 0_level_0,Medal,Gold,Silver,Bronze
Year,Country,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1896,AUS,2,0,0
1896,AUT,2,1,2
1896,DEN,1,2,3
1896,FRA,5,4,2
1896,GBR,2,3,2


In [196]:
agg_1976 = medal_tables.loc[1976].sort_values(["Gold", "Silver", "Bronze"], ascending = False).copy()

In [197]:
agg_1976.head()

Medal,Gold,Silver,Bronze
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
URS,49,41,35
GDR,40,25,25
USA,34,35,25
FRG,10,12,17
JPN,9,6,10


In [198]:
wik_1976.head()

Unnamed: 0_level_0,Gold,Silver,Bronze
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
URS,49,41,35
GDR,40,25,25
USA,34,35,25
FRG,10,12,17
JPN,9,6,10


In [217]:
div_76 = agg_1976.sub(wik_1976).abs().dropna()

In [218]:
div_76

Medal,Gold,Silver,Bronze
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
AUS,0.0,0.0,0.0
AUT,0.0,0.0,0.0
BEL,0.0,0.0,0.0
BER,0.0,0.0,0.0
BRA,0.0,0.0,0.0
BUL,0.0,0.0,0.0
CAN,0.0,0.0,0.0
CUB,0.0,0.0,0.0
DEN,0.0,0.0,0.0
ESP,0.0,0.0,0.0


In [201]:
score_76 = div_76.sum().sum()
score_76

0.0

In [202]:
agg_1996 = medal_tables.loc[1996].sort_values(["Gold", "Silver", "Bronze"], ascending = False).copy()

In [203]:
agg_1996.head()

Medal,Gold,Silver,Bronze
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
USA,44,32,25
RUS,26,21,16
GER,20,18,27
CHN,16,22,12
FRA,15,7,15


In [204]:
wik_1996.head()

Unnamed: 0_level_0,Gold,Silver,Bronze
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
USA,44,32,25
RUS,26,21,16
GER,20,18,27
CHN,16,22,12
FRA,15,7,15


In [205]:
div_96 = agg_1996.sub(wik_1996).abs().dropna()

In [206]:
div_96

Medal,Gold,Silver,Bronze
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ALG,0.0,0.0,0.0
ARG,0.0,0.0,0.0
ARM,0.0,0.0,0.0
AUS,0.0,0.0,0.0
AUT,0.0,0.0,0.0
AZE,0.0,0.0,0.0
BAH,0.0,0.0,0.0
BDI,0.0,0.0,0.0
BEL,0.0,0.0,0.0
BLR,0.0,0.0,0.0


In [207]:
score_96 = div_96.sum().sum()
score_96

0.0

In [208]:
print(score_76, score_96)

0.0 0.0
