In [479]:
import pandas as pd 
import numpy as np
#import matplotlib
#import matplotlib.pyplot as plt
import plotly.graph_objs as go
from plotly.offline import init_notebook_mode, iplot 
#%matplotlib inline 
#import seaborn as sns


In [480]:
from IPython.display import HTML

HTML('''<script>
code_show = true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the raw code."></form>''')

## Assumptions and acknowledgements 

#### What constitutes 'Europe'?
For this analysis it is important to establish the definition of Europe. On one hand, one could think of 'Europe' as comprised mostly of current EU countries with certain extensions; ie. Switzerland, Norway, Balkans, Ukraine etc. In other words, countries whose entire territory lies within the European continent. However, one could also think of Europe in the wider geographical sense; whereby Russia, Turkey, Georgia, Azerbaijan and Kazakhstan should also be included as parts of their territories lie on the European Continent. I will limit this analysis to the 'narrower' definition of Europe and only focus on territories which fully lie within the continent. 

#### Data inconsistencies in handling Yugoslavian and Czechoslovakian data
This issue is explored in more depth in the Appendix. Here I would just like to acknowledge the following:
- Czechoslovakia used to be one country (modern day Czech Republic and Slovakia). From the kaggle dataset it looks like medals won by Czechoslovakia (NOC code: TCH) are all attributed to the Czech Republic and none to Slovakia. 
- Similarly, following the gradual breakup of Yugoslavia into separate countries (Serbia, Slovenia, Croatia, Bosnia and Herzegovina, Montenegro, Kosovo), the kaggle dataset attributes all medals won to Serbia and none to the other countries, even though athletes may have ethnically belonged to another of the Yugoslavian nations. 

This data 'issues' will have to be kept in mind when interpreting data all the countries involved. 

## Roadmap 
In this notebook I will delve depeer into the following topics:
- European Olympic events frequency and location
- European teams: size/ summer vs winter participation/ gender composition
- 'Best' European Countreis and Sports they excel in 
- Relationship between European athlete's body metrics and Olympic success 

In [482]:
# Load world data with added host continent, host country, athlete continent and athlete sub-continent 
# for detail refer to Appendix 
world = pd.read_csv('all_data/extended_data.csv')
world.drop('Unnamed: 0',axis=1,inplace=True)

## European Olympic events frequency and location
- how many countirs / cities inside and outside of Europe hosted? who hosted more than once?  

In [508]:
# How many Olympic Games in total?
num_olymp = world.Games.nunique()
# How many times did a European country host the Olympics?
num_eur_olymp = world[world.Host_Continent=='Europe'].Games.nunique()
print('Europe hosted {} of all {} Olympic games held in modern history; this is {}%'\
      .format(num_eur_olymp, num_olymp, int(100*num_eur_olymp/num_olymp)))

Europe hosted 29 of all 51 Olympic games held in modern history; this is 56%


In [489]:
print("Let's have a look at Cities which hosted multiple Olympic Games:")
sth = world.groupby('City').apply(lambda x:x['Year'].unique()).to_frame().reset_index()
sth.columns=['City','Years']
sth['Count']=[len(c) for c in sth['Years']]
sth.sort_values('Count',ascending=False).head(10)

Let's have a look at Cities which hosted multiple Olympic Games:


Unnamed: 0,City,Years,Count
3,Athens,"[2004, 1906, 1896]",3
17,London,"[2012, 1948, 1908]",3
30,Sankt Moritz,"[1928, 1948]",2
26,Paris,"[1900, 1924]",2
37,Stockholm,"[1912, 1956]",2
18,Los Angeles,"[1932, 1984]",2
15,Lake Placid,"[1980, 1932]",2
14,Innsbruck,"[1964, 1976]",2
0,Albertville,[1992],1
27,Rio de Janeiro,[2016],1


__Note:__ 
- Only two Cities: Athens and London hosted the Olympics 3 times! Both are in Europe.. 
- Out of all places that hosted the Olympics more than once only Los Angeles and Lake Placid are not in Europe.
- Having hosted 52% of all Olympic events, Europe clearly outshines the rest of the world. This figure excludes Russia, which would have added another 1 Olympiad (Sochi). 

In [490]:
# make data for plot 
tmp =  world[world.Host_Continent=='Europe'].groupby(['Host_Country'])['Games'].nunique()

In [491]:
init_notebook_mode(connected=True)

trace = [go.Choropleth(locations = tmp.index,
                      locationmode='country names', 
                      z = tmp,
                      colorscale = 'Greens',                      
                      reversescale = True,
                      marker = dict( line = dict(color = 'gray',width = 0.5)),
                      colorbar = dict(title = '#Olympics hosted'),         
                      geo = 'geo2')]
layout = go.Layout(
    title = 'European Olympic Hosts', 
    geo = dict( scope = 'europe',projection = dict(type = 'mercator')),
    geo2 = dict(scope = 'europe',
        showframe = False,
        showland = True,
        landcolor = "rgb(229, 229, 229)",
        showcountries = False),
        legend = dict(traceorder = 'reversed'))

fig = dict(data = trace, layout = layout)
iplot(fig,filename='world_Choropleth')

In terms of the number of Olympic Games held, having hosted five, France emerges as the leader in Europe and is overshadowed only by the USA. Unlike other European countries, such as UK and Greece, France hosted the Games across multiple locations: Chamonix, Albertville, Grenoble, Paris (twice). Germany, Norway and Italy also hosted their Games across multiple locations.

In [492]:
s = world.groupby(['Host_Country'])['Games'].nunique().sort_values(ascending=False)
print('All countries who hosted more than one Olympic Game:')
print(s[s>=2])

All countries who hosted more than one Olympic Game:
Host_Country
USA            8
France         5
Japan          3
Canada         3
Germany        3
Greece         3
UK             3
Austria        2
Italy          2
Norway         2
Sweden         2
Switzerland    2
Australia      2
Name: Games, dtype: int64


## European teams 

- How big are they? How has this changed over time? <br>
    Clearly, over time the overall trend is for number of athletes to increase. So maybe looking at team evolution over time is not so interesting. Instead I will have a look at the total European team sizes and the average for winter/ summer olympics since 2006 (last 3 Olympiads). 
- What is the ratio of Man to Woman in the different European teams? 
    Although this is a mental leap, could this be a proxy for; Which are the more egalitarian European societies?
    Again, bare in mind the overall trend is increased female participation with time.  

In [495]:
# make a df of european teams 
team_europe = world[world.Continent == 'Europe']#.groupby(['Country',])

In [496]:
# group by Year and Contry. Count number of athletes 
sth = team_europe.groupby(['Year', 'Country'])['Season'].value_counts()
df4plot = pd.DataFrame(data={'Num_Athletes': sth.values}, index = sth.index).reset_index()

In [560]:
top_10_num_athletes = df4plot.drop('Season',axis=1).query('Year >=2006').groupby('Country')['Num_Athletes'].\
sum().sort_values(ascending=False)[:10]

In [None]:
# split up into recent yrs by season, for ease of calculation and plotting
df_Summer = df4plot.query('Season == "Summer" & Year >=2006')
df_Winter = df4plot.query('Season == "Winter" & Year >=2006')
df_Summer = df_Summer.groupby(['Country'])['Num_Athletes'].mean().sort_values(ascending=False)
df_Winter =  df_Winter.groupby(['Country'])['Num_Athletes'].mean()

In [561]:
print('Top 10 total European athletes participation since 2006:\n{}'.format(top_10_num_athletes))

Top 10 total athletes participation since 2006:
Country
Germany        2397
Italy          1965
France         1942
UK             1792
Ukraine        1208
Poland         1207
Spain          1159
Netherlands    1044
Sweden         1011
Switzerland     984
Name: Num_Athletes, dtype: int64


In [542]:
init_notebook_mode(connected=True)

traceS = go.Bar(x = df_Summer.index, y = round(df_Summer,1) ,name="Summer Games",
                marker=dict(color='rgb(270,180,0)',opacity=1,))

traceW = go.Bar(x = df_Winter.index ,y = round(df_Winter,1),name="Winter Games",
                marker=dict(color='rgb(20,200,255)'))

layout = dict(title = 'Average Number of European Athletes in Olympic Games since 2006',height=600, width=1300, 
          xaxis = dict(title = 'Year', showticklabels=True), 
          yaxis = dict(title = 'Number of athlets'),
          legend=dict(x=0.45,y=1, traceorder='normal',font=dict(color='#000'), bgcolor='#E2E2E2'))

fig = dict(data= [traceS, traceW], layout=layout)
iplot(fig, filename='Number_athletes_olympiads')

__Observe:__
- From the bar output above it can be seen that European countries which historically have won the most medals and hosted the most Olympic Games are also among those who send the most athletes to compete in the Games. The leaders in total numbers of athletes sent in recent years are: Germany, Italy, France, UK, Spain. On the 6th and 7th place are slightly unexpected Ukraine followed by Poland. This countries have limited Olympic success when it comes to the total  number of medals won. This fact means that both Poland and Ukraine are keen participants trying hard to make a name for themselves in the Olympic world. 
- Germany is the leader in athlete participation numbers in both Summer and Winter Games.
- Substantial differences exist between different European countries and the amount of athletes they send to Summer and Winter Games. Looking at the bar chart, sorted by Summer games participation, the UK emerges strongly, as does Spain, followed by many other countries with a similar summer:winter participation ratio. These countries are much more involved in Summer sport disciplines, possibly because a lack of winter sport tradition in those countries. 
- On the other hand, some European countries dominate the the winter Olympics in terms of participation; Switzerland, Norway, Austria, Finland, Slovakia, Slovenia, Latvia and Estonia all send more athletes to the Winter Games than to the Summer ones. Again, this is possibly dictated by a stronger winter sport tradition in this countries as well as the fact that their geographical location gives access to training terrain and/or winter conditions.
- When it comes to the ratio of Summer:Winter Games participation, the Czech Republic and Sweden emerge as the most 'balanced' European countries; sending approximately equal numbers of athletes to both Game types.  


- It should also be noted that the number of athletes sent to the Olympics is proportional to the population size of a given country. For instance, Germany has the largest population out of the countries defined as 'Europe' for this exercise. This is followed by France, UK, Italy, Spain, Ukraine, Poland, Romania, Netherlands, Belgium etc.[1] So definitely, as far as Europe is concerned, a strong positive correlation exists between Olympic participation and population size. 

<br> Note: all above observations relate to recent history, i.e. 2006 onwards. 

#### Male vs Female Athletes in Europe 

In [659]:
# group by Contry. Count number of athletes and include Sex
sth = team_europe.groupby(['Country'])['Sex'].value_counts()
df4plot = pd.DataFrame(data={'Num_Athletes': sth.values}, index = sth.index).reset_index()
# Make DF with ratio of Male to Female european Athletes
df_M = df4plot.query('Sex == "M"').copy()
df_F = df4plot.query('Sex == "F"').copy()
df_M.rename(columns={'Num_Athletes':'Num_Athletes_M'},inplace=True)
df_F.rename(columns={'Num_Athletes':'Num_Athletes_F'},inplace=True)
df_M.drop('Sex',axis=1,inplace=True)
df_F.drop('Sex',axis=1,inplace=True)
df = df_M.merge(df_F, on='Country' )
df['M:F_ratio']= df.Num_Athletes_M / df.Num_Athletes_F

In [686]:
trace = go.Bar(x = df.Country , y = df['M:F_ratio'],name="M:F_ratio",
                    marker=dict( colorscale="Rainbow"),  opacity=0.7)
layout = dict(title = ' Female vs Male European Athletes in the Olympic Games',height=600, width=1000, 
          xaxis = dict(title =  'Country', showticklabels=True,  tickangle = 45, tickfont=dict(size=10)), 
          yaxis = dict(title =  'Number of Males per 1 Female' ),
          legend=dict(x=0.45,y=1, traceorder='normal',font=dict(color='#000'), bgcolor='#E2E2E2'))

fig = dict(data= [trace], layout=layout)
iplot(fig, filename='M:F_ratio_athletes_olympiads')

__ Observe:__ 
- Luxembourg, Monaco and San Marino stand out as those European countries with the most men athletes for every woman athlete. Its should be noted that all this countries are very small, they send a small number of athletes to the Olympics anyway, thus simply by 'chance' it could be that their ratios are elevated. Other much larger countries, such as Portugal, Finland, Belgium, Serbia are all above the 4:1 ratio. This countries all send many more athletes to the Olympics, thus overall they are the ones historically sending out the least woman. It should be noted that this result is unfair on modern day Serbia; this data category also includes all Yugoslavian countries of the past, when woman participation was generally lower. 
- Kosovo emerges as the country which sends most woman to the Olympics, but this is also because it is the 'youngest' European country, unlike other countries in the plot it has no past statistics to take the ratio higher. Additionally Kosovo sends out very few athletes. Other countries which stand out as the ones with the lowest M:F ratio are all relative 'newcomers', who started participating in the Games after 1992. This is already a point in time when woman were much more welcome to participate, thus the statistics for this countries are naturally in favour. This countries are Ukraine, Albania, Belarus, Slovakia, Slovenia, Montenegro, Bosnia and Herzegovina.
- How about countries which have participated since the early 1900s and have a low M:F ratio? There are a few, among them: Romania, the Netherlands, Germany and Poland! This nations may be the more egalitarian ones, at least looking at their choice of Olympians.

## 'Best' European Countreis by Medal Count and Sport


###  Medal count 
From notebook two, looking at the top countries by number of medals won, the top European countries emerged as: 
Germany, UK, France, Italy, Sweden, Hungary, Netherlands, Norway, Finland. Thus, 9 countries made it into the top 15 in the world, tahst almost 60%! Which other European countries also do well? The plot of Europes top 15 by medal count is included bellow to help anser this question. 

One should also note that there are five European countries which never won a medal: Bosnia and Herzegovina, Albania, Andorra, San Marino and Malta. Remarkably, Kosovo, which is the 'youngest' European country has won one medal in Judo.



In [610]:
# create df with a medal count by type of medal and country
medals = pd.DataFrame(team_europe.query('Year>2000').groupby('Medal')['Country'].value_counts())\
                .rename(columns={'Country':'Medal_Count'}).reset_index() 
    
# make pivot table with country as rows and medal type as collumn, fill missing values with zero (no medals)
medals_piv = medals.pivot(index='Country',columns='Medal').fillna(0)

# add total medals column and total overall (number of athletes participating)
medals_piv['Total_Medals'] = medals_piv.Medal_Count.Bronze + medals_piv.Medal_Count.Silver \
                            + medals_piv.Medal_Count.Gold
medals_piv['Total_Athletes'] = medals_piv.Medal_Count.sum(axis=1)

# add column for total medals won as a % of total athelet participants 
medals_piv['%medalists'] = round(100* medals_piv['Total_Medals']/medals_piv['Total_Athletes'],1)

# sort data frame in descending order using the total medals column
medals_piv = medals_piv.sort_values(by='Total_Medals',ascending=False)

In [611]:
traceG = go.Bar(x = medals_piv.index, y = medals_piv.Medal_Count.Gold.head(15).values, name="Gold",
               marker=dict( color='rgb(212,175,55)', opacity=0.7, reversescale = True))

traceS = go.Bar(x = medals_piv.index, y = medals_piv.Medal_Count.Silver.head(15).values, name="Silver",
               marker=dict( color='rgb(192,192,192)', opacity=1, reversescale = True))

traceB = go.Bar(x = medals_piv.index, y = medals_piv.Medal_Count.Bronze.head(15).values, name="Bronze",
               marker=dict( color='rgb(128,0,0)', opacity=0.5, reversescale = True))

trace_perc = go.Scatter(x= medals_piv.index , y = medals_piv['%medalists'].head(15).values,name="%_Medalists",  
                        yaxis='y2',marker=dict(color="Blue"),mode = "markers")

layout = go.Layout(title='Top 15 European Countries by Medals Won; in absolute and % terms', 
                   yaxis = dict(title = 'Number of Medals'),
                   yaxis2 = dict(title='% Medals won out of all attempts', anchor='x',overlaying='y',side='right'),
                   legend=dict( x=0.82,y=1, traceorder='normal',font=dict(color='#000'), bgcolor='#F2E2E2',
                               bordercolor='#FFFFFF',borderwidth=2), 
                   barmode='group')

fig = go.Figure(data = [traceG,traceS,traceB,trace_perc], layout = layout)
iplot(fig, filename = "medal")  

###  Sport disciplines in which European nations excel
Look at total number of medals won by country and sport to understand where different European counries compete and are the strongest.

In [580]:
# create df with a medal count by type of medal and country
medals = pd.DataFrame(team_europe.groupby(['Country','Sport','Season'])['Medal'].value_counts())\
                .rename(columns={'Medal':'Medal_Count'})
    
# make pivot table, fill missing values with zero (no medals)
medals_piv = medals.pivot_table(values='Medal_Count',index=['Country','Sport','Season'],columns='Medal').fillna(0)

# add total medals column and total overall (number of athletes participating)
medals_piv['Total_Medals'] = medals_piv.Bronze + medals_piv.Silver  + medals_piv.Gold

In [None]:
C_S = medals_piv.reset_index().copy()
Summer = C_S.query('Season == "Summer"')
Winter = C_S.query('Season == "Winter"')

In [521]:
from plotly import tools

trace_Summer = go.Heatmap(z = Summer.Total_Medals, y = list(Summer.Sport) , x = list(Summer.Country), name = 'Summer Games',
                   colorscale = 'Viridis', reversescale = True, colorbar = dict(title = '#Total Medals',x = 0.43))

trace_Winter = go.Heatmap(z = Winter.Total_Medals, y = list(Winter.Sport) , x = list(Winter.Country), name = 'Winter Games',
                   colorscale = 'Viridis', reversescale = True, colorbar = dict(title = '#Total Medals'))

fig = tools.make_subplots(rows=1, cols=2, print_grid=False, subplot_titles=('Summer Games', 'Winter Games'))

fig.append_trace(trace_Summer, 1, 1)
fig.append_trace(trace_Winter, 1, 2)

fig['layout']['xaxis1'].update(title='Country',  domain=[0.025, 0.425], tickangle = 45, tickfont=dict(size=9))
fig['layout']['xaxis2'].update(title='Country', domain=[0.585, 1], tickangle = 45, tickfont=dict(size=8))

fig['layout']['yaxis1'].update(title='Sport', tickangle = 345, tickfont=dict(size=8))
fig['layout']['yaxis2'].update(title='Sport',tickangle = 345, tickfont=dict(size=9))
                          
fig['layout'].update(height=800, width=1600, 
                     title='European Medals by Country and Sport')

iplot(fig, filename='euro_heatmap')

__ Observe: __

Summer Olympics: 
- At first glance, it can be seen that some countries only participate in a few sports, while others cover a wide range of disciplines. Some countries are 'medium' performers across the board and have no spots that 'stand out', while others are strong in a few to many disciplines. 
- Germany is the clear leader in terms of amount of medals won across a variety of disciplines. Overall, it has beaten other European countries in 18 out of the 63 Sports. Germany is the strongest at Rowing, Swimming, Athletics and many more (look plot).
- To name a few other observations that stand out: UK is best at Cycling and Sailing. Italy excels in Fencing, followed closely by France. Sweden excels in Ice Hockey, while the Netherlands is the strongest in Europe in Hockey. Hungary dominates Water Polo.

Winter Olympics: 
- As expected, at first glance Germany emerges as the strongest. Italy and the UK are strong on participation, they have been awarded in all but one Winter Olympic discipline. 
- Ice Hockey has  fierce competition among 3 European countries, but Sweden takes the lead with 217 medals.
- The Netherlands excel at sped skating, which is considered one of their National sports. 
- Norway is the best at Cross-country skiing, while Austria dominates Alpine skiing. 

Conculde: <br>
Looking at the top countries by medal count (Germany, UK, France, Italy, Netherlands etc.) and the sports they excell in, there are few sport disciplines in common between these countries. Perhaps this is why all these countries are at the top; each competes and wins in its own area of expertise.

__Note:__ Countries which have been participating in the Olympics the longest are favoured by the above analysis as it looks total numbers of medals throughout history.

In [524]:
# best country in Europe at each of 63 unique Olympic sports:
best = C_S.drop_duplicates(subset='Sport', keep='first').query('Total_Medals>0') # 16 unique countries 
 
print('Throughout history, {} European Countries won the maximum amount of medals in all the {} Olympic Sport disciplines.\
\nThese Countries are listed below, together with the count of sports they excelled in:'.\
      format(best.Country.nunique(), best.Sport.nunique()))
best.Country.value_counts() 

Throughout history, 16 European Countries won the maximum amount of medals in all the 63 Olympic Sport disciplines.
These Countries are listed below, together with the count of sports they exceled in:


Germany        18
UK             16
France          5
Sweden          4
Italy           3
Austria         2
Norway          2
Netherlands     2
Switzerland     2
Hungary         2
Spain           2
Finland         1
Belarus         1
Belgium         1
Serbia          1
Denmark         1
Name: Country, dtype: int64

In [477]:
assert ( sth.Country.value_counts().sum() == sth.Sport.nunique()) # check all makes sense

In [533]:
best4plot = best[best['Country'] == 'Germany']
trace0 = go.Bar(
    x= best4plot.Sport,
    y= best4plot.Total_Medals,
    marker=dict(color='rgb(158,202,225)',line=dict(color='rgb(8,48,107)',width=1.5)),
    opacity=0.6)

layout = go.Layout(title='Germany, Sports with Highest Medal Count in Europe',
                   xaxis = dict(title = 'Sport'),
                   yaxis = dict(title = 'Number of Medals'),)

fig = go.Figure(data=[trace0], layout=layout)
iplot(fig, filename='german_medals')

## European athlete's body metrics and Olympic success 

In [600]:
df1 = team_europe.groupby(['Country'])['Height', 'Weight'].agg('mean').dropna().reset_index()
df2 = team_europe.groupby(['Country'])['ID'].count().reset_index() 
df4plot = df1.merge(df2)

In [604]:
hover_text = []
for index, row in df4plot.iterrows():
    hover_text.append(( 'Country: {}<br>'+
                        'Mean Height: {}<br>'+
                        'Mean Weight: {}<br>'+
                        'Number of athlets: {}<br>').format(row['Country'],
                                                            round(row['Height'],2),
                                                            round(row['Weight'],2), 
                                                            row['ID']))
df4plot['hover_text'] = hover_text

In [605]:
data = []
for country in df4plot['Country']:
    ds = df4plot[df4plot['Country']==country]
    trace = go.Scatter(x = ds['Height'],y = ds['Weight'], name = country,  text = ds['hover_text'],
                       marker = dict(symbol='circle',
                                   sizemode ='area',
                                   sizeref = 7,
                                   size = ds['ID'],))
    data.append(trace)
                         
layout = go.Layout(title='Athletes average body metrics by country',
                   xaxis=dict(title ='Height (cm)'),
                   yaxis=dict(title ='Weight (kg)'),
                   showlegend=False,)

fig = dict(data = data, layout = layout)

iplot(fig, filename='Ave_body_metrics_by_country')

Most European Countries participate in a wide variety of sports , which require different body characteristics, hence little can be said for the countries in the middle of the graph, which are simply 'average'. However, the countries located on the left and right extremes of the plot are more interesting; are this results because certain nations have certain body metric characteristics? or are they dictated by the types of sport these countries excel in? 

Going back to the discoveries made in the previous notebooks, lets recall the general rule of thumb for many of the Olympic disciplines: 'the taller and lighter the better'. This is the case in Swimming, Canoeing, Rowing, Water Polo, Basketball and Volleyball to name a few. On the other extreme, in some sports the combination of tallness and lightness is important. Gymnastics. Diving, Figure Skating all favour the 'smaller and lighter' body types. While, wrestlers tend to be short but heavy. 

__Fun fact:__ World's 10 tallest nations are all in European! [2] This are: <br>
1) Netherlands - 1.838m, 2) Montenegro - 1.832m, 3) Denmark - 1.826m, 4) Norway - 1.824m, 5) Serbia - 1.82m, 6) Germany - 1.81m, 7) Croatia - 1.805m, 8) Czech Republic - 1.8031m, 9) Slovenia - 1.803m, 10) Luxembourg - 1.799m. 
Therefore, it comes as no surprise that many of the above listed nations are 'above average' on the graph. Lithuania, Iceland, Latvia and Slovakia do not make it to the top 10, but their populations are all relatively tall (1.77cm and above).
On a global scale no European country makes it to the top 10 shortest nations. Populations of countries on the left side of the graph, such as Romania, Bulgaria, Ukraine, Andorra etc. all have an average height of about 1.75cm, just as many other European countries [2]. Thus it is likely that countries in the left corner chose their athletes targeting Olympic excellence in certain sports, while countries in the right side use their natural endowment to excel at Olympic disciplines, which favour the tall. For instance, Serbia does well in Handball, Basketball and Water Polo, all which favour the tall. Sweden, Finland and the Czech Republic dominate Ice Hockey, another discipline rewarding height.  

Closer look at Countries on the Left:<br>
Romanian and Bulgarian athletes are on average the smallest and lightest out of the countries considered. Remarkably, Bulgaria scores well in Volleyball and Basketball, where height is known to be a significant contributor to success. It is also strong in gymnastics and wrestling, which tend to favour the the shorter. Romania is very strong in Gymnastics, which again is in line with body metrics. Romania is also strong in Rowing, Canoeing and Handball, which all favour the tall. This results are a mixed bag, while it is true that Romania and Bulgaria do very well in Gymnastics, Wrestling and Boxing, so do Norway and Sweden... Therefore it cannot be said with certainty that countries on the left side of the graph target sports for the 'shorter and/ or heavier'. 

## Conclusion 

This analysis could go on forever. As with any good peace of investigative data since, sometimes it is hard to know when to stop. Thus, below I include some questions which I would have still like to answer. To summarise the key findings from this analysis:
- Europe has hosted 52% of all Olympic events. Unlike the rest of the world, many European countries and cities have hosted more than once.  
- Germany, UK, France, Italy are leaders in both the numbers of athletes sent to the Olympics as well as overall Medals won. This countries excel in a variety of sports; together they are the best, in terms of absolute medal count, at 43 out of the 63 Olympic disciplines.
- Europe is home to all the tallest nations of the World. Since being tall and light has been shown to be an advantage in many Olympic disciplines, this countries make use of their natural endowment when selecting athletes. 


#### Further work:
- Patriotic Look at Poland an Ukraine
This two countries have been sending many athletes to the Olympics since 2006. They have participated across a wide range of sports with limited success. Ukraine only started participating in the Olympics since 1994. Poland has been intermittently present since 1912. However due to its complex history it was not able to excel like some of European countries which dominated this analysis. I would like to have a closer look at Polish history painted through the Olympic data. I would also like to compare Poland, Ukraine and other countries which became democratic in the early 1990s on a like for like basis with its other European counterparts (i.e. taking into account only history after 1992).
- Is there relationship between being a host country and number of medals won?
- I would also like to have a look at possible links between economic development, population size and the number of num of medals won.  

Sources:

- [1] European Population data https://en.wikipedia.org/wiki/List_of_European_countries_by_population
- [2] The Telegraph *Mapped: The world's tallest (and shortest) countries*: https://www.telegraph.co.uk/travel/maps-and-graphics/the-tallest-and-shortest-countries-in-the-world/