### Imports of libraries
First we import necessary modules.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import random
from scipy.stats import variation
%matplotlib inline
import seaborn as sns
sns.set()

### Variables
Let us create some variables that will help us write less. I know which I need since this is not the first version of the notebook :).

In [None]:
listOfMedals = ['Gold','Silver','Bronze']
listOfMedalsPointsSum = ['Gold','Silver','Bronze','Medal_pts','Medals']
groupingOfMedals = ['Year','Season','Team']

### Import the data
Import data about athletes attending events. An event can be basketball tournament or running 400m. The dataset includes data until year 2016.

In [None]:
#athletes = pd.read_csv("athlete_events.csv")
athletes = pd.read_csv("../input/120-years-of-olympic-history-athletes-and-results/athlete_events.csv")
athletes.head(10)
#import os
#print(os.listdir("../input/")) # there will be subfolder

### Create dummy features for different medals
To help in future manipulation we are turning textual field Medal (Gold, Bronze, Silver, NaN) in separate features. Since not all athletes won medals "no medal" is real possibility. Thus we don't have to remove any dummy features.

In [None]:
athletes = pd.concat([athletes, pd.get_dummies(athletes['Medal'])], axis = 1)
athletes = athletes.drop('Medal', axis = 1)
athletes.head()

### Let's see how the dataset looks like
Check describe() and info().

In [None]:
athletes.describe()

In [None]:
athletes.info()

### Remove data
Because of all the changes that happened in 1990s it is not easily possible to follow achievements and compare before and after. Thus, we remove records older than 1994.

In [None]:
athletes1994 = athletes[athletes['Year']>1993]

In [None]:
athletes1994.head()

### Group athletes into teams
We are not interested in performance of an athlete but teams (e.g. China basketball team). Therefore we group same country athletes into teams by grouping by year, season, team and event. Furthermore we sum medals. If we leave it this way number of medals of group sports will be multiplied. We actually want to get number of medals per event. Thus we change every number of medals > 0 to 1. After that we reset index to return the data into dataframe.
Since I know I will do this more than once, I'll make a function out of it (I know because I've already done this :) )

In [None]:
# I've created this as function since I know will use it multiple times since this isn't the first version :)
def prepareMedals(basicData, listOfMedals):
    medalsDF = basicData.groupby(['Year','Season','Team','Event'])[listOfMedals].sum()
    for m in listOfMedals:
            medalsDF.loc[medalsDF[m] > 0, m] = 1
    medalsDF.reset_index(inplace = True )
    return medalsDF

In [None]:
medals = prepareMedals(athletes1994, listOfMedals)
medals.head(25)

### Group again
Now we group again. This time there will be no more athletes and no more events. Only teams. This dataframe shows how many medals were won by each team.

In [None]:
medalsTeams = medals.groupby(groupingOfMedals)[listOfMedals].sum()
print(medalsTeams.head(5))

### Some of the teams have more than one version. 
E.g. there is Austria, but also Austria-1 and Austria-2. This three should be merged. We will do this by 1st creating the list of teams with "-" in it. We create this list from initial dataframe.

In [None]:
# this is list of all the "duplicated" teams (Like "Austria-1")
the_list = athletes1994['Team'][athletes1994['Team'].str.contains("-")].unique() 
display(the_list)

### Remove last 2 chars from the names of the team.
Now we take the list and find all the teams that are in the list in original list of athletes. There we remove last 2 chars from team names.

In [None]:
for i in the_list:
    # we go back to initial list athletes and remove last 2 chars if the name of the team is in the_list.
    athletes1994.loc[athletes1994['Team']==i,'Team']=i[:-2]
for i in the_list:
    # this is actually optional since this is the first list with old records.
    athletes.loc[athletes['Team']==i,'Team']=i[:-2]

In [None]:
# this is repeated code, just for checking if we merged the teams.
medals = prepareMedals(athletes1994, listOfMedals)
medalsTeams = medals.groupby(groupingOfMedals)[listOfMedals].sum()
medalsTeams.reset_index(inplace = True)
print(medalsTeams.head(20))

In [None]:
medals.to_csv('medalje.csv')
medalsTeams.to_csv('medalsTeam.csv')

### Want to see who the best is
Adding sum of medals and weighted sum. We need this to say who the best is :). Weighted sum is simple (multiplying number of gold medals with 3, silver with 2 and then summing up with bronze medals.

In [None]:
medalsTeams['Medal_pts'] = (3*medalsTeams['Gold'])+(2*medalsTeams['Silver'])+medalsTeams['Bronze']
medalsTeams['Medals'] = medalsTeams['Gold']+medalsTeams['Silver']+medalsTeams['Bronze']
medalsTeams.head(10)

### Now we sum up based on teams 
We sum up all medals that a team got in between 1994 and 2016.

In [None]:
medalsTeamsTotals = medalsTeams.groupby(['Team'])[listOfMedalsPointsSum].sum()
medalsTeamsTotals.reset_index(inplace = True)
medalsTeamsTotals.head(10)

In [None]:
f, (ax1, ax2, ax3, ax4, ax5) = plt.subplots(5, 1, figsize=(15, 20), sharex=False)

#The first graph is who has the most medal points (weighted sum of medals)
sns.barplot(data=medalsTeamsTotals.sort_values(by='Medal_pts', ascending = False).head(10), x='Team', y='Medal_pts', palette="rocket", ax=ax1)
ax1.axhline(0, color="k", clip_on=True)
ax1.set_ylabel("Medal points")
#The second graph is who has the most medals 
sns.barplot(data=medalsTeamsTotals.sort_values(by='Medals', ascending = False).head(10), x='Team', y='Medals', palette="rocket", ax=ax2)
ax2.axhline(0, color="k", clip_on=True)
ax2.set_ylabel("Medals")
#The third graph is who has the most gold medals 
sns.barplot(data=medalsTeamsTotals.sort_values(by='Gold', ascending = False).head(10), x='Team', y='Gold', palette="rocket", ax=ax3)
ax3.axhline(0, color="k", clip_on=True)
ax3.set_ylabel("Gold")
#The fourth graph is who has the most silver medals 
sns.barplot(data=medalsTeamsTotals.sort_values(by='Silver', ascending = False).head(10), x='Team', y='Silver', palette="rocket", ax=ax4)
ax4.axhline(0, color="k", clip_on=True)
ax4.set_ylabel("Silver")
#The fifth graph is who has the most bronze medals 
sns.barplot(data=medalsTeamsTotals.sort_values(by='Bronze', ascending = False).head(10), x='Team', y='Bronze', palette="rocket", ax=ax5)
ax5.axhline(0, color="k", clip_on=True)
ax5.set_ylabel("Bronze")


### So United States is the best?
Well based on number of weighted sum of medals and based on number of gold medals, United States is the best.

### What year was the best?
OK, so we know it is United States that is THE best. But which year was best for which country?

First we create a list of countries and then we loop through the list getting the best years based on weighted sum of medals.

In [None]:
listOfCountries = medalsTeams['Team'].unique()
bestyears = pd.DataFrame(columns=['Team','Year','Medal_pts'])
for country in listOfCountries:
    temp = medalsTeams.loc[medalsTeams['Team']==country].sort_values(by='Medal_pts', ascending = False).head(1)[['Team','Year','Medal_pts']]
    frames = [bestyears, temp]
    bestyears = pd.concat(frames)

### We do not want to show countries that have too few medals.
That is not because we do not want to show them but it is crowdie. So we remove those that at best years had less or equal to 5 medal. (My Croatia is still in the list :) ). Then we draw bubble plot where size of bubbles is defined by weighted sum of medals.

In [None]:
bestyears = bestyears.loc[bestyears['Medal_pts']>5]
#since Medal_pts is type object we must change it into float or int.
bestyears.loc[:,'Medal_pts'] = bestyears.Medal_pts.astype(np.float)
g, (ax1) = plt.subplots(1, 1, figsize=(20, 5))
sns.scatterplot(data = bestyears, x = 'Team', y = 'Year', size ='Medal_pts', sizes=(5,1000) , hue ='Medal_pts' ,palette="coolwarm", ax=ax1)
ax1.axhline(0, color="k", clip_on=True)
ax1.set(ylim=(1992, 2018))
ax1.set_ylabel("Year")
for item in ax1.get_xticklabels():
    item.set_rotation(90)

### Is number of medals enough?
Probably not. We need to see how many events were at each game to see potentially maximal number of medals. Right?

In [None]:
EventsPerGames=pd.DataFrame(athletes1994.groupby(['Year','Season'])['Event'].nunique())
EventsPerGames.columns=['Events']
EventsPerGames.reset_index(inplace = True)
g, (ax1) = plt.subplots(1, 1, figsize=(20, 5))
sns.lineplot(data=EventsPerGames,x='Year',y='Events', hue='Season', ax=ax1)

### We cannot group all games since there is a difference between Winter and Summer games
Since there is big difference in number of events between Summer and Winter Olympic games we need to see "best years" based on the season.

### First we do winter games and repeat everything else.
But first let us create functions that will do all for us :). The new function could replace code used when we wanted to find best years no matter which season (winter or summer). 

In [None]:
#dataframe to get flexibility if we want to differently do things
#Season: Winter / Summer / All
#minimal to drop rows that in best years do not have more then "minimal" number of medals points
def returnDFOfBestYears(dataFrame, season = 'All', minimal = 5, value = 'Medal_pts', columns = ['Team','Year','Medal_pts'], sort_ascending = False):
    dataFrame=dataFrame[dataFrame['Medal_pts']>minimal]
    #Make sure season is with capital first letter
    season = season[0].upper() + season.lower()[1:]
    # define columns that will be used in all cases (a bit less typing)
    if season == 'All':
        season = dataFrame['Season'].unique()
    else:
        season = [season]
    #create empty dataframe
    bestyearsTemp = pd.DataFrame(columns=columns)
    #list of countries already exists
    for country in listOfCountries:
        temp = dataFrame.loc[(dataFrame['Team']==country) & (dataFrame['Season'].isin(season))].sort_values(by=value, ascending = sort_ascending).head(1)[columns]
        bestyearsTemp = pd.concat([bestyearsTemp, temp])
    #since Medal_pts is type object we must change it into float or int.
    bestyearsTemp.loc[:,value] = bestyearsTemp[value].astype(np.float)
    return bestyearsTemp

#dataframe: what we draw
#season actually transfers to palette since I like it when Winter Games use winter palette ;)
#value we mighty do graphs on different values :)
#textDivisor is value we divide the value for labels to avoid big numbers (like in case of GDP)
def drawGraphBestYears(dataFrame, season = 'all', value = 'Medal_pts', textDivisor = 1):
    season = season.lower()
    if season == 'all':
        season = 'coolwarm'
    g, (ax1) = plt.subplots(1, 1, figsize=(20, 8))
    # color palette for the graph "winter" :)
    sns.scatterplot(
                    data = dataFrame
                    , x = 'Team'
                    , y = 'Year'
                    , size = value
                    , sizes=(5,750)
                    , hue = value
                    , alpha=0.7
                    , edgecolors="red"
                    , palette=season
                    , legend = False
                    , ax=ax1)
    ax1.axhline(0, color="k", clip_on=True)
    # lets define max and min of y axis based on actual data, and set ticks on actual data
    yearsTicks = dataFrame['Year'].unique().tolist()
    yearsTicks.sort()
    ax1.set(ylim=(yearsTicks[0]-1, yearsTicks[-1]+1),yticks = (yearsTicks))
    ax1.set_ylabel("Year")
    ax1.set_title("When a team had the best results in " + value, pad = 25, loc = "left" )
    i = 0
    prevYear = 0
    for index, row in dataFrame.iterrows():
        #to avoid overlaping of datalabels
        if prevYear == row['Year']:
            addition = 2.5
        else:
            addition = 1.5
        ax1.text(i, row['Year']+addition, round(row[value]/textDivisor,2), color='black', withdash=True)
        i += 1
        prevYear = row['Year']
    for item in ax1.get_xticklabels():
        item.set_rotation(90)

### Now the graph
Actually following code could be written differently but to make it easier to follow I kept it in two lines (I could also create new function that would call those two or even design drawGraphBestYears to call returnDFOfBestYears).

In [None]:
bestyearsW = returnDFOfBestYears(medalsTeams, "winter")
drawGraphBestYears(bestyearsW, 'winter')

### Now summer!

In [None]:
bestyearsS = returnDFOfBestYears(medalsTeams, "summer")
drawGraphBestYears(bestyearsS, 'summer')

### Looking at the graph we see one interesting thing:
There is far less countries with more than 5 medals in best years in winter Olympics then on summer. Which is normal since there is less events. It would be interesting to see which countries are on one or other or both lists? This requires a bit of a different approach. Firstly we create two separate dataframes out of list of teams winning more than 5 medals in summer and winter Olympics. Then we in both dataframes add column 'Game' with value 1 for summer and 2 for winter. Then we merged two dataframes and summed 'Game' column and got new frame. In new frame if value of column 'Game' is 1 then the team won more than 5 medals only in summer, if it is 2 then the team won more than 5 medals only in winter, and if it is three then in both. Drew the graph with changed y tick labels to reflect the logic.

In [None]:
# First create dataframe with two columns: Team - list of teams, and Game - value 1 for summer games
summerCountries = pd.DataFrame(columns=['Team','Game'])
summerCountries['Team'] = bestyearsS['Team']
summerCountries['Game'] = 1
# First create dataframe with two columns: Team - list of teams, and Game - value 2 for winter games
winterCountries = pd.DataFrame(columns=['Team','Game'])
winterCountries['Team'] = bestyearsW['Team']
winterCountries['Game'] = 2
#now we concatenate two dataframes
wsCountries = pd.concat([winterCountries, summerCountries])
#and now we group per team and sum values of game
#In case a team had more then 5 medals in summer value of Game is 1, in case a team had
#more then 5 medals in winter value of Game is 2, and if a team had more then 5 medals
#in both summer and winter the value is 3.
wsCountries = wsCountries.groupby(['Team'])['Game'].sum().to_frame()
wsCountries.reset_index(inplace = True)
g, (ax1) = plt.subplots(1, 1, figsize=(20, 7))
sns.scatterplot(data = wsCountries, x = 'Team', y = 'Game', s=1000, ax=ax1, cmap="Blues", alpha=0.4, edgecolors="gray", linewidth=2)
ax1.axhline(0.6, color="k", clip_on=True)
ax1.set(ylim=(0.6, 3.4),yticks = (1,2,3), yticklabels = ('Only summer','Only winter', 'Both'))
ax1.set_ylabel("Olympic season")
for item in ax1.get_xticklabels():
    item.set_rotation(90)


### So the top list of best countries should be also broken down.
What we saw in the data and graphs is that we cannot just summarize medals based on team due to two facts:
1.	Summer Olympics have more events and thus more medals
2.	Not all teams perform on same level on winter and summer Olympics Thus we break the top list down!
Thus we break the top list down!

In [None]:
medalsTeamsTotalsW = medalsTeams.loc[medalsTeams['Season']=='Winter'].groupby(['Team'])[listOfMedalsPointsSum].sum()
medalsTeamsTotalsW.reset_index(inplace = True)
print(medalsTeamsTotalsW.head(10))
medalsTeamsTotalsS = medalsTeams.loc[medalsTeams['Season']=='Summer'].groupby(['Team'])[listOfMedalsPointsSum].sum()
medalsTeamsTotalsS.reset_index(inplace = True)
print(medalsTeamsTotalsS.head(10))

### Now we draw graphs for summer and winter
To avoid writing down each graph separately we make for loop. We loop through rows and columns defined in variables at the beginning. The rest is python at its best.

In [None]:
rows = 5
columns = 2
f, (ax1) = plt.subplots(rows, columns, figsize=(15, 30), sharex=False)

for x in range(rows):
    for y in range(columns):
        i = -0.25
        # dictionary containing options for labels of y axis - based on row number
        ylabelChoices = {0: ("Medal points"), 1: ("Medals"), 2: ("Gold"), 3: ("Silver"), 4: ("Bronze")}
        # dictionary containing options for sorting data - based on row number
        sortChoices = {0: ("Medal_pts"), 1: ("Medals"), 2: ("Gold"), 3: ("Silver"), 4: ("Bronze")}
        # dictionary of pallets which reflect season - based on column number
        palletChoices = {0: ("winter"), 1: ("summer")}
        # dictionary of variables from which we get winter / summer data - based on column number
        sourceChoices = {0: (medalsTeamsTotalsW), 1: (medalsTeamsTotalsS)}
        #the data for this graph
        dataFrame = sourceChoices.get(y,'').sort_values(by=sortChoices.get(x, ''), ascending = False).head(10)
        sns.barplot(
            data = dataFrame
            , x='Team'
            , y=sortChoices.get(x, '')
            , palette=palletChoices.get(y,'')
            , ax=ax1[x,y])
        ax1[x,y].axhline(0, color="k", clip_on=True)
        ax1[x,y].set_ylabel(ylabelChoices.get(x, '') + " - " + palletChoices.get(y,''))
        
        #writting the values
        for index, row in dataFrame.iterrows():
            ax1[x,y].text(i,round(row[sortChoices.get(x, '')],0) , round(row[sortChoices.get(x, '')],0), color='black', withdash=True)
            i += 1
        for item in ax1[x,y].get_xticklabels():
            item.set_rotation(30)


### So United States is not so mighty!
Especially if we look at the number of gold medals, United States is not even in Top 3. So the might of US comes from summer Olympic Games. Let's see where this might come from - in other words which events Top 5 countries in both seasons get most of the medals. The parameter to check who is in Top 5 will be Medal points. In other words for winter following countries are top 5:
1. Germany
2. United States
3. Norway
4. Russia
5. Canada

For summer season top 5 countries are:
1. United States
2. China
3. Russia 
4. Germany
5. Great Britain

We will start from dataframe "medals".

In [None]:
#List of top 5 countries for summer
summerCountries = medalsTeamsTotalsS.sort_values(by = 'Medal_pts', ascending = False)['Team'].head(5)
#List of top 5 countries for winter
winterCountries = medalsTeamsTotalsW.sort_values(by = 'Medal_pts', ascending = False)['Team'].head(5)

#empty dictionaries for summer and winter
summerTop5CountriesMedals = {}
winterTop5CountriesMedals = {}

#list - list of countries to prepare data
#season - summer or winter
def returnTop5CountriesMedals (list, season):
    tempDict = {} # empty temporary dictionary
    for c in list:
        # now we fill dictionary for country "c" with data
        tempDict[c] = medals.loc[(medals['Season']==season) & 
                                 (medals['Team']==c) & 
                                 ((medals['Gold'] > 0)|
                                  (medals['Silver'] > 0)|
                                  (medals['Bronze'] > 0))][['Team','Event','Gold','Silver','Bronze']]
        #group the data and sum
        tempDict[c] = tempDict[c].groupby(['Team','Event'])[listOfMedals].sum() #list of medals is variable
        tempDict[c].reset_index(inplace = True)
        #calculate Medal points based on medals in the data
        tempDict[c]['Medal_pts'] = (3*tempDict[c]['Gold'])+(2*tempDict[c]['Silver'])+tempDict[c]['Bronze']
    return tempDict

summerTop5CountriesMedals = returnTop5CountriesMedals (summerCountries, 'Summer')
winterTop5CountriesMedals = returnTop5CountriesMedals (winterCountries, 'Winter')

### Let's see the list
Let's see list of top 10 events for  each county and each season where the teams got highest number of medal points.

In [None]:
def printingTop5Countries(dict, heading):
    print(heading)
    for c in dict:
        print(c)
        display(dict[c].sort_values(by='Medal_pts', ascending = False).head(10)[['Event','Gold','Silver','Bronze', 'Medal_pts']])
        print('\n\r')
    
printingTop5Countries(summerTop5CountriesMedals,'Top 10 events of top 5 countries for Summer Olympic Games')
printingTop5Countries(winterTop5CountriesMedals,'Top 10 events of top 5 countries for Winter Olympic Games')

### Now the graphs
Next we will show graph for each country in top 5 for season of top 10 events with won medal points.

In [None]:
def drawGraphTop5Countries(dict, heading):
    # "detect" season from the heading and based on the season use pallete
    if 'summer' in heading.lower():
        palette = 'summer'
    else:
        palette = 'winter'
    rows = 5
    columns = 1
    f, (ax1) = plt.subplots(rows, columns, figsize=(15, 40), sharex = True)
    axIndex = 0
    print(heading)
    for c in dict:
        data = dict[c].sort_values(by='Medal_pts', ascending = False).head(10)
        sns.barplot(data=data
                    , y='Event'
                    , x='Medal_pts'
                    , palette=palette
                    , ax=ax1[axIndex])
        ax1[axIndex].axvline(0, color="k", clip_on=True)
        ax1[axIndex].set_xlabel('Medal_pts')
        ax1[axIndex].set_ylabel('Events')
        ax1[axIndex].set_title(c)
        #writting the values
        i = 0.1
        for index, row in data.iterrows():
            ax1[axIndex].text(row['Medal_pts'],i , row['Medal_pts'], color='black', withdash=True)
            i += 1
        axIndex += 1

In [None]:
drawGraphTop5Countries(summerTop5CountriesMedals,'Top 10 events of top 5 countries for Summer Olympic Games')

In [None]:
drawGraphTop5Countries(winterTop5CountriesMedals,'Top 10 events of top 5 countries for Winter Olympic Games')

### What have we learned?
Actually for me two things are surprising:
1. In winter games Germany event with most medal points has almost 2x more points than United States (the second country)
2. Top three China events (events with most medal points) have more points than the top United States event but United States have more points

It seems that United States' success does not come from dominating some events. Is it maybe width of the team and number of events that brings success to United States? In that sense it would be good to compare number of medals with number of participants.

In [None]:
# create dataframe with unique participants grouped by year, season and team
participants = pd.DataFrame(athletes1994.groupby(['Year','Season','Team'])['ID'].nunique())
participants.columns = ['UniqueParticipants']
participants.reset_index(inplace=True)
# create dataframe with sports that the team compete on grouped by year, season and team
sports = pd.DataFrame(athletes1994.groupby(['Year','Season','Team'])['Sport'].nunique())
sports.columns = ['ParticipatingOnSports']
sports.reset_index(inplace=True)
# create dataframe with events that the team compete on grouped by year, season and team
# just a reminder in context of this dataset, event is running 100m and sport is athletics
events = pd.DataFrame(athletes1994.groupby(['Year','Season','Team'])['Event'].nunique())
events.columns = ['ParticipatingOnEvents']
events.reset_index(inplace=True)
#now we merge it with dataframe we worked with (until now that is)
medalsTeamsParticipants = medalsTeams.merge(participants, on=['Year','Season','Team']).merge(sports,  on=['Year','Season','Team']).merge(events, on=['Year','Team','Season'])
# we can check / test if number of participants was appended properly
display(participants.loc[participants['Team']=='Croatia'])
display(sports.loc[sports['Team']=='Croatia'])
display(events.loc[events['Team']=='Croatia'])
display(medalsTeamsParticipants.loc[medalsTeamsParticipants['Team']=='Croatia'])

### Let's define new features 
New feature will give us number of medals per participant, medals per sport and medals per event. Formula should be Medal_pts / participants. Numbers will be usually below 1.

Reality is that some sports (like basketball) require more participants then some other (skiing). Therefore medal_pts per event sounds like great indicator of success - how many did I achieve of my target.

In [None]:
medalsTeamsParticipants['MedalPtsPerPart'] = medalsTeamsParticipants['Medal_pts'] /medalsTeamsParticipants['UniqueParticipants']
medalsTeamsParticipants['MedalPtsPerSport'] = medalsTeamsParticipants['Medal_pts'] /medalsTeamsParticipants['ParticipatingOnSports']
medalsTeamsParticipants['MedalPtsPerEvent'] = medalsTeamsParticipants['Medal_pts'] /medalsTeamsParticipants['ParticipatingOnEvents']
#Again just for test - Croatia
display(medalsTeamsParticipants.loc[medalsTeamsParticipants['Team']=='Croatia'])

### The idea!
The idea is to see beyond absolute number of medal or medal points. This is why I added features Medal points per participant (MedalPtsPerPart), Medal points per sport (MedalPtsPerSport) and Medal points per event (MedalPtsPerEvent). The new features could show us a different story. Right?

So, we are going to use previously written functions returnDFOfBestYears and drawGraphBestYears (I guess now you see why I did put it in function ;) ). 

### First we going to do summer
First season we are going to cover is summer. We will draw best year for a country based on medal points per event. It should be good view since it shows how many points you won vs. how many points you could win. With a twist.

In [None]:
bestyearsS = returnDFOfBestYears(medalsTeamsParticipants, "summer",5,'MedalPtsPerEvent', ['Team', 'Year', 'MedalPtsPerEvent'])
drawGraphBestYears(bestyearsS, 'summer', 'MedalPtsPerEvent')

### Now winter
Next we do winter.

In [None]:
bestyearsW = returnDFOfBestYears(medalsTeamsParticipants, "winter",5,'MedalPtsPerEvent', ['Team', 'Year', 'MedalPtsPerEvent'])
drawGraphBestYears(bestyearsW, 'winter', 'MedalPtsPerEvent')

### Interesting!
When we look at parameter which shows how many points got a team per event then some other countries are stars! Netherland in Winter Olympics, and Ethiopia in Summer Olympics. 
To be honest, this view is not absolutely fair. The reason is that for single sports like running and skiing for one event more participants can participate from the same team. This means that a country participating on basketball tournament can have 0, 1, 2 or even 3 points from same event. A country participating 100m running event and having multiple (capable) participant can get 0, 1, 2 and 3 but also 4, 5 and 6 points. And this is the twist. 
Maybe more fair approach would be to separately show group and single sports. But that is idea for some other time :).

### Let us show ALL
Let's see how through years change number of medal points. First we do winter. But first, we prepare functions :).

In [None]:
def resultsForAllYearsMin5Pts(copyOfmedalsTeams, season):
    #Make sure that season has only 1st character capital
    season = season[0].upper() + season.lower()[1:]
    #Get only records of the season, and if season is "All" then just skip filtering
    if season!='All':
        copyOfmedalsTeams = copyOfmedalsTeams.loc[(copyOfmedalsTeams['Season']==season)]
    #Temporary list (np.array) of records where country had more then 5 medal points.
    #We need to get only unique values since some countries might have more then one row.
    tempListOfCountries = np.unique(copyOfmedalsTeams.loc[copyOfmedalsTeams['Medal_pts'] > 5, ['Team']].values)
    #Put the list in a dataframe
    numericListOfCountries = pd.DataFrame(tempListOfCountries)
    numericListOfCountries.columns = ['Team']
    #Turn a dataframe index to column
    numericListOfCountries['Id'] = numericListOfCountries.index
    #Merge two dataframes 
    copyOfmedalsTeams = copyOfmedalsTeams.loc[copyOfmedalsTeams['Team'].isin(tempListOfCountries)]
    copyOfmedalsTeams = copyOfmedalsTeams.merge(numericListOfCountries, on = 'Team')
    return copyOfmedalsTeams


In [None]:
def drawAllYearsMin5Pts(theData):
    f = plt.figure(figsize=(15, 15))
    numberOfCountries = theData['Team'].nunique()
    dashes = []
    g, h = 1, 1
    for i in range(numberOfCountries):
        dashes.append((random.randint(1,7),random.randint(1,7)))

    X, Y, hue = theData['Year'], theData['Medal_pts'], theData['Team']

    f=sns.lineplot(X,Y,hue = hue
                        , style=hue
                        , dashes=dashes)
    f.set_xticks(theData['Year'].unique())
    f.set_ylabel('Medal points')

### First we do winter!
This time to show off in just one line :)

In [None]:
drawAllYearsMin5Pts(resultsForAllYearsMin5Pts(medalsTeams, 'winter')) #note that winter is written with lower first letter :)

### And now summer

In [None]:
drawAllYearsMin5Pts(resultsForAllYearsMin5Pts(medalsTeams, 'summer')) #note that summer is written with lower first letter :)

### Graphs are messy
But we learned something: there are big variations between years for some countries. Let's see only China and their results on Summer Olympic Games.

In [None]:
tempData = resultsForAllYearsMin5Pts(medalsTeams, 'summer')
drawAllYearsMin5Pts(tempData.loc[tempData['Team']=='China'])

Or Canada in winter.

In [None]:
tempData = resultsForAllYearsMin5Pts(medalsTeams, 'winter')
drawAllYearsMin5Pts(tempData.loc[tempData['Team']=='Canada'])

### Coefficient of variation to the rescue
Reality is that we cannot use only variation or standard deviation to see who has the most stable results. We need to check what standard deviation in relation with average value is. And that is Coefficient of variation. We could do it manually but we will use scipy function variation to get it. 
As usual we will create functions to get the coefficient and then to draw them. The functions are here since we do same thing two times: for winter and for summer games. 

In [None]:
#datafame - data we operate upon
def calculateCoeffVar(dataFrame):
    #create empty dataframe
    coeffVarDF = pd.DataFrame(columns=['Team','Coeff'])
    countries = dataFrame['Team'].unique()
    for c in countries:
        data = dataFrame.loc[dataFrame['Team']==c]['Medal_pts']
        coef = variation(data)
        #add values to dataframe
        coeffVarDF=coeffVarDF.append({'Team':c,'Coeff':coef}, ignore_index=True) 
    return coeffVarDF

#datafrane = data we draw
#palette = palette for the graph
def drawCoeffVar(dataFrame, palette):
    #definition of maxim which is used for modification of position of data labels
    #and in definition of limits
    maxim = dataFrame.sort_values(by = 'Coeff', ascending = False)['Coeff'].head(1)
    maxim = round((maxim.values[0] * 1.1),2)
    palette = palette.lower()
    g, (ax1) = plt.subplots(1, 1, figsize=(20, 7))
    sns.barplot(data = dataFrame, x = 'Team', y = 'Coeff', palette=palette, ax=ax1)
    ax1.axhline(0, color="k", clip_on=True)
    ax1.set(ylim=(0, maxim))
    ax1.set_ylabel("Coefficient of variation")
    #writting the datalabels
    i = -0.25
    prevValue = 0
    for index, row in dataFrame.iterrows():
        #this is indendent to avoid text over text writing
        value = round(row['Coeff'],2)
        if abs(1 - prevValue / value)<0.1:
            position = value + (maxim * 0.05)
        else:
            position = value
        prevValue = position
        ax1.text(i,position, value, color='black', withdash=True)
        i += 1
    for item in ax1.get_xticklabels():
        item.set_rotation(90)

### Anyway, winter first!

In [None]:
drawCoeffVar(calculateCoeffVar(resultsForAllYearsMin5Pts(medalsTeams, 'winter')) ,'winter')

### Now summer!

In [None]:
drawCoeffVar(calculateCoeffVar(resultsForAllYearsMin5Pts(medalsTeams, 'summer')),'summer')

### Conclusion about coefficient of variation - sort of (1/2)
Unlike at other cases here is better who is lower on graph. So, in Winter Olympic Games the most reliable countries are Germany and Norway. In Summer Olympic Games the most reliable are United States and France. 

Let’s print France in summer and Norway in winter.

In [None]:
tempData = resultsForAllYearsMin5Pts(medalsTeams, 'winter')
print(tempData.loc[tempData['Team']=='Norway'])

In [None]:
tempData = resultsForAllYearsMin5Pts(medalsTeams, 'summer')
print(tempData.loc[tempData['Team']=='France'])

### Conclusion about coefficient of variation - sort of (2/2)
The other point is that some countries have coefficient of variation above 1. The reason is that those countries had 0 medal points in some years which lowers average. Let's print Egypt in summer and Estonia in winter.

In [None]:
tempData = resultsForAllYearsMin5Pts(medalsTeams, 'winter')
print(tempData.loc[tempData['Team']=='Estonia'])

In [None]:
tempData = resultsForAllYearsMin5Pts(medalsTeams, 'summer')
print(tempData.loc[tempData['Team']=='Egypt'])

### Let's move to financial data
It is only GDP for the countries. We want to see GDP only in years when games took place and remove the rest. First we import the data.

In [None]:
#gdpdata = pd.read_csv('GDP by Country.csv', skiprows = range(0,4))
gdpdata = pd.read_csv('../input/gdp-world-bank-data/GDP by Country.csv', skiprows = range(0,4))

### It is all the GDP data
We can check the data to format / shape / content.

In [None]:
gdpdata.head()

### For which years we need GDP
It is the list of unique years from our dataset with athletes and events. Right!

In [None]:
years = np.sort(athletes1994['Year'].unique())
print("This is the list:")
print(str(years).split())

### Now we want a string that we will use to get only needed columns
By needed I meant those years when Olympic games took place. We also need country name. These are column names.

In [None]:
years_string=str(years)[1:-1].split()
cntrname = 'Country Name'
years_string.insert(0, cntrname)
print(years_string)

### Let's select only relevant columns
We create new dataframe with relevant columns and all the rows.

In [None]:
gdpdata_work = gdpdata[years_string]
gdpdata_work.head()

### Is "unpivoting" real word?
Don't know. But we will use melt to unpivot the dataframe. This is needed for easier merger with other datasets.

In [None]:
melted_gdp = gdpdata_work.melt(id_vars=['Country Name'], value_vars=years_string[1:], var_name='Years')

### Now give the columns right names.
It is 'Team', 'Year' and 'GDP'

In [None]:
melted_gdp.columns=['Team','Year','GDP']

### Let's see how the dataframe looks now
Well only the head of the dataframe, but you get the point.

In [None]:
melted_gdp.head()

### We want to detect which teams/countries from list of Olympic participant don't exist in GDP list
There is a chance that a country is not listed in both dataframes with same name. Or that some countries are listed in one but not in the other dataframe. Let's check it. First we check both lists.

In [None]:
#get both list - countries from both dataframes
olympic_countries = athletes1994['Team'].unique()
gdp_countries = gdpdata_work['Country Name'].unique()

### Let us compare the lists
For comparison we use setdiff1d method of numpy.

In [None]:
#compare both list. Result is list of countries that competed on Olympic games and dont have GDP data.
np.setdiff1d(olympic_countries,gdp_countries)

### And now vice versa
Let's see countries that have GDP but didn't compete on Olympic games.

In [None]:
#compare both list. Result is list of countries that have GDP data and didn't competed on Olympic games.
np.setdiff1d(gdp_countries,olympic_countries)

### Looking at the list we could adapt following:

Bahamas -> Bahamas, The

Cape Verde -> Cabo Verde

Congo (Brazzaville)	-> Congo, Rep

Russia	-> Russian Federation

Saint Vincent and the Grenadines	-> St. Vincent and the Grenadines

Venezuela	-> Venezuela, RB

Congo (Kinshasa)	-> Congo, DR

Federated States of Micronesia	-> Micronesia, Fed. Sts.

Gambia	-> Gambia, The

Guinea Bissau	-> Guinea-Bissau

Iran	-> Iran, Islamic Rep.

Saint Kitts and Nevis	-> St. Kitts and Nevis

Slovakia	-> Slovak Republic

Syria	-> Syrian Arab Republic

Hong Kong	-> Hong Kong SAR, China

Kyrgyzstan	-> Kyrgyz, Republic

Macedonia	-> Macedonia, FYR

North Korea	-> Korea, Dem. People’s Rep.

Saint Lucia	-> St. Lucia

Timor Leste	-> Timor-Leste

Brunei	-> Brunei Darussalam

Egypt	-> Egypt, Arab Rep.

Great Britain	-> United Kingdom (Yes, I know this is not same but it is closest)

South Korea	-> Korea, Rep.

United States Virgin Islands	-> Virgin Islands (U.S.)

Yemen	-> Yemen, Rep.


### To make sure we do not mess up we add "backup" column 'Country'
We just copy data from column 'Team' into new column 'Country'.

In [None]:
melted_gdp['Country']=melted_gdp['Team']
melted_gdp.head()

### Now we create dictionary with translations
Key of the dictionary is existing name and value is new name aligned with teams' names in "Athletes" dataframe.

In [None]:
#the dictionary with changes
change = {
'Bahamas, The':'Bahamas', 
'Cabo Verde':'Cape Verde',
'Congo, Rep.':'Congo (Brazzaville)',
'Russian Federation':'Russia',
'St. Vincent and the Grenadines':'Saint Vincent and the Grenadines',
'Venezuela, RB':'Venezuela',
'Congo, Dem. Rep.':'Congo (Kinshasa)',
'Micronesia, Fed. Sts.':'Federated States of Micronesia',
'Gambia, The':'Gambia',
'Guinea-Bissau':'Guinea Bissau',
'Iran, Islamic Rep.':'Iran',
'St. Kitts and Nevis':'Saint Kitts and Nevis',
'Slovak Republic':'Slovakia',
'Syrian Arab Republic':'Syria',
'Hong Kong SAR, China':'Hong Kong',
'Kyrgyz, Republic':'Kyrgyzstan',
'Macedonia, FYR':'Macedonia',
'Korea, Dem. People’s Rep.':'North Korea',
'St. Lucia':'Saint Lucia',
'Timor-Leste':'Timor Leste',
'Brunei Darussalam':'Brunei',
'Egypt, Arab Rep.':'Egypt',
'United Kingdom':'Great Britain',
'Korea, Rep.':'South Korea',
'Virgin Islands (U.S.)':'United States Virgin Islands',
'Yemen, Rep.':'Yemen'}

# Short for loop for the change
for key in change:
    value = change[key]
    print("Changing \"{}\" into \"{}\"".format(key, value))
    melted_gdp.loc[melted_gdp['Team']==key,'Team']=value

### After changing the GDP dataframe we compare dataframes again
Just to check how good we got, let's see what is the status now of country names.

In [None]:
#Olympic countries didn't change
#olympic_countries = medalsTeamsParticipants['Team'].unique()
gdp_countries = melted_gdp['Team'].unique()

In [None]:
np.setdiff1d(olympic_countries,gdp_countries)

### Situation is better now, although some of these countries should be in the list of GDPs.
For example Serbia and Montenegro broke apart a bit later. We should change period we covered or get into problems with no data.
Anyway we move on. We change year value in GDP dataframe into int.

In [None]:
melted_gdp['Year']=melted_gdp['Year'].apply(int)

### Now the merge!
We merged list of medals and teams with GDP values. The merge is done based on features 'Team' and 'Year'. 

In [None]:
total_data = pd.merge(medalsTeamsParticipants, melted_gdp, on=['Team','Year'])

### This is how the data looks now
We have all the previously added data like number of events on which country participated and we added GDP. 

In [None]:
total_data.head()

### Country field is not needed anymore
So we drop it.

In [None]:
total_data.drop(['Country'], axis = 1, inplace = True)

### Let's get the population
We get the population of countries from CSV file and check the head if dataframe.

In [None]:
#population = pd.read_csv('country_population.csv')
population = pd.read_csv('../input/world-bank-data-1960-to-2016/country_population.csv')
population.head()

### Good thing is that we already have columns list that we need.
We use it to get features (columns) with right years from GDP dataframe. And we use it again now.

In [None]:
pop_work = population[years_string]

### Now we repeat process
1. we check if there are countries that are in list of medals and events
2. We melt the population data
3. we merge population data with our "total" data

In [None]:
olympic_countries = total_data['Team'].unique()
pop_countries = pop_work['Country Name'].unique()
print ("If the list is empty then all countries in total data have population data")
print("And the list is: ",np.setdiff1d(olympic_countries,gdp_countries))

### Bingo!
We are done with countries :).

Now we melt / unpivot the dataframe.

In [None]:
melted_pop = pop_work.melt(id_vars=['Country Name'], value_vars=years_string[1:], var_name='Years')
melted_pop.columns=['Team','Year','Population']
melted_pop['Year']=melted_pop['Year'].apply(int)
melted_pop.head()

### Merge ...
... and finalize the dataframe.

In [None]:
total_data = pd.merge(total_data, melted_pop, on=['Team','Year'])

### This is how masterpiece data looks like
Let's check what we got!

In [None]:
total_data[total_data['Team']=='Croatia']

### We add few more calculated features
These features should help us how mighty sporty a country is.

In [None]:
total_data['GDPPercapita']=total_data['GDP']/total_data['Population']
total_data['MedalsPer100kcapita']=total_data['Medals']/(total_data['Population']/100000)
total_data['MedalPointsPer100kcapita']=total_data['Medal_pts']/(total_data['Population']/100000)
total_data['GDPPerMedal']=(total_data['GDP']/total_data['Medals'])#.apply(lambda x: '{:,.2f}'.format(float(x))) 
total_data['GDPPerMedalPoints']=(total_data['GDP']/(total_data['Medal_pts']))#.apply(lambda x: '{:,.2f}'.format(float(x))) 
total_data['ParticipantsPer100kPop']=total_data['UniqueParticipants']/(total_data['Population']/100000)

### One morecheck
How the dataframe looks like.

In [None]:
total_data[total_data['Team']=='United States']

### To avoid messing up the masterpiece we copy the dataframe
To be able to roll back to our "final" data we copy the dataframe.

In [None]:
working_data = total_data.copy()
working_data.info()

### First thing to check is if richness influences results on Olympic games
Let's start with two previously created functions: returnDFOfBestYears & drawGraphBestYears. Idea is to see who has best year (meaning lowest GPD per medal points) when, and roughly see how much.

In [None]:
bestyearsS = returnDFOfBestYears(working_data, "summer",10,'GDPPerMedalPoints', ['Team', 'Year', 'GDPPerMedalPoints'], True)
drawGraphBestYears(bestyearsS, 'summer', 'GDPPerMedalPoints', 1000000000)

In [None]:
bestyearsW = returnDFOfBestYears(working_data, "winter",5,'GDPPerMedalPoints', ['Team', 'Year', 'GDPPerMedalPoints'], True)
drawGraphBestYears(bestyearsW, 'winter', 'GDPPerMedalPoints', 1000000000)

### Comparing to Medal points graphs we see difference
In case of medal points graphs best years for countries are latter years (closer to 2016) whereas on the graph "Billions of GDP per medal point" best years are earlier years. This is explainable with growth of GDP over years. For example let's check GDP for Bulgaria (the lowest number of USD billions per medal point in summer games - 0,3), and Norway (the lowest number of USD billions per medal point in winter games - 2,59) and compare it with number of medals.

Being lazy as I am, first we do a function. The function is parameterized so it can be used later.

In [None]:
def drawComparison(
    country
    , palette
    , firstData
    , firstDataLabel
    , secondData
    , secondDataLabel
    , firstDataDivide = 1
    , secondDataDivide = 1):
    #country = country[0].upper() + country.lower()[1:]
    tempData = resultsForAllYearsMin5Pts(working_data, 'All')
    dataFrame = tempData.loc[tempData['Team']==country]
    palette = palette.lower()
    g, ax1 = plt.subplots(figsize=(20, 7))
    sns.barplot(
        dataFrame['Year'].apply(lambda x: str(x)), 
        dataFrame[firstData].apply(lambda x: x / firstDataDivide), 
        ax=ax1, palette = palette)
    ax1.axhline(0, color="k", clip_on=True)
    ax2 = ax1.twinx()
    sns.lineplot(
        dataFrame['Year'].apply(lambda x: str(x)), 
        dataFrame[secondData].apply(lambda x: x / secondDataDivide), 
        linewidth = 4, marker = 's', markersize = 12,
        ax=ax2)
    ax2.grid(False)
    ax1.set_title('Comparison of '+ firstDataLabel +' and '+ secondDataLabel +' through years for ' + country)
    ax1.set_ylabel(firstDataLabel)
    ax2.set_ylabel(secondDataLabel)   


### Now we show comparison for Bulgaria!

In [None]:
drawComparison(
    "Bulgaria",
    "Rocket", 
    "Medal_pts", 
    "Medal points", 
    "GDP", 
    "GDP (billion USD)", 
    1, 
    1000000000)

### Now we show comparison for Norway!

In [None]:
drawComparison(
    "Norway",
    "seismic", 
    "Medal_pts", 
    "Medal points", 
    "GDP", 
    "GDP (billion USD)", 
    1, 
    1000000000)

### Conclusion about GDP per medal
Well it is obviously clear why in best years Bulgaria got low on GDP per medal, they got relatively lot of medals on summer games and, in same time, their GDP was very low. Interestingly their GDP grew from 2000 onward, but their number of medal points was decreasing. They are not going to repeat their success from 1996 any time soon. Norway, on the other hand, had relatively stable results in winter games but their GDP grew too. So, they will not be able to repeat their success from 1994. 

Anyway, it still remains; Bulgaria had the best result in summer and Norway in winter games. So, there are new champions?

### Let's see how countries' results relate to population!
We already added two features in data set: ParticipantsPer100kPop (number of participants on a game per 100.000 habitants of the country) & MedalPointsPer100kcapita (how many medal points got a country per 100.000 habitants).

For this we will use the existing (already used) functions: returnDFOfBestYears & drawGraphBestYears.

### How sporty a country is?
The idea behind this is to check how many Olympians a country has but per capita. Or, to get nicer number, per 100.000 habitants. This number should show us a dedication on a country side. Having said that, there are a limited number of events or Olympians a country can send. Therefore this point of view might suit small countries more than large countries. But, nevertheless we will check it. First we do summer.

### Olympians per capita for Summer Olympic games.

In [None]:
theSportiestSummer = returnDFOfBestYears(working_data, 
                                         "summer",
                                         5,
                                         'ParticipantsPer100kPop', 
                                         ['Team', 'Year', 'ParticipantsPer100kPop'], 
                                         False)
drawGraphBestYears(theSportiestSummer, 'summer', 'ParticipantsPer100kPop')

### And now for Winter Olympic Games

In [None]:
theSportiestWinter = returnDFOfBestYears(working_data, 
                                         "winter",
                                         5,
                                         'ParticipantsPer100kPop', 
                                         ['Team', 'Year', 'ParticipantsPer100kPop'], 
                                         False)
drawGraphBestYears(theSportiestWinter, 'winter', 'ParticipantsPer100kPop')

### Now we know the sportiest countries 
It is Slovenia and Latvia for Winter Games, and New Zealand for Summer Games. Now I want to see couple of more details. First of all, what are the averages of ParticipantsPer100kPop in best years, what are ParticipantsPer100kPop results over years for the mentioned countries, and how number of Olympians and habitants changed over years for these three countries.

In [None]:
print("Looking only at the best years the average number of participants in summer per 100.000 habitants of a country is: ", round(theSportiestSummer.ParticipantsPer100kPop.mean(),2), " with standard deviation: ", round(theSportiestSummer.ParticipantsPer100kPop.std(),2))
print("Looking only at the best years the average number of participants in winter per 100.000 habitants of a country is: ", round(theSportiestWinter.ParticipantsPer100kPop.mean(),2), " with standard deviation: ", round(theSportiestWinter.ParticipantsPer100kPop.std(),2))

### Next we check results for the three countries:

In [None]:
theSportiestCountries = ['Latvia','Slovenia', 'New Zealand']
for c in theSportiestCountries:
    print("ParticipantsPer100kPop over years for ",c)
    tempResult = working_data.loc[working_data['Team'] == c][['Year','Season','ParticipantsPer100kPop']]
    display(tempResult)
    print('The average values are:')
    print('For summer: ', tempResult.loc[tempResult['Season'] == 'Summer']['ParticipantsPer100kPop'].mean())
    print('For winter: ', tempResult.loc[tempResult['Season'] == 'Winter']['ParticipantsPer100kPop'].mean())
    print()


### Averages, averages...
In average New Zealand has almost 4 times more summer games participants per capita then average of best years of all teams. Slovenia has 3 times for summer and 2 times for winter. At the end, Latvia has 2 times for winter.

Now lets check how did numbers change over years.

In [None]:
for c in theSportiestCountries:
    drawComparison(
        c,
        "seismic", 
        "UniqueParticipants", 
        "Unique participants", 
        "Population", 
        "Population (in millions)", 
        1, 
        1000000)

### So for Latvia is getting easier
Since population is dropping in Latvia, they will, probably, get even better in number of participants per capita.

### We saw which countries are the sportiest
Not sure if that is even english word, but I guess you know what I mean. Number of participants show us how sporty a country is. Now, one last details is left for analysis: how sporty AND successfull countries are. For this, we will compare population with medal points. Good thing is we have most of the functionality and data ready.

First we do summer!

In [None]:
bestyearsS = returnDFOfBestYears(working_data, "summer",5,'MedalPointsPer100kcapita', ['Team', 'Year', 'MedalPointsPer100kcapita'], False)
drawGraphBestYears(bestyearsS, 'summer', 'MedalPointsPer100kcapita', 1)

### ...and now winter!

In [None]:
bestyearsW = returnDFOfBestYears(working_data, "Winter",5,'MedalPointsPer100kcapita', ['Team', 'Year', 'MedalPointsPer100kcapita'], False)
drawGraphBestYears(bestyearsW, 'Winter', 'MedalPointsPer100kcapita', 1)

### Let's check actuall data
Let's see how the number of medal points per 100.000 population changes for these two countries.

In [None]:
theSportiestCountries = ['Jamaica', 'Norway']
for c in theSportiestCountries:
    print("MedalPointsPer100kcapita over years for ",c)
    tempResult = working_data.loc[working_data['Team'] == c][['Year','Season','MedalPointsPer100kcapita','Medal_pts', 'Population']]
    display(tempResult)
    print('The average values are:')
    print('For summer: ', tempResult.loc[tempResult['Season'] == 'Summer']['MedalPointsPer100kcapita'].mean())
    print('For winter: ', tempResult.loc[tempResult['Season'] == 'Winter']['MedalPointsPer100kcapita'].mean())
    print()

### In Summer Olympic Games the best is Jamaica and in Winter Norway
Jamaica has in average 0.64 medal points in summer games per 100.000 population. With this result they are the best. Their best score is 0.9 - almost a medal per 100.000 population.
Norway, on other hand, wins more then a medal per 100.000 population in Winter Games. Their best score is 1.16 medal points 

### Let's summarize:
1. If we look number of medals (medal points) then Germany is most succesfull in Winter games and United States in Summer Games
2. When we look at parameter which shows how many points got a team per event then the best are Netherland in Winter Olympics, and Ethiopia in Summer Olympics
3. If we look at stability of results then France is most stable in Summer and Norway in Winter Games
4. If we look number of medal points in relation to GDP, the best country in Summer Games is Bulgaria and in Winter it is Norway.
5. If we look number of participants compared to population of a country then the best are Slovenia and Latvia for Winter Games, and New Zealand for Summer Games
6. Finally, if we look number of medal points per 100.000 of population then the best teams are Jamaica in Winter Games and Norway in Winter