# 2017 Gymnastics World Championships - Men's Results Dataset

### 1.1 Exploring How Many Medals Each Country Won

In [1]:
'''
2017 Artistic Gymnastics World Championships - Men's Results Dataset 
Number of Medals for Each Country

Creating dataframe that holds countries and their corresponding medals of each color.
Then, creating dataframe that counts the medals won by each country
'''

from pandas import DataFrame, Series

def create_dataframe():
    countries = ['China', 'Croatia', 'Great Britain', 'Greece', 'Israel', 'Japan', 'Korea', 'Netherlands', 'Russian Fed.',
                'Ukraine', 'United States']
    gold = [2, 1, 1, 1, 0, 2, 0, 0, 0, 0, 0]
    silver = [1, 0, 0, 0, 1, 0, 0, 1, 2, 2, 0]
    bronze = [2, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1]
    
    medal_counts_df = DataFrame({'country_name': countries, 'gold': gold, 'silver': silver, 'bronze': bronze},
                               columns = ['country_name', 'gold', 'silver', 'bronze'])
    return medal_counts_df
print(create_dataframe())

In this dataset, I am interested in not only the men's All-Around results from the 2017 World Championships in gymnastics, but also compare how the men performed in other disciplines to get a better picture of what countries were consistent in gymnastics and how these events translate into All-Around scores. In gymnastics, All-Around is similar to decathlon by nature, as all routines are medal events, but all-around event requires gymnasts to perform all six routines to get a combined score. In international competitions, gymnasts perform in a qualification round at first, where they try to qualify for a medal round in all-around and other exercise events. The gymnasts who perform in all-around and place among the first 24 (and in Olympics, for instance, are among the top 2 for their national team) qualify to the all-around finals, and then the scores from other disciplines determine who qualifies for the finals for each individual event. So, in the finals, 24 gymnasts contend for medals in all-around and 8 in each exercise. There are no team competitions in the World Championships, so I was interested to see how the countries did as teams in medals and placings.

### 1.2 Getting Average Medal Count for Countries who Won One Medal or More

In [3]:
'''
Get Average Medal Count
'''
import numpy as np
from pandas import DataFrame, Series

def create_average_count():
    countries = ['China', 'Croatia', 'Great Britain', 'Greece', 'Israel', 'Japan', 'Korea', 'Netherlands', 'Russian Fed.',
                'Ukraine', 'United States']
    gold = [2, 1, 1, 1, 0, 2, 0, 0, 0, 0, 0]
    silver = [1, 0, 0, 0, 1, 0, 0, 1, 2, 2, 0]
    bronze = [2, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1]
    
    medal_count = {'country_name': countries,
                  'gold': Series(gold),
                  'silver': Series(silver),
                  'bronze': Series(bronze)}
    medal_count_df = DataFrame(medal_count)
    
    average_medal_count = medal_count_df[['gold', 'silver', 'bronze']].apply(np.mean)
    return average_medal_count
print(create_average_count())

There are 11 countries who won at least one medal. Interestingly, they all share 7 gold medals, 7 silver medals, and 7 bronze medals! 

### 1.3 Getting Average Bronze Medal Count for Countries who Won at least One Gold Medal

In [4]:
'''
Get Average Bronze Medal Count for Countries who Won At Least One Gold
'''
import numpy as np
from pandas import DataFrame, Series

def create_average_count():
    countries = ['China', 'Croatia', 'Great Britain', 'Greece', 'Israel', 'Japan', 'Korea', 'Netherlands', 'Russian Fed.',
                'Ukraine', 'United States']
    gold = [2, 1, 1, 1, 0, 2, 0, 0, 0, 0, 0]
    silver = [1, 0, 0, 0, 1, 0, 0, 1, 2, 2, 0]
    bronze = [2, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1]
    
    medal_count = {'country_name': countries,
                  'gold': Series(gold),
                  'silver': Series(silver),
                  'bronze': Series(bronze)}
    medal_count_df = DataFrame(medal_count)
    average_bronze_at_least_one_gold = np.mean(medal_count_df.bronze[medal_count_df.gold > 0])
    return average_bronze_at_least_one_gold
print(create_average_count())

There were 5 nations who won at least one gold. I wanted to see if they won anything else in order to understand whether any team "dominated" the competition. Among these five countries, there were 3 bronze medals.

### 1.4 Getting Average Silver Medal Count for Countries who Won at least One Gold Medal

In [5]:
'''
Get Average Silver Medal Count for Countries who Won At Least One Gold
'''
import numpy as np
from pandas import DataFrame, Series

def create_average_count():
    countries = ['China', 'Croatia', 'Great Britain', 'Greece', 'Israel', 'Japan', 'Korea', 'Netherlands', 'Russian Fed.',
                'Ukraine', 'United States']
    gold = [2, 1, 1, 1, 0, 2, 0, 0, 0, 0, 0]
    silver = [1, 0, 0, 0, 1, 0, 0, 1, 2, 2, 0]
    bronze = [2, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1]
    
    medal_count = {'country_name': countries,
                  'gold': Series(gold),
                  'silver': Series(silver),
                  'bronze': Series(bronze)}
    medal_count_df = DataFrame(medal_count)
    average_silver_at_least_one_gold = np.mean(medal_count_df.silver[medal_count_df.gold > 0])
    return average_silver_at_least_one_gold
print(create_average_count())

Among the five countries who won at least one gold medal, only one won the silver. It was China!

### 1.5 Getting Average Gold Medal Count for Countries who Won at least One Gold Medal

In [6]:
'''
Get Average Gold Medal Count for Countries who Won At Least One Gold
'''
import numpy as np
from pandas import DataFrame, Series

def create_average_count():
    countries = ['China', 'Croatia', 'Great Britain', 'Greece', 'Israel', 'Japan', 'Korea', 'Netherlands', 'Russian Fed.',
                'Ukraine', 'United States']
    gold = [2, 1, 1, 1, 0, 2, 0, 0, 0, 0, 0]
    silver = [1, 0, 0, 0, 1, 0, 0, 1, 2, 2, 0]
    bronze = [2, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1]
    
    medal_count = {'country_name': countries,
                  'gold': Series(gold),
                  'silver': Series(silver),
                  'bronze': Series(bronze)}
    medal_count_df = DataFrame(medal_count)
    average_gold_at_least_one_gold = np.mean(medal_count_df.gold[medal_count_df.gold > 0])
    return average_gold_at_least_one_gold
print(create_average_count())

Among the five countries who won at least one gold medal, China and Japan won two gold medals apiece!

### 1.6 Get Number of Placement Points for Each Country

As Top 8 gymnasts in each event get prize money from the governing body of the sport,
I decided to compare how countries would have performed if I counted not just the medals,
but also the placements. So, a Gold would earn 10, a Silver 8, and a Bronze 6. The next five
individuals (4th through 8th) would get 5, 4, 3, 2, 1 points, respectively. This would also let me get the idea on which countries performed well consistently. 

In [8]:
'''
Get Number of Placement Points for Each Country.
'''
import numpy as np
from pandas import DataFrame, Series

def create_placement_points_count():
    countries = ['Armenia', 'Brazil', 'Chile', 'China', 'Croatia', 'Cuba', 'France', 'Great Britain', 'Greece', 'Guatemala', 
                 'Israel', 'Japan', 'Korea', 'Netherlands', 'Germany', 'Russian Fed.', 'Ukraine', 'United States', 
                 'Turkey', 'Romania', 'Switzerland', 'Slovenia']
    
    gold = [0, 0, 0, 2, 1, 0, 0, 1, 1, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    silver = [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 2, 2, 0, 0, 0, 0, 0]
    bronze = [0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0]
    fourth = [0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0]
    fifth = [1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
    sixth = [0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0]
    seventh = [0, 1, 0, 0, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0]
    eighth = [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1]
    
    placement_counts = {'country_name': Series(countries),
                       'gold': Series(gold),
                       'silver': Series(silver),
                       'bronze': Series(bronze),
                       'fourth': Series(fourth),
                       'fifth': Series(fifth),
                       'sixth': Series(sixth),
                       'seventh': Series(seventh),
                       'eighth': Series(eighth)}
    placement_counts_df = DataFrame(placement_counts)
    
    placement_scores = placement_counts_df[['gold', 'silver', 'bronze', 'fourth', 'fifth', 'sixth', 'seventh', 'eighth']]
    '''
    assigning points to each top-eight placing or medal
    '''
    points = np.dot(placement_scores, [10, 8, 6, 5, 4, 3, 2, 1])
    '''
    counting points earned by each country
    '''
    worlds_points = {'country_name': Series(countries),
                    'points': Series(points)}
    '''
    creating the worlds points dataframe that holds the points scored by each country
    '''
    worlds_points_df = DataFrame(worlds_points)
    print(worlds_points_df)
create_placement_points_count()    

There were 22 countries who placed at least one gymnast in top 8 in either all-around or other individual events. As expected, there were countries who did not win a medal but did have top 8 placings. I wanted to explore if a country that only had placements but no medals could actually out-point the other countries who did win medals, thereby winning the points title. As expected, this was not the case. Among the countries who did not win a medal but had top-8 placements, Cuba led with 13 points. 

### 2.1 Exploring Data from All-Around Men's Results Dataset 

At first, I print the first five rows of the dataset to see what output I would get.

In [9]:
# print the first five rows of the dataset
import csv
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
%matplotlib inline
gymnastics_df = pd.read_csv('../input/World_Champs_Men\'s_All-Around.csv')
gymnastics_df.head(5)

This output shows the first five rows of the dataset, the exercises and their corresponding scores performed by the all-around overall champion of the competition.

### 2.2 Get the Unique Apparatus List

The individual medal events that are also a part of the all-around event are each separately labeled as 'Apparatus'. In this case, I am looking for a list of all unique events that comprise the all-around event.

In [10]:
# get the unique apparatus list
apparatus_list = gymnastics_df.Apparatus.unique().tolist()
apparatus_list

### 2.3 Getting the Average Score for Each Apparatus

In [11]:
# get average score for each apparatus
total_apparatus_score_df = gymnastics_df[['Apparatus', 'Total']].copy()
total_apparatus_score_df.head(10)

total_apparatus_score_df.sort_values('Apparatus')

mean_apparatus_score = total_apparatus_score_df.groupby(['Apparatus'])['Total'].mean()
print("Mean Apparatus Scores")
mean_apparatus_score

In [21]:
mean_apparatus_score.plot(title='Mean Score per Apparatus')
plt.show()

### 2.4 Get Apparatus Scores for Top 8 All-Around Finishers

As the output shows six events for each gymnast, in order to view the top eight gymnasts in the competition, I would have to use the head() function and indicate that I want the first 48 rows. This gave me the top 8 gymnasts from the all-around event.

In [12]:
'''
Printing the first 48 rows of the dataset.
'''
name_rank_apparatus_df = gymnastics_df[['Name', 'Apparatus', 'Rank']].copy()
name_rank_apparatus_df.head(48)

### 2.5 Difficulty vs Execution

Each 'Apparatus' score is a total score of Difficulty ('Diff') and Execution ('Exec'). In this case, I am interested to see how gymnasts performed their routines based on the difficulty and execution to attain a higher score. That is, is there any relation? Do gymnasts have a 'favorite' event that they specialize in?

In [13]:
diff_vs_exec_df = gymnastics_df[['Diff', 'Exec', 'Apparatus', 'Rank', 'Name']].copy()
diff_vs_exec_df.drop_duplicates()

### 2.6 Get Maximum Difficlty Score for each Apparatus

In this dataset, 'Diff' is the Difficulty Score for each gymnast. The difficulty scores would show which gymnasts performed the most difficult routines. As expected, if executed well, the difficult routines would award more points than a well-executed routine that was not as difficult.

In [14]:
# get maximum difficulty score for each apparatus
diff_apparatus_score_df = gymnastics_df[['Apparatus', 'Diff']].copy()
diff_apparatus_score_df.head(10)

diff_apparatus_score_df.sort_values('Apparatus')

max_apparatus_diff_score = diff_apparatus_score_df.groupby(['Apparatus'])['Diff'].max()
print("Maximum Apparatus Difficulty Scores")
max_apparatus_diff_score

### 2.7 Get Minimum Difficulty Score for each Apparatus

In this case, I wanted to explore the trend with the lowest all-around scorers.

In [15]:
# get minimum difficulty score for each apparatus
min_apparatus_diff_score = diff_apparatus_score_df.groupby(['Apparatus'])['Diff'].min()
print("Minimum Apparatus Difficulty Scores")
min_apparatus_diff_score

### 2.8 Get the Mean Difficulty Score

Getting the mean of the distribution.

In [16]:
# get mean difficulty score for each apparatus
mean_apparatus_diff_score = diff_apparatus_score_df.groupby(['Apparatus'])[('Diff')].mean()
print("Mean Apparatus Difficulty Scores")
mean_apparatus_diff_score

### 2.9 Execution Scores: Same Procedure as Difficulty Scores

In [17]:
# get mean execution score for each apparatus
exec_apparatus_score_df = gymnastics_df[['Apparatus', 'Exec']].copy()
exec_apparatus_score_df.head(10)

exec_apparatus_score_df.sort_values('Apparatus')

mean_apparatus_exec_score = exec_apparatus_score_df.groupby(['Apparatus'])['Exec'].mean()
print("Average Apparatus Execution Scores")
mean_apparatus_exec_score

In [18]:
# get maximum execution score for each apparatus
max_apparatus_exec_score = exec_apparatus_score_df.groupby(['Apparatus'])['Exec'].max()
print("Maximum Apparatus Execution Scores")
max_apparatus_exec_score

In [19]:
# get minimum execution score for each apparatus
min_apparatus_exec_score = exec_apparatus_score_df.groupby(['Apparatus'])[('Exec')].min()
print("Minimum Apparatus Execution Scores")
min_apparatus_exec_score

In [20]:
mean_apparatus_score.plot(title='Mean Score per Apparatus')
plt.show()

### 3.1 Execution Scores Vs. Difficulty Scores

I wanted to see the relationship between the Execution and Difficulty Scores. In these plots, Execution Scores are in blue and on the top, and the Difficulty Scores are in orange and on the bottom.

In [14]:
mean_apparatus_exec_score.plot(title="Execution Score Vs. Difficulty Score")
mean_apparatus_diff_score.plot()
plt.show()

In [15]:
max_apparatus_exec_score.plot()
max_apparatus_diff_score.plot()
plt.show()

In [16]:
min_apparatus_exec_score.plot()
min_apparatus_diff_score.plot()
plt.show()

Based on these graphs, the highest scoring routines depended on a high difficulty and high execution scores. Although there wer routinse with low difficulty scores that scored high execution scores (Zachary Clay, 9.3000 for vault, with difficulty score of 4.8), their ranking was not high. Also, there were gymnasts who did have particularly high scores for one event and lower scores for other events. For example, Ferhat Arican was outside of top-12 in all events but Parrallel Bars, where he had high difficulty and execution scores and did get into top-8 in the individual event as well. 