### Data analysis of Rio de Janeiro olympics 2016
In this project, I will analyze the data of the Rio de Janeiro Olympics using seaborn , matplotlib and plotly 

In [1]:
import pandas as pd # for reading data
import seaborn as sns # for plotting
import plotly.express as px # for plotting
import matplotlib.pyplot as plt # for plotting

ModuleNotFoundError: No module named 'plotly'

First I will read the data using pands

In [None]:
df_olympic = pd.read_csv('../input/120-years-of-olympic-history-athletes-and-results/athlete_events.csv')

Then I will filter the data to include 2016 only and some sports اhall

In [None]:
sports_hall = ['Volleyball' , 'Basketball']
basket_volley = df_olympic[(df_olympic['Year'] == 2016) & df_olympic['Sport'].isin(sports_hall)]

It is important that the athlete's body is perfect, that is, there is consistency between height and weight 

I want to see the relationship between the heights and weights of players in team sports

So , I will use relplot (like to scatter plot) from seaborn to display the weights and heights of players in team ball sports



The following figure shows that there is a strong correlation between height and weight in these sports (there is a good consistency between height and weight)

In [None]:
sns.relplot(x="Height", y="Weight", hue="Sport" , data=basket_volley , s = 80)

There is too much data in the previous graph, which led to data accumulation in this graph

I will use an alternative to scatter plot to solve this problem so that if there is a lot of data in a certain range it is represented by a darker color and if there is less data in a certain range it is represented by a lighter color
The alternative is hexbin plot from seaborn which use hexagons to represent clusters of data points

1- The darker bins indicate larger number of points on x and y 

2- The lighter bins indicate fewer points

the histogram on the top and right axes depict the variance in the features

In [None]:
sns.jointplot(basket_volley.Height, basket_volley.Weight, kind="hex", color="#4CB391" )

I want to use the box plot to see some statistical characteristics (such as the maximum value, minimum value and average) of the heights of the players of the participating teams in basketball

So , I filter my data to include only basketball 

In [None]:
basketball = basket_volley.query("Sport == 'Basketball'")
basketball.head()

### Can we determine the positions of the players based on the height using this boxplot ?

Yes, we can.

You can see in this plot that the shortest player is the Japanese national team player about 163 centimeters
The position of this player is the team's Point Guard (or playmaker) because he is the shortest player

Also, the tallest player is the Chinese national team player, about 218 centimeters
The position of this player is mostly center player


In [None]:
ax = sns.boxplot(x='Team', y='Height', data=basketball)
ax.set_xticklabels(ax.get_xticklabels(),rotation=90)

I want to do the same thing with volleyball

In [None]:
volleyball = basket_volley.query("Sport == 'Volleyball'")
volleyball.head()

We will determine the positions of some players in volleyball sport using the boxplot , as we did in basketball sport

The height of one of the Argentine players is about 164 centimeters, and he is the shortest player in the national team. This player's position is libero(Middle Back) , usually wearing an opposite colored jersey

the tallest player is the Egyptian national team player, about 212 centimeters
The position of this player is middle blocker(sometimes known as the middle hitter) The main role for this player is being the first line of defense against the opposing team’s hits


In [None]:
ax = sns.boxplot(x='Team', y='Height', data=volleyball)
ax.set_xticklabels(ax.get_xticklabels(),rotation=90)

Now I will filter the data on the teams that only got medals in the 2016 Olympics

In [None]:
Medals = ['Gold', 'Bronze', 'Silver']
winners = df_olympic[(df_olympic['Year'] == 2016) & df_olympic['Medal'].isin(Medals)]
winners.head()

### Analyze by describing data

In [None]:
winners.describe()

1- Average age of players is 26.32

2- Age of oldest player is 58 years

3- Age of the youngest player is 15 years

4- Average height of players around 178.37

5- tallest player around 215 cm 

6- shortest player around 140 cm

7- Average weight of players around 73.96kg

In [None]:
winners.describe(include=object)

1- This table shows that the number of teams participating in the Olympics is 98

2- Number of sports 34

3- The most frequently mentioned name in the Olympics is Michael Fred Phelps

4- The most awarded medal in the Olympics is the bronze with 703 medals

In this code, I used a pie chart to plot a percentage that expresses the proportion of Male and Female who won the Rio Olympics

In [None]:
mycolors = ["#FF865E", "#FEE440"]
winners['Sex'].value_counts().plot(kind="pie", autopct="%.2f" , shadow = True , colors = mycolors)

Let's calculate the number of different medals (gold - silver - bronze)

In [None]:
Medals = winners['Medal'].value_counts()

With the same idea as before, I used a donut chart to represent the percentage of medals (gold - silver - bronze) awarded in all sports

In [None]:
explode = (0.05, 0.05 , 0.05)
plt.pie(Medals , labels = ['Gold' , 'Bronze' , 'Silver'] , autopct="%.2f" , explode = explode, pctdistance=0.85 )
centre_circle = plt.Circle((0, 0), 0.70, fc='white')
fig = plt.gcf()
  
# Adding Circle in Pie chart
fig.gca().add_artist(centre_circle)
  
# Adding Title of chart
plt.legend(title = 'Medals' , bbox_to_anchor =(0.75, 1.15), ncol = 3 , loc = 'lower left')
plt.title('the Percentage of each medal awarded at the Rio Olympics' , loc = 'center')
  
# Displaing Chart
plt.show()

I will use this code to calculate how many medals each country has won

and From this list I will only take the five countries with the most medals

In [None]:
winners['Team'].value_counts()

I wanted to know how many gold, silver and bronze medals each of the five countries (which we referred to earlier) had won 

In [None]:
country_winner = winners[winners.Team.isin(["United States", "Germany" , "Great Britain" , "Russia" , "China"])]

I used this code so that I first filter our data to include these countries and then put them into a group or a special table (to count how many different medals each of the five countries اhad won)

In [None]:
five_country = country_winner.groupby(['Team'])['Medal'].value_counts().unstack().fillna(0)
five_country

I used the stacked bar to draw the previous table

In [None]:
stacked_fig = five_country.plot(kind='bar', stacked=True , color=['#17D7A0' , '#FF5403' , '#8E05C2'])

Let's see the ages of the players of these five countries

In [None]:
sns.distplot(country_winner.Age, kde=False)

I am now going to filter our data table to contain only the five most medal winning countries and then calculate the number of medals awarded in each game from this table

In [None]:
sports = country_winner['Sport'].value_counts()
sports

I'll filter our data table again by the five sports with the most medals

And we are using the Seaborn to plot it

In [None]:
sport = ['Swimming' , 'Athletics' , 'Rowing' , 'Gymnastics' , 'Hockey']
sports_medals = country_winner[country_winner['Sport'].isin(sport)]
sns.catplot("Sport", data=sports_medals, aspect=1.5, kind="count", color="b")

I want to make a comparison between the five countries (most winning medals) in the previous five sports to see who excelled in these sports

In [None]:
g = sns.FacetGrid(sports_medals, col="Sport", height=4,aspect=.5,hue='NOC',palette='magma_r')
g.map(sns.countplot, "NOC", order = sports_medals['NOC'].value_counts().index)

I will repeat the previous graph, but this time the comparison will be between Male and Female

In [None]:
s = sns.FacetGrid(sports_medals, col="Sport", height=4,aspect=.5,hue='Sex',palette='magma_r')
s.map(sns.countplot, "Sex", order = sports_medals['Sex'].value_counts().index)

In the end
I just liked that I am drawing a map of the world showing the countries that got the Olympic medals (golden - silver - bronze) You can drag and you can move between the medals through the bar at the bottom of the map

In [None]:
fig = px.choropleth(winners,
                    locations="NOC",
                    color='Medal',
                    hover_name="Team",
                    animation_frame = "Medal",
                    color_continuous_scale=px.colors.sequential.Plasma)
fig.show()