<h1 style="color:aqua">
Women International Football Starter EDA
</h1>

![Image](https://media.giphy.com/media/pdAiipxDMCHni/giphy.gif)

<strong><span style="color:red">If you like my work, please don't forget to upvote this notebook!</span></strong>

<strong><span style="color:blue"> If you don't, atleast leave a comment on what should I do to improve it!</span></strong>

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

import plotly
import plotly.express as px
import plotly.graph_objs as go
import plotly.figure_factory as ff

warnings.simplefilter("ignore")
plt.style.use("classic")

In [None]:
# Read in the data
data = pd.read_csv("../input/womens-international-football-results/results.csv")
data.head()

# EDA
Let's start with exploratory data analysis by following through all the columns and studying each one of them.

## Check Null Values
First, let's start by checking the presence of Null values in the dataset.

In [None]:
data.isna().sum()

So, the data has no null values in it, now we can proceed with viz.

## What are the most popular Home Teams of the Players?
We will start by looking at the most popular home teams of players.

In [None]:
names = list(dict(data['home_team'].value_counts()).keys())[:15]
values = data['home_team'].value_counts().tolist()[:15]

fig = go.Bar(x = names,
            y = values,
            marker = dict(color = 'rgba(255, 0, 0, 0.5)',
                         line=dict(color='rgb(0,0,1)',width=1.5)),
            text = names)

layout = go.Layout()
fig = go.Figure(data = fig, layout = layout)
fig.update_layout(title_text='Top-15 Home Teams of Players')
fig.show()

## What are the most popular Away Teams of the Players?
Similarly, we look at the most popular away teams of players.

In [None]:
names = list(dict(data['away_team'].value_counts()).keys())[:15]
values = data['away_team'].value_counts().tolist()[:15]

fig = go.Bar(x = names,
            y = values,
            marker = dict(color = 'rgba(0, 255, 0, 0.5)',
                         line=dict(color='rgb(0,0,50)',width=1.5)),
            text = names)

layout = go.Layout()
fig = go.Figure(data = fig, layout = layout)
fig.update_layout(title_text='Top-15 Away Teams of Players')
fig.show()

## What is the Statistic of Home Scores?
Let's look at the statistics of the Home Scores of all players.

In [None]:
sns.distplot(data['home_score'], bins=15)
plt.xlabel("Home Score")
plt.ylabel("Density")
plt.title(f"Home Score Distribution [ \u03BC: {data['home_score'].mean():.2f} ]")
plt.show()

In [None]:
# Also the plotly figure
fig = ff.create_distplot(
    hist_data=[data['home_score'].tolist()],
    group_labels=['Home Score'],
    colors=['#ff00e1'],
    show_hist=False,
    show_rug=False,
)

fig.layout.update({'title':f"Home Score Distribution<br>[Average Score: {data['home_score'].mean():.2f} ]"})

fig.show()

## What is the Statistic of Away Scores?
Let's look at the statistics of the Away Scores of all players.

In [None]:
sns.distplot(data['away_score'], bins=15, color='red')
plt.xlabel("Away Score")
plt.ylabel("Density")
plt.title(f"Away Score Distribution [ \u03BC: {data['away_score'].mean():.2f} ]")
plt.show()

In [None]:
# Also the plotly figure
fig = ff.create_distplot(
    hist_data=[data['away_score'].tolist()],
    group_labels=['Away Score'],
    colors=['#00BFFF'],
    show_hist=False,
    show_rug=False,
)

fig.layout.update({'title':f"Away Score Distribution<br>[Average Score: {data['away_score'].mean():.2f} ]"})

fig.show()

## Comparison of Home and Away Scores
Let's draw a few plots to see the comparison between home scores and away scores

In [None]:
# Also the plotly figure
fig = ff.create_distplot(
    hist_data=[data['away_score'].tolist(), data['home_score'].tolist()],
    group_labels=['Away Score', 'Home Score'],
    colors=['#00008B', '#DC143C'],
    show_hist=False,
    show_rug=False,
)
total_avg = (data['away_score'].mean() + data['home_score'].mean()) / 2

fig.layout.update({'title':f"Complete Score Distribution<br>[Average Score: {total_avg:.2f} ]"})

fig.show()

We can see that the away scores starts at much higher than home scores.

In [None]:
data.head()

## What are the most popular Tournaments ?
Let's look at the tournaments the players have participated in.

In [None]:
names = list(dict(data['tournament'].value_counts()).keys())
values = data['tournament'].value_counts().tolist()

fig = go.Bar(x = names,
             y = values,
             marker = dict(color = 'rgba(0, 0, 255, 0.5)',
                         line=dict(color='rgb(0,0,50)',width=1.5)),
             text = names)

layout = go.Layout()
fig = go.Figure(data = fig, layout = layout)
fig.update_layout(title_text='All Tournaments by number of players')
fig.show()

We observe an important thing here which is that `UEFA Euro qualification` has more than twice then number than the second most-popular tornament (which is `Algarve Cup`).