# Starcraft 2 Pro Players compare EDA

![Starcraft 2 Pro Players compare EDA](https://game4u.co.za/wp-content/uploads/SCLOFTVbanner.jpg)

## Table Of Contents:
* [Goal](#first-bullet)
* [Load and clean data](#second-bullet)
* [Historical score](#third-bullet)
* [Opposing race](#fourth-bullet)
* [Each other](#fifth-bullet)
* [TODO](#sixth-bullet)

## Goal <a class="anchor" id="first-bullet"></a>

The goal of this analysis is to compare the main indicators of the two pro players and to understand whether it is possible to conclude from them about the victory or the advantage of one of the players.

## Load and clean data <a class="anchor" id="second-bullet"></a>

Let's load all libraries necessary for us and. And also check for empty values.

In [None]:
# For autoreloading modules
%load_ext autoreload
%autoreload 2
# For notebook plotting
%matplotlib inline

# Standard libraries
import os
import numpy as np
import pandas as pd

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
from pdpbox import pdp
from plotnine import *
from pandas_summary import DataFrameSummary
from IPython.display import display
from datetime import datetime

In [None]:
KAGGLE_DIR = '../input/'
data = pd.read_csv(KAGGLE_DIR + 'sc2-matches-history.csv')

Display first five rows and last 5 rows.

In [None]:
print('First 5 rows: ')
display(data.head())

print('Last 5 rows: ')
display(data.tail())

In [None]:
data.describe()

Now we see that in the player_2 column there is an empty value. Let's look at it and delete it.

In [None]:
display(data[data['player_2'].isnull()])

In [None]:
data.drop(85860, inplace=True)

Chek it again.

In [None]:
display(data[data['player_2'].isnull()])

For comparison, we take two Ukrainian pro players [Bly](https://liquipedia.net/starcraft2/Bly) and [Kas](https://liquipedia.net/starcraft2/Kas) and form a data frame on them.

![Bly vs Kas](https://i.ytimg.com/vi/qQWacD67slk/maxresdefault.jpg)


In [None]:
all_data = data[(data['player_1']=='Bly') | (data['player_1']=='Kas')]

display(all_data.head(10))

display(all_data.tail(10))

We will be interested only in games in LotV, since all games are played on it.

In [None]:
all_data = all_data[all_data['addon']=='LotV']

Check the data types of our columns. In the date_ column, it is important for us that there be a date. This is necessary to analyze the latest matches.

In [None]:
all_data.dtypes

In [None]:
all_data['match_date'] =pd.to_datetime(all_data['match_date'],dayfirst=False)

In [None]:
all_data.dtypes

## Historical score <a class="anchor" id="third-bullet"></a>

Let's look at how well the players play historically in the LotV.

In [None]:
data_df = (all_data.melt('player_1')
       .groupby(['player_1','variable'])['value']
       .value_counts()
       .unstack([1,2], fill_value=0)
       .rename_axis((None, None), 1))

In [None]:
data_df['player_1_match_status'].plot(kind='bar', stacked=True)

**Сonclusion:**

Bly played a lot more matches than Kas in LotV. But from this graph is not quite clear winrate. Need to see the winrate.

In [None]:
all_state_pcts = (all_data.melt('player_1_match_status')
       .groupby(['player_1_match_status','variable'])['value']
       .value_counts()
       .unstack([1,2], fill_value=0)
       .rename_axis((None, None), 1)).apply(lambda x:
                                                 100 * x / float(x.sum()))

In [None]:
all_state_pcts = all_state_pcts['player_1'].transpose()

In [None]:
all_state_pcts = all_state_pcts[['[winner]','[loser]']]

In [None]:
all_state_pcts.plot(kind='bar', stacked=True)

**Сonclusion:**

Now we can conclude that Bly played twice as many games as Kas in the LotV, but at the same time both players' winrate is very close.

Additionally, we’ll see the last 5 games to draw a conclusion on the current form of both players.

In [None]:
data_df_5 = (all_data.groupby('player_1').head(5).melt('player_1')
       .groupby(['player_1','variable'])['value']
       .value_counts()
       .unstack([1,2], fill_value=0)
       .rename_axis((None, None), 1))

In [None]:
data_df_5['player_1_match_status'].plot(kind='bar', stacked=True)

**Сonclusion:**

The last 5 games, Kas did not lose at the same time, Bly has two defeats. We can assume that Kas is in better form.

## Opposing race <a class="anchor" id="fourth-bullet"></a>

In Starcraft 2, it often happens that a player can play very well against some particular race. Сheck how both players play against the opponent's race.

Let's start with Bly

In [None]:
bly_opposing_race = all_data[(all_data['player_1']=='Bly') & (all_data['player_2_race']=='T')]

display(bly_opposing_race.head(10))

In [None]:
bly_pcts = (bly_opposing_race.melt('player_1_match_status')
       .groupby(['player_1_match_status','variable'])['value']
       .value_counts()
       .unstack([1,2], fill_value=0)
       .rename_axis((None, None), 1)).apply(lambda x:
                                                 100 * x / float(x.sum()))

In [None]:
bly_pcts['player_1'].plot.pie(y='Bly', autopct='%1.1f%%',figsize=(7, 7))

Just see the last 5 games

In [None]:
bly_pcts_5 = (bly_opposing_race.groupby('player_1').head(5).melt('player_1_match_status')
       .groupby(['player_1_match_status','variable'])['value']
       .value_counts()
       .unstack([1,2], fill_value=0)
       .rename_axis((None, None), 1)).apply(lambda x:
                                                 100 * x / float(x.sum()))

In [None]:
bly_pcts_5['player_1'].plot.pie(y='Bly', autopct='%1.1f%%',figsize=(7, 7))

Now let's see the same thing with Kas.

In [None]:
kas_opposing_race = all_data[(all_data['player_1']=='Kas') & (all_data['player_2_race']=='Z')]

display(kas_opposing_race.head(10))

In [None]:
kas_pcts = (kas_opposing_race.melt('player_1_match_status')
       .groupby(['player_1_match_status','variable'])['value']
       .value_counts()
       .unstack([1,2], fill_value=0)
       .rename_axis((None, None), 1)).apply(lambda x:
                                                 100 * x / float(x.sum()))

In [None]:
kas_pcts['player_1'].plot.pie(y='Kas', autopct='%1.1f%%',figsize=(7, 7))

In [None]:
kas_pcts_5 = (kas_opposing_race.groupby('player_1').head(5).melt('player_1_match_status')
       .groupby(['player_1_match_status','variable'])['value']
       .value_counts()
       .unstack([1,2], fill_value=0)
       .rename_axis((None, None), 1)).apply(lambda x:
                                                 100 * x / float(x.sum()))

In [None]:
kas_pcts_5['player_1'].plot.pie(y='Kas', autopct='%1.1f%%',figsize=(7, 7))

**Сonclusion:**

Winrate in the last 5 games of the players' wallpaper is the same.  It is noticeable that Bly plays well enough against the Terrans and has a high win rate.

## Each other <a class="anchor" id="fifth-bullet"></a>

A very important indicator is the history of the games of the players with each other. Perhaps some of the players are more in a position to play against another.

We’ll look at Blyu’s statistics, Kas’s statistics are simply inverted data.

In [None]:
against_data = all_data[(all_data['player_1']=='Bly') &(all_data['player_2']=='Kas')]

In [None]:
df = (against_data.melt('player_1_match_status')
       .groupby(['player_1_match_status','variable'])['value']
       .value_counts()
       .unstack([1,2], fill_value=0)
       .rename_axis((None, None), 1))

In [None]:
state_pcts = df['player_1'].groupby(level=0).apply(lambda x:
                                                 100 * x / float(df['player_1'].sum()))

In [None]:
state_pcts.plot.pie(y='Bly', autopct='%1.1f%%',figsize=(7, 7))

In [None]:
df_5 = (against_data.nlargest(5, 'match_date').melt('player_1_match_status')
       .groupby(['player_1_match_status','variable'])['value']
       .value_counts()
       .unstack([1,2], fill_value=0)
       .rename_axis((None, None), 1))

In [None]:
state_pcts_5 = df_5['player_1'].groupby(level=0).apply(lambda x:
                                                 100 * x / float(df_5['player_1'].sum()))

In [None]:
state_pcts_5.plot.pie(y='Bly', autopct='%1.1f%%', figsize=(7, 7))

**Сonclusion:**

It is absolutely clear that Bly, both historically and in the last 5 matches, is stronger than his opponent.

### Final conclusion
As can be seen from the analysis, Bly is a more experienced LotV player than Kas. Historically and after the game, Bly beat his opponent more often. Bly also has a good win rate against his opponent's race. As a result, in the confrontation of Bly and Kas, I prefer the first.

## TODO <a class="anchor" id="sixth-bullet"></a>

1. Make features to view information on any two players.
1.  It is necessary to take into account when analyzing the match score
1.  Add feature - patch. The balance of the game depends on the patch very much and how the players adapt to it
1.  Build a model for forecasting