### **Hi everyone,** 
#### *We will analyze the twitch game data.*
#### *We'll do variable engineering*
#### *We will analyze the correlation between the some data.*
#### *We will make some statistical analysis.*
#### *Then we'll make predictions about some games which will be watched in the future.*
#### **Let's go! Let's start it!!!**
#### **Note** : This notebook for the beginners.

![](http://cdn.arstechnica.net/wp-content/uploads/2020/11/getty-twitch-800x533.jpg)

### **What's our variables ?**
##### Rank : Rank in the month (1 - 200)
##### Game :Name of game or category
##### Month : Month in question
##### Year : Year in question
##### Hours_watched : Hours watched on twitch
##### Hours_Streamed : Hours streamed on twitch
##### Peak_viewers : Maximum viewers at one instant
##### Peak_channels : Maximum chanells at one instant
##### Streamers : Amount of streamers who streamed the game
##### Avg_viewers : Average viewers
##### Avg_channels : Average channels
##### Avg_viewer_ratio : Average viewer ratio

In [None]:
## lets import libraries 
import pandas as pd
import numpy as np
import seaborn as sns
from pandas.api.types import CategoricalDtype  ## For transform
from scipy.stats import shapiro  ## For statistical analysis
from scipy.stats import stats    ## For statistical analysis
from datetime import datetime   ## For transform
pd.set_option('display.float_format', lambda x: '%.3f' % x)   ## For some integers
import warnings
warnings.filterwarnings('ignore')
from fbprophet import Prophet    ## For Predict model

In [None]:
twitchdata = pd.read_csv("../input/evolution-of-top-games-on-twitch/Twitch_game_data.csv")
df = twitchdata.copy()

#### Firstly lets meet the dataset

In [None]:
df.head()

In [None]:
df.tail()

In [None]:
df.shape

In [None]:
df.dtypes

In [None]:
df.isnull().sum()

#### Since there is only 1 missing data, I remove it from the data set

In [None]:
df.dropna(inplace = True)

In [None]:
df.Year = df.Year.astype(CategoricalDtype(ordered = True))
df.Month = df.Month.astype(CategoricalDtype(ordered = True))

In [None]:
df.describe().T

In [None]:
df["Game"].value_counts().count()

In [None]:
df["Year"].value_counts()

In [None]:
df.corr()

####  Now, I will make visualization to measure the correlation between variables

In [None]:
sns.scatterplot(x = "Avg_viewers", y = "Streamers", 
                hue = "Year", 
                data = df);

In [None]:
sns.jointplot(x = "Streamers" , y = "Avg_viewers" ,
              color = "g" , 
              kind = "reg" ,
              data = df);

In [None]:
sns.jointplot(x = "Peak_viewers", y = "Peak_channels", 
              color = "g" , 
              kind = "reg" ,
              data = df);

#### Let's analyze the correlation between variables statistically
#### Normal distribution tests :

In [None]:
test_statistics, pvalue = shapiro(df["Streamers"])
print('Test Statistics for Streamers = %.4f, p-value = %.4f' % (test_statistics, pvalue));
test_istatistigi, pvalue = shapiro(df["Avg_viewers"])
print('Test Statistics for Avg_viewers = %.4f, p-value = %.4f' % (test_statistics, pvalue));
test_statistics, pvalue = shapiro(df["Peak_viewers"])
print('Test Statistics for Peak_viewers = %.4f, p-value = %.4f' % (test_statistics, pvalue));
test_statistics, pvalue = shapiro(df["Peak_channels"])
print('Test Statistics for Peak_channels= %.4f, p-value = %.4f' % (test_statistics, pvalue));

#### P-Values are 0.000 . Variables do not show normal distribution. So we'll use a spearman nonparametric test 

In [None]:
test_statistics, pvalue = stats.spearmanr(df["Streamers"],df["Avg_viewers"])

print('Corr Value = %.4f, p-value = %.4f' % (test_statistics, pvalue))

In [None]:
test_statistics, pvalue = stats.spearmanr(df["Peak_viewers"],df["Peak_channels"])

print('Corr Value = %.4f, p-value = %.4f' % (test_statistics, pvalue))

#### Decision : Correlation is statistically significant

####  Now let's prepare our variables for the predict model

In [None]:
df['Date'] = df.apply(lambda row: datetime.strptime(f"{int(row.Year)}-{int(row.Month)}", '%Y-%m'),axis=1)
df.drop(['Month','Year'], axis='columns', inplace = True)

In [None]:
dflol = df[df["Game"] == "League of Legends"]
dfdota2 = df[df["Game"] == "Dota 2"]
dfcsgo = df[df["Game"] == "Counter-Strike: Global Offensive"]

#### I choose 3 games for predict : LoL, Csgo, Dota2
#### I'm subtracting other variables to predict Avg_viewers

In [None]:
x = ["Rank","Game","Hours_watched","Hours_Streamed","Peak_viewers","Peak_channels","Streamers","Avg_channels","Avg_viewer_ratio"]
dflol.drop(x, axis = 1 , inplace = True)
dfdota2.drop(x, axis = 1 , inplace = True)
dfcsgo.drop(x, axis = 1 , inplace = True)

#### Variables columns must be "y" and "ds", because prophet model needs it.

In [None]:
dflol.columns = ["y" , "ds"]
dfdota2.columns = ["y" , "ds"]
dfcsgo.columns = ["y" , "ds"]

In [None]:
dflol.head()

In [None]:
dfdota2.head()

In [None]:
dfcsgo.head()

#### I will create forecast models for LoL,Csgo and Dota2
#### Now let's create our forecast model for the prediction

In [None]:
m = Prophet(interval_width = 0.95 , daily_seasonality = True)
model1 = m.fit(dflol)

In [None]:
future = m.make_future_dataframe(periods = 16 , freq = "M")
forecastlol = m.predict(future)

#### We can see future forecasts in yhat values

In [None]:
forecastlol[["ds" , "yhat"]].iloc[66:80]

#### Now ,let's see graphs

In [None]:
plotlol = m.plot(forecastlol)

In [None]:
plottlol = m.plot_components(forecastlol)

In [None]:
n = Prophet(interval_width = 0.95 , daily_seasonality = True)
model2 = n.fit(dfdota2)

####  Forecasting for Dota2

In [None]:
forecastdota2 = n.predict(future)

In [None]:
forecastdota2[["ds" , "yhat"]].iloc[66:79]

In [None]:
plotdota2 = n.plot(forecastdota2)

In [None]:
plottdota2 = n.plot_components(forecastdota2)

#### Forecasting for Cs:go

In [None]:
b = Prophet(interval_width = 0.95 , daily_seasonality = True)
model3 = b.fit(dfcsgo)

In [None]:
forecastcsgo = b.predict(future)

In [None]:
forecastcsgo[["ds" , "yhat"]].iloc[66:80]

In [None]:
plotcsgo = b.plot(forecastcsgo)

In [None]:
plottcsgo = b.plot_components(forecastcsgo)

### **Upvote and Comment if you liked my notebook :)**