## Our Team
DS105 SuperStars

This presentation will show you examples of what you can do with Quarto and [Reveal.js](https://revealjs.com), including:

-   Presenting code and LaTeX equations
-   Including computations in slide output
-   Image, video, and iframe backgrounds
-   Fancy transitions and animations
-   Printing to PDF

...and much more



## Initial Data Correlation{.smaller transition="slide"}

::: panel-tabset
### Plot 

code to obtain the correlation matrix plot of the initial DF

::: columns

::: {.column width="40%"}
``` python 
import matplotlib.pyplot as plt
import seaborn as sns

# plot the correlation matrix
plt.figure(figsize=(12,10), dpi= 80)
sns.heatmap(data.corr(), xticklabels=data.corr().columns, 
    yticklabels=data.corr().columns, 
    cmap='RdYlGn', c
    enter=0, annot=True, fmt=".2f")

# decoration
plt.title('Correlogram of Steam Games data', fontsize=22)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.show()
```
:::

::: {.column width="60%"}
<!-- 12A0366C:Linear_regression.ipynb#fig-correlation-matrix |  | echo:false,warning:false,asis:true,eval:false -->
:::

:::



### Data
|    |   appid | name                             | developer                        | publisher             |   score_rank |   positive |   negative |   userscore | owners                     |   average_forever |   average_2weeks |   median_forever |   median_2weeks |   price |   initialprice |   discount |     ccu |
|---:|--------:|:---------------------------------|:---------------------------------|:----------------------|-------------:|-----------:|-----------:|------------:|:---------------------------|------------------:|-----------------:|-----------------:|----------------:|--------:|---------------:|-----------:|--------:|
|  0 |     570 | Dota 2                           | Valve                            | Valve                 |          nan |    1611153 |     338953 |           0 | 200,000,000 .. 500,000,000 |             37228 |             1393 |              853 |             711 |       0 |              0 |          0 |  532699 |
|  1 |     730 | Counter-Strike: Global Offensive | Valve, Hidden Path Entertainment | Valve                 |          nan |    6207621 |     811918 |           0 | 50,000,000 .. 100,000,000  |             30347 |              822 |             5979 |             331 |       0 |              0 |          0 | 1010721 |
|  2 | 1172470 | Apex Legends                     | Respawn Entertainment            | Electronic Arts       |          nan |     513660 |     105734 |           0 | 50,000,000 .. 100,000,000  |              8491 |              917 |              834 |             498 |       0 |              0 |          0 |  363786 |
|  3 |  578080 | PUBG: BATTLEGROUNDS              | KRAFTON, Inc.                    | KRAFTON, Inc.         |          nan |    1231039 |     925364 |           0 | 50,000,000 .. 100,000,000  |             22551 |              734 |             5940 |             230 |       0 |              0 |          0 |  328139 |
|  4 | 1063730 | New World                        | Amazon Games                     | Amazon Games          |          nan |     176376 |      75957 |           0 | 50,000,000 .. 100,000,000  |              9175 |              723 |             3226 |             454 |    1999 |           3999 |         50 |   17498 |
|  5 |     440 | Team Fortress 2                  | Valve                            | Valve                 |          nan |     883078 |      58714 |           0 | 50,000,000 .. 100,000,000  |              8909 |             1214 |              421 |             113 |       0 |              0 |          0 |  100175 |
|  6 |  271590 | Grand Theft Auto V               | Rockstar North                   | Rockstar Games        |          nan |    1301322 |     218427 |           0 | 50,000,000 .. 100,000,000  |             13540 |              589 |             6105 |             143 |    2998 |           2998 |          0 |  109847 |
|  7 | 1599340 | Lost Ark                         | Smilegate RPG                    | Amazon Games          |          nan |     136970 |      53607 |           0 | 20,000,000 .. 50,000,000   |              3768 |             1303 |              744 |             836 |       0 |              0 |          0 |   85067 |
|  8 |     550 | Left 4 Dead 2                    | Valve                            | Valve                 |          nan |     702272 |      17857 |           0 | 20,000,000 .. 50,000,000   |              2309 |              320 |              520 |             113 |      99 |            999 |         90 |   44785 |
|  9 |  304930 | Unturned                         | Smartly Dressed Games            | Smartly Dressed Games |          nan |     464351 |      43104 |           0 | 20,000,000 .. 50,000,000   |              6654 |             5845 |              293 |            2420 |       0 |              0 |          0 |   51599 |

so the columns are : 
<!-- 12A0366C:Linear_regression.ipynb#initial-column-list |  | echo:false,warning:false,asis:true,eval:false -->
:::


## Tranforming the Data {transition="slide" auto-animate="true"}
``` python
# clean 
df_genres = data.copy()
df_genres = df_genres.drop(['score_rank','initialprice','discount','userscore'],axis=1)
df_genres['appid'] = df_genres['appid'].astype(int)

```

## From str to int : owners {auto-animate="true"}
``` python
# clean 
df_genres = data.copy()
df_genres = df_genres.drop(['score_rank','initialprice','discount','userscore'],axis=1)
df_genres['appid'] = df_genres['appid'].astype(int)

# get the average of the owners
owners = pd.DataFrame(df_genres['owners'].str.replace(',','').str.split(' .. ').tolist(),columns = ['min','max'])
df_genres['owners']= owners.astype(int).sum(axis=1)/2
df_genres['owners'] = df_genres['owners'].astype(int)
df_genres.rename(columns={'owners':'avg_owners'},inplace=True)
```
## Tweeking and Made Usefull {auto-animate="true" }


In [None]:
# clean 
df_genres = data.copy()
df_genres = df_genres.drop(['score_rank','initialprice','discount','userscore'],axis=1)
df_genres['appid'] = df_genres['appid'].astype(int)

# get the average of the owners
owners = pd.DataFrame(df_genres['owners'].str.replace(',','').str.split(' .. ').tolist(),columns = ['min','max'])
df_genres['owners']= owners.astype(int).sum(axis=1)/2
df_genres['owners'] = df_genres['owners'].astype(int)
df_genres.rename(columns={'owners':'avg_owners'},inplace=True)

# ratio of positive reviews over total reviews
df_genres['positive_ratio'] = df_genres['positive']/(df_genres['positive']+df_genres['negative'])

'''
REMARK: this ratio is severely skewed when the number of reviews is low.
'''

# lets drop the rows with less than 100 reviews and 0 average concurent players 
df_genres = df_genres[((df_genres['positive']+df_genres['negative'])>100) & (df_genres['average_forever']>0)]
# only 12k rows left

## initial Dataset {.smaller transition="slide"}

<!-- 12A0366C:Linear_regression.ipynb#modified-initial-data |  | echo:false,warning:false,asis:true,eval:false -->