# Análise Exploratória Jogos Steam - Modelo 1 ETL 

Esta análise explortaória é sobre os dados caputados no site da steam através do processo de webscrpaing realizado no projeto :[link de referencia do projeto de ETL da steam]. Esse trabalho tem como objetivo realizar uma investigação inicial sobre os dados caputados e buscar insigths sobre os jogos e a área de negocio que está envolvida, portanto caso seja necessário, novas etapas de ETL poderá ser realizada. Com o fim da análise, a comunicação dos resultados será direcionada para dashboards e relatórios. 

In [1]:
# Importando os pacotes
import pandas as pd
import numpy as np
import matplotlib.pyplot as mlp
import psycopg2
import ast
from sqlalchemy import create_engine, MetaData 

# My Util
from my_utils import EDA

In [2]:
# Definição de funções: 
def transform_multipleID(df, column):
    data = df[column].apply(lambda x: None if x is None else [int(i) for i in x.split(',')])
    return data

In [3]:
# Criando o engine e conectando ao banco de dados:
engine = create_engine('postgresql://docker:docker@localhost/etl-steam')

# Pegando as tabelas presentes no banco de dados:
SCHEMA_NAME = 'modelo2'
metadata = MetaData(bind=engine, schema=SCHEMA_NAME)
metadata.reflect()
tables = metadata.tables.keys()
tables

dict_keys(['modelo2.exemplo_tab', 'modelo2.info', 'modelo2.prices', 'modelo2.links', 'modelo2.reviews'])

In [4]:
# Iterando sobre as tabelas e salvando em dataframes pandas
df_dict = {}
for table in tables:
    df_dict[table] = pd.read_sql('select * from'+' '+table, engine)
    
print(f'Tabelas:{df_dict.keys()}')

Tabelas:dict_keys(['modelo2.exemplo_tab', 'modelo2.info', 'modelo2.prices', 'modelo2.links', 'modelo2.reviews'])


In [5]:
# Criando novos objetos dataframe com a cópia dos dados orignais para transormações/alterações
df_info = df_dict.get(SCHEMA_NAME+'.info')
df_prices = df_dict.get(SCHEMA_NAME+'.prices')
df_reviews = df_dict.get(SCHEMA_NAME+'.reviews')

* A tabela referente links não será utilizada durante essa análise pois não traz informações relevantes, é somente um armazenamento dos links da página de cada jogo para futuras implementações.

## Construindo uma unica tabela relacional para agrupamento de informações:

In [6]:
# Criando uma única tabela pela busca em comum de steam_id:
data = df_info.merge(df_prices, on='steam_id').merge(df_reviews, on='steam_id')

# Drop columns index_y e index_x
data.drop(columns=['index_x','index_y'], inplace=True)

In [7]:
data.head()

Unnamed: 0,steam_id,title,tagid_steam,release_date,price_real,discount,data_view,index,total_reviews,percent_positive_reviews
0,730,Counter-Strike: Global Offensive,"[1663,1774,3859,3878,19,5711,5055]","21 Aug, 2012",76.49,,20-07-2023,0,7366358,88
1,1086940,Baldur's Gate 3,"[493,122,6426,4747,1742,4474,1684]","6 Oct, 2020",199.99,,20-07-2023,1,60101,88
2,671860,BattleBit Remastered,"[1663,1774,3859,5363,128,4168,1775]","15 Jun, 2023",49.0,,20-07-2023,2,57862,90
3,271590,Grand Theft Auto V,"[1695,19,3859,6378,1100687,1697,3839]","13 Apr, 2015",0.0,,20-07-2023,3,1452384,86
4,1174180,Red Dead Redemption 2,"[1695,1742,1647,21,19,3859,4175]","5 Dec, 2019",98.96,,20-07-2023,4,391401,90


In [8]:
#Verificando um resumo dos dados: 
EDA.summary_dataframes(data)

Quantidade total de registros:334316. 


<class 'pandas.core.frame.DataFrame'>
Int64Index: 334316 entries, 0 to 334315
Data columns (total 10 columns):
 #   Column                    Non-Null Count   Dtype 
---  ------                    --------------   ----- 
 0   steam_id                  333316 non-null  object
 1   title                     334316 non-null  object
 2   tagid_steam               306612 non-null  object
 3   release_date              334316 non-null  object
 4   price_real                334316 non-null  object
 5   discount                  0 non-null       object
 6   data_view                 334316 non-null  object
 7   index                     334316 non-null  int64 
 8   total_reviews             95143 non-null   object
 9   percent_positive_reviews  95143 non-null   object
dtypes: int64(1), object(9)
memory usage: 28.1+ MB
None


 Total Valores nulos:
                           Total Values null    %_weight
steam_id                               1000    0.29

In [9]:
# Removendo duplicidades
data.drop_duplicates(inplace=True)

In [11]:
EDA.summary_dataframes(data)

Quantidade total de registros:143015. 


<class 'pandas.core.frame.DataFrame'>
Int64Index: 143015 entries, 0 to 334315
Data columns (total 10 columns):
 #   Column                    Non-Null Count   Dtype 
---  ------                    --------------   ----- 
 0   steam_id                  143007 non-null  object
 1   title                     143015 non-null  object
 2   tagid_steam               135046 non-null  object
 3   release_date              143015 non-null  object
 4   price_real                143015 non-null  object
 5   discount                  0 non-null       object
 6   data_view                 143015 non-null  object
 7   index                     143015 non-null  int64 
 8   total_reviews             57987 non-null   object
 9   percent_positive_reviews  57987 non-null   object
dtypes: int64(1), object(9)
memory usage: 12.0+ MB
None


 Total Valores nulos:
                           Total Values null    %_weight
steam_id                                  8    0.00

In [15]:
data.head(10)

Unnamed: 0,steam_id,title,tagid_steam,release_date,price_real,discount,data_view,index,total_reviews,percent_positive_reviews
0,730,Counter-Strike: Global Offensive,"[1663,1774,3859,3878,19,5711,5055]","21 Aug, 2012",76.49,,20-07-2023,0,7366358,88
1,1086940,Baldur's Gate 3,"[493,122,6426,4747,1742,4474,1684]","6 Oct, 2020",199.99,,20-07-2023,1,60101,88
2,671860,BattleBit Remastered,"[1663,1774,3859,5363,128,4168,1775]","15 Jun, 2023",49.0,,20-07-2023,2,57862,90
3,271590,Grand Theft Auto V,"[1695,19,3859,6378,1100687,1697,3839]","13 Apr, 2015",0.0,,20-07-2023,3,1452384,86
4,1174180,Red Dead Redemption 2,"[1695,1742,1647,21,19,3859,4175]","5 Dec, 2019",98.96,,20-07-2023,4,391401,90
5,1938090,Call of Duty®: Modern Warfare® II,"[1663,3859,19,1774,4182,4168,3839]","27 Oct, 2022",299.9,,20-07-2023,5,404844,60
6,1364780,Street Fighter™ 6,"[4736,1743,19,1773,4747,21,3859]","1 Jun, 2023",249.0,,20-07-2023,6,11798,89
7,2108330,F1® 23,"[21978,699,1644,701,3859,1100687,4182]","15 Jun, 2023",287.4,,20-07-2023,7,3143,86
8,1868140,DAVE THE DIVER,"[3964,597,21,599,122,15564,1654]","28 Jun, 2023",59.99,,20-07-2023,8,39132,97
9,1599340,Lost Ark,"[1754,113,4231,1646,3859,122,128]","11 Feb, 2022",0.0,,20-07-2023,9,194010,71


## Dev Functions