# MVP de Engenharia de Dados

>


## Pedro Augusto Boller


## 1 Objetivo

**Objetivo Geral do MVP:** \
O objetivo deste MVP é criar um pipeline de dados da Steam (plataforma de jogos eletrônicos) que permita ao utilizador explorar e obter insights valiosos sobre os jogos disponíveis na plataforma. \

**Perguntas a serem respondidas:** \
1.	Quais são os cinco jogos mais bem avaliados na categoria "Roguelike" que foram lançados nos últimos 12 meses? \
2.	Qual é o jogo mais popular que inclui a palavra "cyberpunk" em sua descrição ou tags? \
3.	Quais são os três jogos de estratégia com temática de ficção científica que têm o maior número de horas jogadas no total? \
4.	Qual é a porcentagem média de conclusão dos jogos de terror mais assustadores disponíveis na Steam? \
5.	Quais são os cinco jogos mais jogados que tem exatamente 1 GB de tamanho?\
6.	Quais são os três jogos de simulação mais caros que têm suporte para realidade virtual? \
7.	Qual é o jogo com a trilha sonora mais extensa em termos de número de faixas de música disponíveis? \
8.	Quais são os jogos que têm a maior diferença entre a classificação dos críticos e dos jogadores? \
9.	Quais são os jogos de estratégia em tempo real mais antigos que ainda têm uma comunidade ativa de jogadores online? \
10.	Quais são os jogos independentes mais populares que foram lançados por desenvolvedoras com sede na América do Sul? \


## 2 Detalhamento

## 2.1 Busca pelos dados

Inicialmente, a ideia era utilizar uma API para buscar os dados e gerar uma tabela, que então seria inserida no banco de dados do azure para a criação do pipeline, limpeza e análise. Porem, não foi possivel gerar estes dados devido ao aprendizado a mais que seria necessario, e devido a isso não daria tempo de concluir o trabalho no tempo estipulado.

Foi decidido então buscar os dados na plataforma do kaggle, de um usario que usou a API. De acordo com o autor, os dados foram retirados em maio de 2019, e contêm a maioria dos jogos lançados até essa data.

Os dados já estão limpos e tratados, mas será feito uma analise de cada coluna para verificar a qualidade dos dados.

O link da pagina do kaggle onde os dados foram obtidos está abaixo:

https://www.kaggle.com/datasets/nikdavis/steam-store-games


## 2.2 Coleta

Os dados foram baixados do kaggle, e inseridos dentro da plaforma cloud da microsoft, o azure. Foi utilizado o databricks como ferramenta para a realização deste trabalho

## 2.3 Modelagem

O modelo de dados que será usado é o flat, pois os dados se enquadram mais em um conceito de data lake (existem dados numericos, de datas, textos e "dicionarios").

Inicialmente existem três tabelas, a "steam" onde contem a maioria dos dados sobre os jogos, a tabela "steam_description_data" onde contem os dados de descrição de cada jogo, e a tabela "steam_requirements_data", onde existem os dados dos requisitos minimos e recomendados de cada jogo.

Abaixo é possivel ver cada uma das tabelas:

In [0]:
%sql
SELECT * FROM steam
LIMIT 10

appid,name,release_date,english,developer,publisher,platforms,required_age,categories,genres,steamspy_tags,achievements,positive_ratings,negative_ratings,average_playtime,median_playtime,owners,price
10,Counter-Strike,2000-11-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-Player;Valve Anti-Cheat enabled,Action,Action;FPS;Multiplayer,0,124534,3339,17612,317,10000000-20000000,7.19
20,Team Fortress Classic,1999-04-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-Player;Valve Anti-Cheat enabled,Action,Action;FPS;Multiplayer,0,3318,633,277,62,5000000-10000000,3.99
30,Day of Defeat,2003-05-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Valve Anti-Cheat enabled,Action,FPS;World War II;Multiplayer,0,3416,398,187,34,5000000-10000000,3.99
40,Deathmatch Classic,2001-06-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-Player;Valve Anti-Cheat enabled,Action,Action;FPS;Multiplayer,0,1273,267,258,184,5000000-10000000,3.99
50,Half-Life: Opposing Force,1999-11-01,1,Gearbox Software,Valve,windows;mac;linux,0,Single-player;Multi-player;Valve Anti-Cheat enabled,Action,FPS;Action;Sci-fi,0,5250,288,624,415,5000000-10000000,3.99
60,Ricochet,2000-11-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Valve Anti-Cheat enabled,Action,Action;FPS;Multiplayer,0,2758,684,175,10,5000000-10000000,3.99
70,Half-Life,1998-11-08,1,Valve,Valve,windows;mac;linux,0,Single-player;Multi-player;Online Multi-Player;Steam Cloud;Valve Anti-Cheat enabled,Action,FPS;Classic;Action,0,27755,1100,1300,83,5000000-10000000,7.19
80,Counter-Strike: Condition Zero,2004-03-01,1,Valve,Valve,windows;mac;linux,0,Single-player;Multi-player;Valve Anti-Cheat enabled,Action,Action;FPS;Multiplayer,0,12120,1439,427,43,10000000-20000000,7.19
130,Half-Life: Blue Shift,2001-06-01,1,Gearbox Software,Valve,windows;mac;linux,0,Single-player,Action,FPS;Action;Sci-fi,0,3822,420,361,205,5000000-10000000,3.99
220,Half-Life 2,2004-11-16,1,Valve,Valve,windows;mac;linux,0,Single-player;Steam Achievements;Steam Trading Cards;Captions available;Partial Controller Support;Steam Cloud;Includes Source SDK,Action,FPS;Action;Sci-fi,33,67902,2419,691,402,10000000-20000000,7.19


Esta tabela possui as seguintes colunas:

**appid**: Código unico de cada jogo.\
**name**: Nome do jogo.\
**release_date**: Data de quando o jogo foi lançado.\
**english**: Variavel booleana que indica se o jogo tem suporte para a lingua inglesa.\
**developer**: A empresa que produziu o jogo.\
**publisher**: A empresa que publicou o jogo.\
**platforms**: Contem o número de conquistas que o jogo possui.\
**positive_ratings**: valores inteiros. Valor minimo é 0 e o maximo é 2644404. Está dentro do esperado.\
**negative_ratings**: valores inteiros. Valor minimo é 0 e o maximo é 487076. Está dentro do esperado.\
**average_playtime**: valores inteiros. Valor minimo é 0 e o maximo é 190625. Valores estranhos\
**median_playtime**: valores inteiros. Valor minimo é 0 e o maximo é 190625. Valores estranhos\
**price**: valores decimais. Valor minimo é 0 e o maximo é 421,99. Está dentro do esperado

In [0]:
%sql
SELECT * FROM steam_description_data
LIMIT 10

steam_appid,detailed_description,about_the_game,short_description
10,Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.
20,"One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.","One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.","One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes."
30,"Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations. And, as war rages, players must work together with their squad to accomplish a variety of mission-specific objectives.","Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations. And, as war rages, players must work together with their squad to accomplish a variety of mission-specific objectives.","Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations."
40,"Enjoy fast-paced multiplayer gaming with Deathmatch Classic (a.k.a. DMC). Valve's tribute to the work of id software, DMC invites players to grab their rocket launchers and put their reflexes to the test in a collection of futuristic settings.","Enjoy fast-paced multiplayer gaming with Deathmatch Classic (a.k.a. DMC). Valve's tribute to the work of id software, DMC invites players to grab their rocket launchers and put their reflexes to the test in a collection of futuristic settings.","Enjoy fast-paced multiplayer gaming with Deathmatch Classic (a.k.a. DMC). Valve's tribute to the work of id software, DMC invites players to grab their rocket launchers and put their reflexes to the test in a collection of futuristic settings."
50,"Return to the Black Mesa Research Facility as one of the military specialists assigned to eliminate Gordon Freeman. Experience an entirely new episode of single player action. Meet fierce alien opponents, and experiment with new weaponry. Named 'Game of the Year' by the Academy of Interactive Arts and Sciences.","Return to the Black Mesa Research Facility as one of the military specialists assigned to eliminate Gordon Freeman. Experience an entirely new episode of single player action. Meet fierce alien opponents, and experiment with new weaponry. Named 'Game of the Year' by the Academy of Interactive Arts and Sciences.","Return to the Black Mesa Research Facility as one of the military specialists assigned to eliminate Gordon Freeman. Experience an entirely new episode of single player action. Meet fierce alien opponents, and experiment with new weaponry. Named 'Game of the Year' by the Academy of Interactive Arts and Sciences."
60,"A futuristic action game that challenges your agility as well as your aim, Ricochet features one-on-one and team matches played in a variety of futuristic battle arenas.","A futuristic action game that challenges your agility as well as your aim, Ricochet features one-on-one and team matches played in a variety of futuristic battle arenas.","A futuristic action game that challenges your agility as well as your aim, Ricochet features one-on-one and team matches played in a variety of futuristic battle arenas."
70,"Named Game of the Year by over 50 publications, Valve's debut title blends action and adventure with award-winning technology to create a frighteningly realistic world where players must think to survive. Also includes an exciting multiplayer mode that allows you to play against friends and enemies around the world.","Named Game of the Year by over 50 publications, Valve's debut title blends action and adventure with award-winning technology to create a frighteningly realistic world where players must think to survive. Also includes an exciting multiplayer mode that allows you to play against friends and enemies around the world.","Named Game of the Year by over 50 publications, Valve's debut title blends action and adventure with award-winning technology to create a frighteningly realistic world where players must think to survive. Also includes an exciting multiplayer mode that allows you to play against friends and enemies around the world."
80,"With its extensive Tour of Duty campaign, a near-limitless number of skirmish modes, updates and new content for Counter-Strike's award-winning multiplayer game play, plus over 12 bonus single player missions, Counter-Strike: Condition Zero is a tremendous offering of single and multiplayer content.","With its extensive Tour of Duty campaign, a near-limitless number of skirmish modes, updates and new content for Counter-Strike's award-winning multiplayer game play, plus over 12 bonus single player missions, Counter-Strike: Condition Zero is a tremendous offering of single and multiplayer content.","With its extensive Tour of Duty campaign, a near-limitless number of skirmish modes, updates and new content for Counter-Strike's award-winning multiplayer game play, plus over 12 bonus single player missions, Counter-Strike: Condition Zero is a tremendous offering of single and multiplayer content."
130,"Made by Gearbox Software and originally released in 2001 as an add-on to Half-Life, Blue Shift is a return to the Black Mesa Research Facility in which you play as Barney Calhoun, the security guard sidekick who helped Gordon out of so many sticky situations.","Made by Gearbox Software and originally released in 2001 as an add-on to Half-Life, Blue Shift is a return to the Black Mesa Research Facility in which you play as Barney Calhoun, the security guard sidekick who helped Gordon out of so many sticky situations.","Made by Gearbox Software and originally released in 2001 as an add-on to Half-Life, Blue Shift is a return to the Black Mesa Research Facility in which you play as Barney Calhoun, the security guard sidekick who helped Gordon out of so many sticky situations."
220,"1998. HALF-LIFE sends a shock through the game industry with its combination of pounding action and continuous, immersive storytelling. Valve's debut title wins more than 50 game-of-the-year awards on its way to being named ""Best PC Game Ever"" by PC Gamer, and launches a franchise with more than eight million retail units sold worldwide.",,


## 2.4 Carga

In [0]:
from pyspark.sql import SparkSession
import pandas as pd

spark = SparkSession.builder.enableHiveSupport().getOrCreate()

arquivo_steam = 'dbfs:/user/hive/warehouse/steam'
arquivo_description = 'dbfs:/user/hive/warehouse/steam_description_data'
arquivo_requirements = 'dbfs:/user/hive/warehouse/steam_requirements_data'

steam = spark.read.load(arquivo_steam)
description = spark.read.load(arquivo_description)
requirements = spark.read.load(arquivo_requirements)

steam = steam.toPandas()
description = description.toPandas()
requirements = requirements.toPandas()



In [0]:
steam_all_data = pd.merge(steam, description, left_on='appid', right_on='steam_appid')
steam_all_data = pd.merge(steam_all_data, requirements, left_on='appid', right_on='steam_appid')
steam_all_data.head(5)

Unnamed: 0,appid,name,release_date,english,developer,publisher,platforms,required_age,categories,genres,...,steam_appid_x,detailed_description,about_the_game,short_description,steam_appid_y,pc_requirements,mac_requirements,linux_requirements,minimum,recommended
0,10,Counter-Strike,2000-11-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,...,10.0,Play the world's number 1 online action game. ...,Play the world's number 1 online action game. ...,Play the world's number 1 online action game. ...,10,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...","500 mhz processor, 96mb ram, 16mb video card, ...",
1,10,Counter-Strike,2000-11-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,...,10.0,000 levels of chaotic science fiction horror t...,in any order,from the dead easy beginning sectors to the c...,10,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...","500 mhz processor, 96mb ram, 16mb video card, ...",
2,10,Counter-Strike,2000-11-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,...,10.0,000 levels of chaotic science fiction horror t...,in any order,from the dead easy beginning sectors to the c...,10,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...","500 mhz processor, 96mb ram, 16mb video card, ...",
3,20,Team Fortress Classic,1999-04-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,...,20.0,One of the most popular online action games of...,One of the most popular online action games of...,One of the most popular online action games of...,20,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...","500 mhz processor, 96mb ram, 16mb video card, ...",
4,30,Day of Defeat,2003-05-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Valve Anti-Cheat enabled,Action,...,30.0,Enlist in an intense brand of Axis vs. Allied ...,Enlist in an intense brand of Axis vs. Allied ...,Enlist in an intense brand of Axis vs. Allied ...,30,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...","500 mhz processor, 96mb ram, 16mb video card, ...",


In [0]:
steam_all_data_clean = steam_all_data.drop(columns = ['english', 'steam_appid_y', 'steam_appid_x'])
steam_all_data_clean.head(5)

Unnamed: 0,appid,name,release_date,developer,publisher,platforms,required_age,categories,genres,steamspy_tags,...,owners,price,detailed_description,about_the_game,short_description,pc_requirements,mac_requirements,linux_requirements,minimum,recommended
0,10,Counter-Strike,2000-11-01,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,...,10000000-20000000,7.19,Play the world's number 1 online action game. ...,Play the world's number 1 online action game. ...,Play the world's number 1 online action game. ...,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...","500 mhz processor, 96mb ram, 16mb video card, ...",
1,10,Counter-Strike,2000-11-01,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,...,10000000-20000000,7.19,000 levels of chaotic science fiction horror t...,in any order,from the dead easy beginning sectors to the c...,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...","500 mhz processor, 96mb ram, 16mb video card, ...",
2,10,Counter-Strike,2000-11-01,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,...,10000000-20000000,7.19,000 levels of chaotic science fiction horror t...,in any order,from the dead easy beginning sectors to the c...,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...","500 mhz processor, 96mb ram, 16mb video card, ...",
3,20,Team Fortress Classic,1999-04-01,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,...,5000000-10000000,3.99,One of the most popular online action games of...,One of the most popular online action games of...,One of the most popular online action games of...,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...","500 mhz processor, 96mb ram, 16mb video card, ...",
4,30,Day of Defeat,2003-05-01,Valve,Valve,windows;mac;linux,0,Multi-player;Valve Anti-Cheat enabled,Action,FPS;World War II;Multiplayer,...,5000000-10000000,3.99,Enlist in an intense brand of Axis vs. Allied ...,Enlist in an intense brand of Axis vs. Allied ...,Enlist in an intense brand of Axis vs. Allied ...,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...","500 mhz processor, 96mb ram, 16mb video card, ...",


In [0]:
steam_all_data_clean = steam_all_data_clean.drop_duplicates(subset='appid')
steam_all_data_clean

Unnamed: 0,appid,name,release_date,developer,publisher,platforms,required_age,categories,genres,steamspy_tags,...,owners,price,detailed_description,about_the_game,short_description,pc_requirements,mac_requirements,linux_requirements,minimum,recommended
0,10,Counter-Strike,2000-11-01,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,...,10000000-20000000,7.19,Play the world's number 1 online action game. ...,Play the world's number 1 online action game. ...,Play the world's number 1 online action game. ...,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...","500 mhz processor, 96mb ram, 16mb video card, ...",
3,20,Team Fortress Classic,1999-04-01,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,...,5000000-10000000,3.99,One of the most popular online action games of...,One of the most popular online action games of...,One of the most popular online action games of...,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...","500 mhz processor, 96mb ram, 16mb video card, ...",
4,30,Day of Defeat,2003-05-01,Valve,Valve,windows;mac;linux,0,Multi-player;Valve Anti-Cheat enabled,Action,FPS;World War II;Multiplayer,...,5000000-10000000,3.99,Enlist in an intense brand of Axis vs. Allied ...,Enlist in an intense brand of Axis vs. Allied ...,Enlist in an intense brand of Axis vs. Allied ...,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...","500 mhz processor, 96mb ram, 16mb video card, ...",
5,40,Deathmatch Classic,2001-06-01,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,...,5000000-10000000,3.99,Enjoy fast-paced multiplayer gaming with Death...,Enjoy fast-paced multiplayer gaming with Death...,Enjoy fast-paced multiplayer gaming with Death...,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...","500 mhz processor, 96mb ram, 16mb video card, ...",
6,50,Half-Life: Opposing Force,1999-11-01,Gearbox Software,Valve,windows;mac;linux,0,Single-player;Multi-player;Valve Anti-Cheat en...,Action,FPS;Action;Sci-fi,...,5000000-10000000,3.99,Return to the Black Mesa Research Facility as ...,Return to the Black Mesa Research Facility as ...,Return to the Black Mesa Research Facility as ...,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...","500 mhz processor, 96mb ram, 16mb video card, ...",
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
27059,1065230,Room of Pandora,2019-04-24,SHEN JIAWEI,SHEN JIAWEI,windows,0,Single-player;Steam Achievements,Adventure;Casual;Indie,Adventure;Indie;Casual,...,0-20000,2.09,"<img src=""https://steamcdn-a.akamaihd.net/stea...","<img src=""https://steamcdn-a.akamaihd.net/stea...",The Room of Pandora is a third-person interact...,{'minimum': '<strong>Minimum:</strong><br><ul ...,[],[],OS: Windows 7 Processor: Inter Core i7 Memory:...,
27060,1065570,Cyber Gun,2019-04-23,Semyon Maximov,BekkerDev Studio,windows,0,Single-player,Action;Adventure;Indie,Action;Indie;Adventure,...,0-20000,1.69,Have you ever been so lonely that no one but y...,,,{'minimum': '<strong>Minimum:</strong><br><ul ...,[],[],"OS: Windows XP, Vista, 7, 8, 10 Processor: Int...",
27061,1065650,Super Star Blast,2019-04-24,EntwicklerX,EntwicklerX,windows,0,Single-player;Multi-player;Co-op;Shared/Split ...,Action;Casual;Indie,Action;Indie;Casual,...,0-20000,3.99,<strong>Super Star Blast </strong>is a space b...,<strong>Super Star Blast </strong>is a space b...,Super Star Blast is a space based game with ch...,{'minimum': '<strong>Minimum:</strong><br><ul ...,[],[],"OS: Windows 7, Windows 8, Windows 10 (32/64bit...",
27062,1066700,New Yankee 7: Deer Hunters,2019-04-17,Yustas Game Studio,Alawar Entertainment,windows;mac,0,Single-player;Steam Cloud,Adventure;Casual;Indie,Indie;Casual;Adventure,...,0-20000,5.19,Pursue a snow-white deer through an enchanted ...,,,{'minimum': '<strong>Minimum:</strong><br><ul ...,{'minimum': '<strong>Minimum:</strong><br><ul ...,[],OS: Windows XP or later Processor: 1.5 GHz Mem...,OS: Windows 7 or later Processor: 1.5 GHz Memo...


In [0]:
# Create Hive Internal table
sparkDF=spark.createDataFrame(steam_all_data_clean) 

sparkDF.write.mode('overwrite') \
         .saveAsTable("steam_all_data")

## 2.5 Análise

## 2.5.1 Qualidade dos dados

Nesta etapa, sera feito uma analise da qualidade dos dados por atributo, da tabela construida nas etapas anteriores.

In [0]:
steam_all_data_clean.isnull().sum()

Out[6]: appid                       0
name                        0
release_date                0
developer                   0
publisher                   0
platforms                   0
required_age                0
categories                  0
genres                      0
steamspy_tags               0
achievements                0
positive_ratings            0
negative_ratings            0
average_playtime            0
median_playtime             0
owners                      0
price                       0
detailed_description        0
about_the_game           6782
short_description        6966
pc_requirements             0
mac_requirements            0
linux_requirements          0
minimum                     5
recommended             13044
dtype: int64

A primeira análise é a verificação dos valores faltantes. Como é possivel ver no resultado acima, os unicos atributos que possuem dados faltantes são o "about_the_game", "short_description" , "minimum" e "recommended".

O "about_the_game" e "short_description" se tratam de dados da descrição do jogo na plataforma, e não é incomum alguns jogos não terem a descrição. A coluna "recommended" se trata dos requisitos recomentados para rodar o jogo, e tambem não é incomum que estes dados estejam faltantes. A coluna "minimum" possui apenas 5 valores faltantes, o que leva a crer que ocorreu algum erro na hora de retirar os dados, ou foram perdidos de alguma forma.

In [0]:
steam_all_data_clean.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 27062 entries, 0 to 27063
Data columns (total 25 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   appid                 27062 non-null  int64  
 1   name                  27062 non-null  object 
 2   release_date          27062 non-null  object 
 3   developer             27062 non-null  object 
 4   publisher             27062 non-null  object 
 5   platforms             27062 non-null  object 
 6   required_age          27062 non-null  int64  
 7   categories            27062 non-null  object 
 8   genres                27062 non-null  object 
 9   steamspy_tags         27062 non-null  object 
 10  achievements          27062 non-null  int64  
 11  positive_ratings      27062 non-null  int64  
 12  negative_ratings      27062 non-null  int64  
 13  average_playtime      27062 non-null  int64  
 14  median_playtime       27062 non-null  int64  
 15  owners             

In [0]:
steam_all_data_clean.describe()

Unnamed: 0,appid,required_age,achievements,positive_ratings,negative_ratings,average_playtime,median_playtime,price
count,27062.0,27062.0,27062.0,27062.0,27062.0,27062.0,27062.0,27062.0
mean,596323.9,0.353817,45.26624,1000.845,211.082773,149.856662,146.105351,6.078505
std,250810.4,2.402338,352.753851,18993.28,4285.965411,1827.474576,2354.443666,7.875836
min,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,401375.0,0.0,0.0,6.0,2.0,0.0,0.0,1.69
50%,599120.0,0.0,7.0,24.0,9.0,0.0,0.0,3.99
75%,798825.0,0.0,23.0,125.0,41.0,0.0,0.0,7.19
max,1069460.0,18.0,9821.0,2644404.0,487076.0,190625.0,190625.0,421.99


Agora serão analisados os dados numéricos da tabela, para ver se estão dentro do esperado.

**appid**: valores inteiros. Valor minimo é 10 e o maximo é 1069460. Está dentro do esperado.\
**required_age**: valores inteiros. Valor minimo é 0 e o maximo é 18. Está dentro do esperado.\
**achievements**: valores inteiros. Valor minimo é 0 e o maximo é 9821. Está dentro do esperado.\
**positive_ratings**: valores inteiros. Valor minimo é 0 e o maximo é 2644404. Está dentro do esperado.\
**negative_ratings**: valores inteiros. Valor minimo é 0 e o maximo é 487076. Está dentro do esperado.\
**average_playtime**: valores inteiros. Valor minimo é 0 e o maximo é 190625. Valores estranhos\
**median_playtime**: valores inteiros. Valor minimo é 0 e o maximo é 190625. Valores estranhos\
**price**: valores decimais. Valor minimo é 0 e o maximo é 421,99. Está dentro do esperado


De acordo com a análise, as unicas colunas com valores estranhos é a "average_playtime" e "median_playtime", que são respectivamente a média e a mediana do tempo de jogo do usuario, pois o tempo maximo observado é extremamente improvavel. Mesmo que seja em minutos, ainda fica fora da realidade. Uma possibilidade é que estes dados sejam na realidade decimais, porem não é possivel determinar qual seria a casa da vírgula nesse caso.

Não encontrei uma solução para resolver este problema, então irei manter as duas colunas como estão. A justificativa para esta decisão é que se existe alguma operação que foi feita nesta coluna, então foi feito para todas as linhas. Nesse caso, para fins de responder as perguntas propostas, os valores absurdos não serão um problema.

In [0]:
steam_all_data_clean.describe(include=object)

Unnamed: 0,name,release_date,developer,publisher,platforms,categories,genres,steamspy_tags,owners,detailed_description,about_the_game,short_description,pc_requirements,mac_requirements,linux_requirements,minimum,recommended
count,27062,27062,27062,27062,27062,27062,27062,27062,27062,27062,20280,20096,27062,27062,27062,27057,14018
unique,27020,2617,17107,14348,5,3333,1551,6418,13,26915,20269,20020,25182,8030,5263,24906,12226
top,Dark Matter,2018-07-13,Choice of Games,Big Fish Games,windows,Single-player,Action;Indie,Action;Indie;Casual,0-20000,\n,Those who love platform games may think this i...,Minimal physical puzzle with explosions,{'minimum': '<strong>Minimum:</strong><br><ul ...,[],[],OS: Windows 7,Requires a 64-bit processor and operating system
freq,3,64,94,212,18393,6104,1852,845,18591,24,3,12,133,15934,18784,137,808


Agora serão analisados as colunas não numéricas.

**name**: Texto. Está dentro do esperado.\
**release_date**: Data. Está dentro do esperado.\
**developer**: Texto. Está dentro do esperado.\
**publisher**: Texto. Está dentro do esperado.\
**platforms**: Texto. Está dentro do esperado.\
**categories**: Texto. Está dentro do esperado.\
**genres**: Texto. Está dentro do esperado.\
**steampsy_tags**: Texto. Está dentro do esperado.\
**owners**: Texto. Está dentro do esperado.\
**detailed_description**: Texto. Está dentro do esperado.\
**about_the_game**: Texto. Está dentro do esperado.\
**short_description**: Texto. Está dentro do esperado.\
**pc_requirements**: Texto. Está dentro do esperado.\
**mac_requirements**: Texto. Está dentro do esperado.\
**linux_requirements**: Texto. Está dentro do esperado.\
**minimum**: Texto. Está dentro do esperado.\
**recommended**: Texto. Está dentro do esperado.\

## 2.5.2 Solução do problema

1. Quais são os cinco jogos mais bem avaliados na categoria "Roguelike" que foram lançados nos últimos 12 meses?

Como os dados são de 2019, a pergunta será reformulada da seguinte forma:

1. Quais são os cinco jogos mais bem avaliados na categoria "Roguelike" que foram lançados entre 2018 e 2019?

In [0]:
%sql
SELECT name, ((positive_ratings / (positive_ratings + negative_ratings)) * 100) as Ratings, release_date
FROM steam_all_data
WHERE (categories LIKE '%Rogue-like%' 
        OR genres LIKE '%Rogue-like%' 
        OR steamspy_tags LIKE '%Rogue-like%')
        AND release_date BETWEEN '2000-01-01' AND '2019-12-31'
AND positive_ratings > 1000
ORDER BY Ratings DESC
LIMIT 5;

name,Ratings,release_date
The Binding of Isaac: Rebirth,97.53582429355832,2014-11-04
Caves of Qud,97.02072538860104,2015-07-15
Crypt of the NecroDancer,96.3729159578088,2015-04-23
Slay the Spire,96.32601973199824,2019-01-23
FTL: Faster Than Light,96.3203072057827,2012-09-14


2) Qual é o jogo mais popular que inclui a palavra "cyberpunk" em sua descrição ou tags?

In [0]:
%sql
SELECT name, ((positive_ratings / (positive_ratings + negative_ratings)) * 100) as Ratings
FROM steam_all_data
WHERE (categories LIKE '%cyberpunk%' 
       OR genres LIKE '%cyberpunk%' 
       OR steamspy_tags LIKE '%cyberpunk%' 
       OR detailed_description LIKE '%cyberpunk%' 
       OR about_the_game LIKE '%cyberpunk%' 
       OR short_description LIKE '%cyberpunk%')
      AND positive_ratings > 1000
ORDER BY Ratings DESC
LIMIT 1;

name,Ratings
VA-11 Hall-A: Cyberpunk Bartender Action,97.67441860465117


3) Quais são os três jogos de estratégia com temática de ficção científica que têm o maior número de horas jogadas no total? \

In [0]:
%sql
SELECT name, average_playtime
FROM steam_all_data
WHERE genres LIKE '%Strategy%'
AND steamspy_tags LIKE '%Sci-fi%'
ORDER BY average_playtime DESC
LIMIT 3;

name,average_playtime
EVE Online,5123
DG2: Defense Grid 2,3601
UFO: Afterlight,2380


4)	Qual é a porcentagem média de conclusão dos jogos de terror mais assustadores disponíveis na Steam?

Não é possivel responder essa pergunta, pois não existem dados da conclusão média dos jogos.

5)	Quais são os cinco jogos mais jogados que tem exatamente 1 GB de tamanho?


In [0]:
%sql
SELECT name, owners
FROM steam_all_data
WHERE minimum LIKE '% 1 GB available space%'
ORDER BY owners DESC
LIMIT 5;

name,owners
Trove,5000000-10000000
Kingdoms and Castles,500000-1000000
AX:EL - Air XenoDawn,500000-1000000
Overcast - Walden and the Werewolf,500000-1000000
Duelyst,500000-1000000


6.	Quais são os três jogos de simulação mais caros que têm suporte para realidade virtual?

In [0]:
%sql
SELECT name, price
FROM steam_all_data
WHERE (categories LIKE '%VR%' 
       OR genres LIKE '%VR%' 
       OR steamspy_tags LIKE '%VR%')
ORDER BY price DESC
LIMIT 3;

name,price
The Music Room,98.99
ARK: Survival Evolved,44.99
The Elder Scrolls V: Skyrim VR,39.99


7.	Qual é o jogo com a trilha sonora mais extensa em termos de número de faixas de música disponíveis?

Não existem dados de trilha sonora disponiveis.

8.	Quais são os jogos que têm a maior diferença entre a classificação dos críticos e dos jogadores?

Não existem dados de notas dos criticos.

9.	Quais são os jogos de estratégia em tempo real mais antigos que ainda têm uma comunidade ativa de jogadores online?

Não existem dados correpondentes a comunidade de cada jogo na tabela.

10.	Quais são os jogos independentes mais populares que foram lançados por desenvolvedoras com sede na América do Sul?

Não existem dados correpondentes as sedes das desenvolvedoras.

## 3 Autoavaliação