<p align="center">
    <img src="./img/image-1.png">
<p>

# Éxito en videojuegos

La industria de los videojuegos se ha convertido en los ultimos 10 años, en uno de los sectores con mayor recaudación del mundo, más que el cine y la música juntos. Estamos hablando de un sector que generó en el 2023 alrededor de $184.0 billones de dolares.

Sin embargo, uno de los puntos que el público general puede no tomar en cuenta son los costes generales de desarrollo. Para hacernos una idea de esto, a comienzos de la sexta generación de consolas (Xbox, PS2 y GameCube) los costes de desarrollo rondaban entre los $2-$7 millones de dolares para juegos medianos, y entre $15-$60 millones para juegos AAA. el abánico es bastante amplio, pero esto dependía de cuan ambiciosos fuesen los estudios y publishers en sus lanzamientos, años de desarrollo, marketing, licencias si eran multiplataforma, etc. Hoy, en la novena generación de consolas, los costes de desarrollo comprenden rangos muy elevados: desde $20-$50 millones de dolares para juegos medianos, y desde $60-$600 millones de dolares para juegos AAA. Este rango tan disperso comprenden juegos de todo tipo, así como grandes y nuevas IPs: Final Fantasy XVI ($60 millones, 87 MC, 2023), GTA 5 ($500 millones, 97 MC, 2013), Star Citizen ($600 millones, en Acceso anticipado).

<sup>[Aquí puedes ver la lista de los juegos más caros de la historia](https://en.wikipedia.org/wiki/List_of_most_expensive_video_games_to_develop)</sup>

El problema de todo esto vienen siendo los benificios que se esperan de estos juegos. Aunque hay mucho caso de exito, hay otros los cuales no han corrido la misma suerte: SONY lanzó en el mes de agosto Concord (62 MC), un juego multijugador de 5V5 que se estima costó $400 millones, vendiendo 25000 unidades y una media de 700 jugadores, [muy lejos de cubrir un porcentaje minúsculo del desarrollo y siendo un absoluto fracaso,](https://jonahwrites.blog/2024/09/05/concord-the-biggest-flop-in-gaming-history/) tanto que cerraron los servidores pasadas dos semanas de su lanzamiento y devolviendo el dinero a los jugadores. 

Tambien estamos hablando de un caso muy evidente pero, ¿y si te dijera que incluso juegos muy bien valorados tampoco se salvan? Alan Wake 2 (89 MC) tuvó un coste de desarrollo de $70 millones de dolares y siendo muy bien recibido por la comunidad y critica. Sin embargo, [vendió 1.3m según un reporte de Remedy en febrero de 2024,](https://www.eurogamer.net/alan-wake-2-fastest-selling-remedy-game-but-yet-to-turn-a-profit) lo cual no llega a cubrir los costes de desarrollo. Este caso es parecido al de Tango Gameworks, pero con el cierre del estudio por parte de Microsoft debido a no cumplir expectativas de ventas.

¿Tiene algo que ver el género o la nota recibida? ¿Los jugadores a los que el juego va dirigido? ¿Quizá sea un tema de timing? ¿El género más popular suele ser el más rentable? Todas estas dudas y más las estaremos afrontando a lo largo de este analisis, en el que queremos abarcar estas dudas gracias a los datos.

## Nuestras hipótesis
+ ¿Un videojuego exitoso en cuanto a critica representa el éxito en cuanto a ventas?
+ ¿La base de jugadores de un videojuego garantiza el éxito?
+ ¿Importa el género de un videojuego?
+ ¿La fecha de lanzamiento de un videojuego es un elemento clave para máximizar ingresos?
+ ¿Se pueden tener en cuenta otros parámetros para medir el éxito? 

<sup><sub>
*Este análisis toma en cuenta el mercado del PC **usando datasets de Steam:*** <br>
[Steam Games Dataset](https://www.kaggle.com/datasets/fronkongames/steam-games-dataset) <br>
[2024 Steam Statistics | Python | SQL | Tableau](https://www.kaggle.com/code/johnangelobelarma/2024-steam-statistics-python-sql-tableau/input)
</sub></sup>

## OBTENCIÓN DE LOS DATOS

In [17]:
import pandas as pd
import numpy as np
import time
#pd.set_option('display.max_columns', 200)
#pd.set_option('display.max_rows', None)
#pd.set_option('display.max_columns', None)
pd.options.mode.chained_assignment = None  # default='warn'

In [18]:
df_1 = df_1 = pd.read_csv("./../data/games-xs.csv")
df_1.drop(columns="Unnamed: 0", inplace=True)

In [19]:
df_1

Unnamed: 0,AppID,Name,Release date,Price,Metacritic score,Average playtime forever,Developers,Publishers,Categories,Genres
0,1026420,WARSAW,"Oct 2, 2019",23.99,62,67,Pixelated Milk,"Pixelated Milk,gaming company","Single-player,Steam Achievements,Steam Trading...","Indie,RPG"
1,22670,Alien Breed 3: Descent,"Nov 17, 2010",9.99,64,44,Team17 Digital Ltd,Team17 Digital Ltd,"Single-player,Multi-player,Co-op,Steam Achieve...",Action
2,231330,Deadfall Adventures,"Nov 15, 2013",19.99,53,324,The Farm 51,THQ Nordic,"Single-player,Multi-player,Co-op,Steam Achieve...","Action,Adventure"
3,897820,Reigns: Game of Thrones,"Oct 18, 2018",3.99,84,83,Nerial,Devolver Digital,"Single-player,Steam Achievements,Full controll...","Adventure,Indie,RPG"
4,12140,Max Payne,"Jan 6, 2011",3.49,89,168,Remedy Entertainment,Rockstar Games,Single-player,Action
...,...,...,...,...,...,...,...,...,...,...
3948,2305840,Cat Quest III,"Aug 8, 2024",19.99,84,0,The Gentlebros,Kepler Interactive,"Single-player,Multi-player,Co-op,Shared/Split ...","Action,Adventure,Indie,RPG"
3949,2366980,Thank Goodness You're Here!,"Aug 1, 2024",19.99,90,265,Coal Supper,Panic,"Single-player,Steam Achievements,Full controll...","Adventure,Casual,Indie"
3950,2394650,Crypt Custodian,"Aug 27, 2024",16.99,81,0,Kyle Thompson,"Kyle Thompson,Top Hat Studios, Inc.,H2 Interac...","Single-player,Steam Achievements,Full controll...",Adventure
3951,1299690,Gori: Cuddly Carnage,"Aug 29, 2024",17.59,78,0,Angry Demon Studio,"Wired Productions,CouchPlay Interactive (Great...","Single-player,Steam Achievements,Full controll...","Action,Adventure,Indie"


In [20]:
# Pasos para limpiar el dataset original, he creado uno nuevo que pesará menos y poder subirlo a GitHub

# 1st CSV
# games_df_column_names = ['AppID', 'Name', 'Release date', 'Estimated owners', 'Peak CCU', 
#                     'Required age', 'Price', 'Unknown', 'DiscountDLC count', 'About the game', 
#                     'Supported languages', 'Full audio languages', 'Reviews', 'Header image', 
#                     'Website', 'Support url', 'Support email', 'Windows', 'Mac', 
#                     'Linux', 'Metacritic score', 'Metacritic url', 'User score', 
#                     'Positive', 'Negative', 'Score rank', 'Achievements', 
#                     'Recommendations', 'Notes', 'Average playtime forever', 
#                     'Average playtime two weeks', 'Median playtime forever', 
#                     'Median playtime two weeks', 'Developers', 'Publishers', 
#                     'Categories', 'Genres', 'Tags', 'Screenshots', 'Movies']
# columns_todrop= ['Estimated owners', "Peak CCU", "Required age", "Unknown"]

# df_1 = pd.read_csv("./../data/games.csv", header=None, skiprows=1)
# df_1.columns = games_df_column_names

# columns_todrop = ["Estimated owners", "Peak CCU", "Required age", "Unknown", "About the game", "Supported languages",
#                   "Full audio languages", "Reviews", "Header image", "Website", "Support url", "Support email", "Windows", "Mac",
#                   "Linux", "Metacritic url", "Positive", "Negative", "Score rank", "Achievements", "Recommendations", "Notes", "Tags",
#                   "Screenshots", "Movies", "Average playtime two weeks" ,"Median playtime forever", "Median playtime two weeks",
#                   "DiscountDLC count", "User score"]
# df_1.drop(columns_todrop, axis=1, inplace=True)

# df_1 = df_1[df_1["Metacritic score"] > 0]

# df_1_small = df_1.copy()
# # Step 4 Specify the file path to save data
# csv_file_path = "games-xs.csv"
# df_1_small.to_csv(csv_file_path)



# Inserting new colums
df_1.insert(9, "Category_1", np.nan)
df_1.insert(10, "Category_2", np.nan)
df_1.insert(12, "Genre_1", np.nan)
df_1.insert(13, "Genre_2", np.nan)

# Parsing and resetting index
df_1[["Category_1", "Category_2", "Genre_1", "Genre_2"]] = df_1[["Category_1", "Category_2", "Genre_1", "Genre_2"]].astype(object)
df_1["Categories"] = df_1["Categories"].astype(str)
df_1.reset_index(drop=True, inplace=True)

# Categories split
for i, categories in enumerate(df_1.copy()["Categories"].str.split(",")):
    df_1.iloc[i,9] = categories[0]
    if len(categories) > 1:
        df_1.iloc[i,10] = categories[1]
    else:
        continue

# Date Formatting
for i,date_string in enumerate(df_1["Release date"]):
    date = pd.to_datetime(date_string)
    # Convertir al formato deseado (DD-MM-YYYY)
    formatted_date = date.strftime('%d-%m-%Y')
    df_1.iloc[i,2] =  formatted_date

# Genres split
for i, genres in enumerate(df_1.copy()["Genres"].str.split(",")):
    if df_1.iloc[i,11] is not np.nan:
        df_1.iloc[i,12] = genres[0]
        if len(genres) > 1:
            df_1.iloc[i,13] = genres[1]
    else:
        continue

# 2nd CSV
df_2 = pd.read_csv("./../data/Steam_2024_bestRevenue_1500.csv")
df_2.rename(columns={"steamId":"AppID", "developers":"Developers", "publishers":"Publishers", 
             "publisherClass":"Publisher Class", "reviewScore":"User Score",
             "revenue":"Revenue", "price":"Price", "releaseDate":"Release date", "name":"Name","avgPlaytime":"Average playtime forever"}, inplace=True)

In [21]:
display(df_1.describe(), df_1.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3953 entries, 0 to 3952
Data columns (total 14 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   AppID                     3953 non-null   int64  
 1   Name                      3953 non-null   object 
 2   Release date              3953 non-null   object 
 3   Price                     3953 non-null   float64
 4   Metacritic score          3953 non-null   int64  
 5   Average playtime forever  3953 non-null   int64  
 6   Developers                3942 non-null   object 
 7   Publishers                3923 non-null   object 
 8   Categories                3953 non-null   object 
 9   Category_1                3953 non-null   object 
 10  Category_2                3597 non-null   object 
 11  Genres                    3948 non-null   object 
 12  Genre_1                   3948 non-null   object 
 13  Genre_2                   2990 non-null   object 
dtypes: float

Unnamed: 0,AppID,Price,Metacritic score,Average playtime forever
count,3953.0,3953.0,3953.0,3953.0
mean,483180.1,14.996949,72.920567,637.227928
std,428575.3,11.783016,10.57735,1792.649122
min,10.0,0.0,20.0,0.0
25%,219640.0,6.99,67.0,38.0
50%,354400.0,12.99,74.0,215.0
75%,695100.0,19.99,80.0,557.0
max,2995920.0,69.99,97.0,42773.0


None

In [22]:
display(df_2.describe(), df_2.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1500 entries, 0 to 1499
Data columns (total 11 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Name                      1500 non-null   object 
 1   Release date              1500 non-null   object 
 2   copiesSold                1500 non-null   int64  
 3   Price                     1500 non-null   float64
 4   Revenue                   1500 non-null   float64
 5   Average playtime forever  1500 non-null   float64
 6   User Score                1500 non-null   int64  
 7   Publisher Class           1500 non-null   object 
 8   Publishers                1499 non-null   object 
 9   Developers                1498 non-null   object 
 10  AppID                     1500 non-null   int64  
dtypes: float64(3), int64(3), object(5)
memory usage: 129.0+ KB


Unnamed: 0,copiesSold,Price,Revenue,Average playtime forever,User Score,AppID
count,1500.0,1500.0,1500.0,1500.0,1500.0,1500.0
mean,141482.6,17.519513,2632382.0,12.562704,76.201333,2183788.0
std,1132757.0,12.646612,27810240.0,21.542173,24.319438,606772.5
min,593.0,0.0,20674.0,0.0,0.0,24880.0
25%,4918.75,9.99,45504.25,3.564848,72.0,1792795.0
50%,11928.5,14.99,109053.0,6.762776,83.0,2321985.0
75%,37869.75,19.99,455156.8,13.104473,92.0,2693228.0
max,30739150.0,99.99,837793400.0,296.332852,100.0,3107330.0


None

In [23]:
display(df_1.head(5), df_2.head(5))

Unnamed: 0,AppID,Name,Release date,Price,Metacritic score,Average playtime forever,Developers,Publishers,Categories,Category_1,Category_2,Genres,Genre_1,Genre_2
0,1026420,WARSAW,02-10-2019,23.99,62,67,Pixelated Milk,"Pixelated Milk,gaming company","Single-player,Steam Achievements,Steam Trading...",Single-player,Steam Achievements,"Indie,RPG",Indie,RPG
1,22670,Alien Breed 3: Descent,17-11-2010,9.99,64,44,Team17 Digital Ltd,Team17 Digital Ltd,"Single-player,Multi-player,Co-op,Steam Achieve...",Single-player,Multi-player,Action,Action,
2,231330,Deadfall Adventures,15-11-2013,19.99,53,324,The Farm 51,THQ Nordic,"Single-player,Multi-player,Co-op,Steam Achieve...",Single-player,Multi-player,"Action,Adventure",Action,Adventure
3,897820,Reigns: Game of Thrones,18-10-2018,3.99,84,83,Nerial,Devolver Digital,"Single-player,Steam Achievements,Full controll...",Single-player,Steam Achievements,"Adventure,Indie,RPG",Adventure,Indie
4,12140,Max Payne,06-01-2011,3.49,89,168,Remedy Entertainment,Rockstar Games,Single-player,Single-player,,Action,Action,


Unnamed: 0,Name,Release date,copiesSold,Price,Revenue,Average playtime forever,User Score,Publisher Class,Publishers,Developers,AppID
0,WWE 2K24,07-03-2024,165301,99.99,8055097.0,42.36514,71,AAA,2K,Visual Concepts,2315690
1,EARTH DEFENSE FORCE 6,25-07-2024,159806,59.99,7882151.0,29.651061,57,Indie,D3PUBLISHER,SANDLOT,2291060
2,Sins of a Solar Empire II,15-08-2024,214192,49.99,7815247.0,12.452593,88,Indie,Stardock Entertainment,"Ironclad Games Corporation,Stardock Entertainment",1575940
3,Legend of Mortal,14-06-2024,440998,19.99,7756399.0,24.797817,76,Indie,"Paras Games,Obb Studio Inc.",Obb Studio Inc.,1859910
4,Shin Megami Tensei V: Vengeance,13-06-2024,141306,59.99,7629252.0,34.258496,96,AA,SEGA,ATLUS,1875830


In [24]:
games = pd.merge(df_1,df_2, how="outer")
games.head(10)

  games = pd.merge(df_1,df_2, how="outer")


Unnamed: 0,AppID,Name,Release date,Price,Metacritic score,Average playtime forever,Developers,Publishers,Categories,Category_1,Category_2,Genres,Genre_1,Genre_2,copiesSold,Revenue,User Score,Publisher Class
0,10,Counter-Strike,01-11-2000,9.99,88.0,10524.0,Valve,Valve,"Multi-player,PvP,Online PvP,Shared/Split Scree...",Multi-player,PvP,Action,Action,,,,,
1,30,Day of Defeat,01-05-2003,4.99,79.0,1397.0,Valve,Valve,"Multi-player,Valve Anti-Cheat enabled",Multi-player,Valve Anti-Cheat enabled,Action,Action,,,,,
2,70,Half-Life,08-11-1998,9.99,96.0,459.0,Valve,Valve,"Single-player,Multi-player,PvP,Online PvP,Stea...",Single-player,Multi-player,Action,Action,,,,,
3,80,Counter-Strike: Condition Zero,01-03-2004,9.99,65.0,1523.0,Valve,Valve,"Single-player,Multi-player,Valve Anti-Cheat en...",Single-player,Multi-player,Action,Action,,,,,
4,100,Counter-Strike: Condition Zero,01-03-2004,9.99,65.0,1321.0,Valve,Valve,"Single-player,Multi-player,Valve Anti-Cheat en...",Single-player,Multi-player,Action,Action,,,,,
5,130,Half-Life: Blue Shift,01-06-2001,4.99,71.0,175.0,Gearbox Software,Valve,"Single-player,Remote Play Together",Single-player,Remote Play Together,Action,Action,,,,,
6,220,Half-Life 2,16-11-2004,9.99,96.0,606.0,Valve,Valve,"Single-player,Steam Achievements,Steam Trading...",Single-player,Steam Achievements,Action,Action,,,,,
7,240,Counter-Strike: Source,01-11-2004,9.99,88.0,9171.0,Valve,Valve,"Multi-player,Cross-Platform Multiplayer,Steam ...",Multi-player,Cross-Platform Multiplayer,Action,Action,,,,,
8,300,Day of Defeat: Source,12-07-2010,9.99,80.0,732.0,Valve,Valve,"Multi-player,Cross-Platform Multiplayer,Steam ...",Multi-player,Cross-Platform Multiplayer,Action,Action,,,,,
9,380,Half-Life 2: Episode One,01-06-2006,7.99,87.0,253.0,Valve,Valve,"Single-player,Steam Achievements,Captions avai...",Single-player,Steam Achievements,Action,Action,,,,,


In [25]:
games.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5453 entries, 0 to 5452
Data columns (total 18 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   AppID                     5453 non-null   int64  
 1   Name                      5453 non-null   object 
 2   Release date              5453 non-null   object 
 3   Price                     5453 non-null   float64
 4   Metacritic score          3953 non-null   float64
 5   Average playtime forever  5453 non-null   float64
 6   Developers                5440 non-null   object 
 7   Publishers                5422 non-null   object 
 8   Categories                3953 non-null   object 
 9   Category_1                3953 non-null   object 
 10  Category_2                3597 non-null   object 
 11  Genres                    3948 non-null   object 
 12  Genre_1                   3948 non-null   object 
 13  Genre_2                   2990 non-null   object 
 14  copiesSo

# Cosas que interesa buscar para el dataset mediante scrapping
+ Metacritic score: faltan más de 1000 registros, relevantes para la primera hipótesis.<br>
[Página para conseguir esto](https://www.metacritic.com)
+ User Score: Para contrastar la critica de los medios especializados con la de los usuarios. <br>
[Página para conseguir esto](https://steamdb.info/charts/)
+ Copies Sold: Podriamos hacer un análisis estadístico cerrado con los que ya tenemos, pero no sería fiable. <br>
[Página para conseguir esto](https://github.com/molatosekgobela/Data-Science-Video-games-sales-dataset?tab=readme-ov-file)
+ avgPlaytime. <br>
[Página para conseguir esto](https://steamdb.info/charts/) 
+ Publisher Class. <br>
[Página para conseguir esto](https://vginsights.com/publishers-database)