## Steam Game Analysis and Prediction

### Introduction

This project aims to conduct a predictive analysis on a public dataset encompassing over 80,000 Steam games, each containing a multitude of attributes. The primary objective is to develop a model capable of suggesting similar games based on a user's selection. For instance, if a user selects Grand Theft Auto V, the model would return a top 5 list of games most closely resembling it, such as Grand Theft Auto: San Andreas. 

By leveraging this predictive capability, we seek to enhance game recommendations for Steam users and gain deeper insights into the complex relationships between various game attributes.

In [None]:
import pandas as pd

### Feature Engineering
#### Genre Column

In [None]:
irrelevant_genres = [
    'design & illustration',
    'massively multiplayer',
    'animation & modeling',
    'software training',
    'audio production',
    'game development',
    'video production',
    'web publishing',
    'sexual content',
    'photo editing',
    'documentary',
    'accounting',
    'utilities',
    'education',
    '360 video',
    'episodic',
    'tutorial',
    'violent',
    'nudity',
    'movie',
    'short',
    'gore',
    ]

df_exploded = df['Genres'].str.split(',').explode()

df.drop(df_exploded.isin(irrelevant_genres)[df_exploded.isin(irrelevant_genres)].index.drop_duplicates(), inplace=True)

df.reset_index(inplace=True)

#### Tags Column

In [None]:
replace_tags = {
    'e-sports': 'sports',
    'action rpg': 'action',
    'action rts': 'action',
    'dark comedy': 'comedy',
    '2d platformer': 'platformer',
    '3d platformer': 'platformer',
    'action-adventure': 'adventure',
    'puzzle-platformer': 'puzzle',
    'turn-based combat': 'turn-based',
    'turn-based strategy': 'turn-based',
    'turn-based tactics': 'turn-based',
    'massively multiplayer': 'multiplayer',
    'local multiplayer': 'multiplayer',
    'crpg': 'rpg',
    'jrpg': 'rpg',
    'world war i': 'wargame',
    'world war ii': 'wargame',
    'cold war': 'wargame',
    'traditional roguelike': 'roguelike',
    'action roguelike': 'roguelike',
    'rogue-like': 'roguelike',
    'space sim': 'simulation',
    'farming sim': 'simulation',
    'dating sim': 'simulation',
    'cartoony': 'cartoon',
    'coding': 'programming',
    'hacking': 'programming',
    '2d fighter': 'fighting',
    '3d fighter': 'fighting',
    'immersive sim': 'immersive',
    'political sim': 'politics',
    'political': 'politics',
    'resource management': 'management',
    'time management': 'management',
    'inventory management': 'management',
    'arena shooter': 'shooter',
    'hero shooter': 'shooter',
    'looter shooter': 'shooter',
    'extraction shooter': 'shooter',
    'lore-rich': 'story rich',
    'mini golf': 'golf',
    'dark fantasy': 'fantasy',
    }

df_exploded = df['Tags'].str.split(',').explode()

df_exploded = df_exploded.replace(replace_tags)

In [None]:
df_grouped = df.groupby(level=0)['Tags'].agg(list).reset_index()
low_frequency_values = df_exploded.value_counts()[df_exploded.value_counts() < 3000].index

filtered_df = df_exploded[df_exploded.isin(low_frequency_values)]