# Additional Visualizations

This notebook will detail some extra statistics and visuals from our dataset. We'll need some imports to get things set up.

In [None]:
import os
import ast
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly as py
import plotly.express as px
import panel as pn
import panel.widgets as pnw
import polars as pl

ModuleNotFoundError: No module named 'plotly'

Here's a look at our dataset one last time

In [None]:
steamdataset = pd.read_csv('../data/steamdataset.csv')
steamdataset["release_date"] = pd.to_datetime(steamdataset["release_date"])
steamdataset["date"] = pd.to_datetime(steamdataset["date"])
steamdataset

We'll need to do some preprocessing for the data:

In [None]:
steamdataset["genres"] = steamdataset["genres"].apply(
    lambda x: ast.literal_eval(x) if isinstance(x, str) else x
)

steamdataset["release_year"] = steamdataset["release_date"].dt.year

rating_map = {
    'Overwhelmingly Positive': 5,
    'Very Positive': 4,
    'Positive': 3,
    'Mixed': 2,
    'Negative': 1,
    'Very Negative': 0
}
steamdataset["rating_num"] = steamdataset["overall_player_rating"].map(rating_map)

#days since release (relative to when this code is run)
steamdataset["days_since_release"] = (
    pd.Timestamp.today() - steamdataset["release_date"]
).dt.days

steamdataset[["overall_player_rating", "rating_num", "release_year", "days_since_release"]].head()

### Graph 1

Finding the average rec_ratio by genre and release year

To solve this, we'l employ a heatmap

In [None]:
sns.set_style("whitegrid")

exploded = steamdataset.explode("genres")

#top 10 genres by frequency
top_genres = exploded["genres"].value_counts().head(10).index.tolist()
top_genre_df = exploded[exploded["genres"].isin(top_genres)].copy()

#compute mean rec_ratio by (genre, release_year)
heat_df = (
    top_genre_df
    .groupby(["genres", "release_year"])["rec_ratio"]
    .mean()
    .reset_index()
)

heat_pivot = heat_df.pivot(
    index="genres",
    columns="release_year",
    values="rec_ratio"
)

plt.figure(figsize=(12, 5))
sns.heatmap(
    heat_pivot,
    cmap="viridis",
    vmin=0,
    vmax=1,
    annot=True,
    fmt=".2f",
    cbar_kws={"label": "Average recommendation ratio"}
)
plt.title("Average Recommendation Ratio by Genre and Release Year")
plt.xlabel("Release Year")
plt.ylabel("Genre")
plt.tight_layout()
plt.show()

### Graph 2

Interactive plotly graph between rec_ratio and hours played