In [None]:
import pandas as pd
pd.set_option('display.max_columns', None)
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
import altair as alt
import re
alt.data_transformers.disable_max_rows()
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.preprocessing import QuantileTransformer
from sklearn.decomposition import PCA
from collections import Counter
from bs4 import BeautifulSoup

from itertools import combinations
from collections import Counter
import networkx as nx
from pyvis.network import Network

from data_type_fix import data_type_fix

In [None]:
%run custom_theme.py

## Steam’s Explosion: How Game Releases Went from a Trickle to a Flood

![alt text](steamdb_game_releases_per_year.png)

![alt text](steamdb_game_releases_per_month.png)

For those unfamiliar with it, Steam is a digital distribution platform for PC games, created by Valve—the studio behind the iconic Half-Life, released in 1998. Steam itself launched in 2003 as a way to distribute updates and patches for Valve’s games. But in 2005, everything changed: Valve opened Steam to third-party developers, laying the groundwork for what would become the largest game storefront on the planet.

Steam’s early years were relatively modest. But in 2012, Valve introduced Steam Greenlight, a system that allowed indie developers to submit games for user voting. It dramatically lowered the barriers to entry and engaged the community in deciding what got published. Greenlight is often credited as the catalyst for the surge in game releases that followed.

Then came 2017, the year Steam removed almost all gatekeeping. With Steam Direct, any developer who submitted the required information and a small fee could publish a game without needing approval. The floodgates opened—and the platform was never the same. Since then, the number of games released each year has skyrocketed, peaking at over 18,000 new titles in 2024.

In just a decade, Steam transitioned from a curated digital store to an open bazaar—one where creativity thrives but discoverability becomes a brutal contest.

To ensure the relevance and interpretability of the results, this study focuses on Steam games that have a corresponding Metacritic page. This constraint helps isolate games that have surpassed a threshold of public and critical visibility, allowing the analysis to focus on factors associated with appreciation among both critics and players. While this approach may exclude certain under-the-radar indie successes, it enables a more consistent and interpretable comparison set. Future work may expand this scope to include the long tail of the Steam catalogue.

In [None]:
df = pd.read_csv('steam_games.csv')
df = data_type_fix(df)

In [None]:
df['year'] = pd.to_datetime(df['release_date']).dt.year
df['2024_release'] = [year == 2024 for year in df['year']]

In [None]:
df['2024_release'].value_counts()

In [None]:
df = df[df['2024_release'] == False]

In [None]:
df.drop(columns=['2024_release'], inplace=True)

In [None]:
metacritic_data = pd.read_csv('games_list_v3.csv')

In [None]:
df = df.merge(metacritic_data[['appid', 'n_crit_revs', 'n_user_revs', 'metacritic_user_score_0_100_from_reviews']], how='left', left_on='appid', right_on='appid')

In [None]:
df['release_year'] = df['release_date'].dt.year

In [None]:
df[['release_year']].value_counts().reset_index().sort_values('release_year').reset_index(drop=True)

In [None]:
base = alt.Chart(df[['release_year']].value_counts().reset_index().sort_values('release_year').reset_index(drop=True)).mark_bar().encode(
    x = alt.X('release_year:O', axis=alt.Axis(labelAngle=270), title='year'),
    y = alt.Y('count:Q', title='')
)

rules_df = pd.DataFrame({
    'x': [2013, 2017],
    'label': ['Steam Greenlight', 'Steam Direct']
})

rules = alt.Chart(rules_df).mark_rule(strokeDash=[5], strokeWidth=3).encode(
    x='x:O',
    color=alt.Color('label:N',
        legend=alt.Legend(title='Steam Programs'),
        scale=alt.Scale(domain=['Steam Greenlight', 'Steam Direct'],
                        #range=['#00ff41', '#fe53bb'])  # Verde matrix e rosa acceso
                        range=['lime', 'blue'])
    )
)


metacritic_games_per_year = (base + rules)

metacritic_games_per_year.save('assets/charts/metacritic_games_per_year.json')

In [None]:
metacritic_games_per_year

While the number of games released on Steam has exploded—reaching over 18,000 in 2024—only a small fraction of them ever make it to Metacritic, the review aggregator that collects professional and user scores. This second plot focuses on that fraction: games released on Steam that received enough visibility to merit a Metacritic page.

The growth is still there—from just 1 game per year in the early 2000s to a peak of over 450 in 2020—but it's much more modest. And crucially, the gap between total releases and critically visible games has widened dramatically. In 2023, for example, only 340 of the 14,000+ games on Steam had a Metacritic presence. That’s just about 2.4%.

This tells a deeper story: most games today are born into obscurity. Steam's open-door policy may have democratized access to publishing, but critical attention has become a scarce and unevenly distributed resource.

In [None]:
df.drop(columns=['revenue_estimated'], inplace = True)

In [None]:
(df['n_user_revs'] == float(0)).value_counts()

In [None]:
df = df[df['n_user_revs'] != float(0)].reset_index(drop = True)

Of the games that reached Metacritic, not all of them managed to spark conversation. Many were released, reviewed by critics, and then seemingly disappeared into the digital void. To focus our analysis on games that actually reached players and provoked reactions, we removed any title that received zero user reviews.

This step helped us narrow the dataset down to games that were not just visible—but engaged with. After all, it’s hard to measure appreciation or popularity without a single player speaking up. Whether it’s praise, critique, or outright fury, we wanted games that left a footprint in the form of player feedback.

In [None]:
df['avg_estimated_owners'] = 0.5*(df['min_estimated_owners'] + df['max_estimated_owners'])

In [None]:
df[df['all_time_peak_ccu'].isna()]['24h_peak_ccu'].describe()

In [None]:
count = 0
for idx, row in df.iterrows():
    if isinstance(row['platforms'], list):
        if 'Windows PC' in row['platforms']:
            count += 1

print(count)

In [None]:
(df['windows'] == True).value_counts()

In [None]:
count = 0
for idx, row in df.iterrows():
    if isinstance(row['platforms'], list):
        if 'Linux' in row['platforms']:
            count += 1

print(count)

In [None]:
(df['linux'] == True).value_counts()

In [None]:
count = 0
for idx, row in df.iterrows():
    if isinstance(row['platforms'], list):
        if 'Mac' in row['platforms']:
            count += 1

print(count)

In [None]:
(df['mac'] == True).value_counts()

In [None]:
# Step 1: Clean 'platforms' column, removing certain entries safely
to_remove = {'Windows PC', 'Linux', 'Mac'}

df['platforms'] = df['platforms'].apply(
    lambda lst: [p for p in lst if p not in to_remove] if isinstance(lst, list) else []
)

# Step 2: Add platforms based on boolean columns
def add_platforms(row):
    if row['windows']:
        row['platforms'].append('Windows')
    if row['linux']:
        row['platforms'].append('Linux')
    if row['mac']:
        row['platforms'].append('Mac')
    return row

df = df.apply(add_platforms, axis=1)

In [None]:
df[df['plays_backloggd'].isna()]

In [None]:
df.at[1000, 'plays_backloggd'] = 50
df.at[1000, 'playing_backloggd'] = 1
df.at[1000, 'backlogs_backloggd'] = 24
df.at[1000, 'wishlist_backloggd'] = 7
df.at[1000, 'lists_backloggd'] = 17
df.at[1000, 'reviews_backloggd'] = 6
df.at[1000, 'likes_backloggd'] = 0

In [None]:
df.at[2292, 'plays_backloggd'] = 187
df.at[2292, 'playing_backloggd'] = 5
df.at[2292, 'backlogs_backloggd'] = 103
df.at[2292, 'wishlist_backloggd'] = 32
df.at[2292, 'lists_backloggd'] = 28
df.at[2292, 'reviews_backloggd'] = 6
df.at[2292, 'likes_backloggd'] = 6

In [None]:
df.at[4512, 'plays_backloggd'] = 126
df.at[4512, 'playing_backloggd'] = 0
df.at[4512, 'backlogs_backloggd'] = 34
df.at[4512, 'wishlist_backloggd'] = 7
df.at[4512, 'lists_backloggd'] = 25
df.at[4512, 'reviews_backloggd'] = 11
df.at[4512, 'likes_backloggd'] = 1

In [None]:
df['platforms'] = df['platforms'].fillna('[]')

idx = 0
for platforms_list in df['platforms'].values:
    if platforms_list == '[]':
        df.at[idx, 'platforms'] = list()
    idx += 1

In [None]:
df['platforms'].isna().value_counts()

In [None]:
df['num_supported_languages'] = df['supported_languages'].apply(len)
df['num_audio_languages'] = df['full_audio_languages'].apply(len)
df['num_platforms'] = df['platforms'].apply(len)

In [None]:
df['reviews_score_fancy'].describe()

In [None]:
df['reviews_total'].describe()

In [None]:
df['reviews_score_fancy'] = df['reviews_score_fancy'].fillna(float(0))

df.columns = df.columns.str.strip()

In [None]:
df['metacritic_user_score'].isna().value_counts()

In [None]:
(df['metacritic_user_score'].isna() & df['metacritic_user_score_0_100_from_reviews'].notna()).value_counts()

In [None]:
df['is_missing_metacritic_user_score'] = 0
for idx, row in df.iterrows():
    if pd.isna(row['metacritic_user_score']):
        df.at[idx, 'is_missing_metacritic_user_score'] = 1

In [None]:
df['is_missing_metacritic_user_score'].value_counts()

In [None]:
df['metacritic_user_score_filled'] = df['metacritic_user_score']

In [None]:
round(df['metacritic_user_score_0_100_from_reviews']*0.1, 1)

In [None]:
for idx, row in df.iterrows():
    if pd.isna(row['metacritic_user_score_filled']) & pd.notna(row['metacritic_user_score_0_100_from_reviews']):
        df.at[idx, 'metacritic_user_score_filled'] = round(row['metacritic_user_score_0_100_from_reviews']*0.1, 1)

In [None]:
df['metacritic_user_score_filled'] = df['metacritic_user_score_filled'].fillna(df['metacritic_score']*0.1)

In [None]:
df[(df['reviews_total'] != df['positive'] + df['negative'])][['appid','name','developers','publishers','reviews_total','positive','negative','reviews_score_fancy']]

In [None]:
df.at[185,'reviews_score_fancy'] = 0.
df.at[185, 'reviews_total'] = 6
df.at[185, 'positive'] = 0
df.at[185, 'negative'] = 0

df.at[735,'reviews_score_fancy'] = 96.0
#df.at[735, 'reviews_total'] = 6
df.at[735, 'positive'] = int(0.96 * df.at[735, 'reviews_total'])
df.at[735, 'negative'] = int((1 - 0.96) * df.at[735, 'reviews_total'])

df.at[1409,'reviews_score_fancy'] = 74.0
df.at[1409, 'reviews_total'] = 43
df.at[1409, 'positive'] = int(0.74 * df.at[1409, 'reviews_total'])
df.at[1409, 'negative'] = int((1 - 0.74) * df.at[1409, 'reviews_total'])

df.at[1893,'reviews_score_fancy'] = 46.0
df.at[1893, 'reviews_total'] = 15
df.at[1893, 'positive'] = int(0.46 * df.at[1893, 'reviews_total'])
df.at[1893, 'negative'] = int((1 - 0.46) * df.at[1893, 'reviews_total'])

df.at[2477,'reviews_score_fancy'] = 0.
df.at[2477, 'reviews_total'] = 1
df.at[2477, 'positive'] = 0
df.at[2477, 'negative'] = 0

df.at[2541,'reviews_score_fancy'] = 76.0
df.at[2541, 'reviews_total'] = 120
df.at[2541, 'positive'] = int(0.76 * df.at[2541, 'reviews_total'])
df.at[2541, 'negative'] = int((1 - 0.76) * df.at[2541, 'reviews_total'])

df.at[3770,'reviews_score_fancy'] = 0.
df.at[3770, 'reviews_total'] = 3
df.at[3770, 'positive'] = 0
df.at[3770, 'negative'] = 0

df.at[4613,'reviews_score_fancy'] = 0.
df.at[4613, 'reviews_total'] = 0
df.at[4613, 'positive'] = 0
df.at[4613, 'negative'] = 0

In [None]:
df['n_user_revs'].describe()

In [None]:
df['likes_backloggd'].isna().value_counts()

In [None]:
total_average_estimated_owners_per_year = alt.Chart(df[['year', 'avg_estimated_owners']]).mark_bar().encode(
    x = alt.X('year:O', axis=alt.Axis(labelAngle=270), title=''),
    y = alt.Y('sum(avg_estimated_owners):Q', title='Total estimated owners')
).properties(
    #width=600
)

average_playtime_per_year = alt.Chart(df[['year', 'average_forever']]).mark_bar(color = '#00ff00').encode(
    x = alt.X('year:O', axis=alt.Axis(labelAngle=270)),
    y = alt.Y('mean(average_forever):Q', title='Average playtime (in minutes)')
).properties(
    #width=600
)

total_average_estimated_owners_per_year_plus_average_playtime_per_year = (total_average_estimated_owners_per_year & average_playtime_per_year)
#total_average_estimated_owners_per_year_plus_average_playtime_per_year.properties(width=500)

total_average_estimated_owners_per_year_plus_average_playtime_per_year.save('assets/charts/total_average_estimated_owners_per_year_plus_average_playtime_per_year.json')

In [None]:
total_average_estimated_owners_per_year_plus_average_playtime_per_year

The growth of the player base in the past decade is undoubtedly impressive. With more casual gamers joining the hardcore gamers, as well as more options of games, there’s no doubt that the expectations will make it harder for game developers to keep up.

The year 2013 seemed to be a turning point because both player base and their average playtime per game showed a clear decline afterward. From our previous discussion, we know that this is also where the Steam game library started to explode. So it seems that the ballooned game library failed to attract more players or playtime from them. It clearly says something about the game quality in general.

What catches the attention here is the unreal spike of playtime in 2000. Similar but much smaller spikes also happened in 1998, 2004, and 2013. Let’s zoom in and see what happened in those years.

The y-axis shows the average playtime per game, and the size of the dots shows the size of the player base for that particular game.

In [None]:
explain_spikes = alt.Chart(df[['name','year', 'average_forever','avg_estimated_owners']]).transform_filter(alt.FieldOneOfPredicate(field='year', oneOf=[1998, 2000, 2001, 2003, 2004, 2013])).mark_point().encode(
    x = alt.X('year:O'),
    y = alt.Y('average_forever:Q', title='Average playtime (in minutes)'),
    size = alt.Size('avg_estimated_owners:Q', legend=alt.Legend(symbolType='circle', title='Avg estimated owners', format='.1s')),
    tooltip=[
        alt.Tooltip('name:N'),
        alt.Tooltip('average_forever:Q'),
        alt.Tooltip('avg_estimated_owners:Q', format=".1s")
    ]
).properties(
    width = 400
)

explain_spikes.save('assets/charts/explain_spikes.json')

In [None]:
explain_spikes

Now the answer is clear. Those spikes in playtime were results of the greatest hits in game history.

“Half-Life” was the cornerstone that brought fame and wealth to Valve in the first place. “Counter-Strike” was the legendary first-person shooter (FPS) game that took the world by storm. That explains the unreal spike of playtime in 2000, and no wonder why its sequel caused another spike in 2004. 2013 was backed up by “Dota 2” with one of the largest player base on Steam.

In [None]:
alt.Chart(df[['year', 'plays_backloggd']]).mark_bar().encode(
    x = alt.X('year:O', axis=alt.Axis(labelAngle=270)),
    y = alt.Y('mean(plays_backloggd):Q')
)

While sales and reviews reflect a game’s immediate success, the number of users who log a play on platforms like Backloggd offers a window into long-term engagement. When we look at the average number of plays per game by release year, the data tells a familiar story: standout years in video game history continue to echo through the habits of modern players.

Notably, 1998—the year of genre-defining titles like Half-Life—shows the highest average play counts, underscoring its enduring cultural legacy. Similar spikes appear in 2004, 2007, and the early 2010s, reflecting the lasting impact of titles such as Half-Life 2 and Portal. Even as the number of annual releases exploded in the late 2010s, the average plays per game saw a decline, suggesting a saturation point: more games are being released, but fewer stand out as lasting experiences.

In essence, the Backloggd play data reinforces a key idea already hinted at by ownership metrics—many of the most played games today were not released recently, but have proven their relevance across decades.

In [None]:
# Step 1: Create missing flag and impute all_time_peak_ccu
df['is_missing_all_time_peak'] = df['all_time_peak_ccu'].isna().astype(int)

def impute_all_time_peak(row):
    if pd.notna(row['all_time_peak_ccu']):
        return row['all_time_peak_ccu']
    else:
        return row['24h_peak_ccu']

df['all_time_peak_ccu_filled'] = df.apply(impute_all_time_peak, axis=1)

In [None]:
# Step 1: Get the rows with the max value per year
max_rows = df.loc[df.groupby('year')['all_time_peak_ccu_filled'].idxmax(), ['year', 'name', 'all_time_peak_ccu_filled']]

# Step 2: Plot using the filtered DataFrame
alt.Chart(max_rows).mark_bar().encode(
    x = alt.X('year:O', axis=alt.Axis(labelAngle=270)),
    y = alt.Y('all_time_peak_ccu_filled:Q'),
    tooltip = ['name', 'all_time_peak_ccu_filled']
)


One of the clearest signs of a game's cultural impact is the moment it draws the most people in at once. The maximum number of concurrent players on Steam provides a powerful snapshot of that peak — capturing the precise moment when excitement, community buzz, and player interest all converged.

This metric reveals fascinating patterns over time. In 2012, Counter-Strike: Global Offensive exploded with over 1.8 million simultaneous players, while Dota 2 followed closely in 2013 with over 1.2 million. More recently, Lost Ark (2022) and New World (2021) reached similarly massive peaks, reflecting the growing scale of online communities. Earlier titles, like POSTAL (1997) or Legacy of Kain: Soul Reaver (1999), show much smaller numbers — a reflection not of their relevance, but of the era's limited infrastructure and reach.

While these values don't tell the full story of a game's lifespan or quality, they offer a compelling glimpse into moments when a title truly dominated players’ attention.

In [None]:
df['reviews_total'].describe()

In [None]:
plt.hist(df['reviews_total'].values, bins=100)
plt.xscale('log')
plt.yscale('log')
plt.show()

In [None]:
# Cap a 99° percentile
cap = df['reviews_total'].quantile(0.99)
df_filtered = df[df['reviews_total'] <= cap]

alt.Chart(df_filtered).mark_bar().encode(
    x=alt.X('reviews_total:Q', bin=alt.Bin(maxbins=40), title='Total Reviews'),
    y=alt.Y('count()', title='Count')
)

In addition to playtime and estimated ownership, user reviews are one of the clearest indicators of how much attention a game has received. Steam reviews are both a signal of reach and community engagement. However, as the following plots show, this engagement is heavily skewed: while a small number of games accumulate tens of thousands of reviews, the vast majority receive very few — or none at all.

In [None]:
def categorize_reviews(x):
    if x < 100:
        return '0-99'
    elif x < 1000:
        return '100-999'
    elif x < 10000:
        return '1000-9999'
    elif x < 100000:
        return '10000-99999'
    else:
        return '100000+'

df['reviews_bucket'] = df['reviews_total'].apply(categorize_reviews)

In [None]:
reviews_distribution = alt.Chart(df[['reviews_bucket']]).mark_bar().encode(
    x = alt.X('reviews_bucket:N', title='Number of reviews'),
    y = alt.Y('count()')
).properties(
    width=540
)

reviews_bucket_per_released_games = alt.Chart(df[['year', 'reviews_bucket']]).mark_bar().encode(
    x=alt.X('year:O', axis=alt.Axis(labelAngle=270), title='year'),
    y=alt.Y('count()', title='Number of games released'),
    color=alt.Color('reviews_bucket:N', title='Reviews Bucket',
                    sort=['0-99', '100-999', '1000-9999', '10000-99999', '100000+'], legend=alt.Legend(orient='bottom', title='Number of reviews')),
    tooltip=['year:O', 'reviews_bucket:N', 'count()']
)

reviews_distribution_plus_reviews_bucket_per_released_games = reviews_distribution & reviews_bucket_per_released_games
reviews_distribution_plus_reviews_bucket_per_released_games.save('assets/charts/reviews_distribution_plus_reviews_bucket_per_released_games.json')

In [None]:
reviews_distribution_plus_reviews_bucket_per_released_games

While thousands of games are released on Steam each year, only a small fraction capture the majority of user engagement. Based on user review counts, just 170 games have earned over 100,000 reviews, while nearly 1,600 titles sit in the modest 100–999 review range. The most common group — around 1,800 games — falls between 1,000 and 9,999 reviews. Meanwhile, 345 games remain virtually unseen, with fewer than 100 reviews to their name.

This stark imbalance in attention underscores the platform’s highly competitive landscape, where visibility and engagement are far from evenly distributed. The vast majority of games struggle for recognition, while a few dominate the discourse and player feedback.

As the number of games released each year on Steam surged throughout the 2010s, so did the spread of user reviews — but not evenly. Early years like 1997 to 2004 saw only a handful of games reach higher review brackets. From 2010 onward, however, the landscape shifted: each year brought dozens of new titles amassing between 10,000 and 99,999 reviews, while a select few — often high-profile releases — crossed the 100,000 mark.

Still, most games remained in lower tiers. In 2020 alone, while 24 games surpassed 100,000 reviews, over 180 were in the 1,000–9,999 range, and 35 had fewer than 100 reviews. The disparity between visibility and obscurity has widened alongside the volume of releases, underscoring a platform defined by both runaway hits and forgotten titles.

In [None]:
crit = alt.Chart(df).mark_bar().encode(
    x = alt.X('year:O', axis=alt.Axis(labelAngle=270)),
    y = alt.Y('sum(n_crit_revs):Q')
)

user = alt.Chart(df).mark_bar().encode(
    x = alt.X('year:O', axis=alt.Axis(labelAngle=270)),
    y = alt.Y('sum(n_user_revs):Q')
)

crit | user

**The Voices That Shape a Game’s Reputation**

Beyond player activity, the volume of reviews — both from professional critics and everyday users — offers essential insight into how visible and discussed a game was at the time of its release.

Critic reviews, often representing curated press outlets, grew steadily throughout the 2000s, spiking sharply after 2005. By 2014, Metacritic was aggregating over 13,000 professional reviews for Steam games released that year — a figure that remained high through the following decade. This trend reflects both an expanding game industry and the increasingly formalized role of games journalism.

On the other hand, user reviews present a different kind of signal: mass engagement. From relatively modest numbers in the early 2000s, user reviews exploded after 2010. Especially notable is the dramatic surge in 2020, where the total number of user reviews more than doubled compared to the previous year — a spike that may reflect the pandemic-era boom in gaming activity. By 2023, over 70,000 user reviews were recorded for that year’s releases alone.

Together, these two metrics help sketch the shifting landscape of game discourse: from the editorial voices of critics to the vast, often passionate reactions of players themselves.

Another interesting aspect to study is if and how the critic and the player base are misaligned in the evaluation of a game

In [None]:
df['metacritic_user_score_0_100'] = df['metacritic_user_score']*10

In [None]:
df[['appid','name','metacritic_score','metacritic_user_score_0_100']]

In [None]:
df['metacritic_score_difference'] = (df['metacritic_user_score_0_100'] - df['metacritic_score'])

In [None]:
df[['appid','name','metacritic_score','metacritic_user_score_0_100','metacritic_score_difference']].sort_values(by='metacritic_score_difference', ascending=False).head(40)

In [None]:
metacritic_score = alt.Chart(df[['metacritic_score']]).mark_bar().encode(
    x = alt.X('metacritic_score:Q', bin=True, title='Metacritic Critic Score'),
    y = alt.Y('count():Q',title='count')
)

user_score = alt.Chart(df[['metacritic_user_score_0_100']]).mark_bar(color="#00ff00").encode(
    x = alt.X('metacritic_user_score_0_100:Q', bin=True, title='Metacritic User Score'),
    y = alt.Y('count():Q', title='')
)

critic_plus_user_score = metacritic_score | user_score
critic_plus_user_score.save('assets/charts/critic_plus_user_score.json')

In [None]:
critic_plus_user_score

In [None]:
steam_positive_review_ratio = alt.Chart(df[['reviews_score_fancy', 'reviews_bucket']]).mark_bar().encode(
    x = alt.X('reviews_score_fancy:Q', bin=True, title='Steam Positive Review Ratio'),
    y = alt.Y('count():Q', title='count'),
    color=alt.Color('reviews_bucket:N', title='Number of reviews',
                    sort=['0-99', '100-999', '1000-9999', '10000-99999', '100000+'])
    
)

steam_positive_review_ratio.save('assets/charts/steam_positive_review_ratio.json')

In [None]:
steam_positive_review_ratio

**Three Ways to Judge a Game**

The perception of quality can vary dramatically depending on who’s holding the mic. By looking at the distribution of scores from Metacritic critics, Metacritic users, and Steam user ratings, we see clear patterns emerge — and a few surprises.

Critics on Metacritic tend to hover in the middle-upper range, with a concentration between 70–80, and relatively few games rated below 60. This compressed distribution reflects a long-standing criticism of the games press: the reluctance to give truly low scores, perhaps due to industry dynamics or an editorial focus on already-promising titles.

Metacritic users, while more willing to dip into lower scores, also cluster in the 60–80 range — with a noticeable bump in the 50–60 interval and even some extreme low ratings. This spread suggests a wider range of sentiment, including disappointment or backlash, but still broadly centered around average-to-good experiences.

Steam user ratings, presented as a positive review ratio, tell a different story. The majority of games score above 80%, with a heavy right skew — and more than 1,200 games sitting in the 90–100 range. Steam’s binary “thumbs up or down” system tends to inflate these scores and as a consequence it offers less nuance compared to Metacritic's scoring, which uses a broader numerical scale.

Together, these distributions illustrate the fragmented yet complementary nature of game evaluation: from the cautious optimism of critics to the vocal intensity of fans.

**Popularity Amplifies Positivity**

A closer look at Steam’s user rating system reveals that positivity tends to rise with popularity. Among games with extremely high review scores (90–100%), most are backed by hundreds to thousands of user reviews — and a sizable number (88 titles) even exceed 100,000 reviews. These aren’t just obscure cult favorites with inflated scores; they’re widely played and publicly endorsed.

Conversely, as we move toward lower Steam score brackets (below 60%), we also see a marked drop in review volume. Games in the 30–50% positive range often have fewer than 1,000 reviews, many even under 100. The implication is twofold: lower-rated games not only fare worse critically, but they also struggle to attract attention, compounding their invisibility.

This trend underscores how Steam’s score distribution is skewed not just by fan enthusiasm, but also by review volume bias. High-scoring, widely reviewed games benefit from social proof and visibility, while low-rated titles are often both less liked and less played. The takeaway? On Steam, positivity thrives in numbers.

In [None]:
df['metacritic_score_difference_pct'] = (df['metacritic_score_difference'] / df['metacritic_score']) * 100

In [None]:
bins = [-np.inf, -30, -20, -10, -5, 5, 10, 20, 30, np.inf]
labels = [
    'User <<< Critic (<-30%)',
    'User << Critic (-30% to -20%)',
    'User < Critic (-20% to -10%)',
    'User ~ Critic (-10% to -5%)',
    'User ≈ Critic (-5% to +5%)',
    'User ~ Critic (+5% to +10%)',
    'User > Critic (+10% to +20%)',
    'User >> Critic (+20% to +30%)',
    'User >>> Critic (>+30%)'
]

df['metacritic_score_difference_pct_category'] = pd.cut(df['metacritic_score_difference_pct'], bins=bins, labels=labels)

# Count games in each category
category_counts = df['metacritic_score_difference_pct_category'].value_counts().sort_index()

# Display results
print("Games by percentage difference category:")
print(category_counts)

In [None]:
metacritic_score_difference = alt.Chart(category_counts.reset_index()).mark_bar().encode(
    x = alt.X('count:Q'),
    y = alt.Y('metacritic_score_difference_pct_category:N', sort='-x', title=''),
    tooltip=['metacritic_score_difference_pct_category:N', 'count:Q']
).properties(
)
metacritic_score_difference.save('assets/charts/metacritic_score_difference.json')

In [None]:
metacritic_score_difference

While most games fall within a reasonable margin of agreement between critics and users, a small but striking group stands out at the fringes — titles where perception differs by more than 30% between reviewers and the broader player base.

In 92 games, users rated the experience at least 30% higher than critics. These are often cult hits, late bloomers, or games that critics may have undervalued at launch — possibly due to technical issues or unconventional design choices that later gained appreciation among dedicated fans. Such titles suggest that critical frameworks sometimes miss the emotional or community-driven impact a game might achieve over time.

Conversely, 238 games were rated over 30% lower by users than by critics. These cases often reflect backlash to hype, controversial monetization models, or post-launch issues not captured in early reviews. Games that launched with technical shortcomings, misleading marketing, or gameplay decisions unpopular with core audiences frequently fall into this group — highlighting a disconnect between polished pre-release impressions and real-world playability.

These extremes underscore a broader truth: critical consensus does not guarantee lasting approval, and games are increasingly judged over time, not just at launch. As gaming audiences grow more vocal and discerning, the gap between professional critique and grassroots sentiment becomes both more visible — and more telling.

In [None]:
user_vs_critic_score_by_difference = alt.Chart(df[df['metacritic_user_score'].notna()][['metacritic_score', 'metacritic_user_score_0_100', 'metacritic_score_difference_pct_category', 'name', 'metacritic_score_difference', 'metacritic_score_difference_pct']]).mark_circle().encode(
    x=alt.X('metacritic_score:Q',  title='Critic Score'),
    y=alt.Y('metacritic_user_score_0_100:Q', title='User Score'),
    color=alt.Color('metacritic_score_difference_pct_category:N', title='Score Difference Category', sort=[
        'User <<< Critic (<-30%)',
        'User << Critic (-30% to -20%)',
        'User < Critic (-20% to -10%)',
        'User ~ Critic (-10% to -5%)',
        'User ≈ Critic (-5% to +5%)',
        'User ~ Critic (+5% to +10%)',
        'User > Critic (+10% to +20%)',
        'User >> Critic (+20% to +30%)',
        'User >>> Critic (>+30%)'], scale=alt.Scale(range=[
"#E20194", "#FF00FF", "#FF66CF", "#DFABCF", "#FFFFFF", "#AFD5A6", "#79FF59", "#00FF00", "#23B700"
])
    ),
    tooltip=[alt.Tooltip('name', title='name'), alt.Tooltip('metacritic_score', title='critic'), alt.Tooltip('metacritic_user_score_0_100', title='user'), alt.Tooltip('metacritic_score_difference', title='Numerical difference'), alt.Tooltip('metacritic_score_difference_pct', title='Percentage difference', format='.2f'), alt.Tooltip('metacritic_score_difference_pct_category', title='Category')]
).properties(
    width=600,
    height=400
).interactive()

user_vs_critic_score_by_difference.save('assets/charts/user_vs_critic_score_by_difference.json')

In [None]:
user_vs_critic_score_by_difference

It would be interesting to analyze in more details (espacially doing the analysis of the reviews) the games that have an high misalignment between critics and user base.

In [None]:
# Step 2: Define features for PCA

# Step 3: Log transform skewed features
# (Skip binary features like is_missing_all_time_peak)
log_transform_features = [
    '24h_peak_ccu',
    'all_time_peak_ccu_filled',
    #'is_missing_all_time_peak',
    'avg_estimated_owners',
    'reviews_total',
    'plays_backloggd',
    #'playing_backloggd',
    'backlogs_backloggd',
    #'lists_backloggd',
    'reviews_backloggd',
    'recommendations',
    'n_crit_revs',
    'n_user_revs'
    #'average_forever',
]

# Apply log1p transformation (log(1 + x)) to avoid issues with zeros
for col in log_transform_features:
    df[col + '_log'] = np.log1p(df[col])

# Create the new feature matrix using transformed columns + the binary column
pca_features = [col + '_log' for col in log_transform_features] + ['is_missing_all_time_peak']

X = df[pca_features]

# Step 4: Standardize
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Step 5: PCA
pca = PCA(n_components=1)
popularity_score = pca.fit_transform(X_scaled)

# Step 6: Normalize to [0, 1]
df['popularity_score'] = MinMaxScaler().fit_transform(popularity_score)

In [None]:
critic_gt_user = df[df['metacritic_score_difference_pct_category'] == 'User <<< Critic (<-30%)'][['appid', 'name', 'metacritic_score_difference_pct', 'metacritic_score_difference_pct_category', 'popularity_score']].sort_values(by = 'popularity_score', ascending=[False]).head(15).reset_index(drop=True)
critic_gt_user

A striking pattern emerges when comparing user and critic scores: some highly anticipated and critically praised titles face significant backlash from players. Among the top 15 games where user scores trail critic scores by over 30%, franchises like Call of Duty, Battlefield, and Destiny 2 stand out. These games often launched amidst controversies involving aggressive monetization, technical issues, or unmet player expectations. Titles such as Battlefield™ 2042 and Fallout 76 struggled with buggy releases that soured community sentiment despite positive critical reception. Similarly, games like The Sims™ 4 and STAR WARS™ Battlefront™ II faced criticism over pay-to-win mechanics and content gating. This divergence highlights a recurring disconnect between professional reviews and player experiences, underscoring the complexities of evaluating games in the evolving landscape of live-service models and monetization strategies.

In [None]:
critic_gt_user.to_csv('critic_gt_user.csv', index=False)

In [None]:
user_gt_critic = df[df['metacritic_score_difference_pct_category'] == 'User >>> Critic (>+30%)'][['appid', 'name', 'metacritic_score_difference_pct', 'metacritic_score_difference_pct_category', 'popularity_score']].sort_values(by = ['popularity_score'], ascending=[False]).head(15).reset_index(drop=True)
user_gt_critic

Notice that popularity_score is lower for games with User >>> Critic (>+30%)

On the flip side, some games earn far more praise from players than from critics, with user scores exceeding critic scores by over 30%. This list includes cult favorites and niche titles like 7 Days to Die, Rain World, and POSTAL 2, which often develop passionate fanbases despite mixed or lukewarm professional reviews. Many of these games offer unique gameplay experiences, deep community engagement, or long-term updates that resonate strongly with players. MMORPGs like FINAL FANTASY XIV Online also showcase how ongoing content and player-driven evolution can elevate a game’s appreciation well beyond initial critical impressions. This phenomenon reflects the gap between traditional review criteria and the values that players often prioritize, such as replayability, community, and emotional connection.

In [None]:
user_gt_critic.to_csv('user_gt_critic.csv', index=False)

Popularity in the gaming world is more than just raw sales—it’s a combination of visibility, engagement, and community presence. To capture this complex reality, we created a composite Popularity Score using data from both Steam and gaming databases like Metacritic and Backloggd.

We began with ten core indicators of attention, ranging from how many people bought or played a game, to how often it was reviewed or recommended. These included:

- Player activity metrics like peak concurrent users (24h and all-time)

- Estimated number of owners

- Steam review counts (total and recommendations)

- Engagement on Backloggd (plays, backlogs, reviews)

- Number of critic and user reviews on Metacritic

Because these numbers can vary wildly between indie darlings and AAA blockbusters, we applied a logarithmic transformation to scale them more evenly. Then, we used Principal Component Analysis (PCA) to synthesize these features into a single, unified Popularity Score (which is then normalized between $0$ and $1$).

This score doesn’t just tell us how many people bought a game—it reflects how visible, discussed, and socially present a game is in the gaming ecosystem.

In [None]:
# Step 1: Create missing flag and impute all_time_peak_ccu
df['is_missing_all_time_peak'] = df['all_time_peak_ccu'].isna().astype(int)

def impute_all_time_peak(row):
    if pd.notna(row['all_time_peak_ccu']):
        return row['all_time_peak_ccu']
    else:
        return row['24h_peak_ccu']

df['all_time_peak_ccu_filled'] = df.apply(impute_all_time_peak, axis=1)

# Step 2: Define features for PCA

# Step 3: Log transform skewed features
# (Skip binary features like is_missing_all_time_peak)
log_transform_features = [
    '24h_peak_ccu',
    'all_time_peak_ccu_filled',
    #'is_missing_all_time_peak',
    'avg_estimated_owners',
    'reviews_total',
    'plays_backloggd',
    #'playing_backloggd',
    'backlogs_backloggd',
    #'lists_backloggd',
    'reviews_backloggd',
    'recommendations',
    'n_crit_revs',
    'n_user_revs'
    #'average_forever',
]

# Apply log1p transformation (log(1 + x)) to avoid issues with zeros
for col in log_transform_features:
    df[col + '_log'] = np.log1p(df[col])

# Create the new feature matrix using transformed columns + the binary column
pca_features = [col + '_log' for col in log_transform_features] + ['is_missing_all_time_peak']

X = df[pca_features]

# Step 4: Standardize
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Step 5: PCA
pca = PCA(n_components=1)
popularity_score = pca.fit_transform(X_scaled)

# Step 6: Normalize to [0, 1]
df['popularity_score'] = MinMaxScaler().fit_transform(popularity_score)

In [None]:
loadings = pca.components_
loadings

In [None]:
hist = alt.Chart(df[['popularity_score']]).mark_bar().encode(
    x = alt.X('popularity_score:Q', title='Popularity Score', bin=alt.Bin(maxbins=30)),
    y = alt.Y('count()', title='count')
).properties(
    width=400
)


kde = alt.Chart(df[['popularity_score']]).transform_density(
    'popularity_score',
    as_=['popularity_score', 'density']
).mark_line(color='red').encode(
    x='popularity_score:Q',
    y=alt.Y('density:Q', axis=alt.Axis(labelPadding=25), title='KDE')
)

popularity_distribution = (hist + kde).properties(width=400, height=400).resolve_scale(y='independent')
popularity_distribution.save('assets/charts/popularity_distribution.json')

In [None]:
popularity_distribution

In [None]:
sns.histplot(df['popularity_score'], bins=30, kde=True)
plt.savefig('assets/images/popularity_distribution.png')
plt.show()

In [None]:
df.sort_values(by='popularity_score', ascending=False).head(15)[['name', 'popularity_score']]

In [None]:
top_15_popularity = alt.Chart(df.sort_values(by='popularity_score', ascending=False).head(15)[['name', 'popularity_score']]).mark_bar().encode(
    x = alt.X('popularity_score:Q', title='Popularity Score'),
    y = alt.Y('name:N', sort='-x', title=''),
    tooltip=['name','popularity_score']
)

top_15_popularity.save('assets/charts/top_15_popularity.json')

In [None]:
top_15_popularity

At the top of this popularity pyramid sit the undeniable titans:

- Blockbusters like Grand Theft Auto V, ELDEN RING, and Cyberpunk 2077, which combine massive marketing budgets with cinematic storytelling.

- Multiplayer giants like Counter-Strike: Global Offensive, Destiny 2, and Dota 2, which dominate through community and competition.

- Indie darlings like Stardew Valley, Hollow Knight, and Stray show that passion projects can thrive alongside AAA titans.

What’s striking is the diversity in genre, budget, and design philosophy. From The Witcher 3’s sprawling narrative to Rocket League’s arcade sportsmanship, the top 30 aren’t bound by formula. If anything, they represent different roads to success — a testament to the varied tastes of the modern gaming audience.

Another fascinating observation? Many of these games were not immediate hits. Titles like No Man’s Sky or Cyberpunk 2077 faced rocky launches but slowly climbed into favor through updates, transparency, and community trust — showing that redemption arcs are possible in the digital age.

This list is not just a leaderboard. It's a cultural snapshot. These are the games that didn’t just sell — they stayed. They sparked conversations, mods, memes, and memories. And in many cases, they’re still evolving.

In [None]:
median_popularity_df = df.groupby('release_date')[['popularity_score']].median().reset_index()
median_popularity_df['label'] = 'Median popularity score'

median_popularity = alt.Chart(median_popularity_df).mark_line(point=True).encode(
    x = alt.X('year(release_date):T'),
    y = alt.Y('median(popularity_score):Q', scale=alt.Scale(domain=[0.2,1.1])),
    color = alt.Color('label:N', legend=alt.Legend(title=''), scale=alt.Scale(range=['#00ff00'])),
    tooltip = ['year(release_date):T','median(popularity_score):Q']
)


most_popular_games = df.sort_values('popularity_score', ascending=False).groupby('year').first().reset_index()
most_popular_games_df = most_popular_games[['release_date', 'popularity_score', 'name']].copy(deep=True)
most_popular_games_df['label'] = 'Max popularity score'

top_popularity = alt.Chart(most_popular_games_df).mark_line(point=True).encode(
    x=alt.X('year(release_date):T', title='year'),
    y=alt.Y('popularity_score:Q', title='', scale=alt.Scale(domain=[0.2,1.1])),
    color = alt.Color('label:N', legend=alt.Legend(title=''), scale=alt.Scale(range=['#ff00ff'])),
    tooltip=['name:N', 'year(release_date):T', 'popularity_score:Q']
).properties(
)

median_popularity_and_max_popularity_by_year = (median_popularity + top_popularity).resolve_scale(color='independent')
median_popularity_and_max_popularity_by_year.save('assets/charts/median_popularity_and_max_popularity_by_year.json')

In [None]:
median_popularity_and_max_popularity_by_year

The median popularity score of games by release year paints a subtle but telling picture of how audience attention has evolved across generations. In the late 1990s and early 2000s, median scores regularly surpassed 0.6, suggesting a concentration of enduring, high-profile titles. Notably, 1998 and 2000 stand out, years marked by landmark releases that continue to command interest today.

However, from 2006 onward, a clear downward shift emerges. Median popularity scores hover closer to 0.35–0.45, reflecting a broader and more fragmented market. This likely mirrors the rise of digital distribution, indie development, and an explosion in the volume of new titles — where fewer individual games dominate attention the way they once did.

Recent years, particularly post-2020, show a modest resurgence, possibly due to pandemic-fueled engagement and the breakout success of a few widely streamed or viral titles. Still, the overall pattern suggests that while more games than ever are being played, the average title garners less sustained visibility — a testament to how player attention is now spread thin across an ever-expanding landscape.

In [None]:
top_games = df.sort_values('popularity_score', ascending=False).groupby('year').first().reset_index()

alt.Chart(top_games).mark_line(point=True).encode(
    x=alt.X('year:O', axis=alt.Axis(labelAngle=270), title='Year'),
    y=alt.Y('popularity_score:Q', title='Max Popularity Score'),
    tooltip=['name:N', 'year:O', 'popularity_score:Q']
)

Tracking the most popular game each year — as measured by a comprehensive popularity score — reveals a timeline of titles that not only defined their release windows but often reshaped the landscape of gaming altogether.

The late 1990s and early 2000s were heavily marked by Valve’s dominance, with Half-Life (1998), Counter-Strike (2000), and Half-Life 2 (2004) capturing widespread attention. The mid-2000s showed broader variety: Psychonauts (2006) earned cult status, while Team Fortress 2 (2007) signaled the arrival of enduring live-service models.

The 2010s saw an increasing blend of indie standouts (Stardew Valley, Hollow Knight) and AAA juggernauts (GTA V, Red Dead Redemption 2). Titles like Terraria (2011) and The Binding of Isaac: Rebirth (2014) proved that depth and replayability often trump scale in sustaining popularity.

In recent years, releases like Cyberpunk 2077, ELDEN RING, and Hogwarts Legacy reflect how marketing powerhouses and strong IPs continue to command outsized attention, even as the industry grows more crowded. Still, what binds these top performers is not just visibility at launch — it's their ability to retain a lasting cultural and player footprint.

While popularity tells us which games are getting the most attention, appreciation is about how deeply players value their time spent. To distill that sentiment into a measurable score, we combined three distinct signals of game quality:

- Steam review ratio: A grassroots signal of user satisfaction

- Metacritic critic score: A curated, professional assessment

- Metacritic user score: Broader public sentiment outside Steam’s ecosystem

We used Principal Component Analysis (PCA) to synthesize these variables into a single appreciation score (which is then normalized between $0$ and $1$) — one that cuts through isolated metrics and offers a more holistic picture of how a game is received across communities.

This approach lets us identify not just the most played games, but the most loved. 

In [None]:
appreciation_features = [
    'reviews_score_fancy',
    #'likes_backloggd',
    'metacritic_score',
    'metacritic_user_score_filled',
    #'is_missing_metacritic_user_score',
    #'few_steam_reviews',
    #'few_metacritic_user_reviews'
]

# log_transform_features = [
#     'adj_reviews_score_fancy',
#     #'likes_backloggd',
#     'metacritic_score',
#     'metacritic_user_score_filled',
# ]

# Apply log1p transformation (log(1 + x)) to avoid issues with zeros
# for col in log_transform_features:
#     df[col + '_log'] = np.log1p(df[col])

# Create the new feature matrix using transformed columns + the binary column
# pca_features = [col + '_log' for col in log_transform_features] + ['is_missing_metacritic_user_score']

X = df[appreciation_features] 

# Step 3: Standardize features before PCA
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Step 4: Run PCA (extract first principal component)
pca = PCA(n_components=1)
appreciation_score = pca.fit_transform(X_scaled)

# Add the score back to your DataFrame
df['appreciation_score'] = appreciation_score.flatten()

minmaxscaler = MinMaxScaler()
df['appreciation_score'] = minmaxscaler.fit_transform(df[['appreciation_score']])

In [None]:
loadings = pca.components_
loadings

In [None]:
hist = alt.Chart(df[['appreciation_score']]).mark_bar().encode(
    x = alt.X('appreciation_score:Q',title='Appreciation Score', bin=alt.Bin(maxbins=30)),
    y = alt.Y('count()', title='count')
)


kde = alt.Chart(df[['appreciation_score']]).transform_density(
    'appreciation_score',
    as_=['appreciation_score', 'density']
).mark_line(color='red').encode(
    x='appreciation_score:Q',
    y=alt.Y('density:Q', axis=alt.Axis(labelPadding=25), title='KDE')
)

appreciation_distribution = (hist + kde).properties(width=400, height=400).resolve_scale(y='independent')
appreciation_distribution.save('assets/charts/appreciation_distribution.json')

In [None]:
appreciation_distribution

In [None]:
sns.histplot(df['appreciation_score'], bins=30, kde=True)
plt.show()

In [None]:
df.sort_values('appreciation_score', ascending=False)['name'].values[:30]

In [None]:
top_15_appreciation = alt.Chart(df.sort_values(by='appreciation_score', ascending=False).head(15)[['name', 'appreciation_score']]).mark_bar().encode(
    x = alt.X('appreciation_score:Q', title='Appreciation Score'),
    y = alt.Y('name:N', sort='-x', title=''),
    tooltip=['name','appreciation_score']
)

top_15_appreciation.save('assets/charts/top_15_appreciation.json')

In [None]:
df.sort_values(by='appreciation_score', ascending=False).head(15)[['name', 'appreciation_score']]

In [None]:
top_15_appreciation

Popularity and appreciation do not always go hand in hand. While games like Grand Theft Auto V or Cyberpunk 2077 might dominate in terms of ownership and visibility, a different set of titles consistently rise to the top when we measure critical and user acclaim in unison.

Unsurprisingly, games like Baldur's Gate 3, Red Dead Redemption 2, Portal 2, and The Witcher 3 feature prominently. But the list also includes smaller-scale or older titles that continue to resonate: Celeste, System Shock 2, Pizza Tower, and Undertale all rank among the top 30.

What these games share is not budget, scope, or genre—but craftsmanship, consistency, and emotional resonance. Their high scores across all metrics suggest a convergence of design quality, storytelling, and community reception. They are games that players remember not just for what they did, but for how they made them feel.

This appreciation-driven ranking also reveals a powerful historical throughline. Half-Life, released in 1998, remains among the most appreciated games on the platform, joined by its sequels Half-Life 2 and Alyx. Legacy matters—but only when it’s earned.

In [None]:
median_appreciation_df = df.groupby('year')[['appreciation_score']].median()
median_appreciation_df['label'] = 'Median appreciation score'

In [None]:
median_appreciation_df.reset_index()

In [None]:
median_appreciation_df = df.groupby('year')[['appreciation_score']].median().reset_index()
median_appreciation_df['label'] = 'Median appreciation score'

median_appreciation = alt.Chart(median_appreciation_df).mark_line(point=True).encode(
    x = alt.X('year:O', axis=alt.Axis(labelAngle=270)),
    y = alt.Y('median(appreciation_score):Q', scale=alt.Scale(domain=[0.6,1.1])),
    color = alt.Color('label:N', legend=alt.Legend(title=''), scale=alt.Scale(range=['#00ff00'])),
    tooltip = ['year:O','median(appreciation_score):Q']
)


most_appreciated_games = df.sort_values('appreciation_score', ascending=False).groupby('year').first().reset_index()
most_appreciated_games_df = most_appreciated_games[['year', 'appreciation_score', 'name']].copy(deep=True)
most_appreciated_games_df['label'] = 'Max appreciation score'

top_appreciation = alt.Chart(most_appreciated_games_df).mark_line(point=True).encode(
    x=alt.X('year:O', axis=alt.Axis(labelAngle=270), title='year'),
    y=alt.Y('appreciation_score:Q', title='', scale=alt.Scale(domain=[0.6,1.1])),
    color = alt.Color('label:N', legend=alt.Legend(title=''), scale=alt.Scale(range=['#ff00ff'])),
    tooltip=['name:N', 'year:O', 'appreciation_score:Q']
).properties(

)

median_appreciation_and_max_appreciation_by_year = (median_appreciation + top_appreciation).resolve_scale(color='independent')
median_appreciation_and_max_appreciation_by_year.save('assets/charts/median_appreciation_and_max_appreciation_by_year.json')

In [None]:
median_appreciation_and_max_appreciation_by_year

As more and more games flood Steam each year, one question naturally arises:
Has the average quality of games improved, declined, or remained stable?

To answer that, we calculated the median appreciation score — a composite index of Steam user ratings, Metacritic critic scores, and Metacritic user feedback — for games released each year.

The results are revealing:

- From 1998 to around 2008, appreciation scores remained consistently high, reflecting a period when fewer, often more curated titles were released on the platform. The high score in 1998, for example, coincides with landmark releases like Half-Life and the early days of PC gaming prestige.

- Between 2009 and 2019, we observe a noticeable dip in median appreciation, with a particularly low point between 2014 and 2016. This coincides with the post-Greenlight era, when Steam opened its gates to a massive influx of indie titles — many innovative, but also many unpolished or derivative. The sheer volume made it harder for consistently high-quality games to dominate the average.

- More recently, from 2020 to 2023, appreciation appears to be climbing again, albeit modestly. This could suggest that developers and publishers have adapted to the saturated market, putting more care into polish and feedback — or that players have become better at filtering for quality through reviews and tags.

This trend underscores a broader industry shift:

More games don’t always mean better games. As barriers to entry dropped, so did the median perception of quality — but also, standout titles had more room to shine. This duality continues to shape the market today, where curation, discoverability, and community word-of-mouth are as critical as ever.

In [None]:
most_appreciated_games = df.sort_values('appreciation_score', ascending=False).groupby('year').first().reset_index()

alt.Chart(most_appreciated_games).mark_line(point=True).encode(
    x=alt.X('year:O', axis=alt.Axis(labelAngle=270), title='Year'),
    y=alt.Y('appreciation_score:Q', title='Max Appreciation Score'),
    tooltip=['name:N', 'year:O', 'appreciation_score:Q']
).properties(
    width=600
)

While popularity may indicate reach, appreciation reveals resonance — the degree to which a game connects deeply with players over time. Using a composite appreciation score, we can trace a lineage of standout titles that left lasting emotional and artistic impact, often regardless of commercial hype.

Some years echo the familiar: Half-Life (1998), Counter-Strike (2000), and Half-Life 2 (2004) were not just hits — they redefined genres. Others highlight the rediscovery of classics, like System Shock 2 (2013) and FINAL FANTASY IX (2016), beloved long after their original releases.

There are surprise champions too. Pizza Tower (2023), with its frenetic energy and indie charm, captivated a niche audience enough to top its year. Similarly, Celeste (2018) and Hollow Knight (2017) emerged as critical darlings, proving that tightly designed, emotionally rich experiences still punch far above their budget.

In standout years like 2015 and 2020, games such as The Witcher 3 and Baldur's Gate 3 didn’t just impress — they redefined expectations for narrative and freedom in role-playing games. The trend suggests that appreciation often favors craftsmanship, depth, and originality over sheer scale or sales.

In [None]:
df['popularity_plus_appreciation'] = df['popularity_score'] + df['appreciation_score']
df['popularity_minus_appreciation'] = df['popularity_score'] - df['appreciation_score']

In [None]:
df['is_indie'] = False

for idx, row in df.iterrows():
    if not isinstance(row['tags'], float):
        for tag in row['tags']:
            if tag == 'Indie':
                df.at[idx, 'is_indie'] = True
    if not isinstance(row['genres'], float):
        for genre in row['genres']:
            if genre == 'Indie':
                df.at[idx, 'is_indie'] = True

The scatter plot below positions games according to their appreciation (horizontal axis) and popularity (vertical axis) scores. With median lines dividing the space, four distinct profiles emerge. Games in the top-right quadrant — such as Red Dead Redemption 2 or The Witcher 3 — combine strong admiration with widespread reach. The top-left captures popular games with more mixed reception, while the bottom-right shows beloved titles with narrower appeal. The bottom-left hosts games that struggled to resonate broadly on either front. This quadrant framework helps decode the landscape of critical and commercial impact in a single glance.

In [None]:
high_pop_high_appr = df[(df['popularity_score'] > df['popularity_score'].median()) & (df['appreciation_score'] > df['appreciation_score'].median())][['appid', 'name', 'popularity_score', 'appreciation_score', 'popularity_plus_appreciation', 'is_indie']].sort_values(by='popularity_plus_appreciation', ascending=False).head(15).reset_index(drop=True)

In [None]:
high_pop_high_appr

In [None]:
high_pop_low_appr = df[(df['popularity_score'] > df['popularity_score'].median()) & (df['appreciation_score'] < df['appreciation_score'].median())][['appid', 'name', 'popularity_score', 'appreciation_score', 'popularity_minus_appreciation', 'is_indie']].sort_values(by='popularity_minus_appreciation', ascending=False).head(15).reset_index(drop=True)
high_pop_low_appr

The upper-left quadrant of the popularity vs. appreciation plot reveals a compelling pattern: games that drew significant attention but failed to win over audiences. Titles like Battlefield™ 2042, Fallout 76, and The Lord of the Rings: Gollum™ were highly visible, yet their reception fell flat. These games often launched with high expectations — bolstered by marketing, IP strength, or franchise legacy — but ultimately struggled to deliver satisfying experiences. Their placement reflects a key insight: visibility doesn’t guarantee admiration.

In [None]:
high_pop_low_appr.to_csv('high_pop_low_appr.csv', index=False)

In [None]:
low_pop_low_appr = df[(df['popularity_score'] < df['popularity_score'].median()) & (df['appreciation_score'] < df['appreciation_score'].median())][['appid', 'name', 'popularity_score', 'appreciation_score', 'popularity_plus_appreciation', 'is_indie']].sort_values(by='popularity_plus_appreciation', ascending=True).head(15).reset_index(drop=True)

In [None]:
low_pop_low_appr

In [None]:
low_pop_high_appr = df[(df['popularity_score'] < df['popularity_score'].median()) & (df['appreciation_score'] > df['appreciation_score'].median())][['appid', 'name', 'popularity_score', 'appreciation_score', 'popularity_minus_appreciation', 'is_indie']].sort_values(by='popularity_minus_appreciation', ascending=True).head(15).reset_index(drop=True)
low_pop_high_appr

In the bottom-right quadrant lie the industry’s unsung heroes — games that received strong appreciation scores despite flying under the radar. Titles like Worm Jazz, MOTHERGUNSHIP: FORGE, and This Way Madness Lies didn’t command the same attention as mainstream blockbusters, but they resonated deeply with the players who discovered them. Whether due to niche appeal, limited marketing, or unconventional design, these games exemplify how quality and recognition don’t always align. They’re a reminder that some of gaming’s most rewarding experiences are waiting off the beaten path.

In [None]:
low_pop_high_appr.to_csv('low_pop_high_appr.csv', index=False)

In [None]:
# Slider interattivo per soglia
slider = alt.binding_range(min=0, max=1, step=0.01)
threshold = alt.param(name="threshold", value=0, bind=slider)

# Layer 1: punti sopra soglia
above = alt.Chart(df[['name', 'appreciation_score', 'popularity_score']]).mark_point().encode(
    x=alt.X('appreciation_score:Q', title='Appreciation Score'),
    y=alt.Y('popularity_score:Q', title='Popularity Score'),
    tooltip=['name', 'popularity_score', 'appreciation_score']
).transform_filter(
    alt.datum['appreciation_score'] >= threshold
)

# Layer 2: densità sotto soglia con bin e size → COUNT()
below = alt.Chart(df[['name', 'appreciation_score', 'popularity_score']]).mark_circle(color='#00ff00', opacity=0.6).encode(
    x=alt.X('appreciation_score:Q', bin=alt.Bin(maxbins=10)),
    y=alt.Y('popularity_score:Q', bin=alt.Bin(maxbins=10)),
    size=alt.Size('count():Q', scale=alt.Scale(domain=[0, 300]), legend=None),
    tooltip=[
        alt.Tooltip('count():Q', title='Density')
    ]
).transform_filter(
    alt.datum['appreciation_score'] < threshold
)

# Linea verticale dinamica
# dyn_line = alt.Chart().mark_rule(color='#fe53bb').encode(
#     x=alt.X(datum=alt.expr(threshold.name), type='quantitative'),
#     strokeWidth=alt.StrokeWidth(value=2)
# )

# Linea verticale fissa (mediana di appreciation)
vline = alt.Chart(pd.DataFrame({
    'x': [df['appreciation_score'].median()]
})).mark_rule(color='red', strokeDash=[4, 3], strokeWidth=3).encode(x='x')

# Linea orizzontale fissa (mediana di popularity)
hline = alt.Chart(pd.DataFrame({
    'y': [df['popularity_score'].median()]
})).mark_rule(color='red', strokeDash=[4, 3], strokeWidth=3).encode(y='y')


high_pop_high_appr_chart = alt.Chart(high_pop_high_appr).mark_point(color = 'yellow').encode(
    x = alt.X('appreciation_score:Q'),
    y = alt.Y('popularity_score:Q'),
    tooltip = ['name', 'popularity_score', 'appreciation_score']
).transform_filter(
    alt.datum['appreciation_score'] >= threshold
)

high_pop_low_appr_chart = alt.Chart(high_pop_low_appr).mark_point(color = 'yellow').encode(
    x = alt.X('appreciation_score:Q'),
    y = alt.Y('popularity_score:Q'),
    tooltip = ['name', 'popularity_score', 'appreciation_score']
).transform_filter(
    alt.datum['appreciation_score'] >= threshold
)

low_pop_low_appr_chart = alt.Chart(low_pop_low_appr).mark_point(color = 'yellow').encode(
    x = alt.X('appreciation_score:Q'),
    y = alt.Y('popularity_score:Q'),
    tooltip = ['name', 'popularity_score', 'appreciation_score']
).transform_filter(
    alt.datum['appreciation_score'] >= threshold
)

low_pop_high_appr_chart = alt.Chart(low_pop_high_appr).mark_point(color = 'yellow').encode(
    x = alt.X('appreciation_score:Q'),
    y = alt.Y('popularity_score:Q'),
    tooltip = ['name', 'popularity_score', 'appreciation_score']
).transform_filter(
    alt.datum['appreciation_score'] >= threshold
)




# Layer finale
popularity_vs_appreciation = alt.layer(
    above,         # punti sopra soglia
    below,         # densità sotto soglia
    high_pop_high_appr_chart,
    high_pop_low_appr_chart,
    low_pop_high_appr_chart,
    low_pop_low_appr_chart,
    #dyn_line,      # linea soglia interattiva
    vline,         # linea mediana verticale
    hline          # linea mediana orizzontale
).add_params(threshold).properties(
    width = 600,
    height = 600
).interactive()


popularity_vs_appreciation.save('assets/charts/popularity_vs_appreciation.json')

In [None]:
popularity_vs_appreciation

In [None]:
df[df['normalized_name'] == 'freudsbonesthegame'][['appid', 'name', 'popularity_score', 'appreciation_score']]

While many games in our dataset aim for mass appeal, Freud’s Bones by Fortuna Imperatore proves that passion and vision can create powerful impact — even outside the mainstream. Nestled in the bottom-right quadrant of our popularity vs. appreciation analysis, the game stands out for its high critical regard despite modest visibility.

Blending psychoanalytic themes with an unusual narrative structure, Freud’s Bones carved out a niche for players craving introspection over action. In our interview with developer Fortuna Imperatore, she shared how the game’s indie spirit and psychological complexity were both a creative strength and a commercial challenge.

This rare case of high appreciation paired with low popularity underscores the trade-offs faced by experimental developers — and reminds us that innovation often begins at the margins.

Beyond raw numbers and critical acclaim lies a deeper story—one about who makes the games we play. As we navigate the divide between the industry's giants and its independent creators, a clear pattern begins to emerge. Indie games, often born from limited resources but boundless creativity, tend to occupy a different space in the gaming ecosystem than their mainstream, studio-backed counterparts.

But how do these two worlds compare when it comes to popularity and appreciation? Do indie games struggle for visibility despite critical praise? Are big-budget titles more likely to dominate the charts but divide audiences? To answer these questions, we looked at how indie and non-indie games perform across the spectrum of attention and acclaim.

In [None]:
df['is_indie'] = False

for idx, row in df.iterrows():
    if not isinstance(row['tags'], float):
        for tag in row['tags']:
            if tag == 'Indie':
                df.at[idx, 'is_indie'] = True
    if not isinstance(row['genres'], float):
        for genre in row['genres']:
            if genre == 'Indie':
                df.at[idx, 'is_indie'] = True

In [None]:
df['is_indie'].value_counts()

In [None]:
df[(df['popularity_score'] < df['popularity_score'].median()) & (df['appreciation_score'] > df['appreciation_score'].median())]['is_indie'].value_counts()

In [None]:
df[(df['popularity_score'] > df['popularity_score'].median()) & (df['appreciation_score'] < df['appreciation_score'].median())]['is_indie'].value_counts()

In [None]:
df[(df['popularity_score'] > df['popularity_score'].median()) & (df['appreciation_score'] > df['appreciation_score'].median())]['is_indie'].value_counts()

In [None]:
df[(df['popularity_score'] < df['popularity_score'].median()) & (df['appreciation_score'] < df['appreciation_score'].median())]['is_indie'].value_counts()

In [None]:
(df.groupby(by='year')['is_indie'].sum()/df.groupby(by='year')['year'].count())*100

In [None]:
df[df['is_indie'] == True].groupby('year')['year'].agg('count')

In [None]:
# Group by year and is_indie to count games
games_by_type_per_year = df.groupby(['year', 'is_indie']).size().reset_index(name='game_count')

# Replace boolean values with readable labels
games_by_type_per_year['game_type'] = games_by_type_per_year['is_indie'].map({
    True: 'Indie Games',
    False: 'Non-Indie Games'
})

# Create grouped bar chart
indie_vs_not_indie_by_year = alt.Chart(games_by_type_per_year).mark_bar().encode(
    x=alt.X('year:O', title='year', axis=alt.Axis(labelAngle=270)),
    y=alt.Y('game_count:Q', title='Number of Games'),
    color=alt.Color('game_type:N', title='Game Type',
                    scale=alt.Scale(domain=['Non-Indie Games', 'Indie Games'],
                                    range=['#ff00ff', '#00ff00'])),
    tooltip=['year:O', 'game_type:N', 'game_count:Q']
)

indie_vs_not_indie_by_year.save('assets/charts/indie_vs_not_indie_by_year.json')

In [None]:
indie_vs_not_indie_by_year

In [None]:
df[df['is_indie'] == True][['popularity_score']]

In [None]:
hist = alt.Chart(df[df['is_indie'] == True][['popularity_score']]).mark_bar().encode(
    x = alt.X('popularity_score:Q', bin=alt.Bin(maxbins=30)),
    y = alt.Y('count()')
).properties(
    width=400
)


kde = alt.Chart(df[df['is_indie'] == True][['popularity_score']]).transform_density(
    'popularity_score',
    as_=['popularity_score', 'density']
).mark_line(color='red').encode(
    x='popularity_score:Q',
    y='density:Q'
)

(hist + kde).resolve_scale(y='independent')

In [None]:
df[df['is_indie'] == True][['popularity_score']].median()

In [None]:
hist = alt.Chart(df[df['is_indie'] == False][['popularity_score']]).mark_bar().encode(
    x = alt.X('popularity_score:Q', bin=alt.Bin(maxbins=30)),
    y = alt.Y('count()')
).properties(
    width=400
)


kde = alt.Chart(df[df['is_indie'] == False][['popularity_score']]).transform_density(
    'popularity_score',
    as_=['popularity_score', 'density']
).mark_line(color='red').encode(
    x='popularity_score:Q',
    y='density:Q'
)

(hist + kde).resolve_scale(y='independent')

In [None]:
df[df['is_indie'] == False][['popularity_score']].median()

In [None]:
hist = alt.Chart(df[df['is_indie'] == True][['appreciation_score']]).mark_bar().encode(
    x = alt.X('appreciation_score:Q', bin=alt.Bin(maxbins=30)),
    y = alt.Y('count()')
).properties(
    width=400
)


kde = alt.Chart(df[df['is_indie'] == True][['appreciation_score']]).transform_density(
    'appreciation_score',
    as_=['appreciation_score', 'density']
).mark_line(color='red').encode(
    x='appreciation_score:Q',
    y='density:Q'
)

(hist + kde).resolve_scale(y='independent')

In [None]:
df[df['is_indie'] == True][['appreciation_score']].median()

In [None]:
hist = alt.Chart(df[df['is_indie'] == False][['appreciation_score']]).mark_bar().encode(
    x = alt.X('appreciation_score:Q', bin=alt.Bin(maxbins=30)),
    y = alt.Y('count()')
).properties(
    width=400
)


kde = alt.Chart(df[df['is_indie'] == False][['appreciation_score']]).transform_density(
    'appreciation_score',
    as_=['appreciation_score', 'density']
).mark_line(color='red').encode(
    x='appreciation_score:Q',
    y='density:Q'
)

(hist + kde).resolve_scale(y='independent')

In [None]:
df[df['is_indie'] == False][['appreciation_score']].median()

While independent developers were virtually absent from the early 2000s Steam landscape, things began to shift dramatically in the 2010s. In 2010, only a quarter of the games with a Metacritic page were indie titles. By 2015, that share had soared past 66%, reflecting the growing accessibility of game development tools and digital distribution platforms.

At its peak in the mid-2010s, indie representation regularly accounted for over half of all notable Steam releases—a cultural and creative boom that brought forth now-classic titles like Stardew Valley, Hollow Knight, and Celeste. Though recent years have seen a slight decline in their relative share—down to around 44% in 2023—indie games remain a driving force in shaping what we play and how we engage with games.

The data tells a clear story: indie games, once rare outliers, are now integral to the fabric of PC gaming. But how do they stack up in terms of visibility and critical acclaim? We break it down next.

In [None]:
# Raggruppa mediane per anno e tipo di gioco
games_by_type_per_year = (
    df.groupby(['year', 'is_indie'])['appreciation_score']
      .median()
      .reset_index()
)

# Aggiungi colonna descrittiva
games_by_type_per_year['game_type'] = games_by_type_per_year['is_indie'].map({
    True: 'Indie Games',
    False: 'Non-Indie Games'
})

# Costruisci il grafico con legenda
indie_vs_not_indie_median_appreciation = alt.Chart(games_by_type_per_year).mark_line(point=True).encode(
    x=alt.X('year:O', axis=alt.Axis(labelAngle=270), title='year'),
    y=alt.Y('appreciation_score:Q', title='Median Appreciation Score', scale = alt.Scale(domain=[0.5, 1.1])),
    color=alt.Color('game_type:N',
        scale=alt.Scale(domain=['Indie Games', 'Non-Indie Games'],
                        range=['#00ff00', '#ff00ff']),  # vaporwave rosa + cyan
        legend=alt.Legend(title='Game Type')
    ),
    tooltip=['year:O', 'game_type:N', 'appreciation_score:Q']
)

indie_vs_not_indie_median_appreciation.save('assets/charts/indie_vs_not_indie_median_appreciation.json')

In [None]:
indie_vs_not_indie_median_appreciation

Over the past two decades, indie games haven’t just increased in number—they’ve consistently won over players’ hearts. When comparing appreciation scores—our measure of how well-received a game is—indie titles have often outperformed their big-budget counterparts, especially in more recent years.

Since 2010, median appreciation scores for indie games have remained impressively steady, hovering around 0.70–0.76. In contrast, non-indie (or AAA) games have seen greater fluctuation and, in some years, lagged behind. By 2023, indie games posted a median appreciation of 0.75, compared to 0.69 for non-indie releases.

This growing parity—and often superiority—suggests that while indie games might not always match AAA production values, they frequently resonate more deeply with players, offering originality, emotional depth, or innovative mechanics that large studios may overlook.

As the indie space continues to mature, the line between "small" and "great" has never been blurrier.

In [None]:
# Raggruppa mediane per anno e tipo di gioco
games_by_type_per_year = (
    df.groupby(['year', 'is_indie'])['popularity_score']
      .median()
      .reset_index()
)

# Aggiungi colonna descrittiva
games_by_type_per_year['game_type'] = games_by_type_per_year['is_indie'].map({
    True: 'Indie Games',
    False: 'Non-Indie Games'
})

# Costruisci il grafico con legenda
indie_vs_not_indie_median_popularity = alt.Chart(games_by_type_per_year).mark_line(point=True).encode(
    x=alt.X('year:O', axis=alt.Axis(labelAngle=270), title='year'),
    y=alt.Y('popularity_score:Q', title='Median Popularity Score', scale = alt.Scale(domain=[0.2, 0.9])),
    color=alt.Color('game_type:N',
        scale=alt.Scale(domain=['Indie Games', 'Non-Indie Games'],
                        range=['#00ff00', '#ff00ff']),  # vaporwave rosa + cyan
        legend=alt.Legend(title='Game Type')
    ),
    tooltip=['year:O', 'game_type:N', 'popularity_score:Q']
)

indie_vs_not_indie_median_popularity.save('assets/charts/indie_vs_not_indie_median_popularity.json')

In [None]:
indie_vs_not_indie_median_popularity

While indie titles have increasingly gained critical acclaim—often outperforming AAA games in terms of appreciation—they still lag behind in terms of popularity. Median popularity scores for indie games consistently trail those of non-indie releases across most years.

This gap is particularly stark in the early 2000s, where blockbuster titles dominated. But even in more recent years (e.g., 2020–2023), the most popular titles remain overwhelmingly non-indie.

Notably, however, some indie games have broken through this ceiling. Titles like Terraria, Hollow Knight, and Stardew Valley have not only achieved critical success but also popularity levels comparable to large-scale productions.

The notable spike in indie game popularity in 2008 can be attributed to a handful of highly successful titles released that year. Games like World of Goo, AudioSurf, Mount & Blade, and Defense Grid: The Awakening gained significant traction on Steam, reflecting both their innovation and increasing visibility for indie titles. With only nine indie games in the dataset for that year, these standout hits heavily influenced the median popularity score, pushing it above 0.5 — the highest for indie titles until 2022. This moment marks one of the early breakthroughs of indie games into broader public awareness during the digital distribution era.

In [None]:
top_pop_indie = df[df['is_indie'] == True][['appid', 'name', 'popularity_score', 'appreciation_score', 'popularity_minus_appreciation']].sort_values(by='popularity_score', ascending = False).head(15).reset_index(drop=True)
top_pop_indie

In [None]:
top_pop_indie['name'].values

Indie games may lack the budgets of their AAA counterparts, but they’re more than capable of commanding attention. According to our composite popularity score, titles like Terraria, Stardew Valley, and Hollow Knight have emerged as cultural phenomena, rivaling mainstream releases in reach and resonance. From the chaotic charm of Fall Guys to the haunting elegance of Stray and the nostalgic brilliance of Cuphead, these top-performing indie titles illustrate how creativity, strong community support, and unique design can drive lasting impact — often without the backing of major publishers.

In [None]:
top_pop_indie.to_csv('top_pop_indie.csv', index=False)

In [None]:
top_appr_indie = df[df['is_indie'] == True][['appid', 'name', 'popularity_score', 'appreciation_score', 'popularity_minus_appreciation']].sort_values(by='appreciation_score', ascending = False).head(15).reset_index(drop=True)
top_appr_indie

In [None]:
top_appr_indie['name'].values

When it comes to appreciation, indie games shine even brighter. Topping the list are critical darlings like Hollow Knight, Hades, and Celeste — each praised for their masterful design, emotional storytelling, and polished gameplay. Titles such as Ori and the Will of the Wisps, Undertale, and Factorio showcase the genre’s breadth, from heartwarming narratives to intricate systems. Even more niche entries like Meg’s Monster or A Short Hike have found deep resonance with players. These standout games exemplify how artistic vision and thoughtful execution can earn enduring respect, regardless of a game’s scale or studio size.

In [None]:
top_appr_indie.to_csv('top_appr_indie.csv', index=False)

In [None]:
top_pop_indie_chart = alt.Chart(top_pop_indie).mark_bar(color='#00ff00').encode(
    x = alt.X('popularity_score:Q', title='Popularity Score'),
    y = alt.Y('name:N', sort='-x', title = ''),
    tooltip = ['name', 'popularity_score']
).properties(
    width=300
)

top_appr_indie_chart = alt.Chart(top_appr_indie).mark_bar().encode(
    x = alt.X('appreciation_score:Q', title='Appreciation Score'),
    y = alt.Y('name:N', sort='-x', title = ''),
    tooltip = ['name', 'appreciation_score']
).properties(
    width=300
)


top_pop_top_appr_indie = (top_pop_indie_chart | top_appr_indie_chart).resolve_scale(color='independent')
top_pop_top_appr_indie.save('assets/charts/top_pop_top_appr_indie.json')

In [None]:
top_pop_top_appr_indie

In [None]:
top_pop_not_indie = df[df['is_indie'] == False][['appid', 'name', 'popularity_score', 'appreciation_score', 'popularity_minus_appreciation']].sort_values(by='popularity_score', ascending = False).head(15).reset_index(drop=True)
top_pop_not_indie

In [None]:
top_pop_not_indie['name'].values

On the mainstream front, popularity is dominated by industry titans. Games like Grand Theft Auto V, ELDEN RING, and Cyberpunk 2077 lead the pack with massive player bases, marketing power, and cultural impact. Live-service giants such as Counter-Strike: Global Offensive, Destiny 2, and Dota 2 maintain long-term engagement, while narrative-heavy epics like Red Dead Redemption 2, The Witcher 3, and Baldur’s Gate 3 prove that cinematic storytelling can drive lasting interest. These titles demonstrate how broad appeal, technical excellence, and community ecosystems translate into enduring popularity in the AAA space.

In [None]:
top_pop_not_indie.to_csv('top_pop_not_indie.csv', index=False)

In [None]:
top_appr_not_indie = df[df['is_indie'] == False][['appid', 'name', 'popularity_score', 'appreciation_score', 'popularity_minus_appreciation']].sort_values(by='appreciation_score', ascending = False).head(15).reset_index(drop=True)
top_appr_not_indie

In [None]:
top_appr_not_indie['name'].values

When it comes to appreciation, timeless design and narrative mastery take center stage. Baldur’s Gate 3, Half-Life 2, and Portal 2 exemplify how innovative gameplay and strong world-building can earn both critical acclaim and lasting player devotion. Classics like Half-Life, BioShock, and Grand Theft Auto: San Andreas still resonate, while story-driven epics such as Red Dead Redemption 2, Persona 5 Royal, and The Witcher 3 are praised for their emotional depth and polish. These titles reflect how, beyond scale or budget, a well-crafted experience is what ultimately earns the deepest appreciation from players.

In [None]:
top_appr_not_indie.to_csv('top_appr_not_indie.csv', index = False)

In [None]:
top_pop_not_indie_chart = alt.Chart(top_pop_not_indie).mark_bar(color='#00ff00').encode(
    x = alt.X('popularity_score:Q', title='Popularity Score'),
    y = alt.Y('name:N', sort='-x', title=''),
    tooltip = ['name', 'popularity_score']
).properties(
    width=300
)

top_appr_not_indie_chart = alt.Chart(top_appr_not_indie).mark_bar().encode(
    x = alt.X('appreciation_score:Q', title='Appreciation Score'),
    y = alt.Y('name:N', sort='-x', title=''),
    tooltip = ['name', 'appreciation_score']
).properties(
    width=300
)


top_pop_top_appr_not_indie = (top_pop_not_indie_chart | top_appr_not_indie_chart).resolve_scale(color = 'independent')
top_pop_top_appr_not_indie.save('assets/charts/top_pop_top_appr_not_indie.json')

In [None]:
top_pop_top_appr_not_indie

**Introduction to Steam Tags**

Steam tags serve as the primary vocabulary of the platform's discovery system. Whether applied by developers or players, tags capture everything from genre and theme to tone and mechanics. Analyzing these tags offers insight into what types of games capture attention — and which resonate most with players or critics.

1990's vs 1990s;  2.5D vs 25D;  4 Player Local vs 44 Player Local; ANimation & Modeling vs Animation  Modeling;  Base-Building vs Base Building; Beat 'em up vs Beat em up; 
Design & Illustration vs Design  Illustration; Dungeons & Dragons vs Dungeons  Dragons; Football (American) vs Football American; Football (Soccer) vs Football Soccer;
LGBTQ+ vs LGBTQ; 3Match 3 vs Match 3; Point  Click vs Point & Click; Puzzle-Platformer vs Puzzle Platformer; Rogue-like vs Roguelike; Rogue-lite vs Roguelite; 2Sequel vs Sequel;
Shoot 'Em Up vs Shoot Em Up; e-sports vs eSports

In [None]:
# Load your HTML file
with open('steam_games_tags.html', 'r', encoding='utf-8') as file:
    html_content = file.read()

# Synonym map for matching purposes (only used for normalization)
synonym_map = {
    "Rogue-like": "Roguelike",
    "Rogue-lite": "Roguelite",
    "Base-Building": "Base Building",
    "Puzzle-Platformer": "Puzzle Platformer",
    "Match 3": "3Match 3",
    "2Sequel": "Sequel",
    "e-sports": "eSports",
    "Shoot 'Em Up": "Shoot Em Up",
    "Beat 'em up": "Beat em up",
    "Point & Click": "Point & Click",
    "Design & Illustration": "Design Illustration",
    "Animation & Modeling": "Animation Modeling",
    "Football (Soccer)": "Football Soccer",
    "Football (American)": "Football American",
    "4 Player Local" : "44 Player Local"
}

def clean_tag(tag_text):
    allowed_chars = re.compile(r"[^\w\s\-&'.+]")
    cleaned = allowed_chars.sub('', tag_text)
    cleaned = re.sub(r'\s{2,}', ' ', cleaned).strip()
    # Apply synonym mapping
    return synonym_map.get(cleaned, cleaned)

def extract_tags_to_dict(html_content):
    soup = BeautifulSoup(html_content, 'html.parser')
    result_dict = {}

    taglist_wraps = soup.find_all('div', class_='taglist-wrap')

    for wrap in taglist_wraps:
        title_element = wrap.find('h2', class_='b')
        if title_element:
            title = title_element.get_text(strip=True)
            for emoji in ['🌫', '🎭', '🌱', '💥', '😌']:
                title = title.replace(emoji, '').strip()

            tags = []
            for label in wrap.find_all('div', class_='label'):
                tag_text = label.find('a').get_text(strip=True)
                cleaned_tag = clean_tag(tag_text)
                tags.append(cleaned_tag)

            result_dict[title] = tags

    return result_dict

def filter_tags_by_reference(reference_dict, original_tags):
    filtered_dict = {}

    # Create lookup from cleaned → original tag
    cleaned_to_original = {
        clean_tag(tag): tag for tag in original_tags
    }

    for category, ref_tags in reference_dict.items():
        matched = []
        for ref_tag in ref_tags:
            if ref_tag in cleaned_to_original:
                matched.append(cleaned_to_original[ref_tag])
        if matched:
            filtered_dict[category] = matched

    return filtered_dict

# Build reference dictionary
reference_dict = extract_tags_to_dict(html_content)

# Your original tags (preserved for display)
your_tags = list(set(df.explode('tags')['tags'].dropna().unique()))

# Final result
filtered_output = filter_tags_by_reference(reference_dict, your_tags)
print(filtered_output)

In [None]:
my_tags = []

for tag_list in filtered_output.values():
    for tag in tag_list:
        my_tags.append(tag)

my_tags_set = set(my_tags)

In [None]:
my_tags_set.difference(set(df.explode('tags')['tags'].dropna().unique()))

In [None]:
set(df.explode('tags')['tags'].dropna().unique()).difference(my_tags_set)

In [None]:
tag_groups = filtered_output

In [None]:
tag_groups.keys()

In [None]:
tag_groups

In [None]:
if 'quadrants' not in df.columns:
    df.insert(len(df.columns), 'quadrants', [' ' for i in range(len(df))])
else:
    df['quadrants'] = [' ' for i in range(len(df))]


app_thr = df['appreciation_score'].median()
pop_thr = df['popularity_score'].median()
 
#print(f'appreciation thr = {app_thr}, popularity thr = {pop_thr}')
 
for index in df.index:
    if (df.loc[index, 'appreciation_score'] < app_thr) and (df.loc[index, 'popularity_score'] < pop_thr):
        df.loc[index, 'quadrant'] = 'LP-LA'
    elif (df.loc[index, 'appreciation_score'] >= app_thr) and (df.loc[index, 'popularity_score'] < pop_thr):
        df.loc[index, 'quadrant'] = 'LP-HA'
    elif (df.loc[index, 'appreciation_score'] < app_thr) and (df.loc[index, 'popularity_score'] >= pop_thr):
        df.loc[index, 'quadrant'] = 'HP-LA'
    elif (df.loc[index, 'appreciation_score'] >= app_thr) and (df.loc[index, 'popularity_score'] >= pop_thr):
        df.loc[index, 'quadrant'] = 'HP-HA'

In [None]:
def tags_stats_builder(df):
    tags_stats = df.explode('tags').groupby('tags').agg(
        #total_reviews = ('reviews_total', 'sum'),
        #avg_popularity=('popularity_score', 'mean'),
        #avg_appreciation=('appreciation_score', 'mean'),
        median_popularity=('popularity_score', 'median'),
        median_appreciation=('appreciation_score', 'median'),
        count=('popularity_score', 'count')
    ).reset_index()

    tags_stats = tags_stats[~tags_stats['tags'].isin(['[', ']'])]

    for tag_group in tag_groups.keys():
        for idx, tag in enumerate(tags_stats['tags'].values):
            if tag in tag_groups[tag_group]:
                tags_stats.at[idx, 'category'] = tag_group

    return tags_stats

In [None]:
tags_stats = tags_stats_builder(df)

In [None]:
tag_groups.keys()

In [None]:
for tag_group in tag_groups.keys():
    for idx, tag in enumerate(tags_stats['tags'].values):
        if tag in tag_groups[tag_group]:
            tags_stats.at[idx, 'category'] = tag_group

In [None]:
tags_stats

In [None]:
k = 15  # number of top tags to show
top_k = tags_stats.sort_values('count', ascending=False).head(k)

top_15_tags = alt.Chart(top_k).mark_bar().encode(
    x=alt.X('count:Q', title='count'),
    y=alt.Y('tags:N', sort='-x', title=''),
    tooltip=['tags:N', 'count:Q']
).properties(
    #title=f'Top {k} Primary Tags',
    width=500
)

top_15_tags.save('assets/charts/top_15_tags.json')

In [None]:
top_15_tags

In [None]:
top_k['tags'].values

**Most Frequent Tags**

A first look at Steam’s tagging ecosystem reveals the most common features and genres that define the platform’s catalog. The most frequent tags—Singleplayer, Action, Adventure, and Indie—reflect the foundational role of narrative-driven and independently developed games in Steam’s library. Tags like Atmospheric, Story Rich, and Great Soundtrack point to an emphasis on immersive experiences, while Multiplayer and Co-op signal the continued relevance of shared play. Meanwhile, 2D, Strategy, and Puzzle showcase the variety in form and mechanics that appeal to different gaming audiences.

In [None]:
tag_groups.keys()

We are going to consider
- 'Themes & Moods'
- 'Top-Level Genres' 
- 'Visuals & Viewpoint'
- 'Genres'
- 'Sub-Genres'
- 'Players'
- 'Story'
- 'Level Design'

In [None]:
tags_stats

In [None]:
attribute_options = ['Themes & Moods', 'Top-Level Genres', 'Visuals & Viewpoint', 'Genres', 'Sub-Genres', 'Players', 'Story', 'Level Design']

# Dropdown legato al campo 'category'
dropdown = alt.binding_select(options=attribute_options, name='Tag Type: ')
selection = alt.param(name='CategorySelector', bind=dropdown, value=attribute_options[0])

# Filtro manuale + ranking top 10
tags_frequency_by_category = alt.Chart(tags_stats).transform_filter(
    alt.datum.category == selection
).transform_window(
    rank='rank(count)',
    sort=[alt.SortField('count', order='descending')],
    groupby=['category']
).transform_filter(
    alt.datum.rank <= 10
).mark_bar().encode(
    x=alt.X('count:Q', title='count'),
    y=alt.Y('tags:N', sort='-x', title=''),
    tooltip=['tags:N', 'count:Q']
).add_params(
    selection
).properties(
    width = 600,
    padding={"left": 100, "top": 10, "right": 100, "bottom": 10}
)

tags_frequency_by_category.save('assets/charts/tags_frequency_by_category.json')

In [None]:
tags_frequency_by_category

Steam’s most common tags reveal a platform defined by variety and player-driven exploration. Immersive themes like Atmospheric, Fantasy, and Sci-fi dominate, reflecting a strong appetite for escapism — while humor, nostalgia, and strategic tension provide tonal range.

At the genre level, Action and Adventure lead, but the prominence of Indie and Casual shows the platform’s openness to creativity and accessibility. Visual styles range from colorful 2D pixel art to stylized 3D, with First and Third-Person perspectives shaping player experience.

Steam balances reflex and reflection: Shooters and Platformers share space with Turn-Based Strategy and Visual Novels. Subgenres like Metroidvania, Puzzle-Platformer, and Rogue-lite highlight players’ appetite for layered, iterative gameplay.

In player modes, Singleplayer dominates, but Co-op and PvP tag prevalence confirms the rise of social and team-based experiences. Story-related tags like Story Rich, Choices Matter, and Visual Novel underscore demand for narrative depth and agency.

Finally, level design shows a split between freedom and structure — Open World and Sandbox thrive, but Side Scroller and Linear formats remain vital.

In [None]:
tags_stats[tags_stats['category'] == 'Level Design'].sort_values('count', ascending = False)[:10].reset_index(drop=True)[['tags', 'count', 'category']]

**Themes & Moods: Atmosphere and Nostalgia Lead the Way**

Among theme and mood tags, Atmospheric stands far ahead with 1,576 games — more than triple the count of any other in this category. This dominance suggests that immersive, mood-driven experiences are a core design focus and a major draw for players.

Behind it, Retro and Family Friendly stand out with over 450 titles each, reflecting two contrasting but popular directions: one rooted in nostalgia, the other in broad accessibility.

Other notable themes include Tactical, Mystery, and Relaxing, each with a few hundred games, showing that cerebral and calming experiences maintain solid representation. Tags like Magic, Surreal, Emotional, and Cyberpunk round out the list with smaller but still meaningful counts — indicating a steady interest in fantasy, introspection, and stylized futurism.

Overall, the spread highlights how mood and theme are central to genre identity, with both mainstream and niche emotional tones well represented in the Steam catalog.


**Top-Level Genres: Adventure and Indie Dominate the Landscape**

Within the top-level genres, Adventure leads by a wide margin, appearing in over 2,600 games — a strong signal of its versatility and broad appeal across game styles and audiences.

Indie follows closely with more than 2,000 titles, highlighting the significant role independent development plays in shaping the gaming ecosystem. Its presence across many genres underscores its creative diversity and growing influence.

RPG and Strategy come next, each with over 1,000 games. These genres cater to players who seek depth, progression, and tactical thinking, confirming their enduring popularity.

Finally, Casual rounds out the list with just under 1,000 games, representing accessible, low-commitment experiences that appeal to a wide demographic — from newcomers to those looking for relaxed gameplay.

In sum, this distribution reflects a healthy balance between narrative exploration, strategic depth, and inclusive design, with indie development continuing to thrive across all categories.


**Visuals & Viewpoint: Diverse Styles and Perspectives Define the Field**

The 2D tag leads in volume with over 1,100 games, showing the continued popularity of flat, often stylized aesthetics that prioritize clarity and artistic charm. It's closely followed by Third Person and First-Person perspectives, both highly represented, reflecting their foundational role in shaping player immersion and control.

Tags like Colorful, Pixel Graphics, and Cute highlight a strong appetite for visually distinctive and emotionally engaging styles, often associated with indie or family-friendly games. Their high counts suggest that players are drawn not just to realism but also to expressive, personality-driven design.

While 3D remains central, it trails behind 2D in tag count, reinforcing the idea that aesthetic variety often outweighs technical complexity in shaping player preference. Meanwhile, Stylized, Isometric, and Top-Down point to niche but thriving design approaches that offer unique spatial perspectives.

Altogether, the data reflects a rich ecosystem where multiple visual styles and viewpoints coexist — serving both artistic expression and gameplay clarity.


**Genres: Classic Formats with Enduring Presence**

Puzzle and Platformer games top the genre chart by count, showing that players continue to gravitate toward mechanics-driven experiences built on clarity and challenge. These genres have proven adaptability across generations, often serving as entry points for both players and developers.

Arcade and Point & Click also maintain strong representation, reflecting enduring appeal for quick reflex gameplay and narrative-rich interaction, respectively.

More complex strategy genres — like Turn-Based, RTS, and Grand Strategy — appear in smaller numbers but remain a vital part of the ecosystem, offering depth and long-term engagement. Rogue-like, with its procedural challenge and replayability, holds a solid middle ground, bridging traditional mechanics with modern twists.

Overall, the data shows that while some genres scale through accessibility, others thrive through specialization — all contributing to a well-rounded gaming landscape.


**Sub-Genres: Precision, Challenge, and Popular Mechanics**

First-Person Shooters (FPS) lead the sub-genre category by count, reflecting their long-standing popularity and adaptability across game generations. Hack and Slash follows closely, emphasizing fast-paced combat and visceral gameplay.

Puzzle-Platformers, Metroidvanias, and Platformers (2D and 3D) show strong representation as well, highlighting continued interest in spatial design, timing, and skillful movement — especially in indie and mid-scale development spaces.

Meanwhile, the Rogue-lite and Action Roguelike tags reflect a strong appetite for replayability and procedural challenge, combining intensity with variety. Dungeon Crawlers round out the list, pointing to a niche but enduring taste for exploration-heavy, loot-driven design.

Together, these sub-genres demonstrate that players consistently value mastery, tight feedback loops, and layered design — whether through action, puzzles, or procedural systems.


**Player Modes: Solo Dominance, Rich Social Variety**

Singleplayer leads by a wide margin, appearing in nearly 3,500 games — a clear sign that solo experiences remain the cornerstone of PC gaming. It reflects strong demand for immersive, self-paced play, often tied to narrative or exploration.

Multiplayer and Co-op follow as the most common social modes, showing that shared gameplay — whether competitive or collaborative — is also deeply embedded in game design. Notably, Local Co-Op and Local Multiplayer both have sizable counts, underscoring continued interest in couch play despite the rise of online platforms.

Online Co-Op and Team-Based modes highlight more coordinated, goal-oriented playstyles, while the smaller tags — 4 Player Local, Massively Multiplayer, and Co-op Campaign — suggest focused but passionate niches for specific formats.

Overall, while multiplayer options are diverse and well-represented, singleplayer remains dominant — showing that even in an increasingly connected world, many players still seek deeply personal gaming experiences.


**Story Tags: Rich Narratives Lead, Interactive Depth Follows**

"Story Rich" dominates this category with over 1,500 games, emphasizing just how central deep narrative experiences are to the gaming landscape. Players consistently gravitate toward games that offer substantial storytelling, emotional depth, and worldbuilding.

Beyond that, tags like "Choices Matter" and "Multiple Endings" show strong representation, reflecting a clear interest in agency and replayability. These elements suggest that players don’t just want to be told a story — they want to shape it.

More niche tags like "Visual Novel," "Interactive Fiction," and "Choose Your Own Adventure" cater to text-heavy or choice-driven experiences, attracting dedicated audiences even with lower overall counts. Themes such as "Historical," "Detective," and "Romance" round out the list, offering narrative variety and signaling demand for specific storytelling genres.

In sum, story remains a cornerstone of game design — with players valuing both depth and interactivity in the experiences they choose.


**Level Design: Freedom Dominates, But Structure Persists**

Open-world and exploration-based games lead in volume, with 856 and 817 titles respectively. This highlights a strong player preference for freedom, discovery, and emergent gameplay — features that have become defining elements in modern design.

Tags like "Sandbox" and "Side Scroller" show a healthy presence too, each offering a different take on interaction: one emphasizing player-driven creativity, the other precision and pacing. Meanwhile, more structured formats like "Linear" and "Nonlinear" appear less frequently, yet still carve out space for tightly crafted or branching experiences.

Overall, the data suggests that while open-ended design dominates in quantity, there's still demand for deliberate, curated environments — offering a balance between autonomy and authorship in level design.

In [None]:
tags_stats_high_pop_high_appr = tags_stats_builder(df[df['quadrant'] == 'HP-HA'])
tags_stats_high_pop_low_appr = tags_stats_builder(df[df['quadrant'] == 'HP-LA'])
tags_stats_low_pop_high_appr = tags_stats_builder(df[df['quadrant'] == 'LP-HA'])
tags_stats_low_pop_low_appr = tags_stats_builder(df[df['quadrant'] == 'LP-LA'])

tags_stats_high_pop_high_appr['quadrant'] = 'HP-HA'
tags_stats_high_pop_low_appr['quadrant'] = 'HP-LA'
tags_stats_low_pop_high_appr['quadrant'] = 'LP-HA'
tags_stats_low_pop_low_appr['quadrant'] = 'LP-LA'

tags_stats_pop_appr = pd.concat([tags_stats_high_pop_high_appr, tags_stats_high_pop_low_appr, tags_stats_low_pop_high_appr, tags_stats_low_pop_low_appr])

In [None]:
tags_stats[(tags_stats['count'] > 100) & (tags_stats['category'] == 'Themes & Moods')].sort_values('median_popularity', ascending = False)[:10].reset_index(drop=True)

In [None]:
tags_stats

In [None]:
#attribute_options = ['Themes & Moods', 'Top-Level Genres', 'Visuals & Viewpoint', 'Genres', 'Sub-Genres', 'Players', 'Story', 'Level Design']
attribute_options = ['Genres', 'Sub-Genres']

# Dropdown legato al campo 'category'
dropdown1 = alt.binding_select(options=attribute_options, name='Tag Type: ')
dropdown2 = alt.binding_select(options=['HP-HA', 'HP-LA', 'LP-HA', 'LP-LA'], name='Quadrant: ')
selection1 = alt.param(name='CategorySelector1', bind=dropdown1, value=attribute_options[0])
selection2 = alt.param(name='CategorySelector2', bind=dropdown2, value='HP-HA')


chart = alt.Chart(tags_stats_pop_appr).add_params(
    selection1,
    selection2
).transform_filter(
    (alt.datum.category == selection1) & (alt.datum.quadrant == selection2)
).transform_window(
    rank='rank(count)',
    sort=[alt.SortField('count', order='descending')],
    groupby=['category']
).transform_filter(
    alt.datum.rank <= 10
).mark_bar().encode(
    x=alt.X('count:Q'),
    y=alt.Y('tags:N', sort='-x', title=''),
    tooltip=['tags:N', 'count:Q']
).properties(width=600)


top_genres_subgenres_by_quadrant = chart

top_genres_subgenres_by_quadrant = top_genres_subgenres_by_quadrant.properties(
    # Aggiunge padding al lato destro della vista complessiva
    padding={"left": 30, "top": 10, "right": 100, "bottom": 10}
)

top_genres_subgenres_by_quadrant.save('assets/charts/top_genres_subgenres_by_quadrant.json')

In [None]:
top_genres_subgenres_by_quadrant

In [None]:
tags_stats_pop_appr[(tags_stats_pop_appr['category'] == 'Sub-Genres') & (tags_stats_pop_appr['quadrant'] == 'LP-LA')][['tags', 'count', 'category', 'quadrant']].sort_values('count', ascending=False)[:10].reset_index(drop=True)

**Genres: High popularity High appreciation**

This list showcases genres that consistently hit the sweet spot of High Popularity and High Appreciation (HP-HA) — indicating not just mass appeal, but strong player satisfaction as well.

Puzzle leads the pack, suggesting that logic-based, mentally stimulating gameplay continues to resonate deeply with a wide audience. Platformers, Arcade, and Point & Click follow closely, highlighting enduring affection for precise, mechanics-driven, and often nostalgic formats. These genres thrive by offering clear goals, tight feedback loops, and accessible design.

Interestingly, strategic genres — like Turn-Based Strategy, RTS, Grand Strategy, and Rogue-like — are also well-represented. Their presence signals a demand for thoughtful, systems-driven play where depth and challenge lead to lasting engagement. Even more niche genres like Rhythm and Beat ’em up appear here, showing that when done well, even less common formats can earn both player admiration and wide traction.

Altogether, these HP-HA genres reflect a strong player preference for games that balance engaging mechanics with high-quality execution — proving that timeless gameplay still holds powerful appeal.

**Genres: High popularity low appreciation**

This High Popularity–Low Appreciation (HP-LA) list highlights genres that attract attention but don’t always land well with players in terms of satisfaction.

Genres like Arcade, Puzzle, and Platformer continue to draw large player bases, but their lower appreciation suggests either oversaturation, shallow design, or execution that fails to innovate. They remain accessible and familiar, yet may lack the depth or novelty that today’s players increasingly seek.

Notably, strategic genres like RTS, Turn-Based Strategy, and Grand Strategy also appear here — a sign that while there's strong interest in complex systems, these games might struggle with approachability, UX, or evolving player expectations.

The presence of e-sports could point to high visibility and player engagement but also to frustrations with balancing, competitiveness, or limited appeal outside niche audiences.

In essence, these are genres with strong reach but room to grow — and possibly a disconnect between what players expect and what they receive.

**Genres: Low popularity high appreciation**

This Low Popularity–High Appreciation (LP-HA) quadrant highlights genres that may not dominate in visibility but deeply resonate with their audiences.

Genres like Puzzle, Platformer, and Arcade continue to show strong player satisfaction despite lower prominence — signaling that these mechanics-driven experiences offer clarity, challenge, and replayability that loyal fans love. Their enduring appeal often lies in tight design and nostalgic roots.

More niche entries like Point & Click, Rhythm, and Tower Defense may serve smaller communities, but they punch above their weight in appreciation — likely due to focused design, clear goals, and rewarding mechanics.

Strategic and procedural genres like Turn-Based Strategy, RTS, and Rogue-like suggest that when well-executed, even complex systems can generate strong goodwill, even without mainstream appeal.

In short, these are under-the-radar favorites — genres that deliver meaningful experiences and foster deep engagement, even without widespread popularity.

**Genres: Low popularity low appreciation**

The Low Popularity–Low Appreciation (LP–LA) quadrant reflects genres that struggle both in visibility and player reception — often due to stagnation, oversaturation, or niche appeal without broad resonance.

Tags like Puzzle, Platformer, and Arcade appear here in notable numbers, suggesting that while these classic genres have strong historical roots, some titles may lack the innovation or polish needed to truly engage modern audiences.

More specialized formats like Point & Click, Turn-Based Strategy, and Rogue-like also show up frequently, perhaps weighed down by inconsistent quality or design complexity that doesn’t translate into wider appreciation.

Meanwhile, deeper strategic tags like RTS, Tower Defense, and Grand Strategy reinforce that even long-established genres need reinvention to maintain relevance.

This quadrant serves as a reminder: familiarity alone isn’t enough — consistent quality, fresh ideas, and player-centric design are key to sustaining both popularity and appreciation.

**Sub-Genres: High popularity High appreciation**

This High Popularity–High Appreciation (HP–HA) list of sub-genres showcases formats that not only draw substantial player interest but also deliver highly satisfying experiences.

Leading the pack is FPS, reaffirming its status as a staple with broad appeal and polished execution. Action-heavy styles like Hack and Slash, Third-Person Shooter, and Action Roguelike also feature prominently, highlighting a player preference for fast-paced, skill-based combat when well-designed.

Meanwhile, Metroidvania, Souls-like, and Rogue-lite reflect growing appreciation for layered, exploratory, and challenging gameplay loops. Their presence suggests that complexity, progression, and replayability continue to resonate strongly when combined with strong design fundamentals.

Finally, hybrid and spatial genres like Puzzle-Platformer, 3D Platformer, and Dungeon Crawler round out the list — reinforcing that thoughtful level design and mechanical variety still earn high marks when executed with care.

Overall, these sub-genres strike a strong balance between accessibility, challenge, and player satisfaction.

**Sub-Genres: High popularity low appreciation**

The High Popularity–Low Appreciation (HP–LA) sub-genres list reflects formats that draw significant attention but may fall short in player satisfaction.

FPS, Hack and Slash, and Third-Person Shooter dominate in count, showing that while action and shooter formats remain widely played, they risk formulaic design, repetition, or lack of innovation — leading to lukewarm reception.

Genres like Souls-like, Rogue-lite, and Metroidvania appear here too, suggesting that their rising popularity may sometimes outpace the quality or polish of their implementations, especially in oversaturated indie markets.

3D Platformer and Dungeon Crawler further reflect this gap, where strong concepts may falter in execution. The presence of Immersive Sim and 2D Platformer hints at niche complexity or dated mechanics not always aligning with broader expectations.

Overall, these sub-genres succeed in attracting players but often struggle to deliver experiences that meet rising standards — pointing to opportunity for refinement and innovation.

**Sub-Genres: low popularity High appreciation**

The Low Popularity–High Appreciation (LP–HA) sub-genres reveal hidden gems — not widely played, but deeply valued by those who engage with them.

Puzzle-Platformer, 2D Platformer, and Shoot 'Em Up top the list, showing that skill-based, focused experiences still resonate strongly with dedicated audiences. These formats often shine through precise mechanics and tight level design.

Bullet Hell, Metroidvania, and Rogue-lite stand out as niche favorites, often delivering challenge and replayability in ways that reward mastery. Similarly, Hack and Slash and Dungeon Crawler appeal to players who enjoy depth and combat intensity, even if they're less mainstream.

Interestingly, even FPS and 3D Platformer — usually associated with mass appeal — appear here, suggesting that well-crafted entries in familiar formats can still surprise and satisfy.

In short, these sub-genres may not dominate in visibility, but they earn strong appreciation through thoughtful, refined design.


**Sub-Genres: low popularity low appreciation**

The Low Popularity–Low Appreciation (LP–LA) sub-genres reveal formats that currently struggle to capture both wide attention and strong acclaim.

Despite their foundational appeal, staples like Hack and Slash, FPS, and Puzzle-Platformer appear here in high numbers — suggesting oversaturation, inconsistent quality, or a lack of innovation that’s causing fatigue among players.

Niche tags like Rogue-lite, Bullet Hell, and Shoot 'Em Up also populate this quadrant. These often demand high skill or feature repetitive gameplay, which may limit their broader appeal and risk polarizing audiences.

Meanwhile, Metroidvania, 3D Platformer, and Dungeon Crawler — known for depth and exploration — may suffer here from uneven execution or difficulty spikes that undercut accessibility.

Overall, this quadrant signals an opportunity: these sub-genres have strong foundations, but need creative refreshes, modern polish, or player-focused refinement to regain momentum and trust.

In [None]:
tags_stats[(tags_stats['Themes & Moods'] == True) & (tags_stats['count'] >= 300)][['tags', 'median_popularity', 'median_appreciation', 'count']].reset_index(drop = True)

**Themes & Moods: Feeling the Game**

Some games capture our attention with mechanics — others hook us through atmosphere and emotional tone. The most frequent themes and moods on Steam reflect players’ deepening appetite for tonal variety, from the whimsical to the intense.

Atmospheric, Fantasy, and Sci-fi top the list in both popularity and appreciation, showing that immersive worlds and speculative settings remain perennial favorites. Notably, Atmospheric titles boast a median popularity of 0.478 and an appreciation of 0.740 — comfortably above both thresholds. These are the kinds of games that pull players in not with action, but ambiance.

Comedy also punches above its weight: both Comedy and Funny tags show not just strong visibility, but some of the highest median appreciation scores across the board (0.755 and 0.750, respectively). In a market often dominated by gritty realism, the success of humor is a striking reminder that levity has power.

At the lower end of the appreciation spectrum are more intense emotional experiences. Horror and Psychological Horror retain solid popularity, but appreciation dips just under the critical 0.71 mark — suggesting that while fear grabs attention, it may divide opinion or wear thin more quickly.

A few quieter standouts like Retro and Mystery perform admirably, especially in terms of appreciation. Retro, with a modest popularity of 0.359, still earns a stellar 0.752 in appreciation — proving that nostalgia, when done right, resonates deeply.

Ultimately, themes and moods help players choose not just what to play, but how to feel. And the data is clear: players appreciate when games make them laugh, wonder, or simply immerse them in beautifully strange worlds.

In [None]:
tags_stats[(tags_stats['Top-Level Genres'] == True) & (tags_stats['count'] >= 100)][['tags', 'median_popularity', 'median_appreciation', 'count']].reset_index(drop = True)

**Top-Level Genres: Popular Giants and Quiet Achievers**

When it comes to defining a game’s identity, top-level genres are the broad strokes — the categories players search first, and often the ones most closely tied to commercial expectations. But the data shows a fascinating divide: popularity doesn't always track with appreciation.

Action, the largest genre by volume (nearly 3,000 titles), predictably enjoys strong visibility with a median popularity of 0.424 — above the threshold. Yet it narrowly misses the appreciation cutoff (0.709), hinting that while Action games draw eyes, they don’t always win hearts. A similar pattern appears in Simulation, RPG, and Strategy — all comfortably popular, but just shy of the highest appreciation tier.

By contrast, Adventure games thread the needle: with 0.432 in popularity and 0.724 in appreciation, they’re a rare example of both broad appeal and consistent acclaim. Action-Adventure follows closely, showing players value hybrid experiences that balance exploration with intensity.

Meanwhile, Casual games stand out as stealth favorites. Despite a below-threshold popularity of 0.361, they deliver high appreciation (0.723), proving that smaller, more accessible experiences can still make lasting impressions.

At the other end of the spectrum, Racing and Sports titles face an uphill climb. Both genres fall below 0.71 in appreciation — and in the case of Sports, well below it (0.659). These findings suggest niche audiences and limited appeal outside of enthusiast circles.

Finally, Indie games tell a familiar story: a massive pool of titles (over 2,000), low median popularity (0.344), but still solid appreciation (0.718). The Indie label may not guarantee visibility — but quality is clearly being recognized by those who seek it out.

In the end, the data underscores an essential truth: top-level genre is just the beginning of a game’s story. It may determine the crowd it attracts — but not necessarily the experience they walk away with.

In [None]:
tags_stats[(tags_stats['Visuals & Viewpoint'] == True) & (tags_stats['count'] >= 200)][['tags', 'median_popularity', 'median_appreciation', 'count']].reset_index(drop = True)

**Visuals & Viewpoints: Pixel Charm, 2D Prestige, and a Popularity Paradox**

Visuals are more than aesthetic — they shape tone, accessibility, and even gameplay itself. And yet, the data on visual styles and camera perspectives reveals a surprising contradiction: the most popular viewpoints aren’t always the most beloved.

Take 2D games. With a popularity just under the 0.39 threshold (0.371), they might be considered a niche in today’s 3D-dominated world. But the appreciation score tells another story — a robust 0.751, placing 2D titles among the most appreciated visual styles. It’s a similar tale for Pixel Graphics and Hand-drawn visuals: low popularity, high admiration. The retro aesthetic, often a hallmark of indie titles, may lack mainstream pull but consistently resonates with players.

Even Cartoony, Colorful, and Cute styles — often dismissed as “casual” or “kiddie” — post impressive appreciation numbers, all above 0.73. Cute, in particular, balances right on the popularity line (0.390), but earns one of the highest approval ratings (0.752). These visuals may not shout for attention, but they quietly charm players once discovered.

On the flip side, 3D and Realistic visuals, while popular (especially Realistic at 0.497), tend to underperform in appreciation. Realistic visual style is the only one here to dip well below the 0.71 line (0.681), suggesting that visual fidelity alone no longer wins hearts — it may, in fact, raise expectations that are harder to meet.

In terms of camera perspective, Third Person and First-Person dominate in popularity — unsurprising, as they're standard in most AAA genres. However, their appreciation is middling, hovering around the threshold. Isometric and Top-Down views, while less common, are both above the 0.71 mark, indicating that their appeal may lie in their functional clarity and tactical depth rather than cinematic flair.

Then there’s Anime and Stylized — both blend distinct visual identity with strong appreciation. Anime-style games, in particular, strike a successful balance with high popularity (0.425) and high appreciation (0.734), one of the few tags to do both.

The verdict? While glossy 3D visuals and familiar camera angles bring visibility, it’s the artful, expressive, and nostalgic styles — especially in 2D and stylized formats — that truly leave an impact.

In [None]:
tags_stats[(tags_stats['Genres'] == True) & (tags_stats['count'] >= 200)][['tags', 'median_popularity', 'median_appreciation', 'count']].reset_index(drop = True)

**Genres in Focus: Balancing Popularity and Player Love**

When looking at how genre tags map onto popularity and appreciation, the picture is one of subtle contrasts rather than sharp divides. A few genres like Sandbox and JRPG manage to balance strong appreciation with solid popularity, sitting comfortably in the upper-right quadrant. Shooter and Action RPG also perform well on the popularity front, though their appreciation scores are slightly more muted.

On the other hand, genres like Puzzle, Platformer, and Visual Novel aren’t the most played but stand out for their consistently high appreciation. These tags suggest that while they might cater to smaller audiences, those audiences are deeply satisfied.

At the lower end of both metrics, genres such as Point & Click, Arcade, and especially Walking Simulator show limited reach and more tempered reception—possibly reflecting niche appeal or genre fatigue. Overall, while no genre dominates both dimensions, some carve out strong identities through either cultural relevance, emotional resonance, or lasting design appeal.

In [None]:
tags_stats[(tags_stats['Sub-Genres'] == True) & (tags_stats['count'] >= 150)][['tags', 'median_popularity', 'median_appreciation', 'count']].reset_index(drop = True)

**Subgenres Spotlight: A Tale of Niche Appeal and Broad Reach**

The subgenre tags show a rich variety in how different gameplay styles resonate with players in terms of both popularity and appreciation. A few patterns stand out.

Third-Person Shooter leads in popularity, crossing the upper threshold with a median of 0.56, though it doesn’t quite reach the appreciation benchmark. It’s a genre with mainstream appeal, but perhaps more functional than emotionally resonant. Similarly, FPS and Hack and Slash enjoy broad reach (both above 0.45 in popularity), yet land just below the appreciation threshold, suggesting these fast-paced experiences are more accessible than deeply loved.

On the other hand, Metroidvania, Puzzle-Platformer, and 2D Platformer all exhibit strong appreciation scores (above 0.74), indicating that games in these subgenres—though less dominant in overall player base—tend to leave a lasting impression on those who engage with them. Puzzle-Platformer in particular stands out with one of the highest appreciation medians (0.75), reflecting the value players place on creativity and design nuance.

Exploration, Rogue-lite, and 3D Platformer hover near the appreciation threshold and maintain moderate popularity, showing a balanced appeal. Meanwhile, subgenres like Action Roguelike and Turn-Based Tactics fall just short on both axes, indicating niche status despite their loyal fanbases.

Altogether, these results reinforce that subgenres focused on tight design and inventive mechanics tend to perform well in appreciation, even when they don’t dominate in popularity. Conversely, highly popular subgenres may sometimes sacrifice distinctiveness or emotional engagement for broader reach.

In [None]:
tags_stats[(tags_stats['Players'] == True) & (tags_stats['count'] >= 100)][['tags', 'median_popularity', 'median_appreciation', 'count']].reset_index(drop=True)

**Player Modes: The Social Pulse of Gaming**

The distribution of player-oriented tags offers a clear picture of how different play modes perform in terms of popularity and appreciation, with some striking imbalances between the two dimensions.

Team-Based, Online Co-Op, and Massively Multiplayer stand out on the popularity front, all exceeding the 0.5 popularity threshold. However, none of them cross the appreciation benchmark, suggesting that while these modes attract many players, they might not deliver consistently memorable or satisfying experiences. This is particularly noticeable for Massively Multiplayer, which is the least appreciated tag in the set despite its high popularity—a sign of potential issues with quality or coherence in large-scale multiplayer games.

In contrast, Singleplayer emerges as a strong performer on the appreciation side (0.73), and while it’s not one of the most popular tags, it still enjoys a solid player base, especially given its massive count. Similarly, Local Co-Op and Local Multiplayer—though less prominent overall—show relatively high appreciation, pointing to a persistent affection for shared-screen or couch co-op formats. This suggests that proximity and social play still hold unique value for many players.

Co-op and Multiplayer in general maintain a healthy balance, with both metrics above the thresholds. These modes appear to hit the sweet spot—broadly appealing and decently appreciated—likely due to their flexibility and ubiquity across genres.

PvP and 4 Player Local, meanwhile, fall into a more middle-ground zone: neither disliked nor particularly standout. These results may reflect niche appeal or experiences that are more dependent on external factors (like player community or match quality) than on game design alone.

Overall, these data hint at a recurring theme: social and cooperative formats are popular but not always deeply appreciated, while singleplayer and local experiences, though more contained, often resonate more meaningfully with players.

In [None]:
tags_stats[(tags_stats['Story'] == True) & (tags_stats['count'] >= 100)][['tags', 'median_popularity', 'median_appreciation', 'count']].reset_index(drop=True)

**Storytelling in Games: Where Player Choice Meets Rich Narratives**

Story-driven elements remain a vital pillar in game design, shaping immersive worlds and emotional connections that captivate players.

Leading the pack is the Story Rich tag, boasting both high popularity (0.466) and strong appreciation (0.748) across a vast catalog of 1,532 games. This confirms that players crave deep, well-crafted narratives that offer expansive storytelling and meaningful engagement.

Closely tied are tags like Choices Matter (popularity 0.438, appreciation 0.741) and Multiple Endings (0.428 / 0.744), both emphasizing player agency and branching storylines. These features resonate with players who want their decisions to impact the game world, enhancing replayability and emotional investment.

Genres centered on narrative interactivity such as Visual Novel (0.393 popularity, 0.741 appreciation), Interactive Fiction (0.376 / 0.729), and Choose Your Own Adventure (0.383 / 0.727) maintain solid appreciation despite slightly below-threshold popularity. This suggests a niche but dedicated audience that values story-first gameplay.

Themes like Detective (0.400 popularity, 0.739 appreciation) and Historical settings (0.432 / 0.724) also find their footing, indicating players’ appetite for mystery, investigation, and richly contextualized worlds that blend narrative with setting.

Even tags with lower popularity, such as Narration (0.369 popularity, 0.737 appreciation) and Romance (0.413 / 0.737), maintain high appreciation scores — showcasing how effective storytelling and emotional themes captivate those who experience them.

Overall, story-focused games demonstrate that strong narrative design and player choice remain central to player enjoyment. While not every narrative tag attracts mass popularity, those that do tend to also enjoy high appreciation, underscoring storytelling’s role in creating memorable and impactful gaming experiences.

In [None]:
tags_stats[(tags_stats['Level Design'] == True) & (tags_stats['count'] >= 100)][['tags', 'median_popularity', 'median_appreciation', 'count']].reset_index(drop=True)

**Level Design Trends: Freedom Leads, But Structure Still Shines**

In the evolving landscape of video game design, level structure plays a pivotal role in shaping how players engage with game worlds. Whether it’s a tightly guided experience or a sprawling world ripe for discovery, the data reveals a strong appetite for a range of design philosophies.

Open World titles lead the popularity chart with a median score of 0.549, indicating their dominance in the modern gaming imagination. Despite their broad appeal, they register a slightly lower appreciation score of 0.718 — just above the threshold. This suggests that while players are drawn to freedom and scale, delivering consistent quality across such massive spaces remains a creative challenge.

Meanwhile, Sandbox design — often overlapping with open world but emphasizing systemic freedom — performs exceptionally well, with both high popularity (0.527) and strong appreciation (0.734). This reflects a clear enthusiasm for games that allow experimentation and emergent gameplay, where players craft their own stories and solutions.

Interestingly, Exploration as a design focus also maintains solid footing, with moderate popularity (0.430) and high appreciation (0.729). It underscores a broader trend: players enjoy being given the space to uncover secrets and immerse themselves in richly layered environments.

Contrastingly, Linear level design — though far less popular (0.362) — boasts one of the highest appreciation scores (0.745). This disparity suggests that while linear games might not be as widely produced or sought-after, when done well, they resonate deeply with players seeking tight pacing, cinematic flow, and refined control over narrative delivery.

Side Scroller design straddles the middle ground: modest popularity (0.368, just under the threshold) but strong appreciation (0.736), pointing to enduring love for classic forms reimagined with modern mechanics.

Overall, the data points to a healthy diversity in player preferences. Whether expansive and emergent or compact and curated, great level design — when aligned with player expectations — remains one of the most appreciated elements in game development today.

In [None]:
df['popularity_plus_appreciation'] = df['popularity_score'] + df['appreciation_score']

In [None]:
# Drop righe senza tag
df_clean = df.dropna(subset=['tags']).copy(deep=True)

# Crea colonna 'tag_pairs' con tutte le combinazioni di 2
df_clean['tag_pairs'] = df_clean['tags'].apply(
    lambda tags: list(combinations(sorted(set(tags)), 2)) if len(tags) >= 2 else []
)

# Esplodi in una riga per ogni coppia
df_pairs = df_clean.explode('tag_pairs').dropna(subset=['tag_pairs'])

# Estrai i singoli tag per leggibilità
df_pairs['tag1'] = df_pairs['tag_pairs'].apply(lambda x: x[0])
df_pairs['tag2'] = df_pairs['tag_pairs'].apply(lambda x: x[1])

In [None]:
tag_pairs_stats = df_pairs.groupby(['tag1', 'tag2']).agg(
    median_popularity=('popularity_score', 'median'),
    median_appreciation=('appreciation_score', 'median'),
    median_popularity_plus_appreciation=('popularity_plus_appreciation', 'median'),
    median_popularity_minus_appreciation=('popularity_minus_appreciation', 'median'),
    count=('popularity_score', 'count')
).reset_index()

In [None]:
tag_pairs_stats.sort_values('count', ascending=False)[:10]

In [None]:
tag_pairs_stats[tag_pairs_stats['count'] >= 100].sort_values(by='median_popularity', ascending=False).reset_index(drop=True)[:10]

In [None]:
tag_pairs_stats.sort_values(by='count', ascending=False).reset_index(drop=True)[:100][['tag1','tag2','count']][:10]

In [None]:
# Create tag pair labels
tag_pairs_stats['tag_pair'] = tag_pairs_stats['tag1'] + ' + ' + tag_pairs_stats['tag2']

alt.Chart(tag_pairs_stats.sort_values(by='count', ascending=False).reset_index(drop=True)[:100]).mark_rect().encode(
    y=alt.Y('tag1:N', title=''),
    x=alt.X('tag2:N', title='', axis=alt.Axis(labelAngle=270)),
    color=alt.Color('count:Q',
                    scale=alt.Scale(scheme='orangered', reverse=False),
                    title='Count'),
    tooltip=[
        'tag1', 'tag2',
        alt.Tooltip('count:Q'),
    ]
).properties(
    width=600,
    height=600
)

The most popular tag pairs blend immersion, freedom, and interaction. Leading combinations like Atmospheric + Third-Person Shooter or Open World + Stealth highlight the appeal of moody environments, tactical movement, and player agency.

Action-focused duos such as Shooter + Story Rich or FPS + Story Rich show growing demand for narrative depth in traditionally mechanic-heavy genres. Meanwhile, Online Co-Op pairings with immersive or first-person tags reflect the strong draw of shared, engaging experiences.

Across the board, high popularity emerges not from individual features, but from well-matched combinations that offer both intensity and depth — emotional, strategic, or social.

**Power Duos: The Tag Pairs Behind Gaming’s Most Popular Experiences**

In the vast and ever-growing landscape of video game design, certain combinations of features strike a chord so strongly with players that they become magnetic. Analyzing the top-performing tag pairs based on median popularity, a clear pattern emerges: success often lies in fusing immersive environments, dynamic perspectives, and cooperative or narrative elements.

At the very top, “Atmospheric + Third-Person Shooter” games take the crown with an astonishing median popularity of 0.655. This pairing, seen in titles like The Last of Us Part II or Control, blends moody, environmental storytelling with fluid combat — a formula that evidently resonates with players looking for depth and intensity.

Close behind is the combo of “Open World + Stealth” (0.647), suggesting that players relish the freedom to move and the thrill of remaining unseen. Think Assassin’s Creed or Metal Gear Solid V, where the sandbox is not just for roaming but for orchestrating silent takedowns.

Open-world brutality also finds an audience: both “Open World + Violent” and “Gore + Open World” appear in the top 10, each with popularity above 0.60. These combinations indicate the continued draw of mature, visceral experiences — a domain dominated by titles like GTA V, Days Gone, or Far Cry.

Narrative depth finds its place too. The pairings “Shooter + Story Rich” and “FPS + Story Rich” boast popularity scores above 0.62, while maintaining high appreciation scores (above 0.77). These numbers reinforce the rise of storytelling in genres traditionally defined by mechanics — a sign of players demanding both gunplay and gravitas.

On the multiplayer front, combinations like “Atmospheric + Online Co-Op” and “First-Person + Online Co-Op” reflect the social shift in modern gaming. These duos offer the best of both worlds: immersive perspectives and the ability to share those experiences with others. Whether it's surviving in eerie post-apocalyptic settings or exploring alien worlds, players crave connection without compromising mood or immersion.

Finally, “Sandbox + Third Person” rounds out the list — a pairing that underscores player love for spatial awareness and creative autonomy, echoing the design philosophies of games like Garry’s Mod, Just Cause, or Red Dead Redemption 2.

In short, these high-performing tag pairings tell a clear story: the most popular experiences are those that successfully blend freedom, immersion, and interaction, often underpinned by narrative or social depth. It’s not just about doing one thing well — it’s about how the right elements reinforce each other to build games players can’t get enough of.

In [None]:
df['popularity_score'].median()

In [None]:
df['appreciation_score'].median()

When trying to categorize tag pairs or triples into the classic four-quadrant split—high vs. low popularity and high vs. low appreciation—a noticeable complication emerges. Unlike single tags, which often fall cleanly into one quadrant, combinations tend to cluster near the dividing lines. Most tag pairs and triples hover around the threshold values, making it difficult to assign them confidently to any one quadrant. This distribution blurs the boundaries and suggests that while combinations do add context, they also dilute clarity. In practice, this makes it harder to extract sharp, actionable insights from composite tags using the quadrant model.

In [None]:
tag_pairs_stats[(tag_pairs_stats['count'] >= 100) & (tag_pairs_stats['median_popularity'] > df['popularity_score'].median()) & (tag_pairs_stats['median_appreciation'] > df['appreciation_score'].median())].sort_values(by='median_popularity_plus_appreciation', ascending=False)[['tag1', 'tag2', 'median_popularity', 'median_appreciation', 'count']].reset_index(drop=True)[:10]

Some tag pairings don’t just succeed—they excel, drawing both wide audiences and strong appreciation. Atmospheric + Third-Person Shooter leads with cinematic tension and satisfying action, showing how mood can enhance gameplay. Story Rich combined with Shooter or FPS demonstrates that players crave narrative, even in action-heavy experiences. Customization is another winner: Moddable + Singleplayer or Multiplayer games clearly resonate, offering creative freedom and replay value. And across genres, Great Soundtrack continues to boost emotional impact, whether paired with Sandbox or Third Person play. These combinations offer more than trends—they reflect what players genuinely value.

In [None]:
tag_pairs_stats[(tag_pairs_stats['count'] >= 100) & (tag_pairs_stats['median_popularity'] > df['popularity_score'].median()) & (tag_pairs_stats['median_appreciation'] < df['appreciation_score'].median())].sort_values(by='median_popularity_minus_appreciation', ascending=False)[['tag1', 'tag2', 'median_popularity', 'median_appreciation', 'count']].reset_index(drop=True)[:10]

The data around popular but underappreciated tag pairs offers insight into a recurring dynamic in game design: crowd appeal versus critical engagement. Combinations like Co-op + Violent or Multiplayer + Stealth draw strong interest — their median popularity scores comfortably exceed the threshold — but they consistently fall short in appreciation. These games often promise action and social interaction, yet fail to deliver experiences that resonate or endure.

A similar pattern emerges with Open World games paired with Gore, FPS, or Online Co-Op. Their high visibility suggests marketing hooks and genre familiarity work well to bring players in. However, the appreciation scores remain below par, hinting at a potential fatigue with formulaic execution or bloated design. The thrill of scale and intensity may attract, but without emotional or mechanical depth, satisfaction wanes.

This segment of the market seems caught in a loop: emphasizing mass appeal through well-known mechanics and themes, but rarely earning the lasting regard that comes from originality, balance, or storytelling.

In [None]:
tag_pairs_stats[(tag_pairs_stats['count'] >= 100) & (tag_pairs_stats['median_popularity'] < df['popularity_score'].median()) & (tag_pairs_stats['median_appreciation'] > df['appreciation_score'].median())].sort_values(by='median_popularity_minus_appreciation', ascending=True)[['tag1', 'tag2', 'median_popularity', 'median_appreciation', 'count']].reset_index(drop=True)[:10]

Some games don’t make waves in the mainstream, but they quietly earn deep appreciation from those who play them. Tag pairs like 2D Platformer + Puzzle-Platformer or Arcade + Pixel Graphics recall an era of tight design and visual charm—traits that still resonate strongly with dedicated players. Similarly, combinations like 2D + Singleplayer or Action + Bullet Hell may not dominate charts, but they deliver engaging, skill-driven experiences that foster loyalty. These are the connoisseur’s choices: less visible, but highly valued by their communities.

In [None]:
tag_pairs_stats[(tag_pairs_stats['count'] >= 100) & (tag_pairs_stats['median_popularity'] < df['popularity_score'].median()) & (tag_pairs_stats['median_appreciation'] < df['appreciation_score'].median())].sort_values(by='median_popularity_plus_appreciation', ascending=True)[['tag1', 'tag2', 'median_popularity', 'median_appreciation', 'count']].reset_index(drop=True)[:10]

Some combinations struggle to find a strong foothold, with low popularity and modest appreciation. Tags like Indie + Space and Action + VR highlight niche or emerging areas that haven’t yet hit mainstream appeal. Similarly, Arcade + Casual or Singleplayer + VR suggest experiments that haven’t fully connected with players. Even traditionally popular genres like 3D Action-Adventure and Horror Indie show that innovation alone isn’t enough—execution and player interest must align to succeed. These pairs reveal opportunities for growth but also cautionary tales.

In [None]:
valid_tags = df['tags'].dropna()
valid_tags = valid_tags.apply(lambda tags: [tag.replace(' ', '_') for tag in tags])

tag_pairs = []
for tags in valid_tags:
    pairs = combinations(sorted(set(tags)), 2)  # Sorted per evitare (a, b) vs (b, a)
    tag_pairs.extend(pairs)

# Conta le coppie
pair_counter = Counter(tag_pairs)
pair_counter.pop(('[', ']'), None)

In [None]:
# Create a graph
G = nx.Graph()

# Add edges from tag pairs with their frequency as weight
for (tag1, tag2), count in pair_counter.items():
    if count >= 100:
        G.add_edge(tag1, tag2, weight=count/100)

In [None]:
with open("tags_graph.ncol", "w") as f:
    for u, v, data in G.edges(data=True):
        weight = data.get("weight", 1.0)
        f.write(f"{u} {v} {weight}\n")

In [None]:
# Initialize PyVis network
net = Network()

# Load graph from NetworkX
net.from_nx(G)

# Update node properties
for node in net.get_nodes():
    net.get_node(node)['label'] = str(node)
    net.get_node(node)['physics'] = True
    net.get_node(node)['size'] = 100          # Increase node size (default is around 15)
    net.get_node(node)['font'] = {'size': 100}  # Increase label font size

# Enable physics globally
net.toggle_physics(True)

# Adjust physics parameters to increase node distance
net.barnes_hut(spring_length=150)  # Default is around 100; increase to spread nodes more

# Show the graph
net.show('tags_graph.html', notebook=False)

In [None]:
# Drop righe senza tag
df_clean = df.dropna(subset=['tags']).copy(deep=True)

# Crea colonna 'tag_pairs' con tutte le combinazioni di 2
df_clean['tag_triples'] = df_clean['tags'].apply(
    lambda tags: list(combinations(sorted(set(tags)), 3)) if len(tags) >= 2 else []
)

# Esplodi in una riga per ogni coppia
df_triples = df_clean.explode('tag_triples').dropna(subset=['tag_triples'])

# Estrai i singoli tag per leggibilità
df_triples['tag1'] = df_triples['tag_triples'].apply(lambda x: x[0])
df_triples['tag2'] = df_triples['tag_triples'].apply(lambda x: x[1])
df_triples['tag3'] = df_triples['tag_triples'].apply(lambda x: x[2])

In [None]:
tag_triples_stats = df_triples.groupby(['tag1', 'tag2', 'tag3']).agg(
    median_popularity=('popularity_score', 'median'),
    median_appreciation=('appreciation_score', 'median'),
    median_popularity_plus_appreciation=('popularity_plus_appreciation', 'median'),
    median_popularity_minus_appreciation=('popularity_minus_appreciation', 'median'),
    count=('popularity_score', 'count')
).reset_index()

In [None]:
tag_triples_stats[tag_triples_stats['count'] >= 100].sort_values(by='median_popularity', ascending=False)[:10]

In [None]:
tag_triples_stats[tag_triples_stats['count'] >= 100].sort_values(by='median_popularity', ascending=False)[['tag1', 'tag2', 'tag3', 'median_popularity', 'median_appreciation', 'count']].reset_index(drop = True)[:10]

When looking at the most popular tag triples in today’s gaming landscape, a clear picture emerges: players crave scale, spectacle, and story—ideally all at once. The trio of Multiplayer, Story Rich, and Third Person tops the list, showing that narrative ambition isn’t reserved for solo play. Whether it’s a cooperative journey or a competitive arena, players still want a well-told story anchoring their experience.

Other standouts, like Atmospheric, Shooter, Story Rich and Adventure, Shooter, Story Rich, confirm that emotional depth and explosive action are no longer opposing forces—they’re complementary. These combinations are not only highly popular but also sit comfortably above the appreciation threshold, signaling critical and player acclaim.

Even formulas like Open World, Stealth, Singleplayer or Action, Open World, Stealth—which flirt with the edges of high appreciation—highlight a desire for agency and strategic freedom. What’s missing, interestingly, are novelty or niche mechanics. Instead, the data rewards polish, familiarity, and integration of proven elements into cohesive experiences.

This top tier of tag trios doesn’t reinvent the wheel, but it shows us which wheels are turning fastest—and why.


In [None]:
tag_triples_stats[(tag_triples_stats['count'] >= 100) & (tag_triples_stats['median_popularity'] > df['popularity_score'].median()) & (tag_triples_stats['median_appreciation'] > df['appreciation_score'].median())].sort_values(by='median_popularity_plus_appreciation', ascending=False)[['tag1', 'tag2', 'tag3', 'median_popularity', 'median_appreciation', 'count']].reset_index(drop = True)[:10]

In [None]:
tag_triples_stats[(tag_triples_stats['count'] >= 100) & (tag_triples_stats['median_popularity'] > df['popularity_score'].median()) & (tag_triples_stats['median_appreciation'] < df['appreciation_score'].median())].sort_values(by='median_popularity_minus_appreciation', ascending=False)[['tag1', 'tag2', 'tag3', 'median_popularity', 'median_appreciation', 'count']].reset_index(drop = True)[:10]

In [None]:
tag_triples_stats[(tag_triples_stats['count'] >= 100) & (tag_triples_stats['median_popularity'] < df['popularity_score'].median()) & (tag_triples_stats['median_appreciation'] > df['appreciation_score'].median())].sort_values(by='median_popularity_minus_appreciation', ascending=True)[['tag1', 'tag2', 'tag3', 'median_popularity', 'median_appreciation', 'count']].reset_index(drop = True)[:10]

In [None]:
tag_triples_stats[(tag_triples_stats['count'] >= 100) & (tag_triples_stats['median_popularity'] < df['popularity_score'].median()) & (tag_triples_stats['median_appreciation'] < df['appreciation_score'].median())].sort_values(by='median_popularity_plus_appreciation', ascending=True)[['tag1', 'tag2', 'tag3', 'median_popularity', 'median_appreciation', 'count']].reset_index(drop = True)[:10]

Are there some genres/tags that are more frequent in the indie context? What about non indie context instead?

In [None]:
inverted_tag_groups = {value : key for key, value_list in tag_groups.items() for value in value_list}

In [None]:
indie_tags = df[df['is_indie'] == True].explode('tags')['tags'].value_counts().reset_index()

for idx, row in indie_tags.iterrows():
    if row['tags'] in inverted_tag_groups.keys():
        indie_tags.at[idx, 'group'] = inverted_tag_groups[row['tags']]

In [None]:
alt.Chart(indie_tags[indie_tags['group'] == 'Sub-Genres'].sort_values(by='count', ascending=False)[:10]).mark_bar().encode(
    x = alt.X('count:Q'),
    y = alt.Y('tags:N', sort='-x')
)

In [None]:
indie_tags[indie_tags['group'] == 'Sub-Genres'].sort_values(by='count', ascending=False).reset_index(drop=True)[:10]

In [None]:
not_indie_tags = df[df['is_indie'] == False].explode('tags')['tags'].value_counts().reset_index()

for idx, row in not_indie_tags.iterrows():
    if row['tags'] in inverted_tag_groups.keys():
        not_indie_tags.at[idx, 'group'] = inverted_tag_groups[row['tags']]

In [None]:
alt.Chart(not_indie_tags[not_indie_tags['group'] == 'Sub-Genres'].sort_values(by='count', ascending=False)[:10]).mark_bar().encode(
    x = alt.X('count:Q'),
    y = alt.Y('tags:N', sort='-x')
)

In [None]:
not_indie_tags[not_indie_tags['group'] == 'Sub-Genres'].sort_values(by='count', ascending=False).reset_index(drop=True)[:10]

This comparison between sub-genre tags in indie and non-indie games reveals some clear stylistic and design preferences that distinguish the two spheres.

Indie games show a strong inclination toward mechanically tight, design-driven genres—Puzzle-Platformers, Rogue-lites, Metroidvanias, and 2D Platformers dominate the list. These are genres well-suited to smaller teams, emphasizing innovation in level design, procedural generation, and challenge-based gameplay over expensive production values. The prominence of Action Roguelikes and Shoot 'Em Ups further reflects a preference for replayability, precision mechanics, and retro inspirations—core pillars of many successful indie titles.

On the other hand, non-indie titles lean heavily into cinematic, high-production genres. FPS and Third-Person Shooters top the list, signaling big-budget investments in action, polish, and visual fidelity. Tags like Immersive Sim, Souls-like, and even 3D Platformer appear more frequently here than in the indie set, likely due to the development complexity and resource demands these genres typically require.

Some tags—Hack and Slash, FPS, and Dungeon Crawler—bridge both worlds, but their relative counts show how differently they're emphasized. For example, Metroidvania and 2D Platformer are far more prevalent in indie games, while Souls-like and Immersive Sim are nearly exclusive to non-indie productions.

In short, indie games tend to favor creative, systems-driven subgenres with lower production overhead, while non-indie games lean toward immersive, large-scale action experiences. This reflects not just budget differences, but also differing creative priorities and audience expectations.

In [None]:
# top_count_tags = compute_subgroup_stats(df, 'tags', tag_map, 'tag', 'count')
# top_popularity_tags = compute_subgroup_stats(df, 'tags', tag_map, 'tag', 'median_popularity')
# top_appreciation_tags = compute_subgroup_stats(df, 'tags', tag_map, 'tag', 'median_appreciation')

In [None]:
min(df['initial_price'])

In [None]:
min(df['current_price'])

In [None]:
(df['current_price'] == 0.).value_counts()

In [None]:
df['initial_price'].isna().value_counts()

In [None]:
df['current_price'].isna().value_counts()

In [None]:
((df['initial_price'].isna()) & (df['current_price'].isna())).value_counts()

In [None]:
df['initial_price'].describe()

In [None]:
df['current_price'].describe()

In [None]:
for idx, row in df.iterrows():
    if not isinstance(row['tags'], float):
        for tags in row['tags']:
            if ('Free to Play' in tags) and pd.isna(row['initial_price']):
                df.at[idx, 'initial_price'] = float(0)
                df.at[idx, 'current_price'] = float(0)

In [None]:
for idx, row in df.iterrows():
    if row['initial_price'] == 0.:
        df.at[idx, 'initial_price_range'] = 'Free' 
    elif 0. < row['initial_price'] <= 5.:
        df.at[idx, 'initial_price_range'] = '0-5'
    elif 5. < row['initial_price'] <= 10.:
        df.at[idx, 'initial_price_range'] = '5-10'
    elif 10. < row['initial_price'] <= 20.:
        df.at[idx, 'initial_price_range'] = '10-20'
    elif 20. < row['initial_price'] <= 50.:
        df.at[idx, 'initial_price_range'] = '20-50'
    elif 50. < row['initial_price'] <= 100.:
        df.at[idx, 'initial_price_range'] = '50-100'
    

In [None]:
for idx, row in df.iterrows():
    if row['current_price'] == 0.:
        df.at[idx, 'current_price_range'] = 'Free'
    elif 0. < row['current_price'] <= 5.:
        df.at[idx, 'current_price_range'] = '0-5'
    elif 5. < row['current_price'] <= 10.:
        df.at[idx, 'current_price_range'] = '5-10'
    elif 10. < row['current_price'] <= 20.:
        df.at[idx, 'current_price_range'] = '10-20'
    elif 20. < row['current_price'] <= 50.:
        df.at[idx, 'current_price_range'] = '20-50'
    elif 50. < row['current_price'] <= 70.:
        df.at[idx, 'current_price_range'] = '50-70'

In [None]:
df[df['is_indie'] == True]['initial_price_range'].value_counts()

In [None]:
(df['is_indie'] == True).value_counts()

In [None]:
df[df['is_indie'] == False]['initial_price_range'].value_counts()

In [None]:
df['initial_price_range'].unique()

In [None]:
df['current_price_range'].unique()

In [None]:
alt.Chart(df[df['initial_price'].notna()]).mark_bar().encode(
    x = alt.X('initial_price_range:O', sort=['Free','0-5', '5-10', '10-20', '20-50', '50-100']),
    y = alt.Y('count()')
).properties(
    width = 300
)

In [None]:
alt.Chart(df[df['current_price'].notna()]).mark_bar().encode(
    x = alt.X('current_price_range:O', sort=['Free','0-5', '5-10', '10-20', '20-50', '50-70']),
    y = alt.Y('count()')
).properties(
    width = 300
)

In [None]:
alt.Chart(df[df['initial_price'].notna()]).mark_line(point=True).encode(
    x = alt.X('initial_price_range:O', sort=['Free','0-5', '5-10', '10-20', '20-50', '50-100']),
    y = alt.Y('median(popularity_score):Q')
).properties(
    width=300
)

In [None]:
alt.Chart(df[df['current_price'].notna()]).mark_line(point=True).encode(
    x = alt.X('current_price_range:O', sort=['Free','0-5', '5-10', '10-20', '20-50', '50-70']),
    y = alt.Y('median(popularity_score):Q')
).properties(
    width=300
)

In [None]:
alt.Chart(df[df['initial_price'].notna()]).mark_line(point=True).encode(
    x = alt.X('initial_price_range:O', sort=['Free','0-5', '5-10', '10-20', '20-50', '50-100']),
    y = alt.Y('median(appreciation_score):Q')
).properties(
    width=300
)

In [None]:
alt.Chart(df[df['current_price'].notna()]).mark_line(point=True).encode(
    x = alt.X('current_price_range:O', sort=['Free','0-5', '5-10', '10-20', '20-50', '50-70']),
    y = alt.Y('median(appreciation_score):Q')
).properties(
    width=300
)

In [None]:
tags_by_initial_price = df.explode('tags').groupby(['initial_price_range', 'tags']).agg(count = ('tags', 'count')).reset_index()
tags_by_initial_price = tags_by_initial_price[~tags_by_initial_price['tags'].isin(['[', ']'])]
tags_by_initial_price['tag_group'] = tags_by_initial_price['tags'].apply(lambda tag: inverted_tag_groups[tag])

In [None]:
tags_by_initial_price

In [None]:
count = 0
for real_tag in df.explode('tags')['tags'].unique():
    if real_tag in inverted_tag_groups.keys():
        count += 1

print(len(df.explode('tags')['tags'].unique()), count)

In [None]:
tags_stats_free = df[df['initial_price_range'] == 'Free'].explode('tags').groupby('tags').agg(
    total_reviews = ('reviews_total', 'sum'),
    avg_popularity=('popularity_score', 'mean'),
    avg_appreciation=('appreciation_score', 'mean'),
    median_popularity=('popularity_score', 'median'),
    median_appreciation=('appreciation_score', 'median'),
    count=('popularity_score', 'count')
).reset_index()

In [None]:
for tag_group in tag_groups.keys():
    for idx, tag in enumerate(tags_stats_free['tags'].values):
        if tag in tag_groups[tag_group]:
            tags_stats_free.at[idx, tag_group] = True
        else:
            tags_stats_free.at[idx, tag_group] = False

In [None]:
tag_groups.keys()

In [None]:
alt.Chart(tags_stats_free[tags_stats_free['Primary_Genres'] == True]).mark_bar().encode(
    x=alt.X('count:Q', title='Frequency of the tag'),
    y=alt.Y('tags:N', sort='-x', title='Tag'),
    tooltip=['tags:N', 'median_popularity:Q', 'count:Q']
).properties(
    title='Frequency of Primary Genres Tags'
)

In [None]:
alt.Chart(tags_stats_free[tags_stats_free['Core_Gameplay_Mechanics'] == True]).mark_bar().encode(
    x=alt.X('count:Q', title='Frequency of the tag'),
    y=alt.Y('tags:N', sort='-x', title='Tag'),
    tooltip=['tags:N', 'median_popularity:Q', 'count:Q']
).properties(
    title='Frequency of Core Gameplay Mechanics Tags'
)

In [None]:
alt.Chart(tags_stats_free[tags_stats_free['Subgenre_Combinations'] == True]).mark_bar().encode(
    x=alt.X('count:Q', title='Frequency of the tag'),
    y=alt.Y('tags:N', sort='-x', title='Tag'),
    tooltip=['tags:N', 'median_popularity:Q', 'count:Q']
).properties(
    title='Frequency of Subgenre Combinations Tags'
)

How are indie games distributed according to price?

In [None]:
df[df['is_indie'] == True]['initial_price_range'].value_counts().reset_index()

In [None]:
alt.Chart(df[df['is_indie'] == True]['initial_price_range'].value_counts().reset_index()).mark_bar().encode(
    x = alt.X('count:Q'),
    y = alt.Y('initial_price_range:N', sort='-x')
)

In [None]:
df[df['is_indie'] == False]['initial_price_range'].value_counts()

In [None]:
alt.Chart(df[df['is_indie'] == False]['initial_price_range'].value_counts().reset_index()).mark_bar().encode(
    x = alt.X('count:Q'),
    y = alt.Y('initial_price_range:N', sort='-x')
)

If there's one takeaway from our analysis, it's this: there is no universal recipe for making a successful game. While our predictive models uncovered consistent patterns—like the strong influence of wishlist interest, the positive role of localization and accessibility, or the impact of some tags—they are not crystal balls. The models can detect statistical associations, not causal relationships. They help us understand what tends to matter, but not why or how it leads to success in every case.

Crucially, our findings show that success—whether measured as popularity or appreciation—can take many forms. AAA games, with their broad platform releases and marketing power, often dominate in popularity. But they are not always the most appreciated. In fact, our model highlights a curious quadrant: games that gain a lot of attention but receive lukewarm reception. These are often large-scale titles that may attract players initially but fail to deliver a satisfying experience over time.

In contrast, some indie games quietly achieve high appreciation with far fewer resources. Titles like Freud’s Bones, for example, are testament to how originality, thoughtful design, and strong narrative identity can create deeply resonant experiences. Our analysis found that features often associated with indie development appear frequently in games with high appreciation but limited popularity. This suggests that innovation and emotional connection matter just as much as scale.

In the end, what players truly value seems to go beyond big budgets or mainstream visibility. Games that are crafted with care, that express a clear creative vision, and that dare to be different can find strong appreciation—even if they don’t dominate the charts. Success in games, as in art, is multifaceted. And perhaps that’s exactly what makes this medium so exciting.

In [None]:
df.to_csv('steam_games_dataset.csv', index=False)