###The Prevalance of Adventure Games in Industry and Market - Data Storytelling

In [2]:
import pandas as pd
games = pd.read_csv('games_cleaned.csv')

In [3]:
games

Unnamed: 0,Title,Release Date,Team,Rating,Times Listed,Genres,Summary,Reviews,Plays,Playing,Backlogs,Wishlist,Genre_Weight,Team_Cluster,Summary_Cluster,Embeddings
0,Elden Ring,2022-02-25,"['Bandai Namco Entertainment', 'FromSoftware']",4.5,3900,Adventure RPG,"Elden Ring is a fantasy, action and open world...","[""The first playthrough of elden ring is one o...",17000,3800,4600,4800,218.852570,8,4,[[-3.41334455e-02 -4.38266108e-03 1.30809518e...
1,Hades,2019-12-10,['Supergiant Games'],4.3,2900,Adventure Brawler Indie RPG,A rogue-lite hack and slash dungeon crawler in...,[convinced this is a roguelike for people who ...,21000,3200,6300,3600,26.383694,8,0,[[-2.78440397e-02 9.41349193e-03 6.61249757e...
2,The Legend of Zelda: Breath of the Wild,2017-03-03,"['Nintendo', 'Nintendo EPD Production Group No...",4.4,4300,Adventure RPG,The Legend of Zelda: Breath of the Wild is the...,[This game is the game (that is not CS:GO) tha...,30000,2500,5000,2600,218.852570,0,16,[[ 0.01696528 0.00396071 -0.01538952 -0.00813...
3,Undertale,2015-09-15,"['tobyfox', '8-4']",4.2,3500,Adventure Indie RPG Turn Based Strategy,"A small child falls into the Underground, wher...",[soundtrack is tied for #1 with nier automata....,28000,679,4900,1800,82.538730,6,5,[[-1.72836576e-02 -2.89896205e-02 -2.97180321e...
4,Hollow Knight,2017-02-24,['Team Cherry'],4.4,3000,Adventure Indie Platform,A 2D metroidvania with an emphasis on close co...,"[""this games worldbuilding is incredible with ...",21000,2400,8300,2300,80.642260,3,0,[[-5.08508086e-02 -1.92051195e-03 4.75870864e...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1080,Back to the Future: The Game,2010-12-22,['Telltale Games'],3.2,94,Adventure Point-and-Click,Back to the Future: The Game is one of Telltal...,[Very enjoyable game. The story adds onto the ...,763,5,223,67,17.368029,8,10,[[ 1.86578103e-03 -3.67635973e-02 -2.50637764e...
1081,Team Sonic Racing,2019-05-21,"['Sumo Digital', 'Sega']",2.9,264,Arcade Racing,Team Sonic Racing combines the best elements o...,"[jogo morto mas bom not my cup of tea ""Compare...",1500,49,413,107,38.009930,6,14,[[-0.07032053 0.05564697 0.0136536 -0.01316...
1082,Dragon's Dogma,2012-05-22,['Capcom'],3.7,210,Brawler RPG,"Set in a huge open world, Dragon’s Dogma: Dark...",[Underrated. A grandes rasgos es como un MMO p...,1100,45,487,206,26.842472,6,11,[[-4.33658175e-02 -1.11472672e-02 -4.25991192e...
1083,Baldur's Gate 3,2020-10-06,['Larian Studios'],4.1,165,Adventure RPG Strategy Tactical Turn Based Str...,"An ancient evil has returned to Baldur's Gate,...",[Bu türe bu oyunla girmeye çalışmak hataydı sa...,269,79,388,602,45.836210,6,10,[[ 8.49182624e-03 -2.68020015e-02 1.04696816e...


For this excercise, I'll be using a cleaned verson of the games dataset, originally from kaggle, which is a collection of the 1000 most popular games from a website called Backloggd. Backloggd is a virtual game libary system where users can create personalized collections of their favorite games and explore new ones.

As identifiers, we have the title of the game, as well as its genre. A game can belong to one or more genres.

First, let's start with Times Listed. Which genres are the most popular on the site?

In [16]:
import plotly.express as px
import pandas as pd

genre_data = games.groupby('Genres').agg({
    'Title': 'count',
    'Times Listed': 'mean'
}).reset_index()

genre_data.columns = ['Genre', 'Number of Games', 'Average Times Listed']

genre_data = genre_data.sort_values('Average Times Listed', ascending=False)

fig = px.scatter(
    genre_data,
    x='Genre',
    y='Average Times Listed',
    size='Number of Games',
    hover_name='Genre',
    hover_data=['Number of Games', 'Average Times Listed'],
    title='Game Genres: Popularity vs Frequency of Listing',
    labels={'Average Times Listed': 'Average Times Listed'},
    color='Average Times Listed',
    color_continuous_scale='plasma'
)

fig.update_layout(
    xaxis_title='',
    xaxis_tickangle=-45,
    xaxis_showgrid=False,
    xaxis_visible=False,
    yaxis_title='Average Times Listed',
    showlegend=False,
    title_x=0.5
)

fig.show()

I use the avg times listed per genre on the y, with the different genre combos on the x. I group by title count so we can see the frequency of the genre combinations. Larger bubbles indicate more common combos, while smaller bubbles indicate less frequent.

There are a few points on the far left with avg times listed > 2000, but these are rare genres with 1 or 2 games to their name. We can take a look at these later. We see our most visually striking clump from the 1000 - 500 avg times listed range. These bubbles show a strong preference to genre combos featuring "Adventure." In fact, even as we scroll over the less listed genre combos on the far right, we see a lot of bubbles having the "Adventure" tag.

This leads to our first question - how popular is the use of the term "Adventure" for describing a games genre?

To answer this, I split the genre combos into induvidual words and count the frequencies of each word. For example, "Adventure RPG" = 1 count for Adventure, and 1 count for RPG. I then plot the frequencies of each genre keyword.

In [21]:
import plotly.express as px
from collections import Counter
import pandas as pd

def split_genres(genre_string):
    if isinstance(genre_string, str):
        return [word.strip().lower() for word in genre_string.replace(',', ' ').split()]
    return []

all_genres = [
    word
    for genres in games['Genres'].dropna()
    for word in split_genres(genres)
]

genre_counts = Counter(all_genres)

genre_df = pd.DataFrame.from_dict(genre_counts, orient='index', columns=['Count']).reset_index()
genre_df.columns = ['Genre', 'Count']

genre_df = genre_df.sort_values('Count', ascending=False)

fig = px.scatter(
    genre_df,
    x='Genre',
    y='Count',
    size='Count',
    hover_name='Genre',
    hover_data=['Count'],
    title='Frequency of Game Genre Keywords',
    labels={'Count': 'Number of Occurrences'},
    color='Count',
    color_continuous_scale='viridis'
)

fig.update_layout(
    xaxis_title='',
    yaxis_title='Number of Occurrences',
    xaxis_showticklabels=False,
    xaxis_showgrid=False,
    showlegend=False,
    title_x=0.5
)

fig.update_traces(marker=dict(line=dict(width=1, color='DarkSlateGrey')))

fig.show()

Not surprisingly, the term "Adventure" (yellow) dominates the genre keywords. Out of 1085 games, "Adventure" is present for 65% of them. The next closest keyword is "RPG", which is present for 32% of games.

Let's look deeper into the "Adventure" phenomenon.

Adventure games are defined as a video game genre in which the player assumes the role of a protagonist in an interactive story, driven by exploration and/or puzzle-solving. The genre's focus on story allows it to draw heavily from other narrative-based media, such as literature and film, encompassing a wide variety of genres. (wikipedia).

Just from definition, it is clear "Adventure" games are able to craft a compelling, film-like story that can apply to a wide range of genres.

So does "Adventure" deserve to be its own genre, or is it more of a descriptor / theme that can be applied to a wide variety of games?

Let's form an experiment. Too see if "Adventure" is fitting of its own genre, we will take a look at the various genre combos present in the set and see where "Adventure" shows its strongest ties. If "Adventure" is present across a significant amount of genre pairings, it may be more fitting to call it a theme or descriptor rather than a genre. If "Adventure" is present for a non-significant amount of combos, then perhaps the reality is these types of games are simply more desirable and lead to better sales, hence why developers push out these types of games.



For other genres as comparators, I use the next five most frequent in the set - rpg, shooter, platform, indie, and strategy.

Experiment 1 - Adventure Vs. Next Top 5 Genres

p1 = proportion of games labeled as "Adventure" <br>
p2 = proportion of games labeled as the compared genre (e.g., RPG, Shooter, etc.)<br>

Null Hypothesis (H0):

There is no significant difference between the proportion of games labeled as "Adventure" and the proportion of games labeled as the compared genre (e.g., RPG, Shooter, etc.).

H0: p1 = p2

Alternative Hypothesis (Ha):

There is a significant difference between the proportion of games labeled as "Adventure" and the proportion of games labeled as the compared genre.

Ha: p1 ≠ p2


In [39]:
import pandas as pd
from collections import Counter

def split_genres(genre_string):
    if isinstance(genre_string, str):
        return [genre.strip() for genre in genre_string.split(',')]
    return []

all_combinations = games['Genres'].dropna().unique()

total_combinations = len(all_combinations)

def calculate_percentage(keyword):
    combinations = [combo for combo in all_combinations if keyword.lower() in combo.lower()]
    combo_count = len(combinations)
    percentage = (combo_count / total_combinations) * 100
    return combo_count, percentage

keywords = ['Adventure', 'RPG', 'Shooter', 'Platform', 'Indie', 'Strategy']

results = {}
for keyword in keywords:
    count, percentage = calculate_percentage(keyword)
    results[keyword] = {'count': count, 'percentage': percentage}

print(f"Total unique genre combinations: {total_combinations}")
for keyword, data in results.items():
    print(f"Combinations including '{keyword}': {data['count']}")
    print(f"Percentage of combinations with '{keyword}': {data['percentage']:.2f}%")
    print()

for keyword in keywords:
    paired_genres = []
    for combo in all_combinations:
        if keyword.lower() in combo.lower():
            genres = split_genres(combo)
            paired_genres.extend([genre for genre in genres if genre.lower() != keyword.lower()])

    genre_counts = Counter(paired_genres)

Total unique genre combinations: 253
Combinations including 'Adventure': 137
Percentage of combinations with 'Adventure': 54.15%

Combinations including 'RPG': 76
Percentage of combinations with 'RPG': 30.04%

Combinations including 'Shooter': 51
Percentage of combinations with 'Shooter': 20.16%

Combinations including 'Platform': 54
Percentage of combinations with 'Platform': 21.34%

Combinations including 'Indie': 97
Percentage of combinations with 'Indie': 38.34%

Combinations including 'Strategy': 78
Percentage of combinations with 'Strategy': 30.83%



We see "Adventure" to be present in 54% of all combos, with the next highest being "Indie" at 38%. Let's run a z-test to quantify the difference.

In [43]:
import scipy.stats as stats
import numpy as np

def z_test(p1, n1, p2, n2):
    p_pooled = (p1 * n1 + p2 * n2) / (n1 + n2)
    se = np.sqrt(p_pooled * (1 - p_pooled) * (1/n1 + 1/n2))
    z = (p1 - p2) / se
    p_value = 2 * (1 - stats.norm.cdf(abs(z)))
    return z, p_value

adventure_percentage = 54.15
genres = {
    'Adventure': 54.15,
    'RPG': 30.04,
    'Shooter': 20.16,
    'Platform': 21.34,
    'Indie': 38.34,
    'Strategy': 30.83
}

p1 = adventure_percentage / 100

for genre, percentage in genres.items():
    if genre != 'Adventure':
        p2 = percentage / 100
        z, p_value = z_test(p1, n, p2, n)
        print(f"Adventure vs {genre}:")
        print(f"Z-score: {z:.4f}")
        print(f"p-value: {p_value:.4f}")
        print()

Adventure vs RPG:
Z-score: 5.4925
p-value: 0.0000

Adventure vs Shooter:
Z-score: 7.9114
p-value: 0.0000

Adventure vs Platform:
Z-score: 7.6126
p-value: 0.0000

Adventure vs Indie:
Z-score: 3.5664
p-value: 0.0004

Adventure vs Strategy:
Z-score: 5.3059
p-value: 0.0000



All the z-scores are large > 3, meaning there is a greater difference between the two proportions being compared.

All the p values are very low < 0.05 (most are 0.00) indicating a significant difference between proportions. We reject the null hypothesis and say there is in fact a significant difference between combos which have "Adventure" and those that do not.

To finish out this investigation, let's see who the development teams are behind adventure games.

For every game with "Adventure" in the title, I add a tally to each who worked on it. Are all teams adopting the "Adventure" popularity, or just a select few?

In [48]:
import pandas as pd
from collections import Counter
import plotly.express as px

def is_adventure(genres):
    return 'Adventure' in str(genres)

adventure_games = games[games['Genres'].apply(is_adventure)]

def split_teams(team_string):
    if isinstance(team_string, str):
        return [team.strip() for team in team_string.strip("[]").replace("'", "").split(',')]
    return []

team_counter = Counter()

for _, game in adventure_games.iterrows():
    teams = split_teams(game['Team'])
    team_counter.update(teams)

team_counts = pd.DataFrame.from_dict(team_counter, orient='index', columns=['Count']).reset_index()
team_counts.columns = ['Team', 'Count']
team_counts = team_counts.sort_values('Count', ascending=False)

fig = px.scatter(
    team_counts,
    x='Team',
    y='Count',
    size='Count',
    hover_name='Team',
    hover_data=['Count'],
    title='Teams Involved in Adventure Games',
    labels={'Count': 'Number of Adventure Games'},
    color='Count',
    color_continuous_scale='viridis'
)

fig.update_layout(
    xaxis_title='',
    yaxis_title='Number of Adventure Games',
    xaxis_showticklabels=False,
    xaxis_showgrid=False,
    showlegend=False,
    title_x=0.5
)

fig.update_traces(marker=dict(line=dict(width=1, color='DarkSlateGrey')))

fig.show()

It's clear Nintendo (yellow) is the main proponent behind "Adventure" games, with 100 titles to their name (this is still an understatement as their subdivisons such as Nintendo EAD are counted separately). The next closest is Capcom, with 46 titles (2.17x less than Nintendo.)

So now we know while the use of "Adventure" as a genre is statistically significant within our set, it's not like every game studio is pushing out "Adventure" games for the sake of it - there are a select top few studios who contribute, being Nintendo, Capcom, and Square Enix.

In [51]:
def split_teams(team_string):
    if isinstance(team_string, str):
        return [team.strip() for team in team_string.strip("[]").replace("'", "").split(',')]
    return []

team_counter = Counter()

for _, game in games.iterrows():
    teams = split_teams(game['Team'])
    team_counter.update(teams)

team_counts = pd.DataFrame.from_dict(team_counter, orient='index', columns=['Count']).reset_index()
team_counts.columns = ['Team', 'Count']
team_counts = team_counts.sort_values('Count', ascending=False)

fig = px.scatter(
    team_counts,
    x='Team',
    y='Count',
    size='Count',
    hover_name='Team',
    hover_data=['Count'],
    title='Total Games per Team (All Genres)',
    labels={'Count': 'Number of Games'},
    color='Count',
    color_continuous_scale='viridis'
)

fig.update_layout(
    xaxis_title='',
    yaxis_title='Number of Games',
    xaxis_showticklabels=False,
    xaxis_showgrid=False,
    showlegend=False,
    title_x=0.5
)

fig.update_traces(marker=dict(line=dict(width=1, color='DarkSlateGrey')))

fig.show()

Even when accounting for all genres, the same top three are visible, with Nintendo remaining as the clear favorite.

To conclude, let's go back to our original question. Is "Adventure" a buzzword, or does it deserve to remain as a genre? Well, we know "Adventure" is heavily present in over half of our games. We also know that a select few teams are responsible for creating these large amount of "Adventure" games.

While "Adventure" may have started as a distinct genre, its current usage suggests it has evolved into something closer to a broad category rather than a specific genre. It seems to be functioning more as a marketing term or a general descriptor of game elements rather than a precise classification.

The concentration of "Adventure" game production among a few teams suggests that there are dominant players setting trends in this space. This could be an interesting area for further market analysis.

From a user perspective, the broad application of "Adventure" might make it less useful as a search or filter criterion. Players looking for specific types of experiences might need more detailed descriptors.

The simplest explanation might be that the majority of video game buyers just love Adventure games, and the major video game companies either tailor to that demand or create that demand with the products they release.

As to what the correct answer is - whether "Adventure" actually is a genre or descriptor - is diffucult to determine with this current dataset. More metadata regarding the nature of the game itself would need to be analyzed, such as characters, story, gameplay elements, etc.