# Exploring Board Game Geek 

[BoardGameGeek](https://boardgamegeek.com/) (BGG) is a game database with over 125,600 different tabletop games, including European-style board games, wargames, and card games. In addition to the game database, the site allows users to rate games on a 1–10 scale and publishes a ranked list of board games. 

The dataset being used for this project is from [kaggle](https://www.kaggle.com/datasets/threnjen/board-games-database-from-boardgamegeek), sourced from the BGG API. 

# Imports

In [223]:
import pandas as pd
import numpy as np
import plotly.express as px
from scipy import stats

import seaborn as sns
from matplotlib import pyplot as plt

In [224]:
boardgames_df = pd.read_csv('data/games.csv')

In [225]:
users_df = pd.read_csv('data/user_ratings.csv')

In [226]:
game_mechanics_df = pd.read_csv('data/mechanics.csv')
game_themes_df = pd.read_csv('data/themes.csv')

# Game Overviews 

In [227]:
boardgames_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21925 entries, 0 to 21924
Data columns (total 48 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   BGGId                21925 non-null  int64  
 1   Name                 21925 non-null  object 
 2   Description          21924 non-null  object 
 3   YearPublished        21925 non-null  int64  
 4   GameWeight           21925 non-null  float64
 5   AvgRating            21925 non-null  float64
 6   BayesAvgRating       21925 non-null  float64
 7   StdDev               21925 non-null  float64
 8   MinPlayers           21925 non-null  int64  
 9   MaxPlayers           21925 non-null  int64  
 10  ComAgeRec            16395 non-null  float64
 11  LanguageEase         16034 non-null  float64
 12  BestPlayers          21925 non-null  int64  
 13  GoodPlayers          21925 non-null  object 
 14  NumOwned             21925 non-null  int64  
 15  NumWant              21925 non-null 

In [228]:
boardgames_df.head()

Unnamed: 0,BGGId,Name,Description,YearPublished,GameWeight,AvgRating,BayesAvgRating,StdDev,MinPlayers,MaxPlayers,...,Rank:partygames,Rank:childrensgames,Cat:Thematic,Cat:Strategy,Cat:War,Cat:Family,Cat:CGS,Cat:Abstract,Cat:Party,Cat:Childrens
0,1,Die Macher,die macher game seven sequential political rac...,1986,4.3206,7.61428,7.10363,1.57979,3,5,...,21926,21926,0,1,0,0,0,0,0,0
1,2,Dragonmaster,dragonmaster tricktaking card game base old ga...,1981,1.963,6.64537,5.78447,1.4544,3,4,...,21926,21926,0,1,0,0,0,0,0,0
2,3,Samurai,samurai set medieval japan player compete gain...,1998,2.4859,7.45601,7.23994,1.18227,2,4,...,21926,21926,0,1,0,0,0,0,0,0
3,4,Tal der Könige,triangular box luxurious large block tal der k...,1992,2.6667,6.60006,5.67954,1.23129,2,4,...,21926,21926,0,0,0,0,0,0,0,0
4,5,Acquire,acquire player strategically invest business t...,1964,2.5031,7.33861,7.14189,1.33583,2,6,...,21926,21926,0,1,0,0,0,0,0,0


The categories and rankings are not very accurate, so we will drop those and add our own later. 
We will also drop games that are reimplementations of older games. 

In [229]:
boardgames_df = boardgames_df.drop(columns=['Rank:boardgame', 'Rank:strategygames', 'Rank:abstracts', 'Rank:familygames', 'Rank:thematic', 'Rank:cgs', 'Rank:wargames', 'Rank:partygames', 'Rank:childrensgames',
                                            'Cat:Thematic', 'Cat:Strategy', 'Cat:War', 'Cat:Family', 'Cat:CGS', 'Cat:Abstract', 'Cat:Party', 'Cat:Childrens'])
boardgames_df = boardgames_df.loc[boardgames_df['IsReimplementation'] == 0]
boardgames_df = boardgames_df.drop(columns=['NumImplementations'])
boardgames_df.head()

Unnamed: 0,BGGId,Name,Description,YearPublished,GameWeight,AvgRating,BayesAvgRating,StdDev,MinPlayers,MaxPlayers,...,ComMaxPlaytime,MfgAgeRec,NumUserRatings,NumComments,NumAlternates,NumExpansions,IsReimplementation,Family,Kickstarted,ImagePath
0,1,Die Macher,die macher game seven sequential political rac...,1986,4.3206,7.61428,7.10363,1.57979,3,5,...,240,14,5354,0,2,0,0,Classic Line (Valley Games),0,https://cf.geekdo-images.com/rpwCZAjYLD940NWwP...
2,3,Samurai,samurai set medieval japan player compete gain...,1998,2.4859,7.45601,7.23994,1.18227,2,4,...,60,10,15146,0,6,0,0,Euro Classics (Reiner Knizia),0,https://cf.geekdo-images.com/o9-sNXmFS_TLAb7Zl...
3,4,Tal der Könige,triangular box luxurious large block tal der k...,1992,2.6667,6.60006,5.67954,1.23129,2,4,...,60,12,340,0,0,0,0,,0,https://cf.geekdo-images.com/nYiYhUlatT2DpyXaJ...
4,5,Acquire,acquire player strategically invest business t...,1964,2.5031,7.33861,7.14189,1.33583,2,6,...,90,12,18655,0,6,2,0,3M Bookshelf,0,https://cf.geekdo-images.com/3C--kJRhi6kTPHsr9...
5,6,Mare Mediterraneum,ancient land mediterranean player attempt sati...,1989,3.0,6.5537,5.54614,1.6535,2,6,...,240,12,81,0,0,0,0,,0,https://cf.geekdo-images.com/277POF80AUz2ZE9XS...


There are a lot of games! However, some are VERY old.

In [230]:
fig = px.histogram(boardgames_df, x= 'YearPublished')
fig.show()

While it's very cool to look at how long humans have been making board games (and how someone has mislabled 'Dog-opoly as having been published in 0BC) we are looking to show users more modern games. 

In [231]:
modern_boardgames_df = boardgames_df.loc[boardgames_df['YearPublished']>=1960]

In [232]:
fig = px.histogram(modern_boardgames_df, x= 'YearPublished')

fig.add_annotation(x=1995, y=252,
            text="Settlers of Catan Released",
            showarrow=True,
            arrowhead=1)

fig.update_layout(
    xaxis_title_text='Year', # xaxis label
    yaxis_title_text='Number of Games Published', # yaxis label
)

fig.show()

That is easier to look at! It is often said that the popularity of 'Settlers Of Catan' led to a board game explosion, and we certainly see more games published ech year afterwards. 

## Settlers of Catan : Before and After

I think it would be interesting to do a before and after T-test of games published per year before and after 1995 (Catan): The data is not normally distributed so I am not sure how to do. 

# Average Rating

Board Game Geek Uses a 1-10 Rating system with the following values: 

- 10 - Outstanding. Always want to play and expect this will never change.

- 9 - Excellent game. Always want to play it.

- 8 - Very good game. I like to play. Probably I'll suggest it and will never turn down a game.

- 7 - Good game, usually willing to play.

- 6 - Ok game, some fun or challenge at least, will play sporadically if in the right mood.

- 5 - Average game, slightly boring, take it or leave it.

- 4 - Not so good, it doesn't get me but could be talked into it on occasion.

- 3 - Likely won't play this again although could be convinced. Bad.

- 2 - Extremely annoying game, won't play this ever again.

- 1 - Defies description of a game. You won't catch me dead playing this. Clearly broken.


Only games that have at least 30 User Ratings are eligible to join the site Ranking for top games.


In [233]:
fig = px.histogram(modern_boardgames_df, x= 'AvgRating')
fig.update_layout(
    xaxis_title_text='Average Rating', # xaxis label
    yaxis_title_text='Number of Games', # yaxis label
)

fig.show()

BGG also provides an 'adjusted' rating based on the number of ratings a board game has overall. You can read more [here](https://boardgamegeek.com/wiki/page/ratings)

In [234]:
fig = px.histogram(modern_boardgames_df, x= 'BayesAvgRating')

fig.update_layout(
    xaxis_title_text='Average Rating Adjusted by Number of Reviews', # xaxis label
    yaxis_title_text='Number of Games', # yaxis label
)

fig.show()

This rating causes games with *few votes* but very high ratings to rank lower than games with *many more votes* but a lower Average Rating. 

It also pushes the scores overall closer to the average rating of all games on the site, around 5.5-6 ("Ok game, some fun or challenge at least, will play sporadically if in the right mood.") using dummy variables : with no way to verify these, I will be using average rating instead. 

# Average Game Weight / Complexity 

Weight is a personal opinion expressing how difficult the game is to play - "Weight" is not actually defined by BGG so different people have different ideas of what it means. The choices for Game Play Weight Are:
- 0 - Unrated 
- 1 - Light
- 2 - Medium Light 
- 3 - Medium
- 4 - Medium Heavy
- 5 - Heavy


In [235]:
fig = px.histogram(modern_boardgames_df, x= 'GameWeight')

fig.update_layout(
    xaxis_title_text='Average Game Weight', # xaxis label
    yaxis_title_text='Number of Games', # yaxis label
)

fig.show()

In [236]:
heavyweight = modern_boardgames_df.loc[modern_boardgames_df['GameWeight']>=3]

heavyweight.sort_values('GameWeight', ascending=False).head()

Unnamed: 0,BGGId,Name,Description,YearPublished,GameWeight,AvgRating,BayesAvgRating,StdDev,MinPlayers,MaxPlayers,...,ComMaxPlaytime,MfgAgeRec,NumUserRatings,NumComments,NumAlternates,NumExpansions,IsReimplementation,Family,Kickstarted,ImagePath
2011,2875,Patton in Flames,germany japan go flame western ally soviet...,2000,4.9167,6.96308,5.55155,1.16468,2,3,...,120,12,65,0,0,5,0,World in Flames (ADG),0,https://cf.geekdo-images.com/bb0HPevvpfPn00hKz...
13495,158793,Atlantic Wall: D-Day to Falaise,decision game websiteon june great armada ...,2014,4.8889,8.22013,5.59415,1.98527,2,6,...,14400,16,77,0,0,0,0,Grand Operational Simulation (Decision Games),0,https://cf.geekdo-images.com/qx_Ikwg_0GxWSUv-W...
2686,4102,Europa Universalis,quoteuropa universalisquot monster wargame dip...,1993,4.8537,6.84085,5.68581,2.1503,1,6,...,3600,14,343,0,0,1,0,,0,https://cf.geekdo-images.com/Gk4MDKpkH5ekO70SL...
16945,217197,Lucky Forward: Patton's Third Army in Lorraine,description publisherlucky forward pattonrsquo...,2020,4.8333,8.34848,5.54798,2.16195,2,2,...,0,0,33,0,0,0,0,Grand Operational Simulation (Decision Games),0,https://cf.geekdo-images.com/0DvCThupKFz7c-HBw...
5301,11532,The Eagle and the Sun,user summarysubtitle quotthe war pacific wor...,1991,4.8,3.68529,5.42496,2.56654,2,2,...,0,12,68,0,0,0,0,,0,https://cf.geekdo-images.com/ey4Lz9xB5wTOddPYq...


In [237]:
lightweight = modern_boardgames_df.loc[modern_boardgames_df['GameWeight']==1]
lightweight.sort_values('GameWeight', ascending=False).head()

Unnamed: 0,BGGId,Name,Description,YearPublished,GameWeight,AvgRating,BayesAvgRating,StdDev,MinPlayers,MaxPlayers,...,ComMaxPlaytime,MfgAgeRec,NumUserRatings,NumComments,NumAlternates,NumExpansions,IsReimplementation,Family,Kickstarted,ImagePath
338,391,Ocean,idea player populate ocean large school fish s...,1999,1.0,4.77083,5.48006,1.40297,2,6,...,20,8,48,0,0,0,0,,0,https://cf.geekdo-images.com/c26cEtrDKBk5exGPP...
15980,198059,Twist of Fate,overviewbase charles dickensrsquo novel oliver...,2016,1.0,4.92036,5.45386,1.57277,2,4,...,45,10,140,0,0,0,0,,0,https://cf.geekdo-images.com/-FtKCFdQ89CDoGGg3...
15940,197321,Cult Following,cult follow creative storycrafting card game r...,2016,1.0,7.03667,5.54217,1.6817,3,8,...,30,14,70,0,0,2,0,,0,https://cf.geekdo-images.com/QATUaephNUedtpTXL...
15949,197435,Santa VS Jesus,publishersanta vs jesus challenge base party g...,2016,1.0,5.22113,5.48368,1.83056,4,16,...,90,12,71,0,0,0,0,,1,https://cf.geekdo-images.com/giME48ubLoB3x98dj...
15953,197455,Dice Heist,description publisherdaring art heist roll dic...,2016,1.0,6.24011,5.64964,1.3782,2,5,...,20,14,462,0,0,0,0,,0,https://cf.geekdo-images.com/m60IflEbUmOM5mIUR...


In [248]:
weight_maxplayers = modern_boardgames_df.groupby(['GameWeight', 'ComMaxPlaytime']).mean(numeric_only=True)
weight_maxplayers

Unnamed: 0_level_0,Unnamed: 1_level_0,BGGId,YearPublished,AvgRating,BayesAvgRating,StdDev,MinPlayers,MaxPlayers,ComAgeRec,LanguageEase,BestPlayers,...,NumWeightVotes,MfgPlaytime,ComMinPlaytime,MfgAgeRec,NumUserRatings,NumComments,NumAlternates,NumExpansions,IsReimplementation,Kickstarted
GameWeight,ComMaxPlaytime,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
0.0000,0,187175.625000,2013.232143,6.143744,5.511520,1.563260,2.107143,12.464286,8.333333,303.0,0.0,...,0.0,0.0,6.607143,8.696429,39.500000,0.0,0.732143,0.071429,0.0,0.035714
0.0000,2,246693.000000,2018.000000,6.353260,5.564780,1.286780,3.000000,6.000000,,,0.0,...,0.0,2.0,2.000000,6.000000,138.000000,0.0,1.000000,0.000000,0.0,0.000000
0.0000,5,191989.750000,2015.000000,6.377117,5.512952,1.563870,2.000000,4.500000,3.000000,,0.0,...,0.0,5.0,5.000000,5.500000,35.250000,0.0,0.250000,0.750000,0.0,0.250000
0.0000,10,191155.826087,2014.913043,6.171736,5.516037,1.462574,1.652174,25.217391,3.533333,313.0,0.0,...,0.0,10.0,9.043478,5.565217,45.913043,0.0,1.913043,0.086957,0.0,0.000000
0.0000,15,216352.048780,2015.951220,5.925394,5.507760,1.521604,2.024390,7.439024,6.300000,326.0,0.0,...,0.0,15.0,12.951220,6.975610,43.048780,0.0,0.609756,0.024390,0.0,0.073171
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4.8000,0,11532.000000,1991.000000,3.685290,5.424960,2.566540,2.000000,2.000000,16.000000,563.0,0.0,...,10.0,0.0,0.000000,12.000000,68.000000,0.0,0.000000,0.000000,0.0,0.000000
4.8333,0,217197.000000,2020.000000,8.348480,5.547980,2.161950,2.000000,2.000000,16.000000,128.0,0.0,...,6.0,0.0,0.000000,0.000000,33.000000,0.0,0.000000,0.000000,0.0,0.000000
4.8537,3600,4102.000000,1993.000000,6.840850,5.685810,2.150300,1.000000,6.000000,16.000000,4.1,0.0,...,82.0,3600.0,3600.000000,14.000000,343.000000,0.0,0.000000,1.000000,0.0,0.000000
4.8889,14400,158793.000000,2014.000000,8.220130,5.594150,1.985270,2.000000,6.000000,12.000000,548.0,0.0,...,9.0,14400.0,120.000000,16.000000,77.000000,0.0,0.000000,0.000000,0.0,0.000000


# Adding Mechanics

Among board game nerds, the primary mechanics can be a big deal! For example, I prefer 'engine building' games over 'worker placement'. However... there are 158 categories. 

That is a lot of features!

In [164]:
mechanics_df.head()

Unnamed: 0,BGGId,Alliances,Area Majority / Influence,Auction/Bidding,Dice Rolling,Hand Management,Simultaneous Action Selection,Trick-taking,Hexagon Grid,Once-Per-Game Abilities,...,Contracts,Passed Action Token,King of the Hill,Action Retrieval,Force Commitment,Rondel,Automatic Resource Growth,Legacy Game,Dexterity,Physical
0,1,1,1,1,1,1,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,2,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
2,3,0,1,0,0,1,0,0,1,1,...,0,0,0,0,0,0,0,0,0,0
3,4,0,1,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,5,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [165]:
mechanics_series = pd.Series()
for col in mechanics_df.columns:
    if col != 'BGGId':
        mechanics_series[col] = sum(mechanics_df[col])

mechanics_series.index

Index(['Alliances', 'Area Majority / Influence', 'Auction/Bidding',
       'Dice Rolling', 'Hand Management', 'Simultaneous Action Selection',
       'Trick-taking', 'Hexagon Grid', 'Once-Per-Game Abilities',
       'Set Collection',
       ...
       'Contracts', 'Passed Action Token', 'King of the Hill',
       'Action Retrieval', 'Force Commitment', 'Rondel',
       'Automatic Resource Growth', 'Legacy Game', 'Dexterity', 'Physical'],
      dtype='object', length=157)

## How common are these different mechanics?

In [166]:
mechanics_series = mechanics_series.sort_values(ascending=False)
fig = px.histogram(mechanics_series, x= mechanics_series.index, y=mechanics_series)

fig.update_layout(
    xaxis_title_text='Mechanic Featured', # xaxis label
    yaxis_title_text='Number of Games', # yaxis label
)

fig.show()

In [167]:
mechanics_popular = mechanics_series[:35]
fig = px.histogram(mechanics_popular, x= mechanics_popular.index, y=mechanics_popular)

fig.update_layout(
    xaxis_title_text='Mechanic Featured', # xaxis label
    yaxis_title_text='Number of Games', # yaxis label
)

fig.show()

Dice rolling and Hand management are the most common mechanics!  Set colection, Variable Player Powers, and Hexagon grids are also quite popular. 

In [168]:
mechanics_popular = mechanics_series[-99:]
fig = px.histogram(mechanics_popular, x= mechanics_popular.index, y=mechanics_popular)

fig.update_layout(
    xaxis_title_text='Mechanic Featured', # xaxis label
    yaxis_title_text='Number of Games', # yaxis label
)

fig.show()

As we get into the less common mechanics, we can see that many of these have less than 50 games assosiated with them. 

Therefore, we may be able to drop some of these to reduce our df length. 

In [174]:
# corr_matrix = mechanics_df.corr()

# plt.figure(figsize=(30, 30))

# #create a mask to remove the duplicate upper half
# mask = np.triu(np.ones_like(corr_matrix, dtype=np.bool))

# heatmap = sns.heatmap(corr_matrix, mask=mask, vmin=-1, vmax=1, cmap='BrBG')
# heatmap.set_title('Mechanic Correlation Heatmap');

There are several correlated genres..... but the heatmap is unreadable. Oops.

## Finding the highest rated mechanics

In [216]:
def mean_mechanic_rating (col):
    temp_group = mechanics_with_ratings_wide.groupby(col).mean()
    return temp_group.iloc[1, temp_group.columns.get_loc('AvgRating')]

In [217]:
mechanics_with_ratings_wide = modern_boardgames_df[['BGGId','AvgRating']]

mechanics_with_ratings_wide = mechanics_with_ratings_wide.merge(mechanics_df, on="BGGId")

In [218]:
mechanics_rating = pd.DataFrame()
mechanics_rating['mechanic'] = mechanics_series.index
mechanics_rating['num_games'] = mechanics_series.values
#do list instead .append
for col in mechanics_rating['mechanic']:
    mechanics_rating['AvgRating'] = mean_mechanic_rating(col)

mechanics_rating.head()

KeyError: 'Adventure'

In [219]:
temp_group = mechanic_rating.groupby('Hand Management').mean()
temp_group

Unnamed: 0_level_0,BGGId,BayesAvgRating,Alliances,Area Majority / Influence,Auction/Bidding,Dice Rolling,Simultaneous Action Selection,Trick-taking,Hexagon Grid,Once-Per-Game Abilities,...,Contracts,Passed Action Token,King of the Hill,Action Retrieval,Force Commitment,Rondel,Automatic Resource Growth,Legacy Game,Dexterity,Physical
Hand Management,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,106407.711791,5.64533,0.001576,0.063157,0.052915,0.321954,0.04484,0.014509,0.137211,0.001444,...,0.006106,0.000197,0.00046,0.001576,0.00046,0.003086,0.000525,0.000525,0.060268,0.021468
1,144164.772264,5.769644,0.001306,0.116218,0.052233,0.167145,0.089318,0.027683,0.016715,0.002612,...,0.006529,0.000261,0.001567,0.007574,0.001045,0.002612,0.001045,0.00235,0.014886,0.017759


In [220]:
temp_group.iloc[1, temp_group.columns.get_loc('AvgRating')]

KeyError: 'AvgRating'

In [221]:
mean_mechanic_rating('Hand Management')

np.float64(6.532110280251441)

# Game Themes

Some of the themes start with 'Theme_' but not all of them

In [188]:
game_themes_df.head()

Unnamed: 0,BGGId,Adventure,Fantasy,Fighting,Environmental,Medical,Economic,Industry / Manufacturing,Transportation,Science Fiction,...,Theme_Fashion,Theme_Geocaching,Theme_Ecology,Theme_Chernobyl,Theme_Photography,Theme_French Foreign Legion,Theme_Cruise ships,Theme_Apache Tribes,Theme_Rivers,Theme_Flags identification
0,1,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,2,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,3,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,4,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,5,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [200]:
themes_series = pd.Series()
for col in game_themes_df.columns:
    if col != 'BGGId':
        themes_series[col] = sum(game_themes_df[col])

themes_series

Adventure                      1177
Fantasy                        2702
Fighting                       1668
Environmental                   194
Medical                          87
                               ... 
Theme_French Foreign Legion       2
Theme_Cruise ships                2
Theme_Apache Tribes               2
Theme_Rivers                      2
Theme_Flags identification        2
Length: 217, dtype: int64

In [191]:
themes_series = themes_series.sort_values(ascending=False)
fig = px.histogram(themes_series, x= themes_series.index, y=themes_series)

fig.update_layout(
    xaxis_title_text='Theme', # xaxis label
    yaxis_title_text='Number of Games', # yaxis label
)

fig.show()

In [197]:
themes_popular = themes_series[:25]

fig = px.histogram(themes_popular, x= themes_popular.index, y=themes_popular)

fig.update_layout(
    xaxis_title_text='Top Themes', # xaxis label
    yaxis_title_text='Number of Games', # yaxis label
)

fig.show()

In [215]:
#drop themes with less than 50 games. 
themes_series = themes_series.loc[themes_series.values > 50]
themes_series

Adventure                           1177
Fantasy                             2702
Fighting                            1668
Environmental                        194
Medical                               87
                                    ... 
Theme_Archaeology / Paleontology      61
Theme_Witches                         59
Theme_Deserts                         57
Theme_Tropical                        55
Theme_Steampunk                       53
Length: 78, dtype: int64

In [None]:
def re_theme (theme):
    if theme in ["Adventure", "Pirates", "Theme_Superheroes", "Theme_Circus"]:
        return "Adventure"
    elif theme in ['Fantasy', 'Mythology' , 'Theme_Vikings' , 'Theme_Witches' , 'Theme_Steampunk' , 'Theme_Ninjas' , 'Theme_King Arthur / The Knights of the Round Table / Camelot' , 'Theme_Samurai' , 'Theme_Kaiju' , 'Theme_Gladiators' , 'Theme_Alchemy']:
        return "Fantasy"
    elif theme in ['Space Exploration' , 'Science Fiction' , 'Theme_Post-Apocalyptic' , 'Theme_Time Travel' , 'Theme_Robots' , 'Theme_Mad Science / Mad Scientist', 'Theme_Cyberpunk']:
        return "Science Fiction"
    elif theme in ['Crime', 'Spies/Secret Agents', 'Mafia', 'Theme_Mystery / Cri', 'Theme_Villainy' , 'Theme_Jail / Prison (Modern)']:
        return "Crime / Underworld"
    elif theme in ['Fighting', 'Civil War', 'Modern Warfare', 'World War I', 'World War II', 'Pike and Shot', 'American Indian Wars' , 'Napoleonic', 'American Revolutionary War' , 'Vietnam War' , 'American Civil War' , 'Korean War' , 'Theme_Sieg' , 'Theme_Mech Warfar' , 'Theme_Animal Battles']:
        return "Warfare"
    elif theme in ['Environmental' , 'Farming', 'Animals' , 'Theme_Anthropomorphic Animals' , 'Theme_Gardening' , 'Theme_Flowers' , 'Theme_Natur' , 'Theme_Weather' , 'Theme_Evolution' , 'Theme_Fruit' ]:
        return "Nature"
    elif theme in ['Medical + Theme_Biology']:
        return "Medical"
    elif theme in ['Economic' , 'Industry / Manufacturing' , 'City Building' , 'Theme_Mining' , 'Theme_Construction' , 'Theme_City' , 'Theme_Oil / Gas / Petroleu']:
        return "Industrial"
    elif theme in ['Transportation' , 'Nautical' , 'Travel' , 'Trains' , 'Aviation / Flight' , 'Racing' , 'Theme_Submarines' , 'Theme_Amusement Parks / Theme Parks' ,'Theme_Airships / Blimps / Dirigibles / Zeppelins' , 'Theme_Firefighting']:
        return "Transportation"
    elif theme in ['Civilization', 'Age of Reason', 'Renaissance', 'American West', 'Medieval' , 'Ancient' , 'Post-Napoleonic' , 'Religious' , 'Arabian' , 'Prehistoric' , 'Theme_Alternate History' , 'Theme_Colonial' , 'Theme_Retro' ,'Theme_Deserts' , 'Theme_Tropical' , 'Theme_Native Americans / First Peoples' , 'Theme_Tropical Islands' , 'Theme_Safaris']:
        return "Historical Setting"
    elif theme in ['Movies / TV / Radio theme' , 'Music' , 'Theme_Art' , 'Theme_Archaeology / Paleontology' , 'Theme_Love / Romanc' , 'Theme_Boardgaming' , 'Theme_Movie Industry']:
        return "Cultural"
    elif theme in ['Horror' , 'Zombies' , 'Theme_Cthulhu Mythos' , 'Theme_Dreams / Nightmares' , 'Theme_Survival']:
        return "Horror"
    elif theme == 'Trivia':
        return "Trivia"
    elif theme in ['Sports' , 'Theme_Fantasy Sports']:
        return 'Sports'
    else:
        return 'Other'
    
    

Add relationships between varibakes
- difficulty VS num players ETC