<h1 
  id = "title"; 
  style="color:#4974a5; border-bottom: 3px solid #4974a5;"
>
  Predicting the Top Board Games to Play

</h1>  

![board game](board_game.jpg)

<!--
# Which board game should you play?
--->

## 📖 Background  
#### By [DataCamp.com](https://app.datacamp.com). 


![Board Game](../images/board_game.jpg)


After a tiring week, what better way to unwind than a board game night with friends and family? But the question remains: which game should you pick? You have gathered a dataset containing information of over `20,000` board games. It's time to utilize your analytical skills and use data-driven insights to persuade your group to try the game you've chosen!

[Competition overview page.](https://app.datacamp.com/learn/competitions/board-games) 

<h2 id= "TOC"; style="color:#207d06; text-align:left; padding: 0px; border-bottom: 3px solid #207d06;">TABLE OF CONTENTS</h2>


- [THE DATA](#the-data)
- [EXECUTIVE SUMMARY](#executive-summary)
- [EXPLORATION](#exploration)
- [DATA CLEANING](#data-cleaning)
    - [DATA FORMAT CONSISTENCY](#data-format-consistency)
        - [COLUMN HEADINGS](#column-headings)
        - [DATE FORMAT](#date-format)
    - [MISSING VALUES](#missing-values)

<!--
- [PREPROCESS DATA](#preprocess-data)
- [DATA CLEANING](#revise-data-cleaning)
    - [TARGET VARIABLE REFINEMENT](#target-variable-refinement)
- [SELECTING A TARGET VARIABLE](#select-target-variable)
- [FEATURE ENGINEERING](#feature-engineering)
    - [NLP REPRESENTATIONS](#nlp-representations)
        - [N-GRAMS](#n-grams)
- [MODELING](#modeling)
    - [SPLITTING DATA](#splitting-data)
    - [MODEL TRAINING](#model-training)
    - [MODEL EVALUATION](#model-evaluation)
    - [TUNE HYPERPARAMETERS](#tune-hyperparameters)
- [DEPLOYMENT](#deployment)
- [DISCUSSION](#discussion)
- [REFERENCES](#references)
    - [DATA SOURCES](#data-sources)
    - [TEXT REFERENCES](#text-references)
--->



<!--

-->


<h2 id= "the-data"; style="color:#207d06; text-align:left; padding: 0px; border-bottom: 3px solid #207d06;">THE DATA</h2>


## 💾 The Data

You've come across a dataset titled `bgg_data.csv` containing details on over `20,000` ranked board games from the BoardGameGeek (BGG) website. BGG is the premier online hub for board game enthusiasts, hosting data on more than `100,000` games, inclusive of both ranked and unranked varieties. This platform thrives due to its active community, who contribute by posting reviews, ratings, images, videos, session reports, and participating in live discussions.

This specific dataset, assembled in `February 2021`, encompasses all ranked games listed on BGG up to that date. Games without a ranking were left out because they didn't garner enough reviews; for a game to earn a rank, it needs a minimum of `30` votes.

In this dataset, each row denotes a board game and is associated with some information.

| Column     | Description              |
|------------|--------------------------|
| `ID` | The ID of the board game. |
| `Name` | The name of the board game.|
| `Year Published` | The year when the game was published.|
| `Min Players` | The minimum number of player recommended for the game.|
| `Max Players` | The maximum number of player recommended for the game.|
| `Play Time` | The average play time suggested by game creators, measured in minutes.|
| `Min Age` | The recommended minimum age of players.|
| `Users Rated` | The number of users who rated the game.|
| `Rating Average` | The average rating of the game, on a scale of 1 to 10.|
| `BGG Rank` | The rank of the game on the BoardGameGeek (BGG) website.| 
| `Complexity Average` | The average complexity value of the game, on a scale of 1 to 5.|
| `Owned Users` |  The number of BGG registered owners of the game.| 
| `Mechanics` | The mechanics used by the game.| 
| `Domains` | The board game domains that the game belongs to.|

**Source:** Dilini Samarasinghe, July 5, 2021, "BoardGameGeek Dataset on Board Games", IEEE Dataport, doi: https://dx.doi.org/10.21227/9g61-bs59.

<h2 id= "executive-summary"; style="color:#207d06; text-align:left; padding: 0px; border-bottom: 3px solid #207d06;">EXECUTIVE SUMMARY</h2>

The top ten recommended boardgames are:...

<h5 style="text-align:right; padding-right: 10%;">
  <a href="#title">Top Of Page</a> / <a href="#table-of-contents">TOC</a>
</h5>

<h2 id= "exploration"; style="color:#207d06; text-align:left; padding: 0px; border-bottom: 3px solid #207d06;">EXPLORATION</h2>


In [1]:
import pandas as pd
import numpy as np
from datetime import datetime as dt
from datetime import date

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_theme()

from scipy import stats

# import warnings
# warnings.filterwarnings('ignore')

# import os
# import json
# import requests


In [5]:
session = 'eda'


In [3]:
boardgame = pd.read_csv('../data/bgg_data.csv')
df = boardgame.copy()

df.head()


Unnamed: 0,ID,Name,Year Published,Min Players,Max Players,Play Time,Min Age,Users Rated,Rating Average,BGG Rank,Complexity Average,Owned Users,Mechanics,Domains
0,174430.0,Gloomhaven,2017.0,1,4,120,14,42055,8.79,1,3.86,68323.0,"Action Queue, Action Retrieval, Campaign / Bat...","Strategy Games, Thematic Games"
1,161936.0,Pandemic Legacy: Season 1,2015.0,2,4,60,13,41643,8.61,2,2.84,65294.0,"Action Points, Cooperative Game, Hand Manageme...","Strategy Games, Thematic Games"
2,224517.0,Brass: Birmingham,2018.0,2,4,120,14,19217,8.66,3,3.91,28785.0,"Hand Management, Income, Loans, Market, Networ...",Strategy Games
3,167791.0,Terraforming Mars,2016.0,1,5,120,12,64864,8.43,4,3.24,87099.0,"Card Drafting, Drafting, End Game Bonuses, Han...",Strategy Games
4,233078.0,Twilight Imperium: Fourth Edition,2017.0,3,6,480,14,13468,8.7,5,4.22,16831.0,"Action Drafting, Area Majority / Influence, Ar...","Strategy Games, Thematic Games"


In [9]:
df.shape


(20343, 14)

Wonderful! As promised, we have just over 20,000 boardgames listed in the dataset.  Now let's check the status of the data with `.info()`


In [10]:
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20343 entries, 0 to 20342
Data columns (total 14 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   id                  20327 non-null  float64
 1   name                20343 non-null  object 
 2   year_published      20342 non-null  float64
 3   min_players         20343 non-null  int64  
 4   max_players         20343 non-null  int64  
 5   play_time           20343 non-null  int64  
 6   min_age             20343 non-null  int64  
 7   users_rated         20343 non-null  int64  
 8   rating_average      20343 non-null  float64
 9   bgg_rank            20343 non-null  int64  
 10  complexity_average  20343 non-null  float64
 11  owned_users         20320 non-null  float64
 12  mechanics           18745 non-null  object 
 13  domains             10184 non-null  object 
dtypes: float64(5), int64(6), object(3)
memory usage: 2.2+ MB


<h5 style="text-align:right; padding-right: 10%;">
  <a href="#title">Top Of Page</a> / <a href="#table-of-contents">TOC</a>
</h5>

<h2 id= "data-cleaning"; style="color:#207d06; text-align:left; padding: 0px; border-bottom: 3px solid #207d06;">DATA CLEANING</h2>


<h3 id= "data-format-consistency"; style="color:#8fca6b; text-align:left; padding: 0px; border-bottom: 2px solid #8fca6b;">DATA FORMAT CONSISTENCY</h3>

<h4 id= "column-headings"; style="color:#c8d43e; text-align:left; padding: 0px; border-bottom: 1px solid #c8d43e;">COLUMN HEADINGS</h4>

Change column headings to Snake case for ease of use and consistency:

In [6]:
# create list of column titles
cols = df.columns.tolist()

# change the spaces to underscores
cols = [x.replace(' ', '_') for x in cols]

# convert all to lower case
cols = [x.lower() for x in cols]

# confirm results by printing a few,
cols[:4]


['id', 'name', 'year_published', 'min_players']

In [7]:
# Create a dictionary keying the old headings to the new:
old_cols = df.columns.tolist()
col_map = dict(zip(old_cols, cols))

col_map


{'ID': 'id',
 'Name': 'name',
 'Year Published': 'year_published',
 'Min Players': 'min_players',
 'Max Players': 'max_players',
 'Play Time': 'play_time',
 'Min Age': 'min_age',
 'Users Rated': 'users_rated',
 'Rating Average': 'rating_average',
 'BGG Rank': 'bgg_rank',
 'Complexity Average': 'complexity_average',
 'Owned Users': 'owned_users',
 'Mechanics': 'mechanics',
 'Domains': 'domains'}

In [39]:
# Rename the columns in the table.
df = df.rename(columns=col_map)
df.head()


Unnamed: 0,id,name,year_published,min_players,max_players,play_time,min_age,users_rated,rating_average,bgg_rank,complexity_average,owned_users,mechanics,domains
0,174430.0,Gloomhaven,2017.0,1,4,120,14,42055,8.79,1,3.86,68323.0,"Action Queue, Action Retrieval, Campaign / Bat...","Strategy Games, Thematic Games"
1,161936.0,Pandemic Legacy: Season 1,2015.0,2,4,60,13,41643,8.61,2,2.84,65294.0,"Action Points, Cooperative Game, Hand Manageme...","Strategy Games, Thematic Games"
2,224517.0,Brass: Birmingham,2018.0,2,4,120,14,19217,8.66,3,3.91,28785.0,"Hand Management, Income, Loans, Market, Networ...",Strategy Games
3,167791.0,Terraforming Mars,2016.0,1,5,120,12,64864,8.43,4,3.24,87099.0,"Card Drafting, Drafting, End Game Bonuses, Han...",Strategy Games
4,233078.0,Twilight Imperium: Fourth Edition,2017.0,3,6,480,14,13468,8.7,5,4.22,16831.0,"Action Drafting, Area Majority / Influence, Ar...","Strategy Games, Thematic Games"


<h5 style="text-align:right; padding-right: 10%;">
  <a href="#title">Top Of Page</a> / <a href="#table-of-contents">TOC</a>
</h5>

<h4 id= "date-format"; style="color:#c8d43e; text-align:left; padding: 0px; border-bottom: 1px solid #c8d43e;">DATE FORMAT</h4>


The only date data is the year each game was published:

In [40]:
# check for null dates?
df['year_published'].isna().sum()


1

Only one.  Which one is it?

In [41]:

df[df['year_published'].isna()]


Unnamed: 0,id,name,year_published,min_players,max_players,play_time,min_age,users_rated,rating_average,bgg_rank,complexity_average,owned_users,mechanics,domains
13984,,Hus,,2,2,40,0,38,6.28,13986,0.02,,,


Briefly checking the BGG website [Hus](https://boardgamegeek.com/boardgame/25999/hus) doesn't have a listed date, so I'll fill this `NaN` with `0` for now so the data type can be formatted.

In [42]:
df['year_published'] = df['year_published'].fillna(0, limit=1)

# Confirm the null count is zero now:
df['year_published'].isna().sum()


0

In [43]:
df['year_published'].describe()


count    20343.000000
mean      1984.152337
std        214.449652
min      -3500.000000
25%       2001.000000
50%       2011.000000
75%       2016.000000
max       2022.000000
Name: year_published, dtype: float64

The `min` data above seems humorous, -3500. Lets see how anomalous that value is:

In [89]:
df[['id', 'name', 'year_published']].sort_values('year_published').head(15)


Unnamed: 0,id,name,year_published
8174,2399.0,Senet,-3500.0
20219,5546.0,Marbles,-3000.0
1275,2397.0,Backgammon,-3000.0
8924,1602.0,The Royal Game of Ur,-2600.0
172,188.0,Go,-2200.0
20002,3886.0,Nine Men's Morris,-1400.0
19648,19915.0,Three Men's Morris,-1400.0
20342,11901.0,Tic-Tac-Toe,-1300.0
20341,5432.0,Chutes and Ladders,-200.0
15134,21488.0,Petteia,-100.0


In [45]:
# Check How many records have 0 or - year dates
df[df['year_published'] <= 0].shape[0]


196

Okay, so they're not joking after all.  These games with negatives are actually historically recorded with BC dates.

I'll leave the `0` values as they are, and convert the BC and AD year types that are known appropriately:

I am hopping that for our purposes these `year_published` dates are fine as they are in the form of numbers ranging from negatives to positives. I believe they're intuitively descriptive enough as a time scale on their own. 

I'm not sure how to handle the BC dates in a simpler way, than making a custom class. Just in case the datetime objects become needed later on during feature engineering (such as for time series analyses), I'll generate the values anyway, and save them away into a separate .csv file called `df_dt_year_published`:

Converting `year_published` to `int` will clean up the unnecessary decimals:

In [95]:
df['year_published'] = df['year_published'].apply(lambda x: int(x))


In [96]:
temp = df.copy()

from datetime import datetime as dt

# Handle BC dates and adjust zero values
temp['year_published'] = temp['year_published'].apply(lambda x: -int(x) if x < 0 else int(x))

# Define a custom class to handle BC dates
class CustomDateTime(dt):
    def __new__(cls, year, month, day, BC=False):
        self = super(CustomDateTime, cls).__new__(cls, year, month, day)
        self.BC = BC
        return self

# Define a custom function to convert the numeric year to datetime
def year_to_datetime(year):
    if year == 0:
        return CustomDateTime(1, 1, 1, BC=True)
    elif year < 0:
        return CustomDateTime(abs(year), 1, 1, BC=True)
    else:
        return CustomDateTime(year, 1, 1)

# Apply the custom function to convert to datetime
temp['year_published'] = temp['year_published'].apply(year_to_datetime)

temp['year_published'].head()


0    2017-01-01 00:00:00
1    2015-01-01 00:00:00
2    2018-01-01 00:00:00
3    2016-01-01 00:00:00
4    2017-01-01 00:00:00
Name: year_published, dtype: object

In [97]:
# Save the df for possible uses latter with feature engineering:
temp.to_csv(f'../data/df_dt_year_published_{session}.csv', index=False)


In [38]:
df_bg.isnull().sum()


ID                       16
Name                      0
Year Published            1
Min Players               0
Max Players               0
Play Time                 0
Min Age                   0
Users Rated               0
Rating Average            0
BGG Rank                  0
Complexity Average        0
Owned Users              23
Mechanics              1598
Domains               10159
dtype: int64

Okay interesting;

Above we can see that the total number of entries numbered 20343 is echoed in the same number of game titles without any null values.  There are however, 16 `null` values in the ID numbers.  

In [16]:
df_bg[df_bg['ID'].isnull()]


Unnamed: 0,ID,Name,Year Published,Min Players,Max Players,Play Time,Min Age,Users Rated,Rating Average,BGG Rank,Complexity Average,Owned Users,Mechanics,Domains
10776,,Ace of Aces: Jet Eagles,1990.0,2,2,20,10,110,6.26,10778,0.02,,,
10835,,Die Erben von Hoax,1999.0,3,8,45,12,137,6.05,10837,0.02,,,
11152,,Rommel in North Africa: The War in the Desert ...,1986.0,2,2,0,12,53,6.76,11154,0.04,,,
11669,,Migration: A Story of Generations,2012.0,2,4,30,12,49,7.2,11671,2.0,,,
12649,,Die Insel der steinernen Wachter,2009.0,2,4,120,12,49,6.73,12651,0.03,,,
12764,,Dragon Ball Z TCG (2014 edition),2014.0,2,2,20,8,33,7.03,12766,2.5,,,
13282,,Dwarfest,2014.0,2,6,45,12,82,6.13,13284,1.75,,,
13984,,Hus,,2,2,40,0,38,6.28,13986,0.02,,,
14053,,Contrario 2,2006.0,2,12,0,14,37,6.3,14055,1.0,,,
14663,,Warage: Extended Edition,2017.0,2,6,90,10,49,7.64,14665,0.03,,,


### How many of the titles and ID numbers are unique?

In [9]:
df_bg.nunique()


ID                    20327
Name                  19976
Year Published          188
Min Players              11
Max Players              54
Play Time               116
Min Age                  21
Users Rated            2973
Rating Average          627
BGG Rank              20343
Complexity Average      383
Owned Users            3997
Mechanics              7381
Domains                  39
dtype: int64

### Do the duplicated game names also have duplicate data such as ID etc, or are they truly different games with the same name?

In [51]:
df_bg[['ID', 'Name']].value_counts().head(10)


ID        Name                                       
1.0       Die Macher                                     1
165041.0  Cargotrain                                     1
165190.0  Boom Bokken                                    1
165189.0  Altaria: Clash of Dimensions                   1
165186.0  Hitler's Reich: WW2 in Europe                  1
165095.0  Pirate Loot: Base Set                          1
165090.0  CLUE: Firefly Edition                          1
165046.0  Slavika: Equinox                               1
165044.0  EverZone: Strategic Battles in the Universe    1
165022.0  €uro Crisis                                    1
dtype: int64

Doesn't look like it, they're probably unique games then.  Lets take a closer look at an example:

In [52]:
df_bg['Name'].value_counts().head(10)


Robin Hood          6
Gettysburg          4
Saga                4
Chaos               4
Cosmic Encounter    4
Gangster            4
Maya                3
Kung Fu             3
Polarity            3
War of the Ring     3
Name: Name, dtype: int64

In [48]:
df_bg[df_bg['Name'] == 'Robin Hood']


Unnamed: 0,ID,Name,Year Published,Min Players,Max Players,Play Time,Min Age,Users Rated,Rating Average,BGG Rank,Complexity Average,Owned Users,Mechanics,Domains
5352,104640.0,Robin Hood,2011.0,3,5,50,14,199,7.04,5354,1.5,262.0,"Hand Management, Role Playing",
11461,258137.0,Robin Hood,2019.0,2,2,120,12,39,6.92,11463,0.02,135.0,"Area Majority / Influence, Area Movement, Dice...",
15626,3569.0,Robin Hood,1990.0,2,6,60,8,81,5.66,15628,1.4,207.0,"Memory, Point to Point Movement",
16474,136.0,Robin Hood,1999.0,3,6,30,8,84,5.53,16476,1.75,221.0,,
16545,1947.0,Robin Hood,1991.0,2,6,60,12,61,5.59,16547,2.4,190.0,"Action Points, Campaign / Battle Card Driven, ...",
19470,31794.0,Robin Hood,1994.0,2,2,60,0,72,4.63,19472,1.6,109.0,Hexagon Grid,


Okay, turning back to the null ID numbers then.  A quick search confirms these ID numbers are assigned by [boardgamegeek.com](https://boardgamegeek.com).  See this link for for the [Gloomhaven](https://boardgamegeek.com/boardgame/174430/gloomhaven) example. 

In [59]:
df_bg[df_bg['Name'] == 'Gloomhaven'][['ID', 'Name']]


Unnamed: 0,ID,Name
0,174430.0,Gloomhaven


In [311]:
df_null_ids = df_bg[df_bg['ID'].isnull()][['ID', 'Name']].copy()
df_null_ids


Unnamed: 0,ID,Name
10776,,Ace of Aces: Jet Eagles
10835,,Die Erben von Hoax
11152,,Rommel in North Africa: The War in the Desert ...
11669,,Migration: A Story of Generations
12649,,Die Insel der steinernen Wachter
12764,,Dragon Ball Z TCG (2014 edition)
13282,,Dwarfest
13984,,Hus
14053,,Contrario 2
14663,,Warage: Extended Edition


In [312]:
df_null_ids["Name"] = df_null_ids["Name"].apply(lambda x: x.replace("Die Insel der steinernen Wachter", "Die Insel der steinernen Wächter"))
df_null_ids["Name"] = df_null_ids["Name"].apply(lambda x: x.replace("Dracarys Dice Don't Get Burned!", "Dracarys Dice"))
df_null_ids


Unnamed: 0,ID,Name
10776,,Ace of Aces: Jet Eagles
10835,,Die Erben von Hoax
11152,,Rommel in North Africa: The War in the Desert ...
11669,,Migration: A Story of Generations
12649,,Die Insel der steinernen Wächter
12764,,Dragon Ball Z TCG (2014 edition)
13282,,Dwarfest
13984,,Hus
14053,,Contrario 2
14663,,Warage: Extended Edition


searching the first item on the list ["Ace of Aces: Jet Eagles"](https://boardgamegeek.com/boardgame/1991/ace-aces-jet-eagles) does list a BGG ID number as 1991.  Is that ID assigned to anything else?

In [None]:
df_bg[df_bg['ID'] == 1991.0]


Unnamed: 0,ID,Name,Year Published,Min Players,Max Players,Play Time,Min Age,Users Rated,Rating Average,BGG Rank,Complexity Average,Owned Users,Mechanics,Domains


In [None]:
from bs4 import BeautifulSoup
import requests


In [291]:
import requests
from bs4 import BeautifulSoup

title = "Dracarys Dice"

# Function to get the ID for a given title
base_url = "https://boardgamegeek.com/geeksearch.php"
params = {
    "action": "search",
    "advsearch": "1",
    "objecttype": "boardgame",
    "q": title,
}

response = requests.get(base_url, params=params)
soup = BeautifulSoup(response.text, "html.parser")

# Find the link to the board game's page
result = soup.find_all("a", string=title)



In [293]:
result


[<a class="primary" href="/boardgame/269573/dracarys-dice">Dracarys Dice</a>]

In [302]:
# soup


In [300]:
# Function to get the ID for a given title
def get_id_for_title(title):
    base_url = "https://boardgamegeek.com/geeksearch.php"
    params = {
        "action": "search",
        "advsearch": "1",
        "objecttype": "boardgame",
        "q": title,
    }

    response = requests.get(base_url, params=params)
    soup = BeautifulSoup(response.text, "html.parser")
    id_string = soup.find("a", string=title).get("href")
    # return id_string

    if id_string and "/boardgame/" in id_string:
        # Extract the game ID from the href attribute
        game_id = id_string.split("/boardgame/")[1].split("/")[0]
        return int(game_id)
    return None


In [301]:
for i in df_null_ids['Name']:
  print(i)
  print(get_id_for_title(i))
  print( )


Ace of Aces: Jet Eagles
1991

Die Erben von Hoax
413

Rommel in North Africa: The War in the Desert 1941-42
11113

Migration: A Story of Generations
143663

Die Insel der steinernen Wächter
54501

Dragon Ball Z TCG (2014 edition)
168077

Dwarfest
170337

Hus
25999

Contrario 2
27227

Warage: Extended Edition
198886

Rainbow
341510

Sexy, el juego del arte del flirteo
148211

Dracarys Dice
269573

Battleship: Tactical Capital Ship Combat 1925-1945
8173

The Umbrella Academy Game
316555

Hidden Conflict
15804



In [313]:
# Iterate through the DataFrame and update missing IDs
for index, row in df_null_ids.iterrows():
    if pd.isna(row['ID']):
        title = row['Name']
        game_id = get_id_for_title(title)
        df_null_ids.at[index, 'ID'] = game_id

# Display the updated DataFrame
df_null_ids


Unnamed: 0,ID,Name
10776,1991.0,Ace of Aces: Jet Eagles
10835,413.0,Die Erben von Hoax
11152,11113.0,Rommel in North Africa: The War in the Desert ...
11669,143663.0,Migration: A Story of Generations
12649,54501.0,Die Insel der steinernen Wächter
12764,168077.0,Dragon Ball Z TCG (2014 edition)
13282,170337.0,Dwarfest
13984,25999.0,Hus
14053,27227.0,Contrario 2
14663,198886.0,Warage: Extended Edition


In [315]:
df_bg["Name"] = df_bg["Name"].apply(lambda x: x.replace("Die Insel der steinernen Wachter", "Die Insel der steinernen Wächter"))
df_bg["Name"] = df_bg["Name"].apply(lambda x: x.replace("Dracarys Dice Don't Get Burned!", "Dracarys Dice"))

# Iterate through the DataFrame and update missing IDs
for index, row in df_bg.iterrows():
    if pd.isna(row['ID']):
        title = row['Name']
        game_id = get_id_for_title(title)
        df_bg.at[index, 'ID'] = game_id

# Display the updated DataFrame
df_bg.isnull().sum()


ID                        0
Name                      0
Year Published            1
Min Players               0
Max Players               0
Play Time                 0
Min Age                   0
Users Rated               0
Rating Average            0
BGG Rank                  0
Complexity Average        0
Owned Users              23
Mechanics              1598
Domains               10159
dtype: int64

In [None]:
df_bg['Mechanics'][0]


'Action Queue, Action Retrieval, Campaign / Battle Card Driven, Card Play Conflict Resolution, Communication Limits, Cooperative Game, Deck Construction, Deck Bag and Pool Building, Grid Movement, Hand Management, Hexagon Grid, Legacy Game, Modular Board, Once-Per-Game Abilities, Scenario / Mission / Campaign Game, Simultaneous Action Selection, Solo / Solitaire Game, Storytelling, Variable Player Powers'

## 💪 Challenge
Explore and analyze the board game data, and share the intriguing insights with your friends through a report. Here are some steps that might help you get started:

* Is this dataset ready for analysis? Some variables have inappropriate data types, and there are outliers and missing values. Apply data cleaning techniques to preprocess the dataset.
* Use data visualization techniques to draw further insights from the dataset. 
* Find out if the number of players impacts the game's average rating.

## 🧑‍⚖️ Judging criteria

This is a community-based competition. The top 5 most upvoted entries will win.

The winners will receive DataCamp merchandise.

## ✅ Checklist before publishing into the competition
- Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
- **Remove redundant cells** like the judging criteria, so the workbook is focused on your story.
- Make sure the workbook reads well and explains how you found your insights. 
- Try to include an **executive summary** of your recommendations at the beginning.
- Check that all the cells run without error.

## ⌛️ Time is ticking. Good luck!