# Parameter Tuning with Hyperopt
This notebook will study using hyperopt and Bayesian optimization over traditional grid search.

## Data
The dataset will be Kaggle's board game geek dataset: https://www.kaggle.com/mrpantherson/board-game-data

We will build a model to predict if a board game receives an above-average rating.

In [3]:
import pandas as pd
from catboost import CatBoostClassifier, Pool
from hyperopt import tpe, hp, fmin, STATUS_OK, Trials

# Get Data

In [6]:
# data path
datafile = '~/Documents/Python/data/bgg_db_1806.csv'

df = pd.read_csv(datafile)
print(f"N records: {len(df)}")
df.head()

N records: 4999


Unnamed: 0,rank,bgg_url,game_id,names,min_players,max_players,avg_time,min_time,max_time,year,avg_rating,geek_rating,num_votes,image_url,age,mechanic,owned,category,designer,weight
0,1,https://boardgamegeek.com/boardgame/174430/glo...,174430,Gloomhaven,1,4,120,60,120,2017,8.98893,8.61858,15376,https://cf.geekdo-images.com/original/img/lDN3...,12,"Action / Movement Programming, Co-operative Pl...",25928,"Adventure, Exploration, Fantasy, Fighting, Min...",Isaac Childres,3.7543
1,2,https://boardgamegeek.com/boardgame/161936/pan...,161936,Pandemic Legacy: Season 1,2,4,60,60,60,2015,8.6614,8.50163,26063,https://cf.geekdo-images.com/original/img/P_Sw...,13,"Action Point Allowance System, Co-operative Pl...",41605,"Environmental, Medical","Rob Daviau, Matt Leacock",2.821
2,3,https://boardgamegeek.com/boardgame/182028/thr...,182028,Through the Ages: A New Story of Civilization,2,4,240,180,240,2015,8.60673,8.30183,12352,https://cf.geekdo-images.com/original/img/1d2h...,14,"Action Point Allowance System, Auction/Bidding...",15848,"Card Game, Civilization, Economic",Vlaada Chvátil,4.3678
3,4,https://boardgamegeek.com/boardgame/167791/ter...,167791,Terraforming Mars,1,5,120,120,120,2016,8.38461,8.19914,26004,https://cf.geekdo-images.com/original/img/o8z_...,12,"Card Drafting, Hand Management, Set Collection...",33340,"Economic, Environmental, Industry / Manufactur...",Jacob Fryxelius,3.2456
4,5,https://boardgamegeek.com/boardgame/12333/twil...,12333,Twilight Struggle,2,2,180,120,180,2005,8.33954,8.19787,31301,https://cf.geekdo-images.com/original/img/ZPnn...,13,"Area Control / Area Influence, Campaign / Batt...",42952,"Modern Warfare, Political, Wargame","Ananda Gupta, Jason Matthews",3.5518


In [5]:
# geek rating distribution
df.geek_rating.describe()

count    4999.000000
mean        6.088576
std         0.483212
min         5.640240
25%         5.726970
50%         5.905240
75%         6.303585
max         8.618580
Name: geek_rating, dtype: float64

In [13]:
df['aboveavg'] = (df.geek_rating > df.geek_rating.mean()).astype('int')
df.groupby('aboveavg')['geek_rating'].describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
aboveavg,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,3223.0,5.797524,0.125798,5.64024,5.686755,5.76681,5.88989,6.08855
1,1776.0,6.616764,0.442476,6.08879,6.263203,6.49368,6.882483,8.61858


In [14]:
df.aboveavg.value_counts(normalize=True)

0    0.644729
1    0.355271
Name: aboveavg, dtype: float64