# Predicting Scores from the Spread Sheet

## Background Reading

This is all done using Fast AI (https://www.fast.ai/)

If you want to understand what this is doing look at the course here: https://course.fast.ai/

This is losely based on https://github.com/fastai/fastbook/blob/master/09_tabular.ipynb

## Why are my results different?
It's worth pointing out that the training of Nural Nets is stochastic rather than deterministic and so you won't get the exact same results as me.

I've initilised the random number generators so that there's as little variance as possiable but you'll still get slightly different results every time you re-train your model.

## Initilisation

In [None]:
# Fast AI - A popular, easy to use Neural Net libraty. See https://www.fast.ai/
!pip install fastai --upgrade
from fastai.tabular.all import *

# For the random predictions
import random
import numpy as np

# Google Drive
from google.colab import drive

# Mount and authorise google Drive
drive.mount('/content/drive')

#You may need to change this if your data is elsewhere
dataPath = Path('/content/drive/MyDrive/guild')

Collecting fastai
  Downloading fastai-2.5.3-py3-none-any.whl (189 kB)
[K     |████████████████████████████████| 189 kB 12.3 MB/s 
Collecting fastcore<1.4,>=1.3.22
  Downloading fastcore-1.3.27-py3-none-any.whl (56 kB)
[K     |████████████████████████████████| 56 kB 2.7 MB/s 
Collecting fastdownload<2,>=0.0.5
  Downloading fastdownload-0.0.5-py3-none-any.whl (13 kB)
Installing collected packages: fastcore, fastdownload, fastai
  Attempting uninstall: fastai
    Found existing installation: fastai 1.0.61
    Uninstalling fastai-1.0.61:
      Successfully uninstalled fastai-1.0.61
Successfully installed fastai-2.5.3 fastcore-1.3.27 fastdownload-0.0.5
Mounted at /content/drive


## Training Data

We have the data from Seasons 1-4. This has been simplified from Al's spreadsheet to work for this purpose.

In [None]:
df = pd.read_csv(dataPath /'TrainingData.csv')
df.head(15)

Unnamed: 0,Beer,Brewery,Country,Chooser,Style,ABV,Score,Score_Band
0,Hitachino Nest White Ale,Kiuchi,Japan,Dave,Wheat Beer,5.5,7.125,VERYGOOD
1,Holding Back the Tiers,Brew York,United Kingdom,Alan,xPA,5.2,7.1875,VERYGOOD
2,Menabrea lager,Menabrea,Italy,Jamie,Lager,4.8,7.25,VERYGOOD
3,Tusker,Tusker,Kenya,Sam,Lager,4.2,6.375,GOOD
4,Titus,Saltaire Brewery,United Kingdom,Morty,Ale,3.9,6.9375,GOOD
5,Helles,Augustiner Brauerei,Germany,Jamie,Lager,5.2,7.1875,VERYGOOD
6,Orval,Orval,Belgium,Baz,Ale,6.9,6.6875,GOOD
7,Nut Brown Ale,Samuel Smiths,United Kingdom,Daz,Ale,5.0,7.0,VERYGOOD
8,Brew Rasp,Brew York,United Kingdom,Morty,xPA,6.5,6.875,GOOD
9,Rhub. Streisand,Brew York,United Kingdom,Baz,xPA,5.5,6.8125,GOOD


# Catagorised Model



## Catagories

We don't have enough data to predict excact scores. Instead we're going to catogorise the scores as follows

* Less than 5 = BAD
* 5.0 - 5.9 = OK
* 6.0 - 6.9 = GOOD
* Greater than 7 = VERY GOOD

We've also simplified the Styles e.g. Helles and Pils have been mapped to Lager

## Building the Model

In [None]:
#Split the data so we have a Training and Validation Set (10% of data, really this should be bigger but we've so little data!)
splits = RandomSplitter(valid_pct=0.1, seed=42)(range_of(df)) #Randomly use 10% of the data as a validation set. initilise the random no. generator with 42

#Build a Tabular Panda
to = TabularPandas(df, procs=[Categorify, FillMissing,Normalize],
                   cat_names = ['Brewery', 'Country', 'Chooser', 'Style'], #Catagory (enum) fields, to be converted to fp numbers
                   cont_names = ['ABV'], # 'Continuous' or number fields
                   y_names='Score_Band', # The column we want to predict
                   splits=splits) # The indexes of the data to use as the training and validation sets
dls = to.dataloaders(bs=4)

random.seed(a=42)
np.random.seed(42)

#Do 50 epochs of training. The Model is empty so no point in calulating the learning rate intitially
learn = tabular_learner(dls, metrics=[error_rate,accuracy])
learn.fit_one_cycle(50)

epoch,train_loss,valid_loss,error_rate,accuracy,time
0,1.420421,1.384106,0.833333,0.166667,00:00
1,1.396093,1.425626,1.0,0.0,00:00
2,1.384985,1.418636,0.666667,0.333333,00:00
3,1.361986,1.442005,0.666667,0.333333,00:00
4,1.357639,1.378331,0.833333,0.166667,00:00
5,1.336068,1.287351,0.833333,0.166667,00:00
6,1.315704,1.359295,0.666667,0.333333,00:00
7,1.287216,1.326378,0.5,0.5,00:00
8,1.24439,1.195231,0.333333,0.666667,00:00
9,1.225419,1.214625,0.166667,0.833333,00:00


## How does it do against our validation data set?

In [None]:
learn.show_results()

Unnamed: 0,Brewery,Country,Chooser,Style,ABV,Score_Band,Score_Band_pred
0,8.0,15.0,5.0,3.0,-0.364452,1.0,3.0
1,13.0,17.0,6.0,6.0,-0.164361,3.0,1.0
2,33.0,17.0,5.0,1.0,-0.66459,3.0,2.0
3,16.0,17.0,3.0,6.0,0.636007,1.0,1.0


## Lets try some Hypothetical Beers!

In [None]:
hypo = pd.read_csv(dataPath /'TestData.csv')
hypo.head(2)

Unnamed: 0,Beer,Brewery,Country,Chooser,Style,ABV
0,F** This,Elgoods,Belgium,Dave,Fruit Beer,7.0
1,Super Brew York,Brew York,United Kingdom,Jamie,Lager,4.5


In [None]:
for i in range(len(hypo)):
    predictRow, clas, probs = learn.predict(hypo.iloc[i])
    print(f"'{hypo.iloc[i].Beer}' is predicted as {dls.vocab[clas]} with probability {probs}")

'F** This' is predicted as BAD with probability tensor([0.8630, 0.0283, 0.0315, 0.0773])


'Super Brew York' is predicted as VERYGOOD with probability tensor([0.0347, 0.1231, 0.0878, 0.7544])


Things to do
*   More data (Beer Guild Live)
*   Estimate all the beers in the Season
*   More data (Steal it off the Internet)



## Predict the Season 5 beers using the Model we built

In [None]:
test = pd.read_csv(dataPath /'Season5.csv')
test.head(16)

Unnamed: 0,Beer,Brewery,Country,Chooser,Style,ABV
0,F*** 2020,Cold Town,United Kingdom,Sam,Lager,5.2
1,Fairytale Of Brew York 2021,Brew York,United Kingdom,Alan,Stout,6.6
2,Anarchy Kiss The Sun Can,Anarchy Brew Co,United Kingdom,Morty,Ale,5.3
3,Cape Point Pale Ale,Cape Brewing Co,South Africa,Dave,xPA,4.8
4,Cisk Lager Can,Simonds Farsons Cisk,Malta,Morty,Lager,4.2
5,Cold Town Proud As Helles Can,Cold Town,United Kingdom,Daz,Lager,5.1
6,Hofbrau Winterzwickl,Hofbrau Munchen,Germany,Jamie,Lager,5.5
7,Humpty Dumpty Christmas Crack,Humpty Dumpty,United Kingdom,Pete,Ale,7.0
8,Kloster Andechs Weissbier Dunkel,Kloster Andechs,Germany,Baz,Wheat Beer,5.0
9,Mighty Oak Wonky Donkey,Mighty Oak Brewery,United Kingdom,Sam,Ale,4.3


In [None]:

for i in range(len(test)):
    predictRow, clas, probs = learn.predict(test.iloc[i])
    print(f"{test.iloc[i].Beer} is predicted as {dls.vocab[clas]} with probability {probs}")

## Randomly Predict the Results as a control

In [None]:
import random
random.seed(a=42)

for i in range(len(test)):
    
    print(f"{test.iloc[i].Beer} is predicted as {dls.vocab[random.randint(0, 3)]}")

F*** 2020 is predicted as BAD
Fairytale Of Brew York 2021 is predicted as BAD
Anarchy Kiss The Sun Can is predicted as OK
Cape Point Pale Ale is predicted as GOOD
Cisk Lager Can is predicted as GOOD
Cold Town Proud As Helles Can is predicted as GOOD
Hofbrau Winterzwickl is predicted as BAD
Humpty Dumpty Christmas Crack is predicted as BAD
Kloster Andechs Weissbier Dunkel is predicted as VERYGOOD
Mighty Oak Wonky Donkey is predicted as BAD
Moeder Overste Flip Top is predicted as BAD
Moon Gazer Skidaddler Norfolk Milk Stout Can is predicted as BAD
Nordbrau Ingolstadt Eisbock is predicted as GOOD
Saison Dupont is predicted as GOOD
Sloop Super Soft IPA Can is predicted as BAD
Wildcraft Wild Card Blackberry Stout Can is predicted as GOOD


# Floating Point Scores

I know we said this wouldn't work but what the hey!

## Model

In [None]:
fpSplits = RandomSplitter(valid_pct=0.1, seed=42)(range_of(df))
fpPanda = TabularPandas(df, procs=[Categorify, FillMissing,Normalize],
                   cat_names = ['Brewery', 'Country', 'Chooser', 'Style'],
                   cont_names = ['ABV'],
                   y_names='Score',
                   splits=fpSplits)
fpDls = fpPanda.dataloaders(bs=4)

random.seed(a=42)
np.random.seed(42)

fpLearn = tabular_learner(fpDls, metrics=[error_rate,accuracy])

fpLearn.fit_one_cycle(50)



epoch,train_loss,valid_loss,error_rate,accuracy,time
0,44.546055,49.591038,1.0,0.0,00:00
1,44.689583,49.313339,1.0,0.0,00:00
2,44.587574,48.413284,1.0,0.0,00:00
3,44.486729,47.990356,1.0,0.0,00:00
4,44.326981,47.577698,1.0,0.0,00:00
5,44.104492,46.914082,1.0,0.0,00:00
6,43.73325,44.945892,1.0,0.0,00:00
7,43.237259,46.916645,1.0,0.0,00:00
8,42.625496,44.12645,1.0,0.0,00:00
9,41.821861,39.72784,1.0,0.0,00:00


0% accuracy, lol!

## The Hypothetical Beers

In [None]:
for i in range(len(hypo)):
    
    predictRow, clas, probs = fpLearn.predict(hypo.iloc[i])
    print(f"{hypo.iloc[i].Beer},{clas[0]}")

F** This,5.398323059082031


Super Brew York,6.972620010375977


## Predictions

In [None]:
for i in range(len(test)):
    predictRow, clas, probs = fpLearn.predict(test.iloc[i])
    print(f"{test.iloc[i].Beer},{clas[0]}")

## Random Control

Again as a control

In [None]:

random.seed(a=42)

for i in range(len(test)):
    print(f"{test.iloc[i].Beer},{random.random()*10}")

F*** 2020,6.394267984578837
Fairytale Of Brew York 2021,0.25010755222666936
Anarchy Kiss The Sun Can,2.7502931836911926
Cape Point Pale Ale,2.2321073814882277
Cisk Lager Can,7.364712141640124
Cold Town Proud As Helles Can,6.766994874229113
Hofbrau Winterzwickl,8.921795677048454
Humpty Dumpty Christmas Crack,0.8693883262941615
Kloster Andechs Weissbier Dunkel,4.2192181968527045
Mighty Oak Wonky Donkey,0.29797219438070344
Moeder Overste Flip Top,2.1863797480360336
Moon Gazer Skidaddler Norfolk Milk Stout Can,5.053552881033624
Nordbrau Ingolstadt Eisbock,0.26535969683863625
Saison Dupont,1.988376506866485
Sloop Super Soft IPA Can,6.498844377795232
Wildcraft Wild Card Blackberry Stout Can,5.449414806032166
