# Hat game demo
In this notebook we
1. download sample corpus
1. train few models on it
1. write a class which will follow AbstractPlayer conventions
1. finally, play the game between local models (players) and one remote :)

Note that exact output may be not reproducible as the remote player could change over time or even fail/timeout at some point

In [1]:
from pathlib import Path

import numpy as np
import pandas as pd
import fasttext
from sklearn.metrics.pairwise import cosine_similarity
from tqdm import tqdm

from the_hat_game.game import Game
from flask_app.player import PlayerDefinition, AbstractPlayer, RemotePlayer

pd.set_option('display.max_colwidth', 200)

[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/olegpolivin/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     /Users/olegpolivin/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


## download sample text corpus

In [2]:
%%bash
cd texts
wget --quiet http://qwone.com/~jason/20Newsgroups/20news-19997.tar.gz
tar -zxf 20news-19997.tar.gz
rm 20news-19997.tar.gz

In [3]:
file_path =  Path('texts/20-newsgroups.txt')
folder = Path('texts/20_newsgroups/')

In [4]:
with open(file_path, 'w', encoding='utf-8') as f_write:
    files = list(folder.rglob('*'))
    for object_path in tqdm(files):
        if object_path.is_dir():
            continue
        with open(object_path, encoding='latin-1') as stream:
            for line in stream:
                f_write.write(line)

100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 20017/20017 [00:02<00:00, 9636.68it/s]


In [5]:
!wc -w {file_path}

 6046669 texts/20-newsgroups.txt


In [6]:
!head -5 {file_path}

Newsgroups: talk.politics.mideast
Path: cantaloupe.srv.cs.cmu.edu!crabapple.srv.cs.cmu.edu!bb3.andrew.cmu.edu!news.sei.cmu.edu!cis.ohio-state.edu!zaphod.mps.ohio-state.edu!cs.utexas.edu!uunet!brunix!doorknob!hm
From: hm@cs.brown.edu (Harry Mamaysky)
Subject: Heil Hernlem 
In-Reply-To: hernlem@chess.ncsu.edu's message of Wed, 14 Apr 1993 12:58:13 GMT


## Train several models

In [7]:
%%time

model_skipgram = fasttext.train_unsupervised(str(file_path), model='skipgram', dim=5)
model_cbow = fasttext.train_unsupervised(str(file_path), model='cbow', dim=16)
model_skipgram2 = fasttext.train_unsupervised(str(file_path), model='skipgram', dim=10)

Read 7M words
Number of words:  72228
Number of labels: 0
Progress: 100.0% words/sec/thread:  228161 lr:  0.000000 avg.loss:  1.815179 ETA:   0h 0m 0s 24.1% words/sec/thread:  221858 lr:  0.037928 avg.loss:  1.860562 ETA:   0h 0m40s 33.2% words/sec/thread:  226645 lr:  0.033422 avg.loss:  1.853261 ETA:   0h 0m34s 38.1% words/sec/thread:  229185 lr:  0.030930 avg.loss:  1.839601 ETA:   0h 0m31s 50.0% words/sec/thread:  228062 lr:  0.025022 avg.loss:  1.846038 ETA:   0h 0m25s 53.0% words/sec/thread:  227486 lr:  0.023488 avg.loss:  1.846840 ETA:   0h 0m24s 55.5% words/sec/thread:  228539 lr:  0.022265 avg.loss:  1.851693 ETA:   0h 0m22s 62.1% words/sec/thread:  228002 lr:  0.018953 avg.loss:  1.854416 ETA:   0h 0m19s 67.5% words/sec/thread:  226836 lr:  0.016227 avg.loss:  1.855453 ETA:   0h 0m16s 84.0% words/sec/thread:  227629 lr:  0.007982 avg.loss:  1.851285 ETA:   0h 0m 8s 91.4% words/sec/thread:  227524 lr:  0.004318 avg.loss:  1.828685 ETA:   0h 0m 4s
Read 7M words
Number of words

CPU times: user 6min 4s, sys: 3.92 s, total: 6min 8s
Wall time: 2min 19s


Progress: 100.0% words/sec/thread:  220365 lr: -0.000000 avg.loss:  1.748638 ETA:   0h 0m 0sProgress: 100.0% words/sec/thread:  220364 lr:  0.000000 avg.loss:  1.748638 ETA:   0h 0m 0s


In [8]:
model_skipgram.words[1]

'the'

In [9]:
len(model_skipgram.words)

72228

In [10]:
model_skipgram['song']

array([-1.6399992 , -0.34246916,  0.50289637, -0.7360252 ,  0.47372568],
      dtype=float32)

In [11]:
!mkdir models
model_skipgram.save_model('models/skipgram.model')
model_skipgram2.save_model('models/skipgram2.model')
model_cbow.save_model('models/cbow.model')

mkdir: models: File exists


In [12]:
!ls -lh models

total 2190872
-rw-r--r--  1 olegpolivin  staff   787M Jun 10 14:47 2021_06_05_processed.model
-rw-r--r--  1 olegpolivin  staff   132M Nov  6 22:27 cbow.model
-rw-r--r--  1 olegpolivin  staff    42M Nov  6 22:27 skipgram.model
-rw-r--r--  1 olegpolivin  staff    83M Nov  6 22:27 skipgram2.model


## Players' classes for fasttext models

In [13]:
class LocalFasttextPlayer(AbstractPlayer):
    def __init__(self, model):
        self.model = model

    def find_words_for_sentence(self, sentence, n_closest):
        neighbours = self.model.get_nearest_neighbors(sentence)
        words = [word for similariry, word in neighbours][:n_closest]
        return words

    def explain(self, word, n_words):
        return self.find_words_for_sentence(word, n_words)

    def guess(self, words, n_words):
        guessed_words = self.find_words_for_sentence(' '.join(words), n_words)
        return {"word_list": guessed_words, "time": 0, "code": 200}

In [14]:
# check remotely deployed service
remote_player = RemotePlayer('https://obscure-everglades-02893.herokuapp.com')

print(remote_player.explain('work', 10))
print(remote_player.guess(['job', 'employee', 'office'], 5))

['work', 'discontent', 'probably:', 'lopid', 'gives', 'putty', 'refund', 'strangest', 'enuff', 'inovative']
{'word_list': ['bars;', 'earnings', 'appellate', 'discoverd', 'phage'], 'time': 0.651013, 'code': 200}


In [15]:
local_player = LocalFasttextPlayer(model_skipgram)
print(local_player.explain('work', 10))
print(local_player.guess(['job', 'employee', 'office'], 5))

['normal.', 'cost,', 'autos', 'vehicle.', 'handy', 'subaru', 'cost', 'fit', 'handlebar', 'expensive,']
{'word_list': ['revision,', '[my', '[some', '>Will', 'WBT/SIL.'], 'time': 0, 'code': 200}


## Playing game!

In [16]:
N_EXPLAIN_WORDS = 10
N_GUESSING_WORDS = 5
N_ROUNDS = 1
CRITERIA = 'soft'

PLAYERS = [
    PlayerDefinition('HerokuOrg team', RemotePlayer('https://obscure-everglades-02893.herokuapp.com')),
    PlayerDefinition('skipgram team', LocalFasttextPlayer(model_skipgram)),
    PlayerDefinition('skipgram2 team', LocalFasttextPlayer(model_skipgram2)),
    PlayerDefinition('cbow team', LocalFasttextPlayer(model_cbow))
]

WORDS = ['dollar', 'percent', 'billion', 'money']

game = Game(PLAYERS, WORDS, CRITERIA, N_ROUNDS, N_EXPLAIN_WORDS, N_GUESSING_WORDS, random_state=0)
game.run(verbose='print_logs', complete=False)

HOST to EXPLAINING PLAYER (HerokuOrg team): the word is "billion"
EXPLAINING PLAYER (HerokuOrg team) to HOST: my wordlist is []
HOST TO EXPLAINING PLAYER (HerokuOrg team): cleaning your word list. Now the list is []


SCORES: {'HerokuOrg team': 0}


HOST to EXPLAINING PLAYER (skipgram2 team): the word is "money"
EXPLAINING PLAYER (skipgram2 team) to HOST: my wordlist is ['pay.', 'taxes.', 'money.', 'insure', 'electricity.', "we'll", 'pay,', 'bill,', 'care.', 'pay']
HOST TO EXPLAINING PLAYER (skipgram2 team): cleaning your word list. Now the list is ['pay', 'taxes', 'insure', 'electricity', 'well', 'bill', 'care']

===ROUND 1===

HOST: ['pay']
[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/olegpolivin/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     /Users/olegpolivin/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
GUESSING PLAYER (HerokuOrg team) to HOST: ['pay', 'reopened', 'considering', 'occasionally', 'hand-cocked']
HOST: False
GUESSING PLAYER (skipgram team) to HOST: ['suggestion,', 'pleases', '>suggest', 'fault?', 'complaint']
HOST: False
GUESSING PLAYER (cbow team) to HOST: ['>pay', 'payroll', 'earphone', 'telephone,

Unnamed: 0,"Explanation for ""money"" (skipgram2 team)",Guess (HerokuOrg team),Guess (skipgram team),Guess (cbow team)
0,[pay],"[pay, reopened, considering, occasionally, hand-cocked]","[suggestion,, pleases, >suggest, fault?, complaint]","[>pay, payroll, earphone, telephone,, phone)]"
1,"[pay, taxes]","[loaned, aging, deep, elapsed, wielded]","[#No,, >Personally,, >>really, that:, _really_]","[taxes, pays, taxes,, faxes, axes]"
2,"[pay, taxes, insure]","[greater, bulletproof, aging, elapsed, subside]","[AFDC, classmates, >(1), 'official', >issue]","[insure, microphone,, insurance,, phone,, tach,]"
3,"[pay, taxes, insure, electricity]",[],"[router, bib, Ny-Quil, synopsis, bilingual]","[electricity, viscosity, rivalry, electrolyte, axes;]"
4,"[pay, taxes, insure, electricity, well]",[],"[bilingual, CopyFromParent, order)., [with, +|>]","[microphone,, electricity, viscosity, transport,, electrolyte]"
5,"[pay, taxes, insure, electricity, well, bill]","[>fund, filing, courtroom, ledges, squalor]","[biz.sco.*, gp160, cathodes, B"", banks,]","[electricity, microphone,, telephone,, viscosity, transport,]"
6,"[pay, taxes, insure, electricity, well, bill, care]",[],"[cymbal, hostname, gp160, I), biz.sco.*]","[microphone,, viscosity, pool,, telephone,, freeware,]"


HOST to EXPLAINING PLAYER (skipgram team): the word is "percent"
EXPLAINING PLAYER (skipgram team) to HOST: my wordlist is ['Ararat', '60,000', 'gel', '140,000', 'lantern', 'camel', 'Akhalkalaki', '90,000', 'year)', 'charred']
HOST TO EXPLAINING PLAYER (skipgram team): cleaning your word list. Now the list is ['ararat', 'gel', 'lantern', 'camel', 'akhalkalaki', 'year', 'charred']

===ROUND 1===

HOST: ['ararat']
[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/olegpolivin/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     /Users/olegpolivin/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
GUESSING PLAYER (HerokuOrg team) to HOST: ['ararat', 'manhattan', 'sea', 'eighth', 'defeated']
HOST: False
GUESSING PLAYER (skipgram2 team) to HOST: ['IgA', 'cratered', 'lava', 'Ararat', 'shrapnel']
HOST: False
GUESSING PLAYER (cbow team) to HOST: ['incandescent', 'quarterly', 'period-centred', 'occ

Unnamed: 0,"Explanation for ""percent"" (skipgram team)",Guess (HerokuOrg team),Guess (skipgram2 team),Guess (cbow team)
0,[ararat],"[ararat, manhattan, sea, eighth, defeated]","[IgA, cratered, lava, Ararat, shrapnel]","[incandescent, quarterly, period-centred, occurred,, tenth]"
1,"[ararat, gel]",[],"[}:, IgM, vilayets, Izmir, velapold]","[Tabaracci, Ararat, mosque(s), (Vincent, Zarathushtra]"
2,"[ararat, gel, lantern]","[gaston, 31-6, senators, championships, millennia]","[outskirts, artillery,, shrine, Kelbajar, Hasbani]","[lantern, eastern, Scandinavian, vanilla, *reserved*]"
3,"[ararat, gel, lantern, camel]","[plague, kadesh, sunday, huizenga, passers]","[velapold, IgM, }:, vilayets, dwarf]","[vanilla, caterpillar, roman.bmp, lantern, moorcockpratchettdenislearydelasoulu2iainmbanksneworderheathersbatmanpjorourke]"
4,"[ararat, gel, lantern, camel, akhalkalaki]",[],"[shrine, IgM, vilayets, Akhalkalaki, artillery,]","[roman.bmp, Battlefields"", Sandstrom, Battlefields,"", Tabaracci]"
5,"[ararat, gel, lantern, camel, akhalkalaki, year]","[>tuesday, morning, bellies, hurried, fury]","[shrine, vilayets, IgM, Kelbajar, azeri]","[Crossroads, camarade.', Battlefields"", Sandstrom, Sandstrom,]"
6,"[ararat, gel, lantern, camel, akhalkalaki, year, charred]",[],"[IgM, }:, -mel, velapold, vilayets]","[Crossroads, velapold, collapsed, perished, headquartered]"


HOST to EXPLAINING PLAYER (cbow team): the word is "dollar"
EXPLAINING PLAYER (cbow team) to HOST: my wordlist is ['airliner', 'champs', 'dollar/pound', 'draft,', 'travel', 'channel,', 'camp,', 'plane,', 'planar', 'rampant']
HOST TO EXPLAINING PLAYER (cbow team): cleaning your word list. Now the list is ['airliner', 'champs', 'draft', 'travel', 'channel', 'camp', 'plane', 'planar', 'rampant']

===ROUND 1===

HOST: ['airliner']
[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/olegpolivin/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     /Users/olegpolivin/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
GUESSING PLAYER (HerokuOrg team) to HOST: []
HOST: False
GUESSING PLAYER (skipgram2 team) to HOST: ['airline', 'outs.', 'springs', 'SRB', 'spans']
HOST: False
GUESSING PLAYER (skipgram team) to HOST: ['takeoff', 'travel', 'brown.', 'powered,', 'cotton']
HOST: False

===ROUND 2===

HOS

Unnamed: 0,"Explanation for ""dollar"" (cbow team)",Guess (HerokuOrg team),Guess (skipgram2 team),Guess (skipgram team)
0,[airliner],[],"[airline, outs., springs, SRB, spans]","[takeoff, travel, brown., powered,, cotton]"
1,"[airliner, champs]","[crest, boarding, pacemakers, metrodome, centerfielder]","[$5,, champs, SRB, Through, :-D]","[uv, guitar,, AP>, deliveries, berthing]"
2,"[airliner, champs, draft]","[pacemakers, sharks, crest, metrodome, 45th]","[#), classroom, Watters, minivan, medal]","[Martian, ]>, tony, NO-OP, Yl|nen]"
3,"[airliner, champs, draft, travel]","[plats, >10pm, minorty, wings-leafs, cafe-bar]","[classroom, matchups, #), Through, Falloon]","[]>, Oldsmobile, Capacity, tony, backon@VMS.HUJI.AC.IL]"
4,"[airliner, champs, draft, travel, channel]","[elite, 58%, begins, fad, ton]","[matchups, #), wycliffe, F-16, Target]","[MW, Rodime, 2SA634, fraud,, SMT,]"
5,"[airliner, champs, draft, travel, channel, camp]","[spruce, skydome, prestige, bkfst, 48th]","[#), matchups, wycliffe, Target, Falloon]","[fraud,, backon@VMS.HUJI.AC.IL, Capacity, 2SA634, #!/bin/sh]"
6,"[airliner, champs, draft, travel, channel, camp, plane]","[yellowish, prune, powerplant, album, skydome]","[#), Spray, Target, F-16, Falloon]","[2SA634, fraud,, Rodime, MW, backon@VMS.HUJI.AC.IL]"
7,"[airliner, champs, draft, travel, channel, camp, plane, planar]",[],"[Target, #), MLV, F-16, Spray]","[(big, #!/bin/sh, harbor., GALILEO, 2SA634]"
8,"[airliner, champs, draft, travel, channel, camp, plane, planar, rampant]","[hairline, navajo, base:, time~s, numbered]","[MLV, Target, #), Scandinavia, Venerean]","[GALAXIES,, (big, M1, #!/bin/sh, GALILEO]"


## View final game report

In [17]:
game.report_results(each_game=True)

=== Team scores in each game ===


Unnamed: 0_level_0,HerokuOrg team,skipgram2 team,skipgram team,cbow team
game,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0


=== Team scores, summary ===


Unnamed: 0,explaining,guessing,total,response_200,response_time
HerokuOrg team,0,0,0.0,0.640212,0.85494
cbow team,0,0,0.0,1.0,0.0
skipgram team,0,0,0.0,1.0,0.0
skipgram2 team,0,0,0.0,1.0,0.0
