# Skip-gram word2vec

In this notebook, I'll lead you through using TensorFlow to implement the word2vec algorithm using the skip-gram architecture. By implementing this, you'll learn about embedding words for use in natural language processing. This will come in handy when dealing with things like machine translation.

## Readings

Here are the resources I used to build this notebook. I suggest reading these either beforehand or while you're working on this material.

* A really good [conceptual overview](http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/) of word2vec from Chris McCormick 
* [First word2vec paper](https://arxiv.org/pdf/1301.3781.pdf) from Mikolov et al.
* [NIPS paper](http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf) with improvements for word2vec also from Mikolov et al.
* An [implementation of word2vec](http://www.thushv.com/natural_language_processing/word2vec-part-1-nlp-with-deep-learning-with-tensorflow-skip-gram/) from Thushan Ganegedara
* TensorFlow [word2vec tutorial](https://www.tensorflow.org/tutorials/word2vec)

## Word embeddings

When you're dealing with words in text, you end up with tens of thousands of classes to predict, one for each word. Trying to one-hot encode these words is massively inefficient, you'll have one element set to 1 and the other 50,000 set to 0. The matrix multiplication going into the first hidden layer will have almost all of the resulting values be zero. This a huge waste of computation. 

![one-hot encodings](assets/one_hot_encoding.png)

To solve this problem and greatly increase the efficiency of our networks, we use what are called embeddings. Embeddings are just a fully connected layer like you've seen before. We call this layer the embedding layer and the weights are embedding weights. We skip the multiplication into the embedding layer by instead directly grabbing the hidden layer values from the weight matrix. We can do this because the multiplication of a one-hot encoded vector with a matrix returns the row of the matrix corresponding the index of the "on" input unit.

![lookup](assets/lookup_matrix.png)

Instead of doing the matrix multiplication, we use the weight matrix as a lookup table. We encode the words as integers, for example "heart" is encoded as 958, "mind" as 18094. Then to get hidden layer values for "heart", you just take the 958th row of the embedding matrix. This process is called an **embedding lookup** and the number of hidden units is the **embedding dimension**.

<img src='assets/tokenize_lookup.png' width=500>
 
There is nothing magical going on here. The embedding lookup table is just a weight matrix. The embedding layer is just a hidden layer. The lookup is just a shortcut for the matrix multiplication. The lookup table is trained just like any weight matrix as well.

Embeddings aren't only used for words of course. You can use them for any model where you have a massive number of classes. A particular type of model called **Word2Vec** uses the embedding layer to find vector representations of words that contain semantic meaning.



## Word2Vec

The word2vec algorithm finds much more efficient representations by finding vectors that represent the words. These vectors also contain semantic information about the words. Words that show up in similar contexts, such as "black", "white", and "red" will have vectors near each other. There are two architectures for implementing word2vec, CBOW (Continuous Bag-Of-Words) and Skip-gram.

<img src="assets/word2vec_architectures.png" width="500">

In this implementation, we'll be using the skip-gram architecture because it performs better than CBOW. Here, we pass in a word and try to predict the words surrounding it in the text. In this way, we can train the network to learn representations for words that show up in similar contexts.

First up, importing packages.

In [1]:
import time

import numpy as np
import tensorflow as tf

import utils

Load the [text8 dataset](http://mattmahoney.net/dc/textdata.html), a file of cleaned up Wikipedia articles from Matt Mahoney. The next cell will download the data set to the `data` folder. Then you can extract it and delete the archive file to save storage space.

In [2]:
from urllib.request import urlretrieve
from os.path import isfile, isdir
from tqdm import tqdm
import zipfile

dataset_folder_path = 'data'
dataset_filename = 'text8.zip'
dataset_name = 'Text8 Dataset'

class DLProgress(tqdm):
    last_block = 0

    def hook(self, block_num=1, block_size=1, total_size=None):
        self.total = total_size
        self.update((block_num - self.last_block) * block_size)
        self.last_block = block_num

if not isfile(dataset_filename):
    with DLProgress(unit='B', unit_scale=True, miniters=1, desc=dataset_name) as pbar:
        urlretrieve(
            'http://mattmahoney.net/dc/text8.zip',
            dataset_filename,
            pbar.hook)

if not isdir(dataset_folder_path):
    with zipfile.ZipFile(dataset_filename) as zip_ref:
        zip_ref.extractall(dataset_folder_path)
        
with open('data/text8') as f:
    text = f.read()

Text8 Dataset: 31.4MB [00:23, 1.33MB/s]                            


## Preprocessing

Here I'm fixing up the text to make training easier. This comes from the `utils` module I wrote. The `preprocess` function coverts any punctuation into tokens, so a period is changed to ` <PERIOD> `. In this data set, there aren't any periods, but it will help in other NLP problems. I'm also removing all words that show up five or fewer times in the dataset. This will greatly reduce issues due to noise in the data and improve the quality of the vector representations. If you want to write your own functions for this stuff, go for it.

In [3]:
words = utils.preprocess(text)
print(words[:30])

['anarchism', 'originated', 'as', 'a', 'term', 'of', 'abuse', 'first', 'used', 'against', 'early', 'working', 'class', 'radicals', 'including', 'the', 'diggers', 'of', 'the', 'english', 'revolution', 'and', 'the', 'sans', 'culottes', 'of', 'the', 'french', 'revolution', 'whilst']


In [4]:
print("Total words: {}".format(len(words)))
print("Unique words: {}".format(len(set(words))))

Total words: 16680599
Unique words: 63641


And here I'm creating dictionaries to convert words to integers and backwards, integers to words. The integers are assigned in descending frequency order, so the most frequent word ("the") is given the integer 0 and the next most frequent is 1 and so on. The words are converted to integers and stored in the list `int_words`.

In [5]:
vocab_to_int, int_to_vocab = utils.create_lookup_tables(words)
int_words = [vocab_to_int[word] for word in words]

In [11]:
print(int_to_vocab[0])
print(vocab_to_int['the'])
int_words[0]

the
0


5240

## Subsampling

Words that show up often such as "the", "of", and "for" don't provide much context to the nearby words. If we discard some of them, we can remove some of the noise from our data and in return get faster training and better representations. This process is called subsampling by Mikolov. For each word $w_i$ in the training set, we'll discard it with probability given by 

$$ P(w_i) = 1 - \sqrt{\frac{t}{f(w_i)}} $$

where $t$ is a threshold parameter and $f(w_i)$ is the frequency of word $w_i$ in the total dataset.

I'm going to leave this up to you as an exercise. This is more of a programming challenge, than about deep learning specifically. But, being able to prepare your data for your network is an important skill to have. Check out my solution to see how I did it.

> **Exercise:** Implement subsampling for the words in `int_words`. That is, go through `int_words` and discard each word given the probablility $P(w_i)$ shown above. Note that $P(w_i)$ is the probability that a word is discarded. Assign the subsampled data to `train_words`.

In [12]:
from collections import Counter

freq = Counter(int_words)

Counter({0: 1061396,
         1: 593677,
         2: 416629,
         3: 411764,
         4: 372201,
         5: 325873,
         6: 316376,
         7: 264975,
         8: 250430,
         9: 192644,
         10: 183153,
         11: 131815,
         12: 125285,
         13: 118445,
         14: 116710,
         15: 115789,
         16: 114775,
         17: 112807,
         18: 111831,
         19: 109510,
         20: 108182,
         21: 102145,
         22: 99683,
         23: 95603,
         24: 91250,
         25: 76527,
         26: 73334,
         27: 72871,
         28: 68945,
         29: 62603,
         30: 61925,
         31: 61281,
         32: 58832,
         33: 54788,
         34: 54576,
         35: 53573,
         36: 44358,
         37: 44033,
         38: 39712,
         39: 39086,
         40: 37866,
         41: 35358,
         42: 32433,
         43: 31523,
         44: 29567,
         45: 28810,
         46: 28553,
         47: 28161,
         48: 28100,
       

In [18]:
threshold = 1e-5
for word in int_words:
    probability = 1 - np.sqrt((threshold/freq[word]))
    print(str(word)+":"+ str(probability))
    #train_words = # The final subsampled word list


5240:0.999818331895
3081:0.999867778529
11:0.999991290011
5:0.999994460433
194:0.999962781276
1:0.999995895834
3133:0.999866725886
45:0.999981369334
58:0.99997902831
155:0.999965562255
127:0.999968645721
741:0.999933642283
476:0.999945862838
10583:0.99970638989
133:0.999967780469
0:0.999996930545
27585:0.999367544468
1:0.999995895834
0:0.999996930545
102:0.999970972393
854:0.999929796466
2:0.999995100799
0:0.999996930545
15183:0.999616517506
58576:0.998709005551
1:0.999995895834
0:0.999996930545
150:0.999966166752
854:0.999929796466
3584:0.99985581255
0:0.999996930545
194:0.999962781276
10:0.999992610876
190:0.999963184502
58:0.99997902831
4:0.999994816642
5:0.999994460433
10756:0.999703825561
214:0.999960573046
6:0.999994377904
1325:0.999913997385
104:0.999970892574
454:0.999946563017
19:0.999990444067
58:0.99997902831
2733:0.999876250512
362:0.999951000404
6:0.999994377904
3673:0.999853510316
0:0.999996930545
708:0.999935097766
1:0.999995895834
371:0.999950413562
26:0.999988322569
40

193:0.99996289163
85:0.999973733278
47:0.999981155876
3235:0.999863791059
241:0.999958498721
1896:0.999897991043
6795:0.999784334454
133:0.999967780469
5240:0.999818331895
230:0.999959364363
6:0.999994377904
44:0.99998160938
6695:0.999786313078
158:0.999965314558
6:0.999994377904
1022:0.999924255465
1555:0.999906911272
1818:0.99990009985
0:0.999996930545
193:0.99996289163
134:0.999967709999
27:0.99998828553
0:0.999996930545
1112:0.999921066296
2771:0.999875097542
2980:0.999870140821
17:0.999990584747
5:0.999994460433
801:0.999931222322
1231:0.99991695452
4:0.999994816642
0:0.999996930545
193:0.99996289163
35:0.999986337599
17:0.999990584747
669:0.999936665717
6:0.999994377904
331:0.999952574134
10988:0.999699849887
152:0.999966016655
481:0.999945775359
1:0.999995895834
0:0.999996930545
756:0.999933050056
0:0.999996930545
45:0.999981369334
8954:0.999739179735
6:0.999994377904
2980:0.999870140821
421:0.999947998419
27:0.99998828553
0:0.999996930545
56:0.99997958631
1346:0.999913223733
32

1:0.999995895834
0:0.999996930545
208:0.999961086843
2:0.999995100799
629:0.999938926671
1:0.999995895834
0:0.999996930545
1411:0.999911053992
1:0.999995895834
867:0.999929447886
4:0.999994816642
302:0.999954418137
99:0.999971541024
0:0.999996930545
18900:0.999543564535
421:0.999947998419
689:0.999935968437
6:0.999994377904
12716:0.999664799238
4:0.999994816642
0:0.999996930545
595:0.999939926666
3:0.999995071942
8:0.999993680877
16:0.999990665816
20:0.999990385593
6987:0.999780206509
7088:0.999778051619
909:0.999928018425
79:0.999974284476
5:0.999994460433
85:0.999973733278
927:0.999927528621
738:0.999933744316
4:0.999994816642
803:0.999931140842
0:0.999996930545
30147:0.999309934441
1448:0.999910161116
1931:0.999896912694
6:0.999994377904
2521:0.999881654733
5:0.999994460433
234:0.999959062437
927:0.999927528621
3264:0.99986302655
2007:0.999894998294
2:0.999995100799
45147:0.999
18:0.999990543751
30147:0.999309934441
2029:0.999894532135
280:0.999955753945
6:0.999994377904
5:0.9999944

7088:0.999778051619
1693:0.999903145144
19:0.999990444067
0:0.999996930545
93:0.999972162013
5830:0.999805375264
6:0.999994377904
985:0.99992560891
5:0.999994460433
3558:0.999856408368
24:0.999989531522
1666:0.999903863929
2:0.999995100799
553:0.999941905613
1666:0.999903863929
6:0.999994377904
2601:0.999879788241
4326:0.9998382038
2258:0.99988861352
148:0.999966332066
1736:0.999901989042
40:0.99998374918
53:0.999980151462
2272:0.999888056584
6:0.999994377904
7433:0.999770584266
310:0.999954005363
4792:0.999827739759
2215:0.999889836838
73:0.999976010975
3848:0.999848900529
33145:0.999254644008
67:0.999977181792
36:0.999984985391
5240:0.999818331895
2:0.999995100799
371:0.999950413562
462:0.999946107108
445:0.999946895584
2:0.999995100799
813:0.999931009868
3416:0.999859280491
335:0.999952483203
5240:0.999818331895
13699:0.999644215967
0:0.999996930545
1002:0.999924962472
1:0.999995895834
5:0.999994460433
93:0.999972162013
2752:0.999875774001
2:0.999995100799
29282:0.999340619527
2:0.9

2353:0.999885816772
2:0.999995100799
447:0.999946820546
1311:0.999914313689
222:0.999959961545
0:0.999996930545
683:0.99993619043
439:0.999947155738
47:0.999981155876
382:0.99994986193
23:0.999989772624
7683:0.999765596385
38:0.999984131381
2320:0.999886699645
43:0.999982189085
345:0.999951814922
2:0.999995100799
42:0.999982440731
2825:0.999873811384
6:0.999994377904
0:0.999996930545
228:0.99995964657
99:0.999971541024
46:0.999981285677
52:0.999980204423
1447:0.999910197349
3921:0.99984714554
4:0.999994816642
2424:0.999884220789
601:0.999939796166
2:0.999995100799
345:0.999951814922
634:0.999938570488
41:0.999983182705
59:0.999979014461
25:0.999988568778
23697:0.999449518117
2641:0.99987891013
19:0.999990444067
30:0.999987292299
2277:0.999887845569
7668:0.999766237709
27:0.99998828553
7683:0.999765596385
10:0.999992610876
2181:0.99989063034
23:0.999989772624
387:0.999949716354
545:0.999942427864
76:0.999974890694
55:0.999979628288
11577:0.999686887854
382:0.99994986193
2:0.999995100799

218:0.999960422026
1753:0.999901419206
1623:0.999905041417
34:0.999986463726
7683:0.999765596385
439:0.999947155738
345:0.999951814922
634:0.999938570488
54:0.999979760983
11:0.999991290011
161:0.999965097634
56:0.99997958631
175:0.999964109029
1979:0.999895629285
92:0.999972251112
89:0.999973284369
140:0.999967340137
4:0.999994816642
2561:0.999880732439
1114:0.999921041695
8520:0.999748422697
703:0.999935179628
1:0.999995895834
749:0.999933318514
3196:0.999864914193
1:0.999995895834
140:0.999967340137
312:0.999953976144
13:0.999990811563
184:0.999963455916
2392:0.999884989073
28:0.999987956614
584:0.999940302497
43:0.999982189085
1596:0.999905844553
47:0.999981155876
1979:0.999895629285
66:0.999977332944
304:0.999954361207
405:0.999948504209
98:0.999971559445
275:0.999955977455
6890:0.999782299828
0:0.999996930545
132:0.999968013713
2195:0.999890301803
79:0.999974284476
2:0.999995100799
79:0.999974284476
47:0.999981155876
6890:0.999782299828
153:0.999965862552
46:0.999981285677
4511:0

68:0.999977127541
7943:0.999760268349
6:0.999994377904
1484:0.999908940273
157:0.999965366602
6053:0.999800795232
11577:0.999686887854
82:0.999973914424
56:0.99997958631
25:0.999988568778
5245:0.999818031369
15289:0.999613666295
4:0.999994816642
43:0.999982189085
1116:0.999920992423
6:0.999994377904
1937:0.999896637721
1593:0.999905844553
477:0.999945846964
4182:0.999841089568
28:0.999987956614
68:0.999977127541
5517:0.999812353344
6:0.999994377904
3777:0.999850596424
0:0.999996930545
2270:0.999888056584
1366:0.999912361586
1:0.999995895834
55:0.999979628288
11577:0.999686887854
82:0.999973914424
56:0.99997958631
25:0.999988568778
232:0.999959124044
13:0.999990811563
135:0.999967622108
47:0.999981155876
5894:0.999803883865
7943:0.999760268349
6:0.999994377904
3777:0.999850596424
32508:0.99927452375
56:0.99997958631
52:0.999980204423
1979:0.999895629285
28:0.999987956614
1214:0.999917437099
11:0.999991290011
101:0.999971049133
11577:0.999686887854
34:0.999986463726
49:0.999980474187
124

5:0.999994460433
1417:0.999910877546
73:0.999976010975
816:0.999930977008
107:0.999970790934
30:0.999987292299
13102:0.99965700283
13:0.999990811563
135:0.999967622108
9:0.999992795195
7:0.999993856759
7:0.999993856759
9:0.999992795195
17:0.999990584747
1372:0.999912226654
7683:0.999765596385
5438:0.999814304662
130:0.999968150282
4:0.999994816642
0:0.999996930545
85:0.999973733278
351:0.999951532773
32:0.999986962534
695:0.999935423365
17:0.999990584747
4988:0.999824046164
18:0.999990543751
6464:0.999791485586
2:0.999995100799
34585:0.999233035011
1864:0.999898777959
1:0.999995895834
30:0.999987292299
11577:0.999686887854
995:0.999925235604
7683:0.999765596385
5438:0.999814304662
130:0.999968150282
17:0.999990584747
280:0.999955753945
18:0.999990543751
0:0.999996930545
129:0.999968465397
1023:0.999924233727
1:0.999995895834
1337:0.999913516293
5262:0.999818031369
382:0.99994986193
11873:0.999682179137
3825:0.999849585791
146:0.999966487454
11577:0.999686887854
371:0.999950413562
7683:

5:0.999994460433
5811:0.999805742828
1562:0.999906708949
5143:0.999820394698
0:0.999996930545
6029:0.999801189307
1:0.999995895834
0:0.999996930545
16974:0.999581146092
557:0.999941609261
2914:0.99987196312
24:0.999989531522
0:0.999996930545
704:0.999935166006
1:0.999995895834
0:0.999996930545
456:0.999946417464
4:0.999994816642
16974:0.999581146092
2:0.999995100799
0:0.999996930545
908:0.999928092901
1:0.999995895834
13:0.999990811563
32:0.999986962534
890:0.999928461847
26:0.999988322569
52:0.999980204423
31:0.999987225701
4487:0.999834478822
147:0.999966406239
167:0.999964417092
4:0.999994816642
0:0.999996930545
19704:0.999528595479
47:0.999981155876
692:0.999935863166
1:0.999995895834
16974:0.999581146092
768:0.999932487889
14701:0.999624706687
3529:0.999856996861
300:0.999954750668
6:0.999994377904
0:0.999996930545
146:0.999966487454
13158:0.99965496722
319:0.999953303646
444:0.999946910554
14:0.999990743518
9:0.999992795195
319:0.999953303646
33:0.99998648994
10:0.999992610876
13

34:0.999986463726
0:0.999996930545
5277:0.999817425814
23967:0.999440983006
1:0.999995895834
418:0.999948110552
11450:0.999689913164
2:0.999995100799
2:0.999995100799
3672:0.999853510316
2:0.999995100799
5332:0.999816506039
6200:0.999797139794
337:0.999952402534
0:0.999996930545
864:0.999929518017
1:0.999995895834
4345:0.999837778579
18399:0.999552786405
100:0.999971457681
33:0.99998648994
17:0.999990584747
1971:0.999895855932
1073:0.999922478291
84:0.999973809064
0:0.999996930545
1175:0.999918889289
810:0.999931026281
51:0.999980221466
18897:0.999543564535
4:0.999994816642
4345:0.999837778579
18399:0.999552786405
100:0.999971457681
39:0.999984004811
34:0.999986463726
32:0.999986962534
64:0.999977866137
2134:0.999891662152
1:0.999995895834
6648:0.999787282185
62240:0.998709005551
23:0.999989772624
0:0.999996930545
799:0.999931254833
337:0.999952402534
1398:0.999911646159
8691:0.999744345004
6962:0.999780735495
24311:0.999432038166
0:0.999996930545
769:0.999932472498
1:0.999995895834
0:

1:0.999995895834
0:0.999996930545
45:0.999981369334
4954:0.999824588396
1780:0.999900985246
1472:0.999909464254
4:0.999994816642
6909:0.99978178211
115:0.999969815919
160:0.999965171802
1020:0.999924320565
264:0.999956406215
54:0.999979760983
11:0.999991290011
3163:0.999865889555
5:0.999994460433
10:0.999992610876
5:0.999994460433
794:0.999931416701
861:0.999929605388
10944:0.999699849887
1435:0.999910413828
4:0.999994816642
0:0.999996930545
4:0.999994816642
601:0.999939796166
5:0.999994460433
4168:0.999841489344
1:0.999995895834
5:0.999994460433
646:0.999937922914
2168:0.999891020715
0:0.999996930545
956:0.999926656724
2501:0.99988214887
19:0.999990444067
918:0.999927755923
52:0.999980204423
2567:0.999880562421
32:0.999986962534
10:0.999992610876
243:0.99995833695
6116:0.999799195168
23:0.999989772624
5:0.999994460433
2083:0.999892973117
10953:0.999699849887
1442:0.999910305785
11:0.999991290011
4:0.999994816642
5:0.999994460433
28:0.999987956614
5:0.999994460433
28:0.999987956614
5:0

1:0.999995895834
853:0.999929813759
287:0.999955390026
36:0.999984985391
1415:0.99991094825
26:0.999988322569
0:0.999996930545
163:0.999964885251
404:0.999948511035
5277:0.999817425814
13843:0.999641942563
91:0.999972575451
4:0.999994816642
0:0.999996930545
85:0.999973733278
81:0.99997410751
78:0.999974660391
16:0.999990665816
16878:0.999581146092
1:0.999995895834
0:0.999996930545
307:0.999954140983
209:0.999961048486
10:0.999992610876
5:0.999994460433
10858:0.999701192848
3115:0.999867196821
23:0.999989772624
5:0.999994460433
152:0.999966016655
33314:0.999254644008
882:0.999928771026
0:0.999996930545
2828:0.999873811384
381:0.999949899699
2:0.999995100799
0:0.999996930545
2095:0.99989260342
1:0.999995895834
1352:0.999912994123
0:0.999996930545
192:0.999963059117
4731:0.999829003608
415:0.999948194177
10:0.999992610876
757:0.999932989958
7056:0.999779136948
23:0.999989772624
0:0.999996930545
4450:0.999835378401
381:0.999949899699
4127:0.999842279926
5:0.999994460433
167:0.999964417092


1078:0.999922244298
1:0.999995895834
0:0.999996930545
3568:0.999856111384
788:0.99993162543
19:0.999990444067
6201:0.999797139794
17888:0.99956147099
60:0.999978242868
10467:0.999708888745
9:0.999992795195
253:0.999957674209
2:0.999995100799
3:0.999995071942
64:0.999977866137
35:0.999986337599
17:0.999990584747
3555:0.999856408368
18:0.999990543751
0:0.999996930545
1772:0.999901130535
76:0.999974890694
24:0.999989531522
43:0.999982189085
163:0.999964885251
5744:0.99980754991
6201:0.999797139794
7718:0.999764949753
10467:0.999708888745
2:0.999995100799
5518:0.999812353344
118:0.999969327174
60:0.999978242868
5:0.999994460433
3666:0.999853667241
1787:0.999900839314
6201:0.999797139794
4601:0.999831926839
10467:0.999708888745
1239:0.999916753337
18:0.999990543751
29:0.999987361299
7004:0.999780206509
35:0.999986337599
39546:0.999122941981
0:0.999996930545
340:0.999952234507
1:0.999995895834
10467:0.999708888745
1014:0.999924450264
29:0.999987361299
12882:0.999660968248
1188:0.999918295854

6201:0.999797139794
4:0.999994816642
189:0.999963239269
0:0.999996930545
422:0.999947949132
1:0.999995895834
6201:0.999797139794
40:0.99998374918
53:0.999980151462
441:0.999947089206
18:0.999990543751
3958:0.999846426221
4569:0.999832634518
4:0.999994816642
5682:0.999808610249
1:0.999995895834
6674:0.999786799284
3:0.999995071942
8:0.999993680877
15:0.999990706777
21:0.999990105555
22684:0.999465477516
4:0.999994816642
633:0.999938686066
31551:0.999292893219
2716:0.999876721582
3:0.999995071942
8:0.999993680877
21:0.999990105555
9:0.999992795195
10205:0.999714867026
24517:0.999432038166
1373:0.999912226654
4:0.999994816642
6201:0.999797139794
1090:0.999921721964
20:0.999990385593
482:0.999945727465
3:0.999995071942
8:0.999993680877
8:0.999993680877
15:0.999990706777
2669:0.999878103154
4:0.999994816642
633:0.999938686066
13106:0.99965700283
256:0.999956982781
259:0.999956737918
3814:0.999849924944
904:0.999928167147
3:0.999995071942
8:0.999993680877
8:0.999993680877
21:0.999990105555
2

4:0.999994816642
3:0.999995071942
12:0.999991065907
16:0.999990665816
20:0.999990385593
35:0.999986337599
134:0.999967709999
5:0.999994460433
663:0.999937068322
1:0.999995895834
0:0.999996930545
13113:0.99965700283
201:0.999962048721
4:0.999994816642
0:0.999996930545
3973:0.999846244799
4:0.999994816642
3:0.999995071942
12:0.999991065907
16:0.999990665816
22:0.999989984112
35:0.999986337599
108:0.999970765982
29:0.999987361299
45:0.999981369334
4247:0.999839871846
155:0.999965562255
3350:0.999860923188
4:0.999994816642
0:0.999996930545
2182:0.99989063034
276:0.999955964652
4822:0.999826967865
19:0.999990444067
0:0.999996930545
2880:0.999872691495
17:0.999990584747
737:0.99993380241
24:0.999989531522
105:0.999970833207
16893:0.999581146092
2:0.999995100799
1869:0.999898674088
738:0.999933744316
1573:0.999906464411
2120:0.999891915575
5:0.999994460433
4919:0.999825125646
23:0.999989772624
5998:0.999801970491
14345:0.999632392689
765:0.999932671762
27:0.99998828553
3:0.999995071942
12:0.9

28389:0.999354502776
364:0.999950763404
161:0.999965097634
23:0.999989772624
50:0.999980471953
5162:0.999819812507
1027:0.999924102907
48:0.999981135433
2587:0.999880134175
3178:0.999865526275
222:0.999959961545
0:0.999996930545
201:0.999962048721
1598:0.99990580279
17719:0.999565627757
56:0.99997958631
48:0.999981135433
533:0.999942861807
39374:0.999122941981
1:0.999995895834
58331:0.998709005551
9916:0.999720491503
50608:0.998881966011
84:0.999973809064
0:0.999996930545
1229:0.999917011734
1573:0.999906464411
17:0.999990584747
4949:0.999824588396
0:0.999996930545
2219:0.999889702901
43380:0.999046537411
18:0.999990543751
6829:0.999783324305
6:0.999994377904
7784:0.99976364027
0:0.999996930545
183:0.9999635386
1:0.999995895834
246:0.999958055699
1258:0.999916081864
6083:0.9998
5:0.999994460433
176:0.99996395782
3317:0.999861855259
1496:0.999908598584
159:0.999965297854
139:0.999967362757
29:0.999987361299
214:0.999960573046
6:0.999994377904
0:0.999996930545
562:0.999941409156
18:0.999

2:0.999995100799
428:0.999947657608
3:0.999995071942
3:0.999995071942
12:0.999991065907
21:0.999990105555
16:0.999990665816
1573:0.999906464411
108:0.999970765982
0:0.999996930545
7785:0.99976364027
1:0.999995895834
3350:0.999860923188
5:0.999994460433
1592:0.999905844553
1:0.999995895834
0:0.999996930545
83:0.999973854757
1573:0.999906464411
4035:0.999844394489
0:0.999996930545
1032:0.999923883361
1:0.999995895834
29:0.999987361299
9134:0.999735557057
28:0.999987956614
1130:0.999920444272
10694:0.999705116088
77:0.999974791963
29:0.999987361299
749:0.999933318514
493:0.999945101332
2:0.999995100799
29:0.999987361299
116:0.999969809042
493:0.999945101332
24:0.999989531522
9049:0.999737387134
4:0.999994816642
30:0.999987292299
3:0.999995071942
12:0.999991065907
21:0.999990105555
20:0.999990385593
820:0.999930878145
6:0.999994377904
1697:0.999903008569
149:0.999966238173
19219:0.999538734396
1573:0.999906464411
10:0.999992610876
92:0.999972251112
3186:0.999865282442
23:0.999989772624
176

6:0.999994377904
1268:0.999915754928
2:0.999995100799
7423:0.999771185619
29:0.999987361299
2605:0.999879701288
1229:0.999917011734
33:0.99998648994
967:0.999926339047
10633:0.99970638989
8105:0.999757464375
18:0.999990543751
1329:0.999913805914
0:0.999996930545
336:0.999952424086
1:0.999995895834
0:0.999996930545
15462:0.999610750528
18:0.999990543751
3216:0.999864168544
6:0.999994377904
0:0.999996930545
2605:0.999879701288
77:0.999974791963
0:0.999996930545
427:0.999947679105
2:0.999995100799
267:0.999956327297
2037:0.999894237614
21347:0.999493630316
14:0.999990743518
5982:0.999802357646
24301:0.999432038166
1573:0.999906464411
11:0.999991290011
215:0.999960530074
21347:0.999493630316
14:0.999990743518
13375:0.999650784852
19:0.999990444067
75:0.999975120221
1036:0.999923839222
39:0.999984004811
1262:0.99991593373
6:0.999994377904
4116:0.999842670806
932:0.999927433276
74:0.999975913053
121:0.999969228584
1573:0.999906464411
6647:0.999787282185
24:0.999989531522
2358:0.99988566761
4

9:0.999992795195
21:0.999990105555
3:0.999995071942
8:0.999993680877
7:0.999993856759
3:0.999995071942
1648:0.999904392637
12102:0.999677251388
1227:0.999917182667
2:0.999995100799
2381:0.999885140903
1573:0.999906464411
17:0.999990584747
86:0.999973681476
13:0.999990811563
15463:0.999610750528
29:0.999987361299
3178:0.999865526275
2:0.999995100799
200:0.999962122299
6785:0.999784334454
6:0.999994377904
157:0.999965366602
1839:0.999899292563
4:0.999994816642
29:0.999987361299
2381:0.999885140903
37:0.999984930083
66:0.999977332944
215:0.999960530074
35:0.999986337599
89:0.999973284369
174:0.9999641252
200:0.999962122299
3384:0.999859971992
4:0.999994816642
3411:0.999859419611
1366:0.999912361586
200:0.999962122299
1255:0.999916229218
41:0.999983182705
35:0.999986337599
1967:0.999895912363
35:0.999986337599
17:0.999990584747
15463:0.999610750528
0:0.999996930545
298:0.999954833822
6949:0.999780735495
397:0.999949068343
13:0.999990811563
0:0.999996930545
384:0.999949760794
1:0.9999958958

7:0.999993856759
7:0.999993856759
15:0.999990706777
16270:0.99959175171
1080:0.999922197244
26:0.999988322569
10:0.999992610876
147:0.999966406239
109:0.999970558932
296:0.99995489818
2757:0.99987558185
1573:0.999906464411
30228:0.999309934441
204:0.999961833729
18:0.999990543751
3605:0.999855209252
121:0.999969228584
271:0.99995621441
7:0.999993856759
12:0.999991065907
7:0.999993856759
9:0.999992795195
12:0.999991065907
16:0.999990665816
12:0.999991065907
22:0.999989984112
9:0.999992795195
16:0.999990665816
3:0.999995071942
8:0.999993680877
8:0.999993680877
8:0.999993680877
2757:0.99987558185
1573:0.999906464411
5:0.999994460433
95:0.999971853869
3:0.999995071942
12:0.999991065907
8:0.999993680877
7:0.999993856759
18:0.999990543751
144:0.999966584885
10049:0.999717157288
754:0.999933124953
34:0.999986463726
1101:0.99992143258
3:0.999995071942
2:0.999995100799
1101:0.99992143258
9:0.999992795195
3:0.999995071942
7:0.999993856759
3638:0.999854289937
4:0.999994816642
49:0.999980474187
29

289:0.999955323295
602:0.999939785253
11:0.999991290011
195:0.999962518151
1005:0.999924877827
26:0.999988322569
0:0.999996930545
1532:0.999907510433
1:0.999995895834
5219:0.999818630937
2:0.999995100799
3617:0.99985490475
737:0.99993380241
9:0.999992795195
1:0.999995895834
0:0.999996930545
51:0.999980221466
240:0.99995863033
931:0.999927452375
1:0.999995895834
450:0.99994672252
602:0.999939785253
3617:0.99985490475
6187:0.999797139794
735:0.999933903707
1510:0.999908059338
27:0.99998828553
0:0.999996930545
6057:0.999800398804
2:0.999995100799
4:0.999994816642
173:0.999964171282
452:0.999946692248
73:0.999976010975
31:0.999987225701
16126:0.999595111835
227:0.999959662988
0:0.999996930545
173:0.999964171282
29244:0.999340619527
67:0.999977181792
8349:0.999751548002
2:0.999995100799
13765:0.999644215967
35:0.999986337599
36:0.999984985391
2019:0.999894707671
5:0.999994460433
24929:0.999422649731
1:0.999995895834
18348:0.999552786405
4:0.999994816642
0:0.999996930545
1755:0.99990137127
1

1595:0.999905844553
4:0.999994816642
3617:0.99985490475
14:0.999990743518
64:0.999977866137
2:0.999995100799
89:0.999973284369
602:0.999939785253
17:0.999990584747
540:0.999942674345
6:0.999994377904
13153:0.99965496722
49:0.999980474187
19394:0.999533747596
1:0.999995895834
2270:0.999888056584
6330:0.999794587992
4:0.999994816642
0:0.999996930545
804:0.999931140842
723:0.999934501876
1:0.999995895834
0:0.999996930545
247:0.999958033543
35:0.999986337599
835:0.9999304111
602:0.999939785253
23:0.999989772624
4314:0.99983841516
33:0.99998648994
35:0.999986337599
36:0.999984985391
88:0.99997341685
261:0.999956620032
539:0.999942674345
76:0.999974890694
19:0.999990444067
29:0.999987361299
89:0.999973284369
1:0.999995895834
0:0.999996930545
194:0.999962781276
261:0.999956620032
4357:0.999837564707
5:0.999994460433
172:0.999964187369
511:0.999944211506
69:0.999976941012
19:0.999990444067
33:0.99998648994
10:0.999992610876
1923:0.999897185485
18:0.999990543751
0:0.999996930545
683:0.999936190

23:0.999989772624
0:0.999996930545
4872:0.9998261855
1:0.999995895834
383:0.99994977347
3:0.999995071942
1797:0.999900545504
369:0.999950643606
258:0.999956746013
41:0.999983182705
23:0.999989772624
0:0.999996930545
358:0.999951164283
1:0.999995895834
166:0.999964495674
1797:0.999900545504
37:0.999984930083
1176:0.999918835874
1025:0.999924190196
258:0.999956746013
13:0.999990811563
135:0.999967622108
5:0.999994460433
397:0.999949068343
6180:0.999797555917
11106:0.999697108734
19:0.999990444067
397:0.999949068343
68:0.999977127541
734:0.99993391814
166:0.999964495674
397:0.999949068343
5010:0.999823498873
6:0.999994377904
21765:0.999487010824
5:0.999994460433
2679:0.999877739282
1705:0.999902825566
76:0.999974890694
87:0.999973665967
0:0.999996930545
397:0.999949068343
3627:0.999854598318
0:0.999996930545
11106:0.999697108734
1298:0.999914719713
0:0.999996930545
397:0.999949068343
29925:0.999325800138
37:0.999984930083
13:0.999990811563
0:0.999996930545
1119:0.999920918342
1:0.99999589

16:0.999990665816
21:0.999990105555
5:0.999994460433
401:0.999948734798
658:0.999937242064
12832:0.999662900069
28:0.999987956614
27080:0.999379826327
24:0.999989531522
723:0.999934501876
2:0.999995100799
0:0.999996930545
18324:0.999552786405
28:0.999987956614
164:0.99996464908
50174:0.998881966011
1331:0.999913741805
24:0.999989531522
889:0.999928516701
2:0.999995100799
28:0.999987956614
164:0.99996464908
1331:0.999913741805
24:0.999989531522
4494:0.999834478822
2:0.999995100799
59320:0.998709005551
28:0.999987956614
164:0.99996464908
1331:0.999913741805
24:0.999989531522
5475:0.999813661002
28:0.999987956614
164:0.99996464908
24:0.999989531522
53242:0.998881966011
18:0.999990543751
5475:0.999813661002
28:0.999987956614
164:0.99996464908
513:0.999944150635
24:0.999989531522
16433:0.99959175171
2:0.999995100799
41057:0.999087129071
1:0.999995895834
128:0.999968527929
28:0.999987956614
164:0.99996464908
1331:0.999913741805
26514:0.999391419381
24:0.999989531522
2495:0.999882393628
2:0.9

8:0.999993680877
9:0.999992795195
12:0.999991065907
6:0.999994377904
390:0.999949505252
298:0.999954833822
297:0.999954870632
9680:0.999724759059
3:0.999995071942
8:0.999993680877
20:0.999990385593
7:0.999993856759
6:0.999994377904
390:0.999949505252
2844:0.999873508894
2036:0.999894237614
298:0.999954833822
4176:0.999841289834
924:0.999927585647
3:0.999995071942
8:0.999993680877
16:0.999990665816
16:0.999990665816
6:0.999994377904
3:0.999995071942
8:0.999993680877
16:0.999990665816
22:0.999989984112
298:0.999954833822
1140:0.999920063923
1242:0.999916666667
3:0.999995071942
8:0.999993680877
16:0.999990665816
15:0.999990706777
6:0.999994377904
3:0.999995071942
8:0.999993680877
16:0.999990665816
22:0.999989984112
298:0.999954833822
1071:0.999922571298
768:0.999932487889
3:0.999995071942
8:0.999993680877
9:0.999992795195
12:0.999991065907
66:0.999977332944
298:0.999954833822
2501:0.99988214887
4175:0.999841289834
28:0.999987956614
1170:0.999919075415
298:0.999954833822
333:0.999952504645

11782:0.999683772234
10:0.999992610876
32087:0.99927452375
24:0.999989531522
0:0.999996930545
4471:0.99983493045
19:0.999990444067
275:0.999955977455
25:0.999988568778
50:0.999980471953
240:0.99995863033
69:0.999976941012
3:0.999995071942
14:0.999990743518
567:0.999941064202
2:0.999995100799
19:0.999990444067
54:0.999979760983
5:0.999994460433
493:0.999945101332
10:0.999992610876
21245:0.999493630316
2:0.999995100799
34072:0.999233035011
35:0.999986337599
36:0.999984985391
903:0.999928167147
19:0.999990444067
26:0.999988322569
17:0.999990584747
147:0.999966406239
5528:0.999812353344
13:0.999990811563
82:0.999973914424
4:0.999994816642
284:0.999955693185
6:0.999994377904
1135:0.999920242097
0:0.999996930545
4499:0.999834478822
1:0.999995895834
3:0.999995071942
14:0.999990743518
199:0.999962295011
2258:0.99988861352
13:0.999990811563
275:0.999955977455
11:0.999991290011
17158:0.999577422873
191:0.999963159528
60:0.999978242868
0:0.999996930545
4896:0.999825657989
1:0.999995895834
1169:0.

135:0.999967622108
1:0.999995895834
0:0.999996930545
378:0.999950043693
1207:0.999917661303
1:0.999995895834
41882:0.999087129071
200:0.999962122299
501:0.999944758969
10:0.999992610876
26272:0.999391419381
24352:0.999432038166
56:0.99997958631
4:0.999994816642
389:0.999949665421
187:0.999963325896
4308:0.99983841516
242:0.999958398285
75:0.999975120221
3546:0.999856703521
13446:0.999650784852
55:0.999979628288
42:0.999982440731
1777:0.999900985246
3588:0.999855662433
6:0.999994377904
29:0.999987361299
2786:0.999874705997
370:0.999950480484
54:0.999979760983
11:0.999991290011
6:0.999994377904
4936:0.999824857639
0:0.999996930545
10249:0.999713700833
61085:0.998709005551
0:0.999996930545
6506:0.999790573046
7617:0.999766873798
0:0.999996930545
10501:0.999708888745
1734:0.999901989042
13:0.999990811563
0:0.999996930545
6846:0.999783324305
2350:0.999885891134
0:0.999996930545
5315:0.999816814174
0:0.999996930545
2700:0.999877187312
2:0.999995100799
59704:0.998709005551
0:0.999996930545
12

573:0.999940713083
17:0.999990584747
446:0.999946888095
168:0.999964412586
215:0.999960530074
190:0.999963184502
4285:0.999839044305
30:0.999987292299
16852:0.999581146092
13:0.999990811563
47:0.999981155876
1:0.999995895834
29:0.999987361299
948:0.999926872758
2:0.999995100799
3923:0.99984714554
4457:0.999835154882
4:0.999994816642
0:0.999996930545
1115:0.99992101707
6:0.999994377904
0:0.999996930545
9:0.999992795195
15:0.999990706777
90:0.999972661672
874:0.999929111879
1:0.999995895834
0:0.999996930545
33226:0.999254644008
0:0.999996930545
3929:0.999846966658
2363:0.99988559281
40:0.99998374918
16004:0.999598390336
13:0.999990811563
398:0.999948988886
110:0.999970557656
1308:0.999914376532
573:0.999940713083
18:0.999990543751
490:0.999945380465
10:0.999992610876
3617:0.99985490475
364:0.999950763404
58106:0.998804771391
1263:0.999915874256
141:0.999967183999
6489:0.999790573046
43486:0.999046537411
12301:0.999672087082
1:0.999995895834
110:0.999970557656
948:0.999926872758
1591:0.99

5:0.999994460433
2120:0.999891915575
2270:0.999888056584
7850:0.999762308656
23:0.999989772624
3617:0.99985490475
2:0.999995100799
144:0.999966584885
7527:0.999768751355
2:0.999995100799
50:0.999980471953
292:0.999955184429
23:0.999989772624
0:0.999996930545
8641:0.999745999746
1:0.999995895834
0:0.999996930545
257:0.99995676623
1:0.999995895834
4686:0.9998299949
2:0.999995100799
0:0.999996930545
257:0.99995676623
1:0.999995895834
890:0.999928461847
168:0.999964412586
1952:0.999896137175
9429:0.999729828387
23:0.999989772624
3651:0.999853979585
24:0.999989531522
691:0.999935876353
1913:0.999897509992
1839:0.999899292563
1:0.999995895834
111:0.999970416273
149:0.999966238173
15952:0.999598390336
11075:0.999697108734
2:0.999995100799
746:0.999933407284
10436:0.999710114482
168:0.999964412586
1397:0.999911646159
36:0.999984985391
6:0.999994377904
38:0.999984131381
5770:0.999806833148
0:0.999996930545
63:0.999977901294
21069:0.9995
3222:0.999864043064
76:0.999974890694
168:0.999964412586
1

17:0.999990584747
6413:0.999792832302
18:0.999990543751
2358:0.99988566761
917:0.999927793599
445:0.999946895584
402:0.999948579152
462:0.999946107108
28:0.999987956614
225:0.999959787112
445:0.999946895584
11:0.999991290011
168:0.999964412586
215:0.999960530074
37:0.999984930083
693:0.999935504002
19:0.999990444067
225:0.999959787112
445:0.999946895584
39:0.999984004811
1486:0.999908826802
27:0.99998828553
917:0.999927793599
445:0.999946895584
94:0.999971963596
26:0.999988322569
215:0.999960530074
37:0.999984930083
5218:0.999818630937
30:0.999987292299
14747:0.999624706687
1:0.999995895834
0:0.999996930545
5410:0.999814941697
1:0.999995895834
0:0.999996930545
9939:0.999719393233
398:0.999948988886
4:0.999994816642
314:0.999953804587
1:0.999995895834
32:0.999986962534
46:0.999981285677
9715:0.999723710518
5444:0.999814304662
14:0.999990743518
1255:0.999916229218
6:0.999994377904
47:0.999981155876
21606:0.999487010824
219:0.999960353654
37:0.999984930083
14535:0.999627322004
24:0.999989

3:0.999995071942
8:0.999993680877
20:0.999990385593
15:0.999990706777
41791:0.999087129071
3:0.999995071942
8:0.999993680877
20:0.999990385593
22:0.999989984112
1179:0.999918755554
2589:0.999880047971
3:0.999995071942
8:0.999993680877
20:0.999990385593
22:0.999989984112
2925:0.999871752706
16190:0.999595111835
3:0.999995071942
8:0.999993680877
20:0.999990385593
22:0.999989984112
36:0.999984985391
4556:0.999832868432
1955:0.999896024951
12534:0.999668503228
1:0.999995895834
30462:0.999309934441
29215:0.999340619527
3:0.999995071942
8:0.999993680877
20:0.999990385593
8:0.999993680877
7005:0.999780206509
13106:0.99965700283
3:0.999995071942
8:0.999993680877
15:0.999990706777
9:0.999992795195
2522:0.999881571771
53465:0.998881966011
3:0.999995071942
8:0.999993680877
15:0.999990706777
20:0.999990385593
6232:0.999796299789
3:0.999995071942
8:0.999993680877
15:0.999990706777
20:0.999990385593
4029:0.999844582532
902:0.999928204184
1:0.999995895834
7005:0.999780206509
3:0.999995071942
8:0.9999

8:0.999993680877
7:0.999993856759
14:0.999990743518
42:0.999982440731
653:0.999937426629
2254:0.99988875146
25:0.999988568778
5207:0.999818928508
21757:0.999487010824
2:0.999995100799
3561:0.999856260106
1180:0.999918755554
807:0.999931124511
54:0.999979760983
11:0.999991290011
8888:0.99974093612
2:0.999995100799
14897:0.999622035527
3102:0.999867430424
15493:0.999610750528
13295:0.999652894933
10504:0.999708888745
24138:0.999440983006
5052:0.999822668274
5441:0.999814304662
5907:0.999803883865
26774:0.999379826327
2:0.999995100799
10642:0.999705116088
13:0.999990811563
0:0.999996930545
1128:0.999920619919
2:0.999995100799
5702:0.999808258753
6019:0.999801581052
11563:0.999688411524
2:0.999995100799
8228:0.999754559653
15376:0.999610750528
5907:0.999803883865
10602:0.99970638989
895:0.999928296526
1288:0.99991499745
6208:0.999797139794
7237:0.999775266713
389:0.999949665421
21209:0.999493630316
5859:0.999804633834
10839:0.999702517941
37808:0.999154845745
2:0.999995100799
6045:0.999800

57:0.999979489067
2758:0.99987558185
2963:0.99987068485
1675:0.999903596264
17:0.999990584747
5163:0.999819812507
1565:0.99990662765
2:0.999995100799
601:0.999939796166
5:0.999994460433
18154:0.999557192557
4:0.999994816642
2347:0.999886039424
253:0.999957674209
17:0.999990584747
1420:0.999910806674
7707:0.999764949753
5254:0.999818031369
0:0.999996930545
4734:0.999829003608
549:0.999942284208
27:0.99998828553
293:0.999955161909
69:0.999976941012
3:0.999995071942
7:0.999993856759
6:0.999994377904
79:0.999974284476
21:0.999990105555
7:0.999993856759
4720:0.999829003608
4:0.999994816642
11739:0.999683772234
280:0.999955753945
6:0.999994377904
5:0.999994460433
3448:0.99985843701
955:0.999926676442
4:0.999994816642
0:0.999996930545
38846:0.999154845745
22:0.999989984112
12:0.999991065907
382:0.99994986193
513:0.999944150635
729:0.999934104914
33:0.99998648994
48:0.999981135433
9:0.999992795195
3691:0.999852877528
5:0.999994460433
147:0.999966406239
14986:0.999619306506
217:0.999960471529
2

1:0.999995895834
29:0.999987361299
5478:0.999813336652
96:0.999971783368
263:0.999956431048
0:0.999996930545
1429:0.999910557281
1:0.999995895834
13533:0.999648635816
13533:0.999648635816
997:0.999925172838
4:0.999994816642
2621:0.999879351582
3:0.999995071942
3:0.999995071942
9:0.999992795195
3:0.999995071942
3:0.999995071942
16:0.999990665816
3:0.999995071942
3:0.999995071942
20:0.999990385593
3:0.999995071942
16:0.999990665816
9:0.999992795195
3:0.999995071942
16:0.999990665816
16:0.999990665816
3:0.999995071942
20:0.999990385593
3:0.999995071942
3:0.999995071942
20:0.999990385593
15:0.999990706777
3:0.999995071942
20:0.999990385593
21:0.999990105555
3:0.999995071942
20:0.999990385593
22:0.999989984112
3:0.999995071942
20:0.999990385593
12:0.999991065907
3:0.999995071942
15:0.999990706777
3:0.999995071942
3:0.999995071942
15:0.999990706777
9:0.999992795195
2:0.999995100799
3:0.999995071942
21:0.999990105555
3:0.999995071942
3854:0.999848727745
12810:0.999662900069
0:0.999996930545
4

15714:0.999604715292
328:0.999952706913
62:0.999977905071
0:0.999996930545
197:0.99996249707
195:0.999962518151
2515:0.999881820135
19:0.999990444067
144:0.999966584885
15345:0.999613666295
10:0.999992610876
0:0.999996930545
242:0.999958398285
56:0.99997958631
3735:0.999851586956
0:0.999996930545
3052:0.99986846659
1:0.999995895834
0:0.999996930545
70:0.999976396312
2:0.999995100799
0:0.999996930545
663:0.999937068322
1:0.999995895834
0:0.999996930545
22143:0.999480124755
35:0.999986337599
10:0.999992610876
36:0.999984985391
0:0.999996930545
132:0.999968013713
368:0.999950697623
11:0.999991290011
0:0.999996930545
3622:0.999854751777
5476:0.999813336652
0:0.999996930545
386:0.999949716354
1:0.999995895834
30:0.999987292299
2831:0.999873710795
13543:0.999648635816
18663:0.999548246049
15345:0.999613666295
224:0.999959787112
290:0.999955309913
34:0.999986463726
257:0.99995676623
3:0.999995071942
9:0.999992795195
2:0.999995100799
291:0.999955296518
353:0.999951412783
34:0.999986463726
4216

1076:0.999922314718
141:0.999967183999
62762:0.998709005551
15717:0.999604715292
362:0.999951000404
4700:0.999829748694
37:0.999984930083
1076:0.999922314718
17617:0.999569668517
46358:0.999
15717:0.999604715292
997:0.999925172838
4:0.999994816642
978:0.999925732108
3:0.999995071942
21:0.999990105555
3:0.999995071942
12651:0.999664799238
14:0.999990743518
729:0.999934104914
322:0.999953124428
14192:0.999634851628
168:0.999964412586
10:0.999992610876
5:0.999994460433
56:0.99997958631
1596:0.999905844553
23:0.999989772624
110:0.999970557656
386:0.999949716354
7277:0.999774123024
12651:0.999664799238
34:0.999986463726
29:0.999987361299
290:0.999955309913
4:0.999994816642
3768:0.99985076289
168:0.999964412586
10:0.999992610876
967:0.999926339047
4:0.999994816642
205:0.999961612213
160:0.999965171802
6335:0.999794587992
139:0.999967362757
2:0.999995100799
12651:0.999664799238
5331:0.999816506039
168:0.999964412586
52:0.999980204423
168:0.999964412586
21166:0.999493630316
118:0.999969327174


6:0.999994377904
10772:0.999703825561
13190:0.99965496722
46:0.999981285677
685:0.999936125378
11788:0.999683772234
151:0.999966096825
46:0.999981285677
52:0.999980204423
6751:0.999785330605
12997:0.99965900283
13533:0.999648635816
10772:0.999703825561
6:0.999994377904
3173:0.999865647696
6:0.999994377904
1133:0.999920393872
2:0.999995100799
3794:0.999850261813
0:0.999996930545
330:0.999952584797
6:0.999994377904
688:0.99993598156
26:0.999988322569
726:0.999934247513
430:0.999947412105
5452:0.999813983667
8424:0.99975
10:0.999992610876
1491:0.999908712907
4:0.999994816642
978:0.999925732108
3:0.999995071942
22:0.999989984112
3:0.999995071942
10772:0.999703825561
13190:0.99965496722
0:0.999996930545
1825:0.999899849662
3850:0.999848900529
533:0.999942861807
18:0.999990543751
13533:0.999648635816
10772:0.999703825561
110:0.999970557656
5392:0.999815257767
6:0.999994377904
0:0.999996930545
3850:0.999848900529
4236:0.999840076745
3:0.999995071942
1:0.999995895834
0:0.999996930545
197:0.999

61:0.999977979657
35:0.999986337599
1286:0.999915089427
26:0.999988322569
6:0.999994377904
16555:0.999588306515
12651:0.999664799238
11:0.999991290011
5:0.999994460433
390:0.999949505252
4:0.999994816642
978:0.999925732108
3:0.999995071942
9:0.999992795195
3:0.999995071942
168:0.999964412586
1591:0.999905886261
26:0.999988322569
14:0.999990743518
1447:0.999910197349
11:0.999991290011
3728:0.999851750137
11:0.999991290011
5:0.999994460433
1887:0.999898149895
1:0.999995895834
3850:0.999848900529
14665:0.999624706687
76:0.999974890694
16555:0.999588306515
1447:0.999910197349
54328:0.998804771391
0:0.999996930545
3878:0.999848205815
1:0.999995895834
0:0.999996930545
4823:0.999826967865
110:0.999970557656
14495:0.999629883395
10:0.999992610876
110:0.999970557656
214:0.999960573046
1:0.999995895834
44665:0.999
110:0.999970557656
2466:0.999882958853
14:0.999990743518
26773:0.999379826327
4:0.999994816642
978:0.999925732108
3:0.999995071942
21:0.999990105555
3:0.999995071942
16555:0.9995883065

26:0.999988322569
6:0.999994377904
16555:0.999588306515
12651:0.999664799238
26:0.999988322569
10:0.999992610876
1491:0.999908712907
4:0.999994816642
978:0.999925732108
3:0.999995071942
21:0.999990105555
3:0.999995071942
0:0.999996930545
19685:0.999528595479
67:0.999977181792
25536:0.99941277978
63420:0.998709005551
0:0.999996930545
31629:0.999292893219
10:0.999992610876
5:0.999994460433
298:0.999954833822
2642:0.999878821257
840:0.99993022501
19:0.999990444067
13928:0.999639625015
0:0.999996930545
1378:0.999912192824
1:0.999995895834
0:0.999996930545
253:0.999957674209
0:0.999996930545
31629:0.999292893219
10:0.999992610876
10:0.999992610876
5:0.999994460433
16837:0.999581146092
506:0.999944512905
1:0.999995895834
5:0.999994460433
7325:0.999772961695
14:0.999990743518
14578:0.999627322004
5:0.999994460433
24391:0.999432038166
3798:0.999850261813
1:0.999995895834
242:0.999958398285
14:0.999990743518
30949:0.999309934441
0:0.999996930545
197:0.99996249707
10:0.999992610876
1491:0.999908

5:0.999994460433
82:0.999973914424
14:0.999990743518
93:0.999972162013
11:0.999991290011
109:0.999970558932
728:0.999934119215
4:0.999994816642
978:0.999925732108
3:0.999995071942
15:0.999990706777
9:0.999992795195
2197:0.999890301803
3550:0.999856556172
13533:0.999648635816
35:0.999986337599
322:0.999953124428
0:0.999996930545
1026:0.999924124758
8733:0.999743505412
5089:0.999821542347
60:0.999978242868
29:0.999987361299
5426:0.999814624001
8733:0.999743505412
74:0.999975913053
22029:0.999480124755
5:0.999994460433
242:0.999958398285
46:0.999981285677
105:0.999970833207
2204:0.999890036831
4627:0.999831210105
32:0.999986962534
6:0.999994377904
13533:0.999648635816
10:0.999992610876
22013:0.999480124755
0:0.999996930545
66:0.999977332944
1190:0.999918241257
1:0.999995895834
22013:0.999480124755
168:0.999964412586
15397:0.999610750528
345:0.999951814922
1503:0.99990833015
5444:0.999814304662
10:0.999992610876
243:0.99995833695
88:0.99997341685
30:0.999987292299
39613:0.999122941981
32:0

110:0.999970557656
850:0.999929865564
6:0.999994377904
1360:0.999912562823
0:0.999996930545
330:0.999952584797
3947:0.999846607002
619:0.999939321461
0:0.999996930545
375:0.999950260451
19:0.999990444067
371:0.999950413562
10:0.999992610876
3516:0.999857142857
882:0.999928771026
21159:0.999493630316
11782:0.999683772234
33770:0.999254644008
49:0.999980474187
584:0.999940302497
0:0.999996930545
840:0.99993022501
82:0.999973914424
6890:0.999782299828
5:0.999994460433
13533:0.999648635816
1875:0.999898569897
53026:0.998881966011
56:0.99997958631
10:0.999992610876
144:0.999966584885
15345:0.999613666295
26:0.999988322569
10:0.999992610876
5:0.999994460433
6867:0.999782813879
1:0.999995895834
43:0.999982189085
55775:0.998804771391
11:0.999991290011
0:0.999996930545
1976:0.999895686085
362:0.999951000404
960:0.999926597473
154:0.99996568211
3780:0.999850596424
240:0.99995863033
1829:0.999899496218
124:0.999968889442
46:0.999981285677
960:0.999926597473
154:0.99996568211
38:0.999984131381
546

58:0.99997902831
29:0.999987361299
1839:0.999899292563
34:0.999986463726
1765:0.99990122704
126:0.999968681108
2:0.999995100799
0:0.999996930545
63:0.999977901294
926:0.999927528621
1:0.999995895834
389:0.999949665421
95:0.999971853869
6:0.999994377904
2298:0.999887205291
2:0.999995100799
1692:0.999903190542
1192:0.999918213918
4088:0.999843251716
1:0.999995895834
918:0.999927755923
37742:0.999154845745
4355:0.999837564707
17:0.999990584747
4227:0.999840280859
1022:0.999924255465
2:0.999995100799
10797:0.999702517941
1:0.999995895834
459:0.999946177408
12842:0.999662900069
18:0.999990543751
0:0.999996930545
145:0.999966532528
14:0.999990743518
113:0.999970282023
28:0.999987956614
4571:0.999832399619
16031:0.999598390336
26:0.999988322569
17:0.999990584747
36:0.999984985391
20489:0.999512049964
5496:0.999813010602
2:0.999995100799
8874:0.99974093612
1:0.999995895834
79:0.999974284476
15122:0.999616517506
2:0.999995100799
1375:0.999912226654
6:0.999994377904
2685:0.99987755612
1348:0.999

119:0.999969324287
1:0.999995895834
5:0.999994460433
50:0.999980471953
152:0.999966016655
4474:0.999834705099
1:0.999995895834
18856:0.999543564535
19:0.999990444067
17:0.999990584747
234:0.999959062437
44430:0.999046537411
739:0.999933700646
8135:0.999756747872
38:0.999984131381
2799:0.999874607534
6:0.999994377904
1897:0.999897937927
1553:0.999906951579
6:0.999994377904
13744:0.999644215967
1210:0.99991754943
2:0.999995100799
10218:0.999713700833
2663:0.999878193616
445:0.999946895584
2:0.999995100799
0:0.999996930545
4355:0.999837564707
1:0.999995895834
284:0.999955693185
821:0.999930828554
1:0.999995895834
4355:0.999837564707
4355:0.999837564707
14:0.999990743518
520:0.999943503727
3560:0.999856408368
23:0.999989772624
1820:0.9999
40:0.99998374918
967:0.999926339047
26:0.999988322569
4:0.999994816642
821:0.999930828554
4:0.999994816642
55:0.999979628288
172:0.999964187369
1114:0.999921041695
47:0.999981155876
200:0.999962122299
639:0.999938313899
5680:0.999808610249
684:0.999936164

1873:0.999898622033
42:0.999982440731
82:0.999973914424
3607:0.999855057241
195:0.999962518151
299:0.999954815383
67:0.999977181792
153:0.999965862552
43:0.999982189085
8841:0.99974180111
28:0.999987956614
405:0.999948504209
28:0.999987956614
1743:0.999901752814
1015:0.999924450264
6:0.999994377904
114:0.99997023203
37:0.999984930083
153:0.999965862552
46:0.999981285677
73:0.999976010975
1015:0.999924450264
6:0.999994377904
250:0.999957937128
511:0.999944211506
10:0.999992610876
1516:0.999907903506
6:0.999994377904
0:0.999996930545
400:0.999948761723
19:0.999990444067
4236:0.999840076745
19:0.999990444067
511:0.999944211506
32:0.999986962534
10:0.999992610876
37:0.999984930083
6:0.999994377904
788:0.99993162543
19:0.999990444067
49:0.999980474187
763:0.999932823848
25:0.999988568778
551:0.99994192521
28:0.999987956614
138:0.999967404395
19:0.999990444067
49:0.999980474187
763:0.999932823848
25:0.999988568778
384:0.999949760794
813:0.999931009868
16913:0.999581146092
270:0.999956235381


678:0.999936294101
569:0.999940899749
10:0.999992610876
16815:0.999584772601
6:0.999994377904
5:0.999994460433
2271:0.999888056584
1926:0.999897185485
1:0.999995895834
371:0.999950413562
219:0.999960353654
26:0.999988322569
10:0.999992610876
575:0.99994055617
6:0.999994377904
44:0.99998160938
199:0.999962295011
29454:0.999325800138
54:0.999979760983
11:0.999991290011
8779:0.999742657494
5136:0.999820394698
2:0.999995100799
3569:0.999856111384
5902:0.999803883865
4:0.999994816642
535:0.999942740167
6:0.999994377904
43:0.999982189085
683:0.99993619043
1614:0.999905297255
4480:0.999834705099
772:0.999932318008
243:0.99995833695
38:0.999984131381
200:0.999962122299
3878:0.999848205815
6:0.999994377904
3125:0.999866961979
1:0.999995895834
0:0.999996930545
82:0.999973914424
56:0.99997958631
458:0.999946216345
114:0.99997023203
3531:0.999856850416
552:0.999941915414
6:0.999994377904
8780:0.999742657494
28:0.999987956614
684:0.999936164433
611:0.999939543646
6983:0.999780206509
2545:0.99988098

0:0.999996930545
4480:0.999834705099
569:0.999940899749
99:0.999971541024
46:0.999981285677
38:0.999984131381
108:0.999970765982
2159:0.999891214341
953:0.999926715832
1:0.999995895834
43:0.999982189085
844:0.999930122876
2:0.999995100799
28983:0.999340619527
27:0.99998828553
5254:0.999818031369
6659:0.999787282185
27:0.99998828553
43:0.999982189085
4480:0.999834705099
1287:0.999915028142
4:0.999994816642
0:0.999996930545
531:0.999943010459
907:0.999928111485
38:0.999984131381
53:0.999980151462
23692:0.999449518117
13:0.999990811563
3560:0.999856408368
4:0.999994816642
0:0.999996930545
5420:0.999814624001
13360:0.999650784852
2996:0.999869700619
1316:0.999914219165
3292:0.999862379529
4:0.999994816642
3752:0.999851258971
6415:0.9997923863
553:0.999941905613
409:0.999948428938
10140:0.999716019083
27379:0.999367544468
13487:0.999648635816
4480:0.999834705099
13487:0.999648635816
2399:0.999884760196
138:0.999967404395
61:0.999977979657
0:0.999996930545
341:0.999952064688
17:0.99999058474

29:0.999987361299
22162:0.999472953723
1:0.999995895834
0:0.999996930545
1772:0.999901130535
3:0.999995071942
8:0.999993680877
21:0.999990105555
12:0.999991065907
498:0.999944976817
23:0.999989772624
55:0.999979628288
1769:0.999901178823
3252:0.999863410409
86:0.999973681476
311:0.99995399076
60113:0.998709005551
5:0.999994460433
208:0.999961086843
1:0.999995895834
450:0.99994672252
8377:0.999750777607
77:0.999974791963
225:0.999959787112
14400:0.999629883395
24:0.999989531522
329:0.999952632694
2:0.999995100799
50:0.999980471953
15271:0.999613666295
1275:0.999915544878
8386:0.999750777607
31961:0.99927452375
32:0.999986962534
208:0.999961086843
86:0.999973681476
11:0.999991290011
208:0.999961086843
10:0.999992610876
37:0.999984930083
3418:0.999859140958
74:0.999975913053
14:0.999990743518
1344:0.999913289003
215:0.999960530074
0:0.999996930545
695:0.999935423365
9231:0.999733688179
23:0.999989772624
118:0.999969327174
311:0.99995399076
1:0.999995895834
32:0.999986962534
543:0.99994252

141:0.999967183999
3644:0.999854135009
39:0.999984004811
34:0.999986463726
253:0.999957674209
4989:0.999824046164
77:0.999974791963
0:0.999996930545
439:0.999947155738
2:0.999995100799
926:0.999927528621
46:0.999981285677
39:0.999984004811
643:0.999938208
6:0.999994377904
1804:0.999900348173
1141:0.999920038372
453:0.999946570645
677:0.999936345748
3474:0.999858009541
754:0.999933124953
67:0.999977181792
36:0.999984985391
178:0.9999638968
1:0.999995895834
653:0.999937426629
4480:0.999834705099
5550:0.999811689106
178:0.9999638968
1:0.999995895834
4480:0.999834705099
1537:0.999907312
10421:0.999710114482
18:0.999990543751
185:0.999963438823
178:0.9999638968
1:0.999995895834
10876:0.999701192848
2559:0.999880732439
6415:0.9997923863
178:0.9999638968
1:0.999995895834
4480:0.999834705099
2392:0.999884989073
14149:0.999637261875
170:0.999964335718
158:0.999965314558
0:0.999996930545
9724:0.999723710518
1:0.999995895834
6415:0.9997923863
150:0.999966166752
6415:0.9997923863
6415:0.9997923863

33:0.99998648994
35:0.999986337599
10:0.999992610876
419:0.999948068587
67:0.999977181792
3502:0.999857433513
2264:0.999888405645
14:0.999990743518
9861:0.999721576977
732:0.999934033265
42:0.999982440731
10062:0.999717157288
15006:0.999619306506
1:0.999995895834
0:0.999996930545
289:0.999955323295
70:0.999976396312
25:0.999988568778
3111:0.999867313777
5979:0.999802357646
780:0.999932084272
746:0.999933407284
10436:0.999710114482
22732:0.999465477516
25209:0.99941277978
746:0.999933407284
16435:0.99959175171
2:0.999995100799
0:0.999996930545
2419:0.999884298311
1:0.999995895834
7305:0.999773544593
291:0.999955296518
4:0.999994816642
0:0.999996930545
3:0.999995071942
12:0.999991065907
90:0.999972661672
122:0.999969212546
23:0.999989772624
0:0.999996930545
975:0.999925936085
1:0.999995895834
173:0.999964171282
1898:0.999897937927
33:0.99998648994
893:0.99992835176
5:0.999994460433
50:0.999980471953
3395:0.999859834507
2:0.999995100799
4503:0.999834251614
3572:0.999856111384
13:0.9999908

28:0.999987956614
1913:0.999897509992
2646:0.999878821257
1:0.999995895834
14688:0.999624706687
37:0.999984930083
0:0.999996930545
19293:0.999533747596
701:0.999935301841
7305:0.999773544593
401:0.999948734798
10:0.999992610876
86:0.999973681476
4:0.999994816642
0:0.999996930545
236:0.999958697428
78:0.999974660391
0:0.999996930545
368:0.999950697623
2:0.999995100799
95:0.999971853869
1:0.999995895834
701:0.999935301841
7305:0.999773544593
30:0.999987292299
3:0.999995071942
3:0.999995071942
90:0.999972661672
122:0.999969212546
1892:0.999898044076
15669:0.999604715292
322:0.999953124428
418:0.999948110552
36925:0.999183503419
1282:0.99991515058
19:0.999990444067
46:0.999981285677
38:0.999984131381
5:0.999994460433
261:0.999956620032
295:0.999955003318
6:0.999994377904
7305:0.999773544593
33:0.99998648994
10:0.999992610876
1086:0.999921984442
8441:0.999749215069
6:0.999994377904
114:0.99997023203
33:0.99998648994
10:0.999992610876
88:0.99997341685
15694:0.999604715292
26:0.999988322569
3

KeyboardInterrupt: 

## Making batches

Now that our data is in good shape, we need to get it into the proper form to pass it into our network. With the skip-gram architecture, for each word in the text, we want to grab all the words in a window around that word, with size $C$. 

From [Mikolov et al.](https://arxiv.org/pdf/1301.3781.pdf): 

"Since the more distant words are usually less related to the current word than those close to it, we give less weight to the distant words by sampling less from those words in our training examples... If we choose $C = 5$, for each training word we will select randomly a number $R$ in range $< 1; C >$, and then use $R$ words from history and $R$ words from the future of the current word as correct labels."

> **Exercise:** Implement a function `get_target` that receives a list of words, an index, and a window size, then returns a list of words in the window around the index. Make sure to use the algorithm described above, where you choose a random number of words from the window.

In [None]:
def get_target(words, idx, window_size=5):
    ''' Get a list of words in a window around an index. '''
    
    # Your code here
    
    return

Here's a function that returns batches for our network. The idea is that it grabs `batch_size` words from a words list. Then for each of those words, it gets the target words in the window. I haven't found a way to pass in a random number of target words and get it to work with the architecture, so I make one row per input-target pair. This is a generator function by the way, helps save memory.

In [None]:
def get_batches(words, batch_size, window_size=5):
    ''' Create a generator of word batches as a tuple (inputs, targets) '''
    
    n_batches = len(words)//batch_size
    
    # only full batches
    words = words[:n_batches*batch_size]
    
    for idx in range(0, len(words), batch_size):
        x, y = [], []
        batch = words[idx:idx+batch_size]
        for ii in range(len(batch)):
            batch_x = batch[ii]
            batch_y = get_target(batch, ii, window_size)
            y.extend(batch_y)
            x.extend([batch_x]*len(batch_y))
        yield x, y
    

## Building the graph

From [Chris McCormick's blog](http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/), we can see the general structure of our network.
![embedding_network](./assets/skip_gram_net_arch.png)

The input words are passed in as integers. This will go into a hidden layer of linear units, then into a softmax layer. We'll use the softmax layer to make a prediction like normal.

The idea here is to train the hidden layer weight matrix to find efficient representations for our words. We can discard the softmax layer becuase we don't really care about making predictions with this network. We just want the embedding matrix so we can use it in other networks we build from the dataset.

I'm going to have you build the graph in stages now. First off, creating the `inputs` and `labels` placeholders like normal.

> **Exercise:** Assign `inputs` and `labels` using `tf.placeholder`. We're going to be passing in integers, so set the data types to `tf.int32`. The batches we're passing in will have varying sizes, so set the batch sizes to [`None`]. To make things work later, you'll need to set the second dimension of `labels` to `None` or `1`.

In [None]:
train_graph = tf.Graph()
with train_graph.as_default():
    inputs = 
    labels = 

## Embedding



The embedding matrix has a size of the number of words by the number of units in the hidden layer. So, if you have 10,000 words and 300 hidden units, the matrix will have size $10,000 \times 300$. Remember that we're using tokenized data for our inputs, usually as integers, where the number of tokens is the number of words in our vocabulary.


> **Exercise:** Tensorflow provides a convenient function [`tf.nn.embedding_lookup`](https://www.tensorflow.org/api_docs/python/tf/nn/embedding_lookup) that does this lookup for us. You pass in the embedding matrix and a tensor of integers, then it returns rows in the matrix corresponding to those integers. Below, set the number of embedding features you'll use (200 is a good start), create the embedding matrix variable, and use `tf.nn.embedding_lookup` to get the embedding tensors. For the embedding matrix, I suggest you initialize it with a uniform random numbers between -1 and 1 using [tf.random_uniform](https://www.tensorflow.org/api_docs/python/tf/random_uniform).

In [None]:
n_vocab = len(int_to_vocab)
n_embedding =  # Number of embedding features 
with train_graph.as_default():
    embedding = # create embedding weight matrix here
    embed = # use tf.nn.embedding_lookup to get the hidden layer output

## Negative sampling



For every example we give the network, we train it using the output from the softmax layer. That means for each input, we're making very small changes to millions of weights even though we only have one true example. This makes training the network very inefficient. We can approximate the loss from the softmax layer by only updating a small subset of all the weights at once. We'll update the weights for the correct label, but only a small number of incorrect labels. This is called ["negative sampling"](http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf). Tensorflow has a convenient function to do this, [`tf.nn.sampled_softmax_loss`](https://www.tensorflow.org/api_docs/python/tf/nn/sampled_softmax_loss).

> **Exercise:** Below, create weights and biases for the softmax layer. Then, use [`tf.nn.sampled_softmax_loss`](https://www.tensorflow.org/api_docs/python/tf/nn/sampled_softmax_loss) to calculate the loss. Be sure to read the documentation to figure out how it works.

In [None]:
# Number of negative labels to sample
n_sampled = 100
with train_graph.as_default():
    softmax_w = # create softmax weight matrix here
    softmax_b = # create softmax biases here
    
    # Calculate the loss using negative sampling
    loss = tf.nn.sampled_softmax_loss 
    
    cost = tf.reduce_mean(loss)
    optimizer = tf.train.AdamOptimizer().minimize(cost)

## Validation

This code is from Thushan Ganegedara's implementation. Here we're going to choose a few common words and few uncommon words. Then, we'll print out the closest words to them. It's a nice way to check that our embedding table is grouping together words with similar semantic meanings.

In [None]:
with train_graph.as_default():
    ## From Thushan Ganegedara's implementation
    valid_size = 16 # Random set of words to evaluate similarity on.
    valid_window = 100
    # pick 8 samples from (0,100) and (1000,1100) each ranges. lower id implies more frequent 
    valid_examples = np.array(random.sample(range(valid_window), valid_size//2))
    valid_examples = np.append(valid_examples, 
                               random.sample(range(1000,1000+valid_window), valid_size//2))

    valid_dataset = tf.constant(valid_examples, dtype=tf.int32)
    
    # We use the cosine distance:
    norm = tf.sqrt(tf.reduce_sum(tf.square(embedding), 1, keep_dims=True))
    normalized_embedding = embedding / norm
    valid_embedding = tf.nn.embedding_lookup(normalized_embedding, valid_dataset)
    similarity = tf.matmul(valid_embedding, tf.transpose(normalized_embedding))

In [None]:
# If the checkpoints directory doesn't exist:
!mkdir checkpoints

## Training

Below is the code to train the network. Every 100 batches it reports the training loss. Every 1000 batches, it'll print out the validation words.

In [None]:
epochs = 10
batch_size = 1000
window_size = 10

with train_graph.as_default():
    saver = tf.train.Saver()

with tf.Session(graph=train_graph) as sess:
    iteration = 1
    loss = 0
    sess.run(tf.global_variables_initializer())

    for e in range(1, epochs+1):
        batches = get_batches(train_words, batch_size, window_size)
        start = time.time()
        for x, y in batches:
            
            feed = {inputs: x,
                    labels: np.array(y)[:, None]}
            train_loss, _ = sess.run([cost, optimizer], feed_dict=feed)
            
            loss += train_loss
            
            if iteration % 100 == 0: 
                end = time.time()
                print("Epoch {}/{}".format(e, epochs),
                      "Iteration: {}".format(iteration),
                      "Avg. Training loss: {:.4f}".format(loss/100),
                      "{:.4f} sec/batch".format((end-start)/100))
                loss = 0
                start = time.time()
            
            if iteration % 1000 == 0:
                ## From Thushan Ganegedara's implementation
                # note that this is expensive (~20% slowdown if computed every 500 steps)
                sim = similarity.eval()
                for i in range(valid_size):
                    valid_word = int_to_vocab[valid_examples[i]]
                    top_k = 8 # number of nearest neighbors
                    nearest = (-sim[i, :]).argsort()[1:top_k+1]
                    log = 'Nearest to %s:' % valid_word
                    for k in range(top_k):
                        close_word = int_to_vocab[nearest[k]]
                        log = '%s %s,' % (log, close_word)
                    print(log)
            
            iteration += 1
    save_path = saver.save(sess, "checkpoints/text8.ckpt")
    embed_mat = sess.run(normalized_embedding)

Restore the trained network if you need to:

In [None]:
with train_graph.as_default():
    saver = tf.train.Saver()

with tf.Session(graph=train_graph) as sess:
    saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
    embed_mat = sess.run(embedding)

## Visualizing the word vectors

Below we'll use T-SNE to visualize how our high-dimensional word vectors cluster together. T-SNE is used to project these vectors into two dimensions while preserving local stucture. Check out [this post from Christopher Olah](http://colah.github.io/posts/2014-10-Visualizing-MNIST/) to learn more about T-SNE and other ways to visualize high-dimensional data.

In [None]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt
from sklearn.manifold import TSNE

In [None]:
viz_words = 500
tsne = TSNE()
embed_tsne = tsne.fit_transform(embed_mat[:viz_words, :])

In [None]:
fig, ax = plt.subplots(figsize=(14, 14))
for idx in range(viz_words):
    plt.scatter(*embed_tsne[idx, :], color='steelblue')
    plt.annotate(int_to_vocab[idx], (embed_tsne[idx, 0], embed_tsne[idx, 1]), alpha=0.7)