# Thinking in tensors in PyTorch

Hands-on training  by [Piotr Migdał](https://p.migdal.pl) (2019). 


## Text generation

* [The Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) by Andrej Karpathy
* [RecurrentJS](http://cs.stanford.edu/people/karpathy/recurrentjs) - an in-browser demo by Andrej Karpathy
* [Unsupervised sentiment neuron by OpenAI](https://openai.com/blog/unsupervised-sentiment-neuron/)
* [Generating Magic cards using deep, recurrent neural networks](https://www.mtgsalvation.com/forums/magic-fundamentals/custom-card-creation/612057-generating-magic-cards-using-deep-recurrent-neural)

Other

* [Training a Keras model to generate colors](https://heartbeat.fritz.ai/how-to-train-a-keras-model-to-generate-colors-3bc79e54971b)


## Various practical links

* [What is the best way to remove accents in a Python unicode string?](https://stackoverflow.com/questions/517923/what-is-the-best-way-to-remove-accents-in-a-python-unicode-string)

In [51]:
!pip install unidecode

[33mYou are using pip version 19.0.3, however version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [57]:
import numpy as np
import pandas as pd
from collections import Counter
from unidecode import unidecode

In [29]:
names = pd.read_csv("https://www.dropbox.com/s/nu2y0p3i2jvwfki/surnames.csv?dl=1")

In [30]:
names.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20071 entries, 0 to 20070
Data columns (total 2 columns):
name        20071 non-null object
language    20071 non-null object
dtypes: object(2)
memory usage: 313.7+ KB


In [31]:
names.sample(5)

Unnamed: 0,name,language
9351,mcmahon,Irish
1841,shadid,Arabic
14015,jakushov,Russian
9783,mackay,Scottish
18759,yuzhilin,Russian


In [37]:
names['language'].value_counts()

Russian       9405
English       3668
Arabic        2000
Japanese       991
German         724
Italian        709
Czech          519
Spanish        298
Dutch          297
French         277
Chinese        268
Irish          232
Greek          203
Polish         139
Scottish       100
Korean          94
Portuguese      74
Vietnamese      73
Name: language, dtype: int64

In [49]:
names['name'].apply(len).value_counts().sort_index()

2       47
3      335
4     1319
5     2711
6     3926
7     3681
8     3068
9     2272
10    1460
11     711
12     324
13     121
14      46
15      13
16       8
17      24
18       3
19       1
20       1
Name: name, dtype: int64

In [65]:
letters_all = Counter()
for name in names['name']:
    letters_all.update(name)

In [66]:
letters_all.most_common()

[('a', 16515),
 ('o', 11103),
 ('e', 10763),
 ('i', 10421),
 ('n', 9958),
 ('r', 8262),
 ('s', 7983),
 ('h', 7688),
 ('k', 6920),
 ('l', 6710),
 ('v', 6313),
 ('t', 5955),
 ('u', 4720),
 ('m', 4351),
 ('d', 3899),
 ('b', 3657),
 ('y', 3616),
 ('g', 3217),
 ('c', 3070),
 ('z', 1932),
 ('f', 1778),
 ('p', 1711),
 ('j', 1349),
 ('w', 1127),
 (' ', 125),
 ('q', 98),
 ("'", 87),
 ('x', 73),
 ('-', 25),
 ('ö', 24),
 ('é', 23),
 ('í', 14),
 ('ó', 13),
 ('á', 13),
 ('ä', 13),
 ('ü', 11),
 ('à', 10),
 ('ß', 9),
 ('ú', 7),
 ('ñ', 6),
 ('ò', 3),
 ('ś', 3),
 ('1', 3),
 ('è', 2),
 ('ã', 2),
 ('ż', 2),
 ('ê', 1),
 ('ç', 1),
 ('ù', 1),
 ('ì', 1),
 ('õ', 1),
 (':', 1),
 ('\xa0', 1),
 ('ń', 1),
 ('ł', 1),
 ('ą', 1),
 ('/', 1)]

In [67]:
letters = Counter()
for name in names['name']:
    letters.update(unidecode(name))
letters.most_common()

[('a', 16554),
 ('o', 11144),
 ('e', 10789),
 ('i', 10436),
 ('n', 9965),
 ('r', 8262),
 ('s', 8004),
 ('h', 7688),
 ('k', 6920),
 ('l', 6711),
 ('v', 6313),
 ('t', 5955),
 ('u', 4739),
 ('m', 4351),
 ('d', 3899),
 ('b', 3657),
 ('y', 3616),
 ('g', 3217),
 ('c', 3071),
 ('z', 1934),
 ('f', 1778),
 ('p', 1711),
 ('j', 1349),
 ('w', 1127),
 (' ', 126),
 ('q', 98),
 ("'", 87),
 ('x', 73),
 ('-', 25),
 ('1', 3),
 (':', 1),
 ('/', 1)]

In [70]:
char2id = {c: i for i, (c, v) in enumerate(letters.items())}
id2char = {i: c for i, (c, v) in enumerate(letters.items())}

In [69]:
char2id

{'a': 0,
 'b': 1,
 'n': 2,
 'o': 3,
 'r': 4,
 'c': 5,
 's': 6,
 'l': 7,
 'e': 8,
 'q': 9,
 'u': 10,
 't': 11,
 'g': 12,
 'm': 13,
 'i': 14,
 'z': 15,
 'd': 16,
 'f': 17,
 'v': 18,
 'j': 19,
 'y': 20,
 'h': 21,
 'x': 22,
 'p': 23,
 "'": 24,
 ' ': 25,
 'k': 26,
 'w': 27,
 '-': 28,
 ':': 29,
 '/': 30,
 '1': 31}

In [None]:
letters_all = Counter()
for name in names['name']:
    letters_all.update(name)

In [None]:
def encode_name(name):
    