In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

In [5]:
import torch

# Find letter index from all_letters, e.g. "a" = 0
def letterToIndex(letter):
    return all_letters.find(letter)

# Just for demonstration, turn a letter into a <1 x n_letters> Tensor
def letterToTensor(letter):
    tensor = torch.zeros(1, n_letters)
    tensor[0][letterToIndex(letter)] = 1
    return tensor

# Turn a line into a <line_length x 1 x n_letters>,
# or an array of one-hot letter vectors
def lineToTensor(line):
    tensor = torch.zeros(len(line), 1, n_letters)
    for li, letter in enumerate(line):
        tensor[li][0][letterToIndex(letter)] = 1
    return tensor

print(letterToTensor('J'))

print(lineToTensor('Jones').size())



Columns 0 to 12 
    0     0     0     0     0     0     0     0     0     0     0     0     0

Columns 13 to 25 
    0     0     0     0     0     0     0     0     0     0     0     0     0

Columns 26 to 38 
    0     0     0     0     0     0     0     0     0     1     0     0     0

Columns 39 to 51 
    0     0     0     0     0     0     0     0     0     0     0     0     0

Columns 52 to 56 
    0     0     0     0     0
[torch.FloatTensor of size 1x57]

torch.Size([5, 1, 57])


Creating the Network
====================

Before autograd, creating a recurrent neural network in Torch involved
cloning the parameters of a layer over several timesteps. The layers
held hidden state and gradients which are now entirely handled by the
graph itself. This means you can implement a RNN in a very "pure" way,
as regular feed-forward layers.

This RNN module (mostly copied from `the PyTorch for Torch users
tutorial <https://github.com/pytorch/tutorials/blob/master/Introduction%20to%20PyTorch%20for%20former%20Torchies.ipynb>`__)
is just 2 linear layers which operate on an input and hidden state, with
a LogSoftmax layer after the output.

.. figure:: https://i.imgur.com/Z2xbySO.png
   :alt: 





In [8]:
import torch.nn as nn
from torch.autograd import Variable

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()
        
        self.hidden_size = hidden_size
        
        self.i2act = nn.Linear(input_size + hidden_size, hidden_size)
        self.act2h = nn.Tanh()
        self.i2o = nn.Linear(hidden_size, output_size)
        self.softmax = nn.LogSoftmax()
    
    def forward(self, input, hidden):
        combined = torch.cat((input, hidden), 1)
        hidden = self.act2h(self.i2act(combined))
        output = self.i2o(hidden)
        output = self.softmax(output)
        return output, hidden

    def initHidden(self):
        return Variable(torch.zeros(1, self.hidden_size))

n_hidden = 128
rnn = RNN(n_letters, n_hidden, n_categories)

To run a step of this network we need to pass an input (in our case, the
Tensor for the current letter) and a previous hidden state (which we
initialize as zeros at first). We'll get back the output (probability of
each language) and a next hidden state (which we keep for the next
step).

Remember that PyTorch modules operate on Variables rather than straight
up Tensors.




In [9]:
input = Variable(letterToTensor('A'))
hidden = Variable(torch.zeros(1, n_hidden))

output, next_hidden = rnn(input, hidden)

For the sake of efficiency we don't want to be creating a new Tensor for
every step, so we will use ``lineToTensor`` instead of
``letterToTensor`` and use slices. This could be further optimized by
pre-computing batches of Tensors.




In [10]:
input = Variable(lineToTensor('Albert'))
hidden = Variable(torch.zeros(1, n_hidden))

output, next_hidden = rnn(input[0], hidden)
print(output)

Variable containing:

Columns 0 to 9 
-2.8967 -2.9108 -2.8523 -2.9770 -2.9834 -2.9110 -2.9024 -2.9157 -2.7928 -2.9418

Columns 10 to 17 
-2.8244 -2.8168 -2.9630 -2.9928 -2.7859 -2.7875 -2.8640 -2.9500
[torch.FloatTensor of size 1x18]



As you can see the output is a ``<1 x n_categories>`` Tensor, where
every item is the likelihood of that category (higher is more likely).




Training
========
Preparing for Training
----------------------

Before going into training we should make a few helper functions. The
first is to interpret the output of the network, which we know to be a
likelihood of each category. We can use ``Tensor.topk`` to get the index
of the greatest value:




In [11]:
def categoryFromOutput(output):
    top_n, top_i = output.data.topk(1) # Tensor out of Variable with .data
    category_i = top_i[0][0]
    return all_categories[category_i], category_i

print(categoryFromOutput(output))

(u'Korean', 14)


We will also want a quick way to get a training example (a name and its
language):




In [12]:
import random

def randomChoice(l):
    return l[random.randint(0, len(l) - 1)]

def randomTrainingExample():
    category = randomChoice(all_categories)
    line = randomChoice(category_lines[category])
    category_tensor = Variable(torch.LongTensor([all_categories.index(category)]))
    line_tensor = Variable(lineToTensor(line))
    return category, line, category_tensor, line_tensor

for i in range(10):
    category, line, category_tensor, line_tensor = randomTrainingExample()
    print('category =', category, '/ line =', line)

category = Dutch / line = Dale
category = Spanish / line = Obando
category = Portuguese / line = Pinheiro
category = Japanese / line = Mashita
category = Polish / line = Wyrzyk
category = Dutch / line = Smits
category = German / line = Dresdner
category = English / line = Chung
category = Polish / line = Czajka
category = Dutch / line = Dale


Training the Network
--------------------

Now all it takes to train this network is show it a bunch of examples,
have it make guesses, and tell it if it's wrong.

For the loss function ``nn.NLLLoss`` is appropriate, since the last
layer of the RNN is ``nn.LogSoftmax``.




In [13]:
criterion = nn.NLLLoss()

Each loop of training will:

-  Create input and target tensors
-  Create a zeroed initial hidden state
-  Read each letter in and

   -  Keep hidden state for next letter

-  Compare final output to target
-  Back-propagate
-  Return the output and loss




In [14]:
learning_rate = 0.005 # If you set this too high, it might explode. If too low, it might not learn

def train(category_tensor, line_tensor):
    hidden = rnn.initHidden()

    rnn.zero_grad()

    for i in range(line_tensor.size()[0]):
        output, hidden = rnn(line_tensor[i], hidden)

    loss = criterion(output, category_tensor)
    loss.backward()

    # Add parameters' gradients to their values, multiplied by learning rate
    for p in rnn.parameters():
        p.data.add_(-learning_rate, p.grad.data)

    return output, loss.data[0]

Now we just have to run that with a bunch of examples. Since the
``train`` function returns both the output and loss we can print its
guesses and also keep track of loss for plotting. Since there are 1000s
of examples we print only every ``print_every`` examples, and take an
average of the loss.




In [16]:
import time
import math

n_iters = 100000
print_every = 50
plot_every = 1000



# Keep track of losses for plotting
current_loss = 0
all_losses = []

def timeSince(since):
    now = time.time()
    s = now - since
    m = math.floor(s / 60)
    s -= m * 60
    return '%dm %ds' % (m, s)

start = time.time()

for iter in range(1, n_iters + 1):
    category, line, category_tensor, line_tensor = randomTrainingExample()
    output, loss = train(category_tensor, line_tensor)
    current_loss += loss

    # Print iter number, loss, name and guess
    if iter % print_every == 0:
        guess, guess_i = categoryFromOutput(output)
        correct = '✓' if guess == category else '✗ (%s)' % category
        print('%d %d%% (%s) %.4f %s / %s %s' % (iter, iter / n_iters * 100, timeSince(start), loss, line, guess, correct))

    # Add current loss avg to list of losses
    if iter % plot_every == 0:
        all_losses.append(current_loss / plot_every)
        current_loss = 0

50 0% (0m 1s) 2.8291 Gallo / Portuguese ✗ (Spanish)
100 0% (0m 1s) 2.8042 Sokolofsky / Portuguese ✗ (Polish)
150 0% (0m 2s) 2.9854 Treasach / Scottish ✗ (Irish)
200 0% (0m 3s) 2.8547 Shin / Scottish ✗ (Korean)
250 0% (0m 4s) 2.9801 Vo / Portuguese ✗ (Vietnamese)
300 0% (0m 5s) 2.7912 Murphy / Scottish ✓
350 0% (0m 6s) 2.7536 Watt / Scottish ✓
400 0% (0m 7s) 2.9354 Donnchadh / Scottish ✗ (Irish)
450 0% (0m 8s) 2.8370 Flaxman / Scottish ✗ (English)
500 0% (0m 9s) 2.9657 Brady / Scottish ✗ (Irish)
550 0% (0m 10s) 2.9063 Skala / Portuguese ✗ (Polish)
600 0% (0m 11s) 2.7067 Baumhauer / German ✓
650 0% (0m 12s) 2.8402 Bohunovsky / Portuguese ✗ (Czech)
700 0% (0m 13s) 2.7522 Mccallum / Portuguese ✗ (Scottish)
750 0% (0m 14s) 2.9097 Mullins / Portuguese ✗ (French)
800 0% (0m 15s) 2.8010 Krantz / Scottish ✗ (German)
850 0% (0m 16s) 2.8372 Sauvageot / Scottish ✗ (French)
900 0% (0m 17s) 3.0503 Olguin / Scottish ✗ (Spanish)
950 0% (0m 18s) 2.8124 Montagna / Portuguese ✗ (Italian)
1000 1% (0m 19s)

8300 8% (2m 42s) 2.4254 Gott / Arabic ✗ (German)
8350 8% (2m 43s) 3.6321 Leon / Chinese ✗ (French)
8400 8% (2m 44s) 2.0680 Uhlik / Arabic ✗ (Czech)
8450 8% (2m 45s) 1.8258 Waller / English ✓
8500 8% (2m 46s) 3.1132 Quiros / Greek ✗ (Spanish)
8550 8% (2m 47s) 1.9205 Verona / Spanish ✗ (Italian)
8600 8% (2m 48s) 2.7278 Hlypovka / Polish ✗ (Russian)
8650 8% (2m 49s) 2.3963 Belmonte / Dutch ✗ (Spanish)
8700 8% (2m 50s) 3.6660 Ryskamp / Japanese ✗ (Dutch)
8750 8% (2m 51s) 4.8459 Furlan / Scottish ✗ (Italian)
8800 8% (2m 52s) 0.9124 Shum / Chinese ✓
8850 8% (2m 53s) 0.9038 Belobrov / Russian ✓
8900 8% (2m 54s) 3.5511 Kasamatsu / Greek ✗ (Japanese)
8950 8% (2m 55s) 2.1985 Sheinfeld / Scottish ✗ (German)
9000 9% (2m 56s) 1.1039 Salib / Arabic ✓
9050 9% (2m 57s) 1.4416 Maestri / Italian ✓
9100 9% (2m 58s) 2.5174 Schulte / English ✗ (German)
9150 9% (2m 59s) 0.6950 Ban / Chinese ✓
9200 9% (3m 0s) 2.8847 Jagoda / Japanese ✗ (Polish)
9250 9% (3m 1s) 1.5464 Tse / Chinese ✓
9300 9% (3m 2s) 1.0699 Sc

16550 16% (5m 24s) 2.3752 Morra / Spanish ✗ (Italian)
16600 16% (5m 25s) 2.1311 Samaha / Japanese ✗ (Arabic)
16650 16% (5m 26s) 0.3045 Janowski / Polish ✓
16700 16% (5m 27s) 3.6987 Valentei / Italian ✗ (Russian)
16750 16% (5m 28s) 0.5348 Choi / Korean ✓
16800 16% (5m 29s) 0.1267 Adamidis / Greek ✓
16850 16% (5m 30s) 2.2626 Vainonen / Dutch ✗ (Russian)
16900 16% (5m 31s) 2.0546 Ota / Spanish ✗ (Japanese)
16950 16% (5m 32s) 1.0595 Khouri / Arabic ✓
17000 17% (5m 33s) 3.1930 Aonghuis / Greek ✗ (Irish)
17050 17% (5m 34s) 1.6585 Kanak / Polish ✗ (Czech)
17100 17% (5m 35s) 1.9482 Karubo / Portuguese ✗ (Japanese)
17150 17% (5m 36s) 0.4610 Nelli / Italian ✓
17200 17% (5m 37s) 2.1091 Mcdonald / French ✗ (Scottish)
17250 17% (5m 38s) 0.8070 Polymenakou / Greek ✓
17300 17% (5m 39s) 3.4108 Szewc / Dutch ✗ (Polish)
17350 17% (5m 40s) 1.9328 Page / French ✓
17400 17% (5m 41s) 1.7422 Lichtenberg / Dutch ✗ (German)
17450 17% (5m 42s) 3.3128 Lichman / Irish ✗ (Russian)
17500 17% (5m 43s) 2.4584 Pey / V

24700 24% (7m 59s) 1.1750 Chan / Vietnamese ✗ (Chinese)
24750 24% (8m 0s) 1.6882 Errington / Russian ✗ (English)
24800 24% (8m 1s) 1.2034 Ahearn / Irish ✓
24850 24% (8m 2s) 1.3590 Flannery / English ✓
24900 24% (8m 3s) 0.5316 Treasach / Irish ✓
24950 24% (8m 4s) 2.0463 Kara / Arabic ✗ (Czech)
25000 25% (8m 6s) 1.9377 Czabal / Arabic ✗ (Czech)
25050 25% (8m 7s) 1.2503 Lac / Chinese ✗ (Vietnamese)
25100 25% (8m 8s) 3.7453 Sauvageau / Japanese ✗ (French)
25150 25% (8m 9s) 0.3337 Trieu / Vietnamese ✓
25200 25% (8m 10s) 1.4016 Kohler / German ✓
25250 25% (8m 11s) 3.6267 Manus / Arabic ✗ (Irish)
25300 25% (8m 12s) 1.8765 Aiza / Japanese ✗ (Spanish)
25350 25% (8m 13s) 1.6894 Pole / Scottish ✗ (English)
25400 25% (8m 14s) 0.4260 Nishiwaki / Japanese ✓
25450 25% (8m 15s) 3.9370 Wallace / French ✗ (Scottish)
25500 25% (8m 16s) 0.0700 Onikov / Russian ✓
25550 25% (8m 17s) 2.8756 Kasa / Japanese ✗ (Czech)
25600 25% (8m 18s) 0.4642 Belanger / French ✓
25650 25% (8m 19s) 1.8171 Heidl / French ✗ (Cze

32900 32% (10m 35s) 1.3041 Agthoven / Dutch ✓
32950 32% (10m 36s) 0.4312 O'Hannagain / Irish ✓
33000 33% (10m 37s) 0.1282 Horiatis / Greek ✓
33050 33% (10m 38s) 0.8097 Paredes / Portuguese ✓
33100 33% (10m 39s) 1.7314 OuYang / Vietnamese ✗ (Chinese)
33150 33% (10m 40s) 0.5045 Ban / Chinese ✓
33200 33% (10m 41s) 0.5402 O'Dell / Irish ✓
33250 33% (10m 42s) 1.8404 Braun / Dutch ✗ (German)
33300 33% (10m 43s) 1.1645 Sokal / Czech ✗ (Polish)
33350 33% (10m 44s) 0.6214 Takei / Japanese ✓
33400 33% (10m 44s) 2.3483 Martel / Dutch ✗ (French)
33450 33% (10m 45s) 2.6520 Brun / Vietnamese ✗ (German)
33500 33% (10m 46s) 1.7737 Hierro / Portuguese ✗ (Spanish)
33550 33% (10m 47s) 3.2888 Roy / Korean ✗ (French)
33600 33% (10m 48s) 2.8733 Seaghdha / Spanish ✗ (Irish)
33650 33% (10m 49s) 0.7623 Faucheux / French ✓
33700 33% (10m 50s) 1.2745 Sakellariou / Polish ✗ (Greek)
33750 33% (10m 51s) 2.2186 Kocian / Arabic ✗ (Czech)
33800 33% (10m 52s) 1.3471 Lang / Vietnamese ✗ (Chinese)
33850 33% (10m 52s) 0.1

41050 41% (13m 4s) 0.7895 Gardinier / French ✓
41100 41% (13m 5s) 1.3278 Nightingale / Scottish ✗ (English)
41150 41% (13m 6s) 1.2665 Mansour / Arabic ✓
41200 41% (13m 7s) 1.8634 Phung / Chinese ✗ (Vietnamese)
41250 41% (13m 8s) 1.3658 Samaha / Japanese ✗ (Arabic)
41300 41% (13m 9s) 2.2291 Diaz / Vietnamese ✗ (Spanish)
41350 41% (13m 10s) 0.6130 Ahearn / Irish ✓
41400 41% (13m 11s) 1.3013 Seif / Korean ✗ (Arabic)
41450 41% (13m 12s) 0.3394 Dertilis / Greek ✓
41500 41% (13m 13s) 3.7023 Gaspar / Arabic ✗ (Spanish)
41550 41% (13m 14s) 1.1632 Durand / French ✓
41600 41% (13m 15s) 0.8393 Abasolo / Spanish ✓
41650 41% (13m 16s) 1.7264 Szweda / Arabic ✗ (Polish)
41700 41% (13m 17s) 0.6533 Maloof / Arabic ✓
41750 41% (13m 18s) 1.1823 Sandoval / Spanish ✓
41800 41% (13m 19s) 0.7704 Bieber / German ✓
41850 41% (13m 20s) 0.0547 Julev / Russian ✓
41900 41% (13m 20s) 0.7665 Zabek / Polish ✓
41950 41% (13m 21s) 1.9071 Nemec / German ✗ (Czech)
42000 42% (13m 22s) 3.3342 Fay / Vietnamese ✗ (French)
42

49300 49% (15m 41s) 0.6630 Beauchene / French ✓
49350 49% (15m 42s) 1.0269 Jeon / Korean ✓
49400 49% (15m 43s) 4.3302 Roig / Korean ✗ (Spanish)
49450 49% (15m 44s) 0.4980 Ramires / Portuguese ✓
49500 49% (15m 45s) 0.1219 Shon / Korean ✓
49550 49% (15m 46s) 0.6739 Lestrange / French ✓
49600 49% (15m 47s) 2.8123 Kassis / Greek ✗ (Arabic)
49650 49% (15m 48s) 1.4531 Suero / Portuguese ✗ (Spanish)
49700 49% (15m 49s) 0.0968 Crocetti / Italian ✓
49750 49% (15m 50s) 0.6332 Dao / Vietnamese ✓
49800 49% (15m 51s) 0.4707 Andrysiak / Polish ✓
49850 49% (15m 52s) 0.2284 Alvarez / Spanish ✓
49900 49% (15m 53s) 1.3749 Lobo / Portuguese ✓
49950 49% (15m 54s) 1.8405 Pachr / German ✗ (Czech)
50000 50% (15m 55s) 0.6723 Le / Vietnamese ✓
50050 50% (15m 56s) 0.1594 Tsuda / Japanese ✓
50100 50% (15m 57s) 0.3238 Luong / Vietnamese ✓
50150 50% (15m 58s) 0.0941 Chweh / Korean ✓
50200 50% (15m 59s) 4.5187 Zaruba / Arabic ✗ (Czech)
50250 50% (16m 0s) 4.6188 Lihtenshtedt / German ✗ (Russian)
50300 50% (16m 1s) 0

57550 57% (18m 20s) 0.6868 Rooijakkers / Dutch ✓
57600 57% (18m 21s) 1.0673 Halabi / Italian ✗ (Arabic)
57650 57% (18m 22s) 1.1735 King / Scottish ✓
57700 57% (18m 23s) 1.7343 Rompuy / English ✗ (Dutch)
57750 57% (18m 24s) 1.8591 Sinclair / Dutch ✗ (Scottish)
57800 57% (18m 25s) 0.4063 Esparza / Spanish ✓
57850 57% (18m 26s) 0.3904 Mokeev / Russian ✓
57900 57% (18m 27s) 6.4217 Althaus / Portuguese ✗ (German)
57950 57% (18m 27s) 2.1872 Krol / Scottish ✗ (Polish)
58000 57% (18m 28s) 0.1675 Onischenko / Russian ✓
58050 58% (18m 29s) 5.3489 Steinborn / Dutch ✗ (Czech)
58100 58% (18m 30s) 1.1170 Blanchet / French ✓
58150 58% (18m 31s) 0.4378 Paschalis / Greek ✓
58200 58% (18m 32s) 1.4467 Craig / Korean ✗ (Scottish)
58250 58% (18m 33s) 4.3828 Giles / Spanish ✗ (French)
58300 58% (18m 34s) 1.6364 Nonomura / Portuguese ✗ (Japanese)
58350 58% (18m 35s) 0.9315 Araki / Japanese ✓
58400 58% (18m 36s) 0.0335 Tsukehara / Japanese ✓
58450 58% (18m 37s) 2.2129 Desjardins / English ✗ (French)
58500 58%

65850 65% (20m 57s) 2.0851 Kensington / Scottish ✗ (English)
65900 65% (20m 58s) 0.2316 Havroshkin / Russian ✓
65950 65% (20m 59s) 0.2026 Vu / Vietnamese ✓
66000 66% (21m 0s) 0.1493 Koumanidis / Greek ✓
66050 66% (21m 1s) 7.0032 Cham / Korean ✗ (Arabic)
66100 66% (21m 2s) 0.4283 Kozlow / Polish ✓
66150 66% (21m 3s) 2.0308 Korandak / Polish ✗ (Czech)
66200 66% (21m 4s) 0.1733 Zhirmunsky / Russian ✓
66250 66% (21m 5s) 0.0533 Mckenzie / Scottish ✓
66300 66% (21m 6s) 0.5107 Devin / Irish ✓
66350 66% (21m 6s) 0.1049 Niall / Irish ✓
66400 66% (21m 7s) 2.3136 Storey / Polish ✗ (English)
66450 66% (21m 8s) 0.1204 Tokudome / Japanese ✓
66500 66% (21m 9s) 0.6043 Maclean / Scottish ✓
66550 66% (21m 10s) 0.2840 Kijek / Polish ✓
66600 66% (21m 11s) 2.9220 Wren / Korean ✗ (English)
66650 66% (21m 12s) 1.0255 Hopkins / Dutch ✗ (English)
66700 66% (21m 13s) 0.0480 Zabek / Polish ✓
66750 66% (21m 13s) 0.9589 Dubois / French ✓
66800 66% (21m 14s) 2.4090 Buchan / Irish ✗ (English)
66850 66% (21m 15s) 2.5

74100 74% (23m 25s) 0.8136 Pinheiro / Portuguese ✓
74150 74% (23m 26s) 2.3488 Eckstein / Scottish ✗ (German)
74200 74% (23m 27s) 1.2715 Schultz / German ✓
74250 74% (23m 28s) 1.6399 Araullo / Portuguese ✗ (Spanish)
74300 74% (23m 29s) 1.7830 Spitznogle / Czech ✗ (German)
74350 74% (23m 30s) 2.5001 Mojjis / English ✗ (Czech)
74400 74% (23m 31s) 0.7372 Rios / Portuguese ✓
74450 74% (23m 32s) 1.0669 Travieso / Spanish ✓
74500 74% (23m 33s) 0.1208 Vaikin / Russian ✓
74550 74% (23m 34s) 0.9470 Etxeberria / Spanish ✓
74600 74% (23m 35s) 0.5730 Basurto / Portuguese ✓
74650 74% (23m 36s) 0.9631 Busto / Italian ✓
74700 74% (23m 36s) 0.0436 Nassar / Arabic ✓
74750 74% (23m 37s) 0.0061 Akrivopoulos / Greek ✓
74800 74% (23m 38s) 0.3984 Rosa / Spanish ✓
74850 74% (23m 39s) 0.7077 Salib / Arabic ✓
74900 74% (23m 40s) 0.7215 Fraser / Scottish ✓
74950 74% (23m 41s) 0.0330 Drivakis / Greek ✓
75000 75% (23m 42s) 1.9392 Stewart / Scottish ✗ (English)
75050 75% (23m 43s) 1.1237 Lac / Chinese ✗ (Vietnamese

82450 82% (25m 57s) 0.1118 Nenci / Italian ✓
82500 82% (25m 58s) 3.1820 Desjardins / German ✗ (French)
82550 82% (25m 59s) 0.0511 Chweh / Korean ✓
82600 82% (26m 0s) 1.6537 Grec / Dutch ✗ (Spanish)
82650 82% (26m 1s) 1.4642 Kasimor / Arabic ✗ (Czech)
82700 82% (26m 2s) 0.2767 Jon / Korean ✓
82750 82% (26m 2s) 0.2898 Poirier / French ✓
82800 82% (26m 3s) 0.3019 Chemlik / Czech ✓
82850 82% (26m 4s) 4.1878 Cassidy / Arabic ✗ (English)
82900 82% (26m 5s) 0.9507 Rossum / Dutch ✓
82950 82% (26m 6s) 0.0062 Polymenakou / Greek ✓
83000 83% (26m 7s) 0.3678 Brodeur / French ✓
83050 83% (26m 8s) 1.3988 Scolaidhe / French ✗ (Irish)
83100 83% (26m 9s) 0.4026 Melo / Portuguese ✓
83150 83% (26m 10s) 0.3809 Teunissen / Dutch ✓
83200 83% (26m 10s) 0.2611 Ri / Korean ✓
83250 83% (26m 11s) 0.1887 Mifsud / Arabic ✓
83300 83% (26m 12s) 0.0032 Hayakawa / Japanese ✓
83350 83% (26m 13s) 1.6512 Hawes / Dutch ✗ (English)
83400 83% (26m 14s) 0.0045 Manoukarakis / Greek ✓
83450 83% (26m 15s) 2.5694 Bosque / French

90750 90% (28m 25s) 0.4976 Nicks / English ✓
90800 90% (28m 26s) 2.0710 Chou / Korean ✗ (Chinese)
90850 90% (28m 27s) 0.3743 Kwong / Chinese ✓
90900 90% (28m 28s) 1.3516 Rian / Chinese ✗ (Irish)
90950 90% (28m 29s) 1.0397 Mai / Chinese ✗ (Vietnamese)
91000 91% (28m 30s) 0.5943 Schuchert / German ✓
91050 91% (28m 31s) 0.4396 Tedesco / Italian ✓
91100 91% (28m 32s) 1.6047 Meisner / German ✗ (Dutch)
91150 91% (28m 33s) 1.1672 Solos / Polish ✗ (Spanish)
91200 91% (28m 33s) 0.2646 Ahearn / Irish ✓
91250 91% (28m 34s) 0.0709 Park  / Korean ✓
91300 91% (28m 35s) 0.4183 Gibson / Scottish ✓
91350 91% (28m 36s) 1.7786 Bacon / English ✗ (Czech)
91400 91% (28m 37s) 0.5038 Finnegan / Irish ✓
91450 91% (28m 38s) 0.2864 Arechavaleta / Spanish ✓
91500 91% (28m 39s) 1.7128 Denzel / Dutch ✗ (German)
91550 91% (28m 40s) 0.6292 Duchamps / French ✓
91600 91% (28m 41s) 0.0787 Trinh / Vietnamese ✓
91650 91% (28m 42s) 1.6299 Gagnon / English ✗ (French)
91700 91% (28m 43s) 4.5099 Aden / Dutch ✗ (Russian)
91750

99200 99% (30m 59s) 0.7778 Schmidt / German ✓
99250 99% (31m 0s) 0.0582 Metrofanis / Greek ✓
99300 99% (31m 1s) 0.0726 Shammas / Arabic ✓
99350 99% (31m 1s) 1.6162 Abasolo / Italian ✗ (Spanish)
99400 99% (31m 2s) 2.6028 Testa / Czech ✗ (Italian)
99450 99% (31m 3s) 0.6694 Bai / Chinese ✓
99500 99% (31m 4s) 0.2188 Shaw / Scottish ✓
99550 99% (31m 5s) 0.0014 Arrigucci / Italian ✓
99600 99% (31m 6s) 0.0134 Chrysanthopoulos / Greek ✓
99650 99% (31m 7s) 0.2666 Walentowicz / Polish ✓
99700 99% (31m 8s) 2.2762 Donnelly / Irish ✗ (English)
99750 99% (31m 8s) 0.0527 Sowka / Polish ✓
99800 99% (31m 9s) 0.7537 Lestrange / French ✓
99850 99% (31m 10s) 1.0558 Janosik / Polish ✗ (Czech)
99900 99% (31m 11s) 0.4035 White / Scottish ✓
99950 99% (31m 12s) 0.0196 Qian / Chinese ✓
100000 100% (31m 13s) 2.9633 Denham / Arabic ✗ (English)


Plotting the Results
--------------------

Plotting the historical loss from ``all_losses`` shows the network
learning:




In [None]:
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

plt.figure()
plt.plot(all_losses)

Evaluating the Results
======================

To see how well the network performs on different categories, we will
create a confusion matrix, indicating for every actual language (rows)
which language the network guesses (columns). To calculate the confusion
matrix a bunch of samples are run through the network with
``evaluate()``, which is the same as ``train()`` minus the backprop.




In [None]:
# Keep track of correct guesses in a confusion matrix
confusion = torch.zeros(n_categories, n_categories)
n_confusion = 10000

# Just return an output given a line
def evaluate(line_tensor):
    hidden = rnn.initHidden()
    
    for i in range(line_tensor.size()[0]):
        output, hidden = rnn(line_tensor[i], hidden)
    
    return output

# Go through a bunch of examples and record which are correctly guessed
for i in range(n_confusion):
    category, line, category_tensor, line_tensor = randomTrainingExample()
    output = evaluate(line_tensor)
    guess, guess_i = categoryFromOutput(output)
    category_i = all_categories.index(category)
    confusion[category_i][guess_i] += 1

# Normalize by dividing every row by its sum
for i in range(n_categories):
    confusion[i] = confusion[i] / confusion[i].sum()

# Set up plot
fig = plt.figure()
ax = fig.add_subplot(111)
cax = ax.matshow(confusion.numpy())
fig.colorbar(cax)

# Set up axes
ax.set_xticklabels([''] + all_categories, rotation=90)
ax.set_yticklabels([''] + all_categories)

# Force label at every tick
ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
ax.yaxis.set_major_locator(ticker.MultipleLocator(1))

# sphinx_gallery_thumbnail_number = 2
plt.show()

You can pick out bright spots off the main axis that show which
languages it guesses incorrectly, e.g. Chinese for Korean, and Spanish
for Italian. It seems to do very well with Greek, and very poorly with
English (perhaps because of overlap with other languages).




Running on User Input
---------------------




In [None]:
def predict(input_line, n_predictions=3):
    print('\n> %s' % input_line)
    output = evaluate(Variable(lineToTensor(input_line)))

    # Get top N categories
    topv, topi = output.data.topk(n_predictions, 1, True)
    predictions = []

    for i in range(n_predictions):
        value = topv[0][i]
        category_index = topi[0][i]
        print('(%.2f) %s' % (value, all_categories[category_index]))
        predictions.append([value, all_categories[category_index]])

predict('Dovesky')
predict('Jackson')
predict('Satoshi')

The final versions of the scripts `in the Practical PyTorch
repo <https://github.com/spro/practical-pytorch/tree/master/char-rnn-classification>`__
split the above code into a few files:

-  ``data.py`` (loads files)
-  ``model.py`` (defines the RNN)
-  ``train.py`` (runs training)
-  ``predict.py`` (runs ``predict()`` with command line arguments)
-  ``server.py`` (serve prediction as a JSON API with bottle.py)

Run ``train.py`` to train and save the network.

Run ``predict.py`` with a name to view predictions:

::

    $ python predict.py Hazaki
    (-0.42) Japanese
    (-1.39) Polish
    (-3.51) Czech

Run ``server.py`` and visit http://localhost:5533/Yourname to get JSON
output of predictions.




Exercises
=========

-  Try with a different dataset of line -> category, for example:

   -  Any word -> language
   -  First name -> gender
   -  Character name -> writer
   -  Page title -> blog or subreddit

-  Get better results with a bigger and/or better shaped network

   -  Add more linear layers
   -  Try the ``nn.LSTM`` and ``nn.GRU`` layers
   -  Combine multiple of these RNNs as a higher level network


