In [308]:
from char_decoder import *
from sanity_check import *
from vocab import *
from utils import *
from model_embeddings import *
from nmt_model import *
import numpy as np

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### 2 (a) `__init__` of  `CharDecoder`

To initialize `CharDecoder` the only thing we have to get right is shapes:

- `LSTM` works with embedded vectors, so its `input_size` should be equal to this `embedding size` - in our case `char_embedding_size` (we basically define matrices of `LSTM` that should work on input vectors);
- our `Linear` layer should project on $V_{char}$ space to get distribution over chars; as we know its easier to work with linear layers in terms of `in_features` and `out_features`, not directly with matrix shapes; so as an input we get vectors from `LSTM` of `hidden_size` and `out_features` or $V_{char}$ (or length of appropiate vocabulary);
- finally `Embedding` layer should translate vectors from $V_{char}$ to $e_{char}$ space;

In [309]:
char_vocab = DummyVocab()

In [310]:
len(char_vocab.char2id)

30

In [311]:
print(char_vocab.char2id.items())

dict_items([('<pad>', 0), ('{', 1), ('}', 2), ('<unk>', 3), ('a', 4), ('b', 5), ('c', 6), ('d', 7), ('e', 8), ('f', 9), ('g', 10), ('h', 11), ('i', 12), ('j', 13), ('k', 14), ('l', 15), ('m', 16), ('n', 17), ('o', 18), ('p', 19), ('q', 20), ('r', 21), ('s', 22), ('t', 23), ('u', 24), ('v', 25), ('w', 26), ('x', 27), ('y', 28), ('z', 29)])


In [312]:
HIDDEN_SIZE, EMBED_SIZE

(3, 3)

In [313]:
decoder = CharDecoder(
    hidden_size=HIDDEN_SIZE,
    char_embedding_size=EMBED_SIZE+1,
    target_vocab=char_vocab)

In [314]:
decoder

CharDecoder(
  (charDecoder): LSTM(4, 3)
  (char_output_projection): Linear(in_features=3, out_features=30, bias=True)
  (decoderCharEmb): Embedding(30, 4, padding_idx=0)
)

In [315]:
!python3 sanity_check.py 2a

--------------------------------------------------------------------------------
Running Sanity Check for Question 2a: CharDecoder.__init__()
--------------------------------------------------------------------------------
Sanity Check Passed for Question 2a: CharDecoder.__init__()!
--------------------------------------------------------------------------------


### 2 (b) `forward()` function of `CharDecoder`

The size of an output is specified in this case: `(length, batch, self.vocab_size)`. Let's use the example from `sanity_check.py` and check the shape:

- `sequence_length` is indeed `4`;
- `batch_size` is `5`;
- finally `self.vocab_size` is `30` (see above);

In [316]:
BATCH_SIZE

5

In [317]:
sequence_length = 4
inpt = torch.zeros(sequence_length, BATCH_SIZE, dtype=torch.long)

In [318]:
logits, (dec_hidden1, dec_hidden2) = decoder.forward(inpt)

In [319]:
logits.shape

torch.Size([4, 5, 30])

In [320]:
!python3 sanity_check.py 2b

--------------------------------------------------------------------------------
Running Sanity Check for Question 2b: CharDecoder.forward()
--------------------------------------------------------------------------------
Sanity Check Passed for Question 2b: CharDecoder.forward()!
--------------------------------------------------------------------------------


### 2 (c) `train forward()` of `CharDecoder`

It looks like we need to compute `loss` to train `CharDecoder`. It's not quite clear why we use this form of the loss but at least it's specified in `pdf`.

First of all:

- if `char_sequence` is `<START>,m,u,s,i,c,<END>` we need to remove the last symbol to feed it into `forward()`;
- we also need to create the `target` sequence - in this case we have to remove the first symbol;
- we also have to use `forward()` method to get logits `s`;

And now we have to use `nn.CrossEntropyLoss` but somehow modify it to account:

- for the fact that we use `sum`, not `average` across the batch; 
- we also have to ignore `pad_symbol`;
- finally we have to reshape our logits and target to feed them into `nn.CrossEntropyLoss`;

In [321]:
! python3 sanity_check.py 2c

--------------------------------------------------------------------------------
Running Sanity Check for Question 2c: CharDecoder.train_forward()
--------------------------------------------------------------------------------
Sanity Check Passed for Question 2c: CharDecoder.train_forward()!
--------------------------------------------------------------------------------


# test on tiny dataset

## get batch of data

Let's now use our dataset to debug `CharDecoder` like in the first part.

In [322]:
!head -3 './en_es_data/train_tiny.en'

Thank you so much, Chris. And it's truly a great honor to have the opportunity to come to this stage twice; I'm extremely grateful.
I have been blown away by this conference, and I want to thank all of you for the many nice comments about what I had to say the other night.
And I say that sincerely, partly because (Mock sob) I need that.  Put yourselves in my position.


In [323]:
train_data_src = read_corpus('./en_es_data/train_tiny.es', source='src')
train_data_tgt = read_corpus('./en_es_data/train_tiny.en', source='tgt')
train_data = list(zip(train_data_src, train_data_tgt))

In [324]:
train_batch_size = 3

In [325]:
it = batch_iter(train_data, batch_size=train_batch_size, shuffle=False)

In [326]:
src_sents, tgt_sents = next(it)

This time we probably need `tgt_sents`, not `src_sents`. Again these are still `list[list[str]]`, sorted and enclosed in `<s>` and `</s>`.

In [327]:
[len(tgt_sents[i]) for i in range(len(tgt_sents))]

[32, 26, 20]

In [328]:
print([tgt_sents[i][:5] for i in range(3)])

[['<s>', 'I', 'have', 'been', 'blown'], ['<s>', 'Thank', 'you', 'so', 'much,'], ['<s>', 'And', 'I', 'say', 'that']]


In [329]:
print([tgt_sents[i][-5:] for i in range(3)])

[['say', 'the', 'other', 'night.', '</s>'], ['twice;', "I'm", 'extremely', 'grateful.', '</s>'], ['yourselves', 'in', 'my', 'position.', '</s>']]


## get vocab

In [330]:
vocab = Vocab.load('vocab_tiny_q2.json')

In [331]:
# the sizes of vocabs are even smaller than before
len(vocab.src), len(vocab.tgt)

(26, 32)

In [332]:
list(vocab.tgt.word2id.items())[:5]

[('<pad>', 0), ('<s>', 1), ('</s>', 2), ('<unk>', 3), ('to', 4)]

In [333]:
# that's the same list as before
len(vocab.tgt.char_list)

92

In [334]:
len(vocab.tgt.char2id)

96

## encode characters

In the first part we encoded our words using `to_input_tensor_char()`. What are we going to do now? It seems we still need this encoding.

In [335]:
target_padded_chars = vocab.tgt.to_input_tensor_char(tgt_sents, device=torch.device('cpu')) 

In [336]:
target_padded_chars.shape

torch.Size([32, 3, 21])

In [337]:
target_padded_chars_resh = target_padded_chars.reshape(3, 32, 21)

In [338]:
target_padded_chars_resh.shape

torch.Size([3, 32, 21])

In [339]:
target_padded_chars_resh[0, :5, :5]

tensor([[ 1, 90, 48, 91,  2],
        [ 1, 12,  2,  0,  0],
        [ 1, 37, 30, 51, 34],
        [ 1, 31, 34, 34, 43],
        [ 1, 31, 41, 44, 52]])

In [340]:
[[vocab.tgt.id2char[i] for i in row] for row in target_padded_chars_resh.numpy()[0, :5, :5]]

[['{', '<', 's', '>', '}'],
 ['{', 'I', '}', '<pad>', '<pad>'],
 ['{', 'h', 'a', 'v', 'e'],
 ['{', 'b', 'e', 'e', 'n'],
 ['{', 'b', 'l', 'o', 'w']]

## CharDecoder

### `init` 

Let's one more time verify that this is correct shape:

- `Embedding` layer should have input size of our target char vocab (`96` - see above) and make an embedding of size `7` - specified below; 
- `LSTM` should transfer these embeddings into space of `hidden_size` (`5` - specified below);
- finally `Linear` layer should project these vectors back to space of size `96` (to get prediction of the next character);

In [341]:
char_decoder = CharDecoder(hidden_size=5, char_embedding_size=7, target_vocab=vocab.tgt)

In [342]:
char_decoder

CharDecoder(
  (charDecoder): LSTM(7, 5)
  (char_output_projection): Linear(in_features=5, out_features=96, bias=True)
  (decoderCharEmb): Embedding(96, 7, padding_idx=0)
)

And now one of the most difficult moments - we have to go throw the forward pass of `CharDecoder`. 

Input shape to `forward()` in `CharDecoder` is specified as `(length, batch)`. The question is  - how can we get this shape from `target_padded_chars` which is `3D` tensor. It looks like we have to reshape it. It's not clear why should we remove the first elements right now. That's a code inside `forward()` in `nmt_model.py`.

There are 2 main questions:

- why should we remove `<s>` here if we'are told in `train_forward` that `char_sequence` corresponds to the sequence `x_1 ... x_{n+1}` from the handout (e.g., `<START>,m,u,s,i,c,<END>`); in other word **including** `<s>`;
- by removing it like `target_padded_chars[1:]` we actually break our examples: instead of `<s> I have been blown ...` we get `been blown ...` (but maybe we just don't have to reshape after this operation);

In [343]:
max_word_len = target_padded_chars.shape[-1]

In [344]:
max_word_len

21

In [345]:
target_chars = target_padded_chars[1:].view(-1, max_word_len)

In [346]:
target_chars.shape

torch.Size([93, 21])

In [347]:
# so this is looks like (length, batch) specified in train_forward
target_chars.t().shape

torch.Size([21, 93])

What did we remove?

In [348]:
target_padded_chars.shape

torch.Size([32, 3, 21])

In [349]:
target_padded_chars[:5, 0, :5]

tensor([[ 1, 90, 48, 91,  2],
        [ 1, 31, 34, 34, 43],
        [ 1, 31, 54,  2,  0],
        [ 1, 30, 43, 33,  2],
        [ 1, 49, 44,  2,  0]])

In [350]:
target_padded_chars[1:][:5, 0, :5]

tensor([[ 1, 31, 34, 34, 43],
        [ 1, 31, 54,  2,  0],
        [ 1, 30, 43, 33,  2],
        [ 1, 49, 44,  2,  0],
        [ 1, 44, 35,  2,  0]])

In [351]:
target_padded_chars.view(3, 32, 21)[0, :5, :5]

tensor([[ 1, 90, 48, 91,  2],
        [ 1, 12,  2,  0,  0],
        [ 1, 37, 30, 51, 34],
        [ 1, 31, 34, 34, 43],
        [ 1, 31, 41, 44, 52]])

In [352]:
target_padded_chars[1:].shape

torch.Size([31, 3, 21])

In [353]:
target_padded_chars[1:].view(3, 31, 21)[0, :5, :5]

tensor([[ 1, 31, 34, 34, 43],
        [ 1, 31, 41, 44, 52],
        [ 1, 30, 52, 30, 54],
        [ 1, 31, 54,  2,  0],
        [ 1, 49, 37, 38, 48]])

### `forward()`

In [354]:
target_padded_chars.shape

torch.Size([32, 3, 21])

In [355]:
char_sequence = target_padded_chars.view(21, 3 * 32)

In [356]:
char_sequence.shape

torch.Size([21, 96])

In [357]:
input = char_sequence[:-1, :]

In [358]:
input.shape

torch.Size([20, 96])

What exactly did we remove? That's not `start` and `end` symbols. Not quite clear.

In [390]:
char_sequence[-1, :]

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [392]:
char_sequence[1, :]

tensor([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  1, 30, 52, 30, 54,  2,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  1, 31, 54,  2,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  1, 49, 37,
        38, 48,  2,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         1, 32, 44, 43, 35, 34, 47, 34, 43, 32, 34, 66,  2,  0,  0,  0,  0,  0,
         0,  0,  0,  1, 30, 43])

In [359]:
score, dec_hidden = char_decoder(input)

In [360]:
# that's in fact (length, batch, self.vocab_size)
score.shape

torch.Size([20, 96, 96])

In [361]:
target = char_sequence[1:, :]

In [362]:
target.shape

torch.Size([20, 96])

### `CrossEntropy` loss

#### theory

How to compute `CrossEntropy` loss? Well in A2 we saw that: $CE(y, \hat{y}) = -\sum{y_w log(\hat{y}_w)}$ and in case we use `one-hot-encoded` vectors this equals $-log(\hat{y}_o)$ where $o$ is the correct class. Here $\hat{y}_w$ should be probabilities - in other words scores after softmax.

This is what we may see in the documentation to `pytorch` (here $x$ - one training examples):

$$loss(x, class) = -log(\frac{exp(x(class))}{\sum{exp(x_j)}})$$

To get `loss` for the batch we have to average over training examples.

#### parameters

It's easy to identify options that shoud be specified in `CrossEntropy`:

- `ignore_index (int, optional)` – specifies a target value that is ignored and does not contribute to the input gradient;
- `reduction (string, optional)` – Specifies the reduction to apply to the output: `'none' | 'mean' | 'sum'`; we need `sum`;

#### example

Let's first see at an example of `CrossEntropy` loss.

In [363]:
loss = nn.CrossEntropyLoss()

In [364]:
torch.manual_seed(42)
input = torch.randn(3, 5)

In [365]:
input

tensor([[ 0.3367,  0.1288,  0.2345,  0.2303, -1.1229],
        [-0.1863,  2.2082, -0.6380,  0.4617,  0.2674],
        [ 0.5349,  0.8094,  1.1103, -1.6898, -0.9890]])

In [366]:
torch.manual_seed(42)
target_ex = torch.randint(low=0, high=5, size=(3,), dtype=torch.long)

In [367]:
target_ex

tensor([2, 2, 1])

In [368]:
loss(input, target_ex)

tensor(1.9635)

Can we compute this amount manually?

In [369]:
np.exp(input[range(input.numpy().shape[0]), target_ex.numpy()].numpy())

array([1.2642289 , 0.52834964, 2.2464635 ], dtype=float32)

In [370]:
np.sum(np.exp(input.numpy()), axis=1)

array([ 5.3863764, 13.350886 ,  7.5455084], dtype=float32)

In [371]:
z = np.exp(input[range(input.shape[0]), target_ex.numpy()].numpy()) / np.sum(np.exp(input.numpy()), axis=1)

In [372]:
z

array([0.23470862, 0.03957412, 0.29772195], dtype=float32)

In [373]:
1.2642289 / 5.3863764

0.2347086066989303

In [374]:
np.mean(-np.log(z))

1.9635286

We may do this even easier using `softmax`.

In [375]:
s = torch.softmax(input, dim=1).numpy()

In [376]:
s

array([[0.25997174, 0.21117601, 0.23470864, 0.23374145, 0.06040223],
       [0.06216823, 0.68155295, 0.03957412, 0.11884615, 0.09785859],
       [0.22626513, 0.29772198, 0.40225777, 0.02445913, 0.04929599]],
      dtype=float32)

In [377]:
z = s[range(input.numpy().shape[0]), target_ex.numpy()]

In [378]:
z

array([0.23470864, 0.03957412, 0.29772198], dtype=float32)

In [379]:
np.mean(-np.log(z))

1.9635285

#### input

To compute loss we have to provide `input` and `target`: `output = loss(input, target)`:

- `input` has shape `(batch_size, n_classes)`, so each row contains unnormalized logits;
- `target` has shape `(batch_size,)` and contains correct class for each training example in the batch;

In [380]:
score.shape

torch.Size([20, 96, 96])

In [381]:
target.shape

torch.Size([20, 96])

In [382]:
target[0, :]

tensor([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  1, 30, 52, 30, 54,  2,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  1, 31, 54,  2,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  1, 49, 37,
        38, 48,  2,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         1, 32, 44, 43, 35, 34, 47, 34, 43, 32, 34, 66,  2,  0,  0,  0,  0,  0,
         0,  0,  0,  1, 30, 43])

In [383]:
score.view(-1, score.shape[-1]).shape

torch.Size([1920, 96])

In [384]:
target.view(-1).shape

torch.Size([1920])

In [386]:
loss = nn.CrossEntropyLoss(reduction='sum',
                           ignore_index=vocab.tgt.char2id['<pad>'])

In [387]:
loss(score.view(-1, score.shape[-1]), target.contiguous().view(-1))

tensor(2022.9661, grad_fn=<NllLossBackward>)

This concludes our debugging of the part 2. Everything else is out of scope. This assignment is too big.

## what did we crop?

Let's review our steps before we crop our data:

- get a batch of examples as `list[str]]`;
- encode them using `char` vocabulary;
- reshape them in `(max_word_len, batch_size)`, in our case `(21, 96)`; basically we stack them vertically;
- that's the place when we crop them into `input` and `target`;

In [398]:
[tgt_sents[i][:8] for i in range(len(tgt_sents))]

[['<s>', 'I', 'have', 'been', 'blown', 'away', 'by', 'this'],
 ['<s>', 'Thank', 'you', 'so', 'much,', 'Chris.', 'And', "it's"],
 ['<s>', 'And', 'I', 'say', 'that', 'sincerely,', 'partly', 'because']]

In [399]:
[len(tgt_sents[i]) for i in range(len(tgt_sents))]

[32, 26, 20]

In [400]:
target_padded_chars = vocab.tgt.to_input_tensor_char(tgt_sents, device=torch.device('cpu')) 

In [401]:
target_padded_chars.shape

torch.Size([32, 3, 21])

Can we get our examples back?

In [403]:
target_padded_chars.reshape(3, 32, 21)[:, :5, :10]

tensor([[[ 1, 90, 48, 91,  2,  0,  0,  0,  0,  0],
         [ 1, 12,  2,  0,  0,  0,  0,  0,  0,  0],
         [ 1, 37, 30, 51, 34,  2,  0,  0,  0,  0],
         [ 1, 31, 34, 34, 43,  2,  0,  0,  0,  0],
         [ 1, 31, 41, 44, 52, 43,  2,  0,  0,  0]],

        [[ 1, 90, 48, 91,  2,  0,  0,  0,  0,  0],
         [ 1, 23, 37, 30, 43, 40,  2,  0,  0,  0],
         [ 1, 54, 44, 50,  2,  0,  0,  0,  0,  0],
         [ 1, 48, 44,  2,  0,  0,  0,  0,  0,  0],
         [ 1, 42, 50, 32, 37, 66,  2,  0,  0,  0]],

        [[ 1, 90, 48, 91,  2,  0,  0,  0,  0,  0],
         [ 1,  4, 43, 33,  2,  0,  0,  0,  0,  0],
         [ 1, 12,  2,  0,  0,  0,  0,  0,  0,  0],
         [ 1, 48, 30, 54,  2,  0,  0,  0,  0,  0],
         [ 1, 49, 37, 30, 49,  2,  0,  0,  0,  0]]])

In [408]:
[[vocab.tgt.id2char[i] for i in target_padded_chars.reshape(3, 32, 21).numpy()[0, j, :8]]
 for j in range(5)
]

[['{', '<', 's', '>', '}', '<pad>', '<pad>', '<pad>'],
 ['{', 'I', '}', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>'],
 ['{', 'h', 'a', 'v', 'e', '}', '<pad>', '<pad>'],
 ['{', 'b', 'e', 'e', 'n', '}', '<pad>', '<pad>'],
 ['{', 'b', 'l', 'o', 'w', 'n', '}', '<pad>']]

In [409]:
char_sequence = target_padded_chars.view(21, 3 * 32)

In [410]:
char_sequence.shape

torch.Size([21, 96])

In [412]:
char_sequence.reshape(96, 21)[:5, :10]

tensor([[ 1, 90, 48, 91,  2,  0,  0,  0,  0,  0],
        [ 1, 12,  2,  0,  0,  0,  0,  0,  0,  0],
        [ 1, 37, 30, 51, 34,  2,  0,  0,  0,  0],
        [ 1, 31, 34, 34, 43,  2,  0,  0,  0,  0],
        [ 1, 31, 41, 44, 52, 43,  2,  0,  0,  0]])

In [413]:
char_sequence.reshape(96, 21)[32:37, :10]

tensor([[ 1, 90, 48, 91,  2,  0,  0,  0,  0,  0],
        [ 1, 23, 37, 30, 43, 40,  2,  0,  0,  0],
        [ 1, 54, 44, 50,  2,  0,  0,  0,  0,  0],
        [ 1, 48, 44,  2,  0,  0,  0,  0,  0,  0],
        [ 1, 42, 50, 32, 37, 66,  2,  0,  0,  0]])

Can we interpret somehow char_sequence without reshaping? Simple example below shows that our 21-dim vectors go in 96-dim vector one-by-one.   

In [421]:
x = np.arange(18).reshape(9, 2)

In [422]:
x

array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11],
       [12, 13],
       [14, 15],
       [16, 17]])

In [423]:
x.reshape(2, 9)

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8],
       [ 9, 10, 11, 12, 13, 14, 15, 16, 17]])

In [415]:
char_sequence[0, :]

tensor([ 1, 90, 48, 91,  2,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  1, 12,  2,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  1, 37, 30, 51, 34,  2,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  1, 31, 34, 34, 43,  2,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  1, 31, 41, 44, 52, 43,
         2,  0,  0,  0,  0,  0])

In [426]:
char_sequence[1, :]

tensor([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  1, 30, 52, 30, 54,  2,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  1, 31, 54,  2,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  1, 49, 37,
        38, 48,  2,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         1, 32, 44, 43, 35, 34, 47, 34, 43, 32, 34, 66,  2,  0,  0,  0,  0,  0,
         0,  0,  0,  1, 30, 43])

Now let's try to crop it for `input` and `target`. So we remove actually a few words from the first review. Not starting symbols.

In [428]:
target = char_sequence[1:, :]

In [429]:
target.shape

torch.Size([20, 96])

In [430]:
target[0, :]

tensor([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  1, 30, 52, 30, 54,  2,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  1, 31, 54,  2,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  1, 49, 37,
        38, 48,  2,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         1, 32, 44, 43, 35, 34, 47, 34, 43, 32, 34, 66,  2,  0,  0,  0,  0,  0,
         0,  0,  0,  1, 30, 43])

What if we first reshape back and then crop? I guess result would be better.

In [431]:
target2 = char_sequence.reshape(96, 21)[:, 1:]

In [432]:
target2.shape

torch.Size([96, 20])

In [433]:
target2[:5, :10]

tensor([[90, 48, 91,  2,  0,  0,  0,  0,  0,  0],
        [12,  2,  0,  0,  0,  0,  0,  0,  0,  0],
        [37, 30, 51, 34,  2,  0,  0,  0,  0,  0],
        [31, 34, 34, 43,  2,  0,  0,  0,  0,  0],
        [31, 41, 44, 52, 43,  2,  0,  0,  0,  0]])

Now we actually cropped `start` symbol. It's not clear at all how to crop `end` symbol.

In [434]:
!python3 sanity_check.py 2d

--------------------------------------------------------------------------------
Running Sanity Check for Question 2d: CharDecoder.decode_greedy()
--------------------------------------------------------------------------------
Sanity Check Passed for Question 2d: CharDecoder.decode_greedy()!
--------------------------------------------------------------------------------
