<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Continuous-bag-of-words-(CBOW)-model-for-text-classification" data-toc-modified-id="Continuous-bag-of-words-(CBOW)-model-for-text-classification-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Continuous bag of words (CBOW) model for text classification</a></span><ul class="toc-item"><li><span><a href="#Subjectivity-Dataset" data-toc-modified-id="Subjectivity-Dataset-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Subjectivity Dataset</a></span></li><li><span><a href="#Tokenization" data-toc-modified-id="Tokenization-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Tokenization</a></span><ul class="toc-item"><li><span><a href="#Simple-Tokenization" data-toc-modified-id="Simple-Tokenization-1.2.1"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span>Simple Tokenization</a></span></li><li><span><a href="#Much-better-tokenization-with-Spacy" data-toc-modified-id="Much-better-tokenization-with-Spacy-1.2.2"><span class="toc-item-num">1.2.2&nbsp;&nbsp;</span>Much better tokenization with Spacy</a></span></li></ul></li><li><span><a href="#Split-dataset-in-train-and-validation" data-toc-modified-id="Split-dataset-in-train-and-validation-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Split dataset in train and validation</a></span></li><li><span><a href="#Word-to-index-mapping" data-toc-modified-id="Word-to-index-mapping-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Word to index mapping</a></span></li><li><span><a href="#Sentence-encoding" data-toc-modified-id="Sentence-encoding-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Sentence encoding</a></span></li><li><span><a href="#Embedding-layer" data-toc-modified-id="Embedding-layer-1.6"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>Embedding layer</a></span></li><li><span><a href="#Continuous-Bag-of-Words-Model" data-toc-modified-id="Continuous-Bag-of-Words-Model-1.7"><span class="toc-item-num">1.7&nbsp;&nbsp;</span>Continuous Bag of Words Model</a></span></li></ul></li><li><span><a href="#Training-the-CBOW-model" data-toc-modified-id="Training-the-CBOW-model-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Training the CBOW model</a></span></li><li><span><a href="#Data-loaders-for-SGD" data-toc-modified-id="Data-loaders-for-SGD-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Data loaders for SGD</a></span></li></ul></div>

In [1]:
# import pytorch libraries
%matplotlib inline
import torch 
import torch.autograd as autograd 
import torch.nn as nn 
import torch.nn.functional as F
import torch.optim as optim
import numpy as np

# Continuous bag of words (CBOW) model for text classification
This notebook shows how to use a continuous bag of words (CBOW) model with Pytorch. The task is a text classification problem described [here]( https://people.cs.umass.edu/~miyyer/pubs/2015_acl_dan.pdf). The CBOW model was first described [here](https://arxiv.org/pdf/1301.3781.pdf)

## Subjectivity Dataset
The subjectivity dataset has 5000 subjective and 5000 objective processed sentences. To get the data:
```
wget http://www.cs.cornell.edu/people/pabo/movie-review-data/rotten_imdb.tar.gz
```

In [2]:
def unpack_dataset():
    ! wget http://www.cs.cornell.edu/people/pabo/movie-review-data/rotten_imdb.tar.gz
    ! mkdir data
    ! tar -xvf rotten_imdb.tar.gz -C data

In [3]:
unpack_dataset()

--2021-02-04 10:33:18--  http://www.cs.cornell.edu/people/pabo/movie-review-data/rotten_imdb.tar.gz
Resolving www.cs.cornell.edu (www.cs.cornell.edu)... 132.236.207.36
Connecting to www.cs.cornell.edu (www.cs.cornell.edu)|132.236.207.36|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 519599 (507K) [application/x-gzip]
Saving to: ‘rotten_imdb.tar.gz’


2021-02-04 10:33:19 (1.40 MB/s) - ‘rotten_imdb.tar.gz’ saved [519599/519599]

mkdir: cannot create directory ‘data’: File exists
quote.tok.gt9.5000
plot.tok.gt9.5000
subjdata.README.1.0


In [3]:
!ls data

aclImdb  cars.csv  plot.tok.gt9.5000  quote.tok.gt9.5000  subjdata.README.1.0


In [4]:
! head -2 data/plot.tok.gt9.5000

the movie begins in the past where a young boy named sam attempts to save celebi from a hunter . 
emerging from the human psyche and showing characteristics of abstract expressionism , minimalism and russian constructivism , graffiti removal has secured its place in the history of modern art while being created by artists who are unconscious of their artistic achievements . 


In [7]:
! head -10 data/quote.tok.gt9.5000

smart and alert , thirteen conversations about one thing is a small gem . 
color , musical bounce and warm seas lapping on island shores . and just enough science to send you home thinking . 
it is not a mass-market entertainment but an uncompromising attempt by one artist to think about another . 
a light-hearted french film about the spiritual quest of a fashion model seeking peace of mind while in a love affair with a veterinarian who is a non-practicing jew . 
my wife is an actress has its moments in looking at the comic effects of jealousy . in the end , though , it is only mildly amusing when it could have been so much more . 
works both as an engaging drama and an incisive look at the difficulties facing native americans . 
even a hardened voyeur would require the patience of job to get through this interminable , shapeless documentary about the swinging subculture . 
when perry fists a bull at the moore farm , it's only a matter of time before he gets the upper hand in m

In [5]:
from pathlib import Path
PATH = Path("data")
list(PATH.iterdir())

[PosixPath('data/quote.tok.gt9.5000'),
 PosixPath('data/aclImdb'),
 PosixPath('data/plot.tok.gt9.5000'),
 PosixPath('data/cars.csv'),
 PosixPath('data/subjdata.README.1.0')]

## Tokenization
Tokenization is the task of chopping up text into pieces, called tokens.

spaCy is an open-source software library for advanced Natural Language Processing. Here we will use it for tokenization.  

### Simple Tokenization

In [8]:
# We need each line in the file 
def read_file(path):
    """ Read file returns a list of lines.
    """
    with open(path, encoding = "ISO-8859-1") as f:
        content = f.readlines()
    return content

In [9]:
obj_lines = read_file(PATH/"plot.tok.gt9.5000")

In [10]:
obj_lines[0]

'the movie begins in the past where a young boy named sam attempts to save celebi from a hunter . \n'

In [11]:
np.array(obj_lines[0].strip().lower().split(" "))

array(['the', 'movie', 'begins', 'in', 'the', 'past', 'where', 'a',
       'young', 'boy', 'named', 'sam', 'attempts', 'to', 'save', 'celebi',
       'from', 'a', 'hunter', '.'], dtype='<U8')

### Much better tokenization with Spacy

In [12]:
#!pip install -U spacy

In [13]:
import spacy

In [14]:
# first time run this
#!python3 -m spacy download en

In [15]:
tok = spacy.load('en')

In [16]:
obj_lines = read_file(PATH/"plot.tok.gt9.5000")

In [17]:
len(obj_lines)

5000

In [18]:
obj_lines[0]

'the movie begins in the past where a young boy named sam attempts to save celebi from a hunter . \n'

In [19]:
test = tok(obj_lines[0])

In [20]:
np.array([x for x in test])

array([the, movie, begins, in, the, past, where, a, young, boy, named,
       sam, attempts, to, save, celebi, from, a, hunter, ., 
], dtype=object)

## Split dataset in train and validation

In [21]:
from sklearn.model_selection import train_test_split

In [22]:
sub_content = read_file(PATH/"quote.tok.gt9.5000")
obj_content = read_file(PATH/"plot.tok.gt9.5000")
sub_content = np.array([line.strip().lower() for line in sub_content])
obj_content = np.array([line.strip().lower() for line in obj_content])
sub_y = np.zeros(len(sub_content))
obj_y = np.ones(len(obj_content))
X = np.append(sub_content, obj_content)
y = np.append(sub_y, obj_y)

In [23]:
X[0], y[0]

('smart and alert , thirteen conversations about one thing is a small gem .',
 0.0)

In [24]:
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

In [25]:
X_train[:5], y_train[:5]

(array(['will god let her fall or give her a new path ?',
        "the director's twitchy sketchbook style and adroit perspective shifts grow wearisome amid leaden pacing and indifferent craftsmanship ( most notably wretched sound design ) .",
        "welles groupie/scholar peter bogdanovich took a long time to do it , but he's finally provided his own broadside at publishing giant william randolph hearst .",
        'based on the 1997 john king novel of the same name with a rather odd synopsis : " a first novel about a seasoned chelsea football club hooligan who represents a disaffected society operating by brutal rules .',
        'yet , beneath an upbeat appearance , she is struggling desperately with the emotional and physical scars left by the attack .'],
       dtype='<U691'),
 array([1., 0., 0., 1., 1.]))

## Word to index mapping
In interest of time we will tokenize without spaCy. Here we will compute a vocabulary of words based on the training set and a mapping from word to an index.

In [26]:
from collections import defaultdict

In [27]:
def get_vocab(content):
    """Computes Dict of counts of words.
    
    Computes the number of times a word is on a document.
    """
    vocab = defaultdict(float)
    for line in content:
        words = set(line.split())
        for word in words:
            vocab[word] += 1
    return vocab      

In [28]:
#Getting the vocabulary from the training set
word_count = get_vocab(X_train)

In [30]:
#word_count

In [29]:
len(word_count.keys())

21415

In [31]:
# let's delete words that are very infrequent
for word in list(word_count):
    if word_count[word] < 5:
        del word_count[word]

In [32]:
len(word_count.keys())

4065

In [33]:
## Finally we need an index for each word in the vocab
vocab2index = {"<PAD>":0, "UNK":1} # init with padding and unknown
words = ["<PAD>", "UNK"]
for word in word_count:
    vocab2index[word] = len(words)
    words.append(word)

In [35]:
#vocab2index

## Sentence encoding
Here we encode each sentence as a sequence of indices corresponding to each word.

In [36]:
x_train_len = np.array([len(x.split()) for x in X_train])
x_val_len = np.array([len(x.split()) for x in X_val])

In [37]:
np.percentile(x_train_len, 95) # let set the max sequence len to N=40

43.0

In [38]:
X_train[0]

'will god let her fall or give her a new path ?'

In [41]:
# returns the index of the word or the index of "UNK" otherwise
vocab2index.get("will", vocab2index["UNK"])

7

In [40]:
np.array([vocab2index.get(w, vocab2index["UNK"]) for w in X_train[0].split()])

array([ 7, 10,  2,  8, 12,  9,  3,  8, 11,  6,  4,  5])

In [42]:
def encode_sentence(s, N=40):
    enc = np.zeros(N, dtype=np.int32)
    enc1 = np.array([vocab2index.get(w, vocab2index["UNK"]) for w in s.split()])
    l = min(N, len(enc1))
    enc[:l] = enc1[:l]
    return enc

In [43]:
encode_sentence(X_train[0])

array([ 7, 10,  2,  8, 12,  9,  3,  8, 11,  6,  4,  5,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0], dtype=int32)

In [44]:
x_train_len = np.minimum(x_train_len, 40)
x_val_len = np.minimum(x_val_len, 40)

In [45]:
x_train = np.vstack([encode_sentence(x) for x in X_train])
x_train.shape

(8000, 40)

In [46]:
x_val = np.vstack([encode_sentence(x) for x in X_val])
x_val.shape

(2000, 40)

## Embedding layer
Most deep learning models use a dense vectors of real numbers as representation of words (word embeddings), as opposed to a one-hot encoding representations. The module torch.nn.Embedding is used to represent word embeddings. It takes two arguments: the vocabulary size, and the dimensionality of the embeddings. The embeddings are initialized with random vectors. 

In [47]:
# an Embedding module containing 10 words with embedding size 4
# embedding will be initialized at random
embed = nn.Embedding(10, 4, padding_idx=0)
embed.weight

Parameter containing:
tensor([[ 0.0000,  0.0000,  0.0000,  0.0000],
        [ 0.0519,  0.1758,  0.6141, -0.6515],
        [ 1.1543,  0.6676,  1.4841,  0.5959],
        [-1.6590,  0.0840, -0.2068, -1.0586],
        [ 0.6152,  1.2437,  0.6772, -1.0129],
        [ 0.4551,  0.4638, -2.9965, -0.2842],
        [ 2.2650, -1.3430,  0.5266,  0.1712],
        [ 0.9990,  1.1439,  0.8873, -1.0860],
        [-0.5740,  0.2533, -0.1649,  0.3825],
        [-0.5041, -0.6094,  1.7138,  0.4771]], requires_grad=True)

Note that the `padding_idx` has embedding vector 0.

In [49]:
# given a list of ids we can "look up" the embedding corresponing to each id
# can you see that some vectors are the same?
a = torch.LongTensor([[1,4,1,5,1,0]])
embed(a)

tensor([[[ 0.0519,  0.1758,  0.6141, -0.6515],
         [ 0.6152,  1.2437,  0.6772, -1.0129],
         [ 0.0519,  0.1758,  0.6141, -0.6515],
         [ 0.4551,  0.4638, -2.9965, -0.2842],
         [ 0.0519,  0.1758,  0.6141, -0.6515],
         [ 0.0000,  0.0000,  0.0000,  0.0000]]], grad_fn=<EmbeddingBackward>)

This would be the representation of a sentence with words with indices [1,4,1,5,1] and a padding at the end. Bellow we have an example in which we have two sentences. the first sentence has length 3 and the last sentence has length 2. In order to use a tensor we use padding at the end of the second sentence. 

In [50]:
a = torch.LongTensor([[1,4,1], [1,3,0]])

Our model takes an average of the word embedding of each word. Here is how we do it.

In [51]:
s = torch.FloatTensor([3, 2]) # here is the size of the vector

In [52]:
embed(a)

tensor([[[ 0.0519,  0.1758,  0.6141, -0.6515],
         [ 0.6152,  1.2437,  0.6772, -1.0129],
         [ 0.0519,  0.1758,  0.6141, -0.6515]],

        [[ 0.0519,  0.1758,  0.6141, -0.6515],
         [-1.6590,  0.0840, -0.2068, -1.0586],
         [ 0.0000,  0.0000,  0.0000,  0.0000]]], grad_fn=<EmbeddingBackward>)

In [49]:
embed(a).sum(dim=1)

tensor([[-0.0220, -2.9403,  1.4205, -1.3848],
        [ 1.7230, -2.5306, -2.2319, -1.2701]], grad_fn=<SumBackward1>)

In [53]:
sum_embs = embed(a).sum(dim=1) 
sum_embs/ s.view(s.shape[0], 1)

tensor([[ 0.2396,  0.5318,  0.6351, -0.7720],
        [-0.8036,  0.1299,  0.2036, -0.8551]], grad_fn=<DivBackward0>)

## Continuous Bag of Words Model

In [66]:
class CBOW(nn.Module):
    def __init__(self, vocab_size, emb_size=100):
        super(CBOW, self).__init__()
        self.word_emb = nn.Embedding(vocab_size, emb_size, padding_idx=0)
        self.linear = nn.Linear(emb_size, 1)
        
    def forward(self, x, s):
        x = self.word_emb(x)
        x = x.sum(dim=1)/ s
        x = self.linear(x)
        return x

In [65]:
model = CBOW(vocab_size=5, emb_size=3)

In [57]:
model.word_emb.weight

Parameter containing:
tensor([[ 0.0000,  0.0000,  0.0000],
        [ 0.1589, -1.9838,  0.8531],
        [-0.6887,  0.4271,  0.8066],
        [-0.4199,  0.1744,  0.5959],
        [ 0.3572, -0.9152, -0.3212]], requires_grad=True)

In [54]:
s = s.view(s.shape[0], 1)
model(a, s)

tensor([[0.5772],
        [0.3820]], grad_fn=<AddmmBackward>)

# Training the CBOW model 

In [67]:
V = len(words)
model = CBOW(vocab_size=V, emb_size=500)
print(V)

4067


In [68]:
def val_metrics(model):
    model.eval()
    x = torch.LongTensor(x_val) #.cuda()
    y = torch.Tensor(y_val).unsqueeze(1) #).cuda()
    s = torch.Tensor(x_val_len).view(x_val_len.shape[0], 1)
    y_hat = model(x, s)
    loss = F.binary_cross_entropy_with_logits(y_hat, y)
    y_pred = y_hat > 0
    correct = (y_pred.float() == y).float().sum()
    accuracy = correct/y_pred.shape[0]
    return loss.item(), accuracy.item()

In [69]:
# accuracy of a random model should be around 0.5
val_metrics(model)

(0.7028073072433472, 0.49050000309944153)

In [70]:
def train_epocs(model, epochs=10, lr=0.01):
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)
    for i in range(epochs):
        model.train()
        x = torch.LongTensor(x_train)  #.cuda()
        y = torch.Tensor(y_train).unsqueeze(1)
        s = torch.Tensor(x_train_len).view(x_train_len.shape[0], 1)
        y_hat = model(x, s)
        loss = F.binary_cross_entropy_with_logits(y_hat, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        val_loss, val_accuracy = val_metrics(model)
        print("train_loss %.3f val_loss %.3f val_accuracy %.3f" % (loss.item(), val_loss, val_accuracy))

In [85]:
model = CBOW(vocab_size=V, emb_size=25)

In [86]:
train_epocs(model, epochs=20, lr=0.2)

train_loss 0.706 val_loss 0.672 val_accuracy 0.549
train_loss 0.668 val_loss 0.548 val_accuracy 0.804
train_loss 0.536 val_loss 0.432 val_accuracy 0.845
train_loss 0.401 val_loss 0.333 val_accuracy 0.882
train_loss 0.290 val_loss 0.283 val_accuracy 0.892
train_loss 0.225 val_loss 0.254 val_accuracy 0.902
train_loss 0.176 val_loss 0.257 val_accuracy 0.899
train_loss 0.150 val_loss 0.258 val_accuracy 0.910
train_loss 0.123 val_loss 0.274 val_accuracy 0.907
train_loss 0.106 val_loss 0.293 val_accuracy 0.908
train_loss 0.088 val_loss 0.319 val_accuracy 0.905
train_loss 0.073 val_loss 0.350 val_accuracy 0.901
train_loss 0.062 val_loss 0.379 val_accuracy 0.897
train_loss 0.050 val_loss 0.412 val_accuracy 0.893
train_loss 0.042 val_loss 0.448 val_accuracy 0.892
train_loss 0.035 val_loss 0.485 val_accuracy 0.888
train_loss 0.029 val_loss 0.525 val_accuracy 0.882
train_loss 0.024 val_loss 0.561 val_accuracy 0.879
train_loss 0.020 val_loss 0.592 val_accuracy 0.878
train_loss 0.016 val_loss 0.623

In [82]:
train_epocs(model, epochs=10, lr=0.01)

train_loss 0.047 val_loss 0.386 val_accuracy 0.889
train_loss 0.045 val_loss 0.388 val_accuracy 0.888
train_loss 0.044 val_loss 0.390 val_accuracy 0.888
train_loss 0.042 val_loss 0.392 val_accuracy 0.887
train_loss 0.041 val_loss 0.394 val_accuracy 0.887
train_loss 0.040 val_loss 0.397 val_accuracy 0.887
train_loss 0.039 val_loss 0.399 val_accuracy 0.888
train_loss 0.037 val_loss 0.402 val_accuracy 0.889
train_loss 0.036 val_loss 0.405 val_accuracy 0.889
train_loss 0.035 val_loss 0.408 val_accuracy 0.888


# Data loaders for SGD

Nearly all of deep learning is powered by one very important algorithm: **stochastic gradient descent (SGD)**. SGD can be seeing as an approximation of **gradient descent** (GD). In GD you have to run through *all* the samples in your training set to do a single itaration. In SGD you use *only one* or *a subset*  of training samples to do the update for a parameter in a particular iteration. The subset use in every iteration is called a **batch** or **minibatch**.

In [61]:
from torch.utils.data import Dataset, DataLoader

Next we are going to create a data loader. The data loader provides the following features:
* Batching the data
* Shuffling the data
* Load the data in parallel using multiprocessing workers.

In [62]:
def encode_sentence2(s, N=40):
    enc = np.zeros(N, dtype=np.int32)
    enc1 = np.array([vocab2index.get(w, vocab2index["UNK"]) for w in s.split()])
    l = min(N, len(enc1))
    enc[:l] = enc1[:l]
    return enc, l

In [63]:
encode_sentence2(X_train[0])

(array([ 5,  8,  2, 12, 10,  9,  4, 12, 11,  6,  7,  3,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  0], dtype=int32),
 12)

In [64]:
class SubjectivityDataset(Dataset):
    def __init__(self, X, y):
        self.x = X
        self.y = y
    
    def __len__(self):
        return len(self.y)
    
    def __getitem__(self, idx):
        x = self.x[idx]
        x, s = encode_sentence2(x)
        return x, self.y[idx], s
    
sub_dataset_train = SubjectivityDataset(X_train, y_train)

In [65]:
train_loader = DataLoader(sub_dataset_train, batch_size=5, shuffle=True)
x, y, s = next(iter(train_loader))

In [66]:
x, y, s

(tensor([[ 264,  130, 2552,   47,    1,   34,  240, 1384,   70,   11,    1,    1,
            25,   42,  730,  768,  130,  173,   34,   21,   42,  467, 1783,   41,
            25,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
             0,    0,    0,    0],
         [3398, 1729,   11, 3630, 1569,   34,    1,   13, 2092,   47,    1,   34,
           453,   21, 1952,   96,   11, 1487,    1,    1,   47,   11,  153,   25,
             0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
             0,    0,    0,    0],
         [  13,  451,  872,   81,  427,   60,   11, 1281,  178,   18,    1,   22,
            25,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
             0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
             0,    0,    0,    0],
         [  13,  120,  689,    1,   31,    1,  414,    1,   21,    1,    1,   81,
          1220,   41,    1,   30,  314,   37,    1, 1659,   34, 2778, 1154,

In [67]:
model = CBOW(vocab_size=V, emb_size=50)

In [68]:
train_loader = DataLoader(sub_dataset_train, batch_size=500, shuffle=True)

In [69]:
def train_epocs(model, epochs=10, lr=0.01):
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)
    for i in range(epochs):
        total_loss = 0
        total = 0
        model.train()
        for x, y, s in train_loader:
            x = x.type(torch.LongTensor)  #.cuda()
            y = y.type(torch.FloatTensor).unsqueeze(1)
            s = s.type(torch.Tensor).view(s.shape[0], 1)
            y_hat = model(x, s)
            loss = F.binary_cross_entropy_with_logits(y_hat, y)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            total_loss += x.size(0)*loss.item()
            total += x.size(0)
        train_loss = total_loss/total
        val_loss, val_accuracy = val_metrics(model)
        
        print("train_loss %.3f val_loss %.3f val_accuracy %.3f" % (train_loss, val_loss, val_accuracy))

In [70]:
train_epocs(model, epochs=10)

train_loss 0.648 val_loss 0.586 val_accuracy 0.788
train_loss 0.493 val_loss 0.412 val_accuracy 0.854
train_loss 0.323 val_loss 0.301 val_accuracy 0.881
train_loss 0.229 val_loss 0.256 val_accuracy 0.892
train_loss 0.178 val_loss 0.240 val_accuracy 0.900
train_loss 0.147 val_loss 0.235 val_accuracy 0.904
train_loss 0.124 val_loss 0.237 val_accuracy 0.903
train_loss 0.107 val_loss 0.242 val_accuracy 0.905
train_loss 0.093 val_loss 0.245 val_accuracy 0.905
train_loss 0.081 val_loss 0.252 val_accuracy 0.902
