# **Assignment 1 - Language Models**
#### **Due: September 27 (Tuesday), 2022**

## **Notes**

### **Introduction**

Welcome to CSE 527A. The goal for the first assignment is to make sure you are familiar with all the tools you need to complete the programming assignments for the course. 

Each assignment contains two parts: a written and coding portion. The coding portion for each homework assignment will be delivered through a Colaboratory notebook such as this one. Please use as many as code and markdown cells to run and explain all the steps you took in order to answer each question.

### **Comments/Documentation**

Please follow PEP 8 style guidelines (https://peps.python.org/pep-0008/) for commenting your code. Furthermore, please remember to manually save your work once in a while. If you are connected to a hosted runtime that if for whatever reason it disconnects you will have to rerun all connected code cells.

### **Getting Started**

In order to compile code efficiently please pay attention to if you are using a hardware accelerator or not. If you are directly calling libraries like Tensorflow, Keras, or Pytorch, it is advised to switch to a GPU.

To access a GPU, go to `Edit->Notebook settings` and in the `Hardware accelerator` dropdown choose `GPU`. 
As soon as you run a code cell, you will be connected to a cloud instance with a GPU.
Try running the code cell below to check that a GPU is connected (select the cell then either click the play button at the top left or press `Ctrl+Enter` or `Shift+Enter`).

The free version of Google Colab will provide the necessary hardware for this course. Please keep in mind the RAM and Disk Space that you are allocated and that you are not given an infinite active runtime.

If your local machine has a GPU that you find outperforms the cloud GPU then you can follow the necessary documentation to use a GPU with your environment.

### **Lost GPU/TPU Access on Colab**

If you are not allocated a GPU or cannot connect to a GPU (limits are reached for Collab), Kaggle also provides free access to GPUs and TPUs. Please transfer your work to a Kaggle runtime instance by downloading your file on Colab as a '.ipynb' file and importing the file into Kaggle.

### **Submission Instructions**

We will use Gradescope for assignment submission. You can upload files individually or as part of a zip file, but if using a zip file be sure you are zipping the files directly and not a folder that contains them. Please note if designated output is cleared, you will receive a 0.

To download this notebook, go to `File->Download .ipynb`.  Please rename the file to match the name in our file list. 

When submitting your ipython notebooks, make sure everything runs correctly if the cells are executed in order starting from a fresh session.  Note that just because a cell runs in your current session doesn't mean it doesn't rely on code that you have already changed or deleted.  If the code doesn't take too long to run, we recommend re-running everything with `Runtime->Restart and run all...`.

When you upload your submission to the Gradescope assignment, you should get immediate feedback that confirms your submission was processed correctly. Note that Gradesope will allow you to submit multiple times before the deadline, and we will use the latest submission for grading.

## **Setup**

In [None]:
from google.colab import drive # one option to load datasets
from google.colab import files
drive.mount('/content/gdrive')
!nvidia-smi -L # check if using GPU

## **Problem 1**

## **1.1**

Write a program to compute unsmoothed unigrams, bigrams, and trigrams (you may not import nltk).

In [88]:
def unigram(word):
    unigram1 = {}
    for i in word:
        if i not in unigram1:
            unigram1[i] = 1
        else:

            unigram1[i] += 1
    
    return unigram1


def bigram(word):
    bigram1 = {}
    for i in range(len(word)-1):
        com = word[i] +" "+ word[i+1]
        if com not in bigram1:
            bigram1[com] = 1
        else:
            bigram1[com] += 1

    return bigram1


def trigram(word):
    trigram1 = {}
    for i in range(len(word)-2):
        com = word[i] +" "+ word[i+1] + " " + word[i+2]
        if com not in trigram1:
            trigram1[com] = 1
        else:
            trigram1[com] += 1

    return trigram1



In [89]:
def calculate(count_1, count_2, count_3):
    sum = 0
    for i in count_1:
        sum += count_1[i]
    pro_unigram = {}
    for i in count_1:
        pro_unigram[i] = count_1[i] / sum

    
    pro_bigram = {}
    for i in count_2:
        divid = i.split(" ")[1]
        pro_bigram[i] = count_2[i] / count_1[divid]

    
    pro_trigram = {} 
    keys = list(count_3.keys())

    cur = []
    for i in range(len(keys)):
        cur.extend(keys[i].split(" "))
        com_did = cur[1] + ' ' + cur[2]
        divend = count_2[com_did]
        par = count_3[keys[i]] / divend
        pro_trigram[keys[i]] = par


    
    return pro_unigram, pro_bigram, pro_trigram
        
    
    


## **1.2**

Train your model on the Wikitext-2-v1 training corpus (https://huggingface.co/datasets/wikitext). Explain the differences between your most common unigrams, bigrams, trigrams (pad beginning and end of your sentences and please remove puncutation and unknown tokens from corpus).


In [90]:
## your code here
!pip install datasets
import unicodedata
from datasets import list_datasets, load_dataset
import string


dataset = load_dataset("wikitext", 'wikitext-2-v1' )
print(dataset)


            

def process_data(wiki_data):
    pun = get_punctuation()

    pun.add('<unk>')


    vocab = {'pad': 0, '<start>': 1, '<end>': 2}
    data = {"train": [], "validation": [], "test": []}

    for type in ['test', 'train', 'validation']:
        for idx, line in enumerate(wiki_data[type]):
            text = line['text'].strip()
            words = [i.strip() for i in text.split() if i.strip() not in pun]
            if len(words) == 0:
                continue
            else:
                for word in words:
                    if word not in vocab:
                        vocab[word] = len(vocab)
  
                data[type].append(" ".join(words))

    return data, vocab



def get_punctuation():
    pun = set()
    for cp in range(17 * 65536):
        char = chr(cp)
        if ((cp >= 33 and cp <= 47) or (cp >= 58 and cp <= 64) or
                (cp >= 91 and cp <= 96) or (cp >= 123 and cp <= 126)):
            pun.add(char)
        cat = unicodedata.category(char)
        if cat.startswith("P"):
            pun.add(char)
    return pun




Found cached dataset wikitext (/Users/liuzijie/.cache/huggingface/datasets/wikitext/wikitext-2-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)


  0%|          | 0/3 [00:00<?, ?it/s]

DatasetDict({
    test: Dataset({
        features: ['text'],
        num_rows: 4358
    })
    train: Dataset({
        features: ['text'],
        num_rows: 36718
    })
    validation: Dataset({
        features: ['text'],
        num_rows: 3760
    })
})


## **1.3**

Calculate the perplexity for each n-gram (unigram, bigram, and trigram) on all splits (traning, validataion, test sets of Wikitext-2). And, without writing code, discuss what might happen to the perplexity if you continue to increase the number of words in your n-gram (4-gram, 5-gram, etc.)?

In [91]:
import numpy as np

def cal_perplexity(pro):
    p = 0
    values_list = list(pro.values())
    # print(len(values_list))
    for i in values_list:
        p += np.log(i)
    #     # p = math.pow(pro[keys_list[i]], -1/len(values_list))
        a = - p / len(values_list)
        perp = 2 ** a 

    return perp 




In [92]:
data, vocab = process_data(dataset)

for i in ["test", "validation", "train"]:
    sign_data = ['start']
    for j in data[i]:
        sign_data.extend(j.split(" "))
    sign_data.append('end')
    
    count_1 = unigram(sign_data)
    count_2 = bigram(sign_data)
    count_3 = trigram(sign_data)
    
    pro_unigram, pro_bigram, pro_trigram = calculate(count_1, count_2, count_3)
    
    
    result1 = cal_perplexity(pro_unigram)
    result2 = cal_perplexity(pro_bigram)
    result3 = cal_perplexity(pro_trigram)
    print("The perplexity of unigram in " + i + " set is :", result1) #2003.057839147159
    print("The perplexity of bigram in " + i + " set is :", result2)
    print("The perplexity of Trigram in " + i + " set is :", result3)


The perplexity of unigram in test set is : 1985.554379948258
The perplexity of bigram in test set is : 15.610586099937263
The perplexity of Trigram in test set is : 0.9494600761856709
                                                               
The perplexity of unigram in validation set is : 1883.698481889108
The perplexity of bigram in validation set is : 14.681681512382676
The perplexity of Trigram in validation set is : 4.70062424354332
                                                               
The perplexity of unigram in train set is : 2003.057839147159
The perplexity of bigram in train set is : 98.2681028472694
The perplexity of Trigram in train set is : 6.4820862937523


## **1.4**

Enable Laplace smoothing and Add-K smoothing (0.1,0.05,0.01) to your code. Discuss the changes in perplexity values between n-grams as you try different smoothing methods/values.

In [93]:
## your code here


def calculate_laplace(count_1, count_2, count_3):
    sum = 0
    for i in count_1:
        sum += count_1[i]
    pro_unigram = {}
    for i in count_1:
        pro_unigram[i] = (count_1[i] + 1) / (sum + len(vocab))

    
    pro_bigram = {}
    for i in count_2:
        divid = i.split(" ")[1]
        pro_bigram[i] = (count_2[i] + 1) / (count_1[divid] + len(vocab))

    
    pro_trigram = {} 
    keys = list(count_3.keys())
    cur = []
    for i in range(len(keys)):
        cur.extend(keys[i].split(" "))
        com_did = cur[1] + ' ' + cur[2]
        divend = count_2[com_did]
        par = count_3[keys[i]] + 1 / divend + len(vocab)
        pro_trigram[keys[i]] = par
    
    return pro_unigram, pro_bigram, pro_trigram


  
def calculate_addK(K, count_1, count_2, count_3):
    sum = 0
    for i in count_1:
        sum += count_1[i]

    pro_unigram = {}
    for i in count_1:
        pro_unigram[i] = (count_1[i] + K) / (sum + K * len(vocab))

    
    pro_bigram = {}
    for i in count_2:
        divid = i.split(" ")[1]
        pro_bigram[i] = (count_2[i] + K) / (count_1[divid] + K * len(vocab))

    
    pro_trigram = {} 
    keys = list(count_3.keys())
    cur = []
    for i in range(len(keys)):
        cur.extend(keys[i].split(" "))
        com_did = cur[1] + ' ' + cur[2]
        divend = count_2[com_did]
        par = (count_3[keys[i]] + K) / (divend + K * len(vocab))
        pro_trigram[keys[i]] = par
    
    return pro_unigram, pro_bigram, pro_trigram





In [97]:
data, vocab = process_data(dataset)

for i in ["test", "validation"]:
    sign_data = ['start']
    for j in data[i]:
        sign_data.extend(j.split(" "))
    sign_data.append('end')
    
    count_1 = unigram(sign_data)
    count_2 = bigram(sign_data)
    count_3 = trigram(sign_data)
    
    pro_unigram_la, pro_bigram_la, pro_trigram_la = calculate_laplace(count_1, count_2, count_3) 
    
    result1 = cal_perplexity(pro_unigram)
    result2 = cal_perplexity(pro_bigram)
    result3 = cal_perplexity(pro_trigram)
    print("The perplexity of unigram in " + i + "set with laplace smoothing is:", result1) 
    print("The perplexity of bigram in " + i + "set with laplace smoothing is:", result2)
    print("The perplexity of Trigram in " + i + "set with laplace smoothing is:", result3)
    
    for j in [0.1, 0.05, 0.01]:
        pro_unigram, pro_bigram, pro_trigram = calculate_addK( j, count_1, count_2, count_3)
        result1 = cal_perplexity(pro_unigram)
        result2 = cal_perplexity(pro_bigram)
        result3 = cal_perplexity(pro_trigram)
        print("K=",j)
        print("The perplexity of unigram in " + i + "set with add-K smoothing is :", result1) #2003.057839147159
        print("The perplexity of Bigram in " + i + "set with add-K smoothing is :", result2)
        print("The perplexity of Trigram in " + i + "set with add-K smoothing is :", result3)




The perplexity of unigram in testset with laplace smoothing is: 4020.891842558921
The perplexity of bigram in testset with laplace smoothing is: 100.03398193161631
The perplexity of Trigram in testset with laplace smoothing is: 55.375952726011484
K= 0.1
The perplexity of unigram in testset with add-K smoothing is : 1944.4564747625152
The perplexity of Bigram in testset with add-K smoothing is : 248.16163719375461
The perplexity of Trigram in testset with add-K smoothing is : 246.1126345233863
K= 0.05
The perplexity of unigram in testset with add-K smoothing is : 1964.347637783388
The perplexity of Bigram in testset with add-K smoothing is : 169.49316229450773
The perplexity of Trigram in testset with add-K smoothing is : 157.01515024815285
K= 0.01
The perplexity of unigram in testset with add-K smoothing is : 1981.2009225578959
The perplexity of Bigram in testset with add-K smoothing is : 73.94672662303982
The perplexity of Trigram in testset with add-K smoothing is : 52.88515163727598

In [96]:
data, vocab = process_data(dataset)

for i in ["train"]:
    sign_data = ['start']
    for j in data[i]:
        sign_data.extend(j.split(" "))
    sign_data.append('end')
    
    count_1 = unigram(sign_data)
    count_2 = bigram(sign_data)
    count_3 = trigram(sign_data)
    
    pro_unigram_la, pro_bigram_la, pro_trigram_la = calculate_laplace(count_1, count_2, count_3) 
    
    result1 = cal_perplexity(pro_unigram)
    result2 = cal_perplexity(pro_bigram)
    result3 = cal_perplexity(pro_trigram)
    print("The perplexity of unigram in " + i + "set with laplace smoothing is:", result1) 
    print("The perplexity of bigram in " + i + "set with laplace smoothing is:", result2)
    print("The perplexity of Trigram in " + i + "set with laplace smoothing is:", result3)
    
    for j in [0.1, 0.05, 0.01]:
        pro_unigram, pro_bigram, pro_trigram = calculate_addK( j, count_1, count_2, count_3)
        result1 = cal_perplexity(pro_unigram)
        result2 = cal_perplexity(pro_bigram)
        result3 = cal_perplexity(pro_trigram)
        print("K=",j)
        print("The perplexity of unigram in " + i + "set with add-K smoothing is :", result1) #2003.057839147159
        print("The perplexity of Bigram in " + i + "set with add-K smoothing is :", result2)
        print("The perplexity of Trigram in " + i + "set with add-K smoothing is :", result3)
    break



The perplexity of unigram in trainset with laplace smoothing is: 1981.2009225578959
The perplexity of bigram in trainset with laplace smoothing is: 73.94672662303982
The perplexity of Trigram in trainset with laplace smoothing is: 52.88515163727598
K= 0.1
The perplexity of unigram in trainset with add-K smoothing is : 3988.3851522610685
The perplexity of Bigram in trainset with add-K smoothing is : 281.8580179853999
The perplexity of Trigram in trainset with add-K smoothing is : 242.465570050648
K= 0.05
The perplexity of unigram in trainset with add-K smoothing is : 4006.3164333053182
The perplexity of Bigram in trainset with add-K smoothing is : 201.68171795263464
The perplexity of Trigram in trainset with add-K smoothing is : 155.7188918604515
K= 0.01
The perplexity of unigram in trainset with add-K smoothing is : 4020.891842558921
The perplexity of Bigram in trainset with add-K smoothing is : 100.03398193161631
The perplexity of Trigram in trainset with add-K smoothing is : 55.37595

## **Problem 2**



(Eisenstein Ch. 6) Using the Pytorch library, train an LSTM language model from the same Wikitext training corpus you used in problem 1. After each epoch of training, compute its perplexity on the Wikitext validation corpus. Stop training when the perplexity stops improving.

1. Fully describe your model architecture, hyperparameters, and experimental procedure.
2. After each epoch of training, compute your LM’s perplexity on the development data. Plot the development perplexity against # of epochs. Additionally, compute and report the perplexity on test
data.
3. Compare experimental results such as perplexity and training time between your n-gram and neural models (include smoothed and unsmooth n-grams). Provide graphs that demonstrate your
results.


In [None]:
## your code here

## your code here
# coding: utf-8
import time
import math
import os
import torch
import torch.nn as nn
import os
from io import open
import torch
import json
import math
import torch
import torch.nn as nn
import torch.nn.functional as F

from tqdm import tqdm
from datasets import load_dataset
import unicodedata



class Config:
    def __init__(self):

        self.emsize = 200
        self.nhid = 200
        self.nlayers = 3
        self.lr = 20.0
        self.clip = 0.25

        self.epochs = 4
        self.batch_size = 20
        # seq length
        self.bptt = 35
        self.dropout = 0.2
        self.tied = False
        self.seed = 123
        self.log_interval = 500
        self.save = 'model.pt'

args = Config()

torch.manual_seed(args.seed)

if torch.cuda.is_available():
    device = torch.device("cuda")
else:
    device = torch.device("cpu")

# get punctuationc
def get_punctuation():
    pun = set()
   
    for cp in range(17 * 65536):
        char = chr(cp)
        if ((cp >= 33 and cp <= 47) or (cp >= 58 and cp <= 64) or
                (cp >= 91 and cp <= 96) or (cp >= 123 and cp <= 126)):
            pun.add(char)
        cat = unicodedata.category(char)
        if cat.startswith("P"):
            pun.add(char)
    return pun


puns = get_punctuation()


class Dictionary(object):
    def __init__(self):
        self.word2idx = {"<unk>":0}
        self.idx2word = ["<unk>"]
        

    def add_word(self, word):
        if word not in self.word2idx:
            self.idx2word.append(word)
            self.word2idx[word] = len(self.idx2word) - 1
        return self.word2idx[word]

    def __len__(self):
        return len(self.idx2word)


class Corpus(object):
    def __init__(self):
        self.dictionary = Dictionary()
        data = load_dataset("wikitext", "wikitext-2-v1")
        print(f"raw train sample: {data['train']}")
        print(f"raw train sample: {data['validation']}")
        print(f"raw train sample: {data['test']}")
        # validation
        self.train = self.tokenize(data['train'])
        self.valid = self.tokenize(data['validation'])
        self.test = self.tokenize(data['test'])
        print(f"vocab size: {len(self.dictionary.word2idx)}")
        

    def tokenize(self, data):
      
        for line in data:
            line = line['text'].strip()
            words = line.split() + ['<eos>']
            for word in words:
                if word in puns:
                    continue
                self.dictionary.add_word(word)

        idss = []
        for line in data:
            line = line['text'].strip()
            words = line.split() + ['<eos>']
            ids = []
            for word in words:
                if word in puns:
                    continue
                ids.append(self.dictionary.word2idx[word])
            idss.append(torch.tensor(ids).type(torch.int64))
        ids = torch.cat(idss)

        return ids


###############################################################################
# Model architecture
###############################################################################

    
class RNNModel(nn.Module):
    """Container module with an encoder, a recurrent module, and a decoder."""

    def __init__(self, ntoken, ninp, nhid, nlayers, dropout=0.5, tie_weights=False):
        super(RNNModel, self).__init__()
        self.ntoken = ntoken
        self.drop = nn.Dropout(dropout)
        self.encoder = nn.Embedding(ntoken, ninp)
        self.rnn = nn.LSTM(ninp, nhid, nlayers, dropout=dropout)

        self.decoder = nn.Linear(nhid, ntoken)

        if tie_weights:
            if nhid != ninp:
                raise ValueError('When using the tied flag, nhid must be equal to emsize')
            self.decoder.weight = self.encoder.weight

        self.init_weights()

        self.nhid = nhid
        self.nlayers = nlayers

    def init_weights(self):
        initrange = 0.1
        nn.init.uniform_(self.encoder.weight, -initrange, initrange)
        nn.init.zeros_(self.decoder.bias)
        nn.init.uniform_(self.decoder.weight, -initrange, initrange)

    def forward(self, input, hidden):
        emb = self.drop(self.encoder(input))
        output, hidden = self.rnn(emb, hidden)
        output = self.drop(output)
        decoded = self.decoder(output)
        decoded = decoded.view(-1, self.ntoken)
        return F.log_softmax(decoded, dim=1), hidden

    def init_hidden(self, bsz):
        weight = next(self.parameters())
        return (weight.new_zeros(self.nlayers, bsz, self.nhid),
                    weight.new_zeros(self.nlayers, bsz, self.nhid))
      





###############################################################################
# Load data
###############################################################################

corpus = Corpus()


def batchify(data, bsz):
    # Work out how cleanly we can divide the dataset into bsz parts.
    nbatch = data.size(0) // bsz
    # Trim off any extra elements that wouldn't cleanly fit (remainders).
    data = data.narrow(0, 0, nbatch * bsz)
    # Evenly divide the data across the bsz batches.
    data = data.view(bsz, -1).t().contiguous()
    return data.to(device)

eval_batch_size = 10
train_data = batchify(corpus.train, args.batch_size)
val_data = batchify(corpus.valid, eval_batch_size)
test_data = batchify(corpus.test, eval_batch_size)

###############################################################################
# Build the model
###############################################################################

ntokens = len(corpus.dictionary)

model = RNNModel(ntokens, args.emsize, args.nhid, args.nlayers, args.dropout, args.tied).to(device)

criterion = nn.NLLLoss()

###############################################################################
# Training code
###############################################################################

def repackage_hidden(h):
    """Wraps hidden states in new Tensors, to detach them from their history."""

    if isinstance(h, torch.Tensor):
        return h.detach()
    else:
        return tuple(repackage_hidden(v) for v in h)



def get_batch(source, i):
    seq_len = min(args.bptt, len(source) - 1 - i)
    data = source[i:i+seq_len]
    target = source[i+1:i+1+seq_len].view(-1)
    return data, target


def evaluate(data_source):
    # Turn on evaluation mode which disables dropout.
    model.eval()
    total_loss = 0.
    ntokens = len(corpus.dictionary)
    hidden = model.init_hidden(eval_batch_size)
    with torch.no_grad():
        for i in range(0, data_source.size(0) - 1, args.bptt):
            data, targets = get_batch(data_source, i)
            output, hidden = model(data, hidden)
            hidden = repackage_hidden(hidden)
            total_loss += len(data) * criterion(output, targets).item()
    return total_loss / (len(data_source) - 1)


def train():
    # Turn on training mode which enables dropout.
    model.train()
    total_loss = 0.
    start_time = time.time()
    ntokens = len(corpus.dictionary)
    hidden = model.init_hidden(args.batch_size)
    for batch, i in enumerate(range(0, train_data.size(0) - 1, args.bptt)):
        data, targets = get_batch(train_data, i)
        # Starting each batch, we detach the hidden state from how it was previously produced.
        # If we didn't, the model would try backpropagating all the way to start of the dataset.
        model.zero_grad()
        
        hidden = repackage_hidden(hidden)
        output, hidden = model(data, hidden)
        loss = criterion(output, targets)
        loss.backward()

        # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
        torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip)
        for p in model.parameters():
            p.data.add_(p.grad, alpha=-lr)

        total_loss += loss.item()

        if batch % args.log_interval == 0 and batch > 0:
            cur_loss = total_loss / args.log_interval
            elapsed = time.time() - start_time
            print('| epoch {:3d} | {:5d}/{:5d} batches | lr {:02.2f} | ms/batch {:5.2f} | '
                    'loss {:5.2f} | ppl {:8.2f}'.format(
                epoch, batch, len(train_data) // args.bptt, lr,
                elapsed * 1000 / args.log_interval, cur_loss, math.exp(cur_loss)))
            total_loss = 0
            start_time = time.time()
        


# Loop over epochs.
lr = args.lr
best_val_loss = None

records = []
# At any point you can hit Ctrl + C to break out of training early.
try:
    for epoch in range(1, args.epochs+1):
        record = {"epoch":epoch}
        epoch_start_time = time.time()
        train()
        record['epoch_cost_time'] = time.time() - epoch_start_time
        val_loss = evaluate(val_data)
        

        print('-' * 89)
        print('| end of epoch {:3d} | time: {:5.2f}s | valid loss {:5.2f} | '
                'valid ppl {:8.2f}'.format(epoch, (time.time() - epoch_start_time),
                                           val_loss, math.exp(val_loss)))
        record['valid_loss'] = val_loss
        record['vaild_ppl'] = math.exp(val_loss)
        print('-' * 89)
        # Save the model if the validation loss is the best we've seen so far.
        if not best_val_loss or val_loss < best_val_loss:
            with open(args.save, 'wb') as f:
                torch.save(model, f)
            best_val_loss = val_loss
        else:
            # Anneal the learning rate if no improvement has been seen in the validation dataset.
            lr /= 4.0
        # Run on test data.
        test_loss = evaluate(test_data)
        print('=' * 89)
        print('| end of epoch {:3d} | test loss {:5.2f} | test ppl {:8.2f}'.format(epoch, test_loss, math.exp(test_loss)))
        record['test_loss'] = test_loss
        record['test_ppl'] = math.exp(test_loss)
        print('=' * 89)
        print(record)
        records.append(record)
        with open("records.json", "w", encoding="utf-8") as f:
            f.write("\n".join([json.dumps(i, ensure_ascii=False) for i in records]))
        

except KeyboardInterrupt:
    print('-' * 89)
    print('Exiting from training early')


    
###############################################################################
# Print info
###############################################################################

for record in records:
    print(record)

In [None]:
# PLOT

import matplotlib.pyplot as plt
 
name_list = ['Monday','Tuesday','Friday','Sunday']
num_list = [1985.554379948258,0.6,7.8,6]
num_list1 = [1,2,3,1]
x =list(range(len(num_list)))
total_width, n = 0.8, 2
width = total_width / n
 
plt.bar(x, num_list, width=width, label='boy',fc = 'y')
for i in range(len(x)):
    x[i] = x[i] + width
plt.bar(x, num_list1, width=width, label='girl',tick_label = name_list,fc = 'r')
plt.legend()
plt.show()
