<a href="https://colab.research.google.com/github/ericburdett/cs673-personal-tutor/blob/master/Personal_Tutor.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Personal Tutor

This notebook contains code for the Personal Tutor System built for CS673: Computational Creativity.


## Imports

In [0]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import numpy as np
import matplotlib.pyplot as plt
from torchvision import transforms, utils, datasets
from tqdm import tqdm
from torch.nn.parameter import Parameter
import pdb
import torchvision
import os
import gzip
import tarfile
from PIL import Image, ImageOps
import gc
import pdb
import pandas as pd
from IPython.core.ultratb import AutoFormattedTB
__ITB__ = AutoFormattedTB(mode = 'Verbose',color_scheme='LightBg', tb_offset = 1)

assert torch.cuda.is_available(), "Request a GPU from Runtime > Change Runtime"

## Word Distribution

In [6]:
# Download the simple word distribution from GitHub
!wget -O word_dist_full.csv https://raw.githubusercontent.com/ericburdett/cs673-personal-tutor/master/data/word_dist_full.csv

--2020-01-29 15:39:50--  https://raw.githubusercontent.com/ericburdett/cs673-personal-tutor/master/data/word_dist_full.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 163042 (159K) [text/plain]
Saving to: ‘word_dist_full.csv’


2020-01-29 15:39:50 (4.58 MB/s) - ‘word_dist_full.csv’ saved [163042/163042]



In [0]:
class WordDist(Dataset):
  def __init__(self):
    self.df = pd.read_csv('word_dist_full.csv', header=None, names=['word', 'freq'])
  
  def getdf(self):
    return self.df

  def __getitem__(self, index):
    return self.df['word'][index], self.df['freq'][index]

  def __len__(self):
    return len(self.df)

In [14]:
words = WordDist()
print('Num Words: ', words)
words[0:20]

Num Words:  <__main__.WordDist object at 0x7fc046786f98>


(0      the
 1       of
 2      and
 3       to
 4        a
 5       in
 6      for
 7       is
 8       on
 9     that
 10      by
 11    this
 12    with
 13       i
 14     you
 15      it
 16     not
 17      or
 18      be
 19     are
 Name: word, dtype: object, 0     23135851162
 1     13151942776
 2     12997637966
 3     12136980858
 4      9081174698
 5      8469404971
 6      5933321709
 7      4705743816
 8      3750423199
 9      3400031103
 10     3350048871
 11     3228469771
 12     3183110675
 13     3086225277
 14     2996181025
 15     2813163874
 16     2633487141
 17     2590739907
 18     2398724162
 19     2393614870
 Name: freq, dtype: int64)

## GPT2 Language Model

In [0]:
# Make sure we've downloaded GPT2
!pip install gpt2-client

In [0]:
from gpt2_client import GPT2Client
import os

In [0]:
# Download a few different corpuses to work with GPT2
! wget -O ./text_files.tar.gz 'https://piazza.com/redirect/s3?bucket=uploads&prefix=attach%2Fjlifkda6h0x5bk%2Fhzosotq4zil49m%2Fjn13x09arfeb%2Ftext_files.tar.gz'
!tar -xvf text_files.tar.gz

In [0]:
# Fine-Tune on Lord of the Rings using the simplified GPT2 Model
gpt2 = GPT2Client('117M') # Options include "117M", "345M", "774M", "1.5B"
gpt2.load_model()
gpt2.finetune('./text_files/lotr.txt', return_text=True)

Downloading [1m[36mcheckpoint[0m: 1.00kit [00:00, 897kit/s]                                                   
Downloading [1m[36mencoder.json[0m: 1.04Mit [00:00, 52.4Mit/s]                                                
Downloading [1m[36mhparams.json[0m: 1.00kit [00:00, 505kit/s]                                                 

Created `models/117M` directory to save model weights and checkpoints.



Downloading [1m[36mmodel.ckpt.data-00000-of-00001[0m: 498Mit [00:06, 76.5Mit/s]                               
Downloading [1m[36mmodel.ckpt.index[0m: 6.00kit [00:00, 1.76Mit/s]                                            
Downloading [1m[36mmodel.ckpt.meta[0m: 472kit [00:00, 48.8Mit/s]                                              
Downloading [1m[36mvocab.bpe[0m: 457kit [00:00, 42.8Mit/s]                                                    


Loading checkpoint models/117M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:03<00:00,  3.57s/it]


dataset has 719670 tokens
Training...
[1 | 9.65] loss=3.57 avg=3.57
[2 | 11.94] loss=3.57 avg=3.57
[3 | 14.25] loss=3.63 avg=3.59
[4 | 16.55] loss=3.63 avg=3.60
[5 | 18.88] loss=3.24 avg=3.53
[6 | 21.20] loss=3.44 avg=3.51
[7 | 23.54] loss=3.30 avg=3.48
[8 | 25.88] loss=3.47 avg=3.48
[9 | 28.24] loss=3.37 avg=3.47
[10 | 30.62] loss=3.30 avg=3.45
[11 | 33.02] loss=3.48 avg=3.45
[12 | 35.44] loss=3.58 avg=3.46
[13 | 37.89] loss=3.24 avg=3.45
[14 | 40.34] loss=3.26 avg=3.43
[15 | 42.79] loss=3.18 avg=3.41
[16 | 45.27] loss=3.31 avg=3.41
[17 | 47.75] loss=3.39 avg=3.41
[18 | 50.23] loss=3.25 avg=3.40
[19 | 52.69] loss=3.16 avg=3.38
[20 | 55.14] loss=3.19 avg=3.37
[21 | 57.58] loss=3.12 avg=3.36
[22 | 60.01] loss=3.24 avg=3.35
[23 | 62.42] loss=3.36 avg=3.35
[24 | 64.81] loss=3.18 avg=3.34
[25 | 67.20] loss=3.16 avg=3.34
[26 | 69.58] loss=3.15 avg=3.33
[27 | 71.94] loss=3.20 avg=3.32
[28 | 74.29] loss=3.32 avg=3.32
[29 | 76.65] loss=3.08 avg=3.31
[30 | 79.00] loss=3.24 avg=3.31
[31 | 81.35]

["You could get away without much trouble \nwith that.' \n\n'I am sure it is no comfort to have to worry about that for \nmany days,' said Frodo. 'If you could leave the Ring and leave it for \nmany years, you might at least get away without much trouble.' \n\n'I think we shall get away,' said Pippin. 'But there is another burden \non us. Let's get home before the weather changes.' \n\n'All ready to go?' said Frodo. 'We are hungry. What is to be \ndone?' \n\n'Much at least!' said Pippin. 'But we must think about what we are to do. \n\nThe hunt was on. The Elves had caught some great birds. We must hunt \nsome more. There is some dappled wood, and some strange things about it. \n\nIt might be better if we could go straight to it.' \n\n'Maybe not,' said Frodo. 'But this will be the end of all our planning \nand our trial of our strength. I think there are some strange things about this \nwood. It is full of queer things. Some of the Elves seem to have escaped \nto the North from the hole