This notebook (and the slides from lecture 8) will help you go straight from training a model in Colab to deploying it in a webpage with TensorFlow.js - without having to leave the browser.

Configure this notebook to work with your GitHub account by populating these fields.

In [1]:
!pip install tensorflowjs



In [0]:
import numpy as np
import urllib
import re

In [0]:
# your github username
USER_NAME = "fak2116" 

# the email associated with your commits
# (may not matter if you leave it as this)
USER_EMAIL = "fak2116@columbia.edu" 

# the user token you've created (see the lecture 8 slides for instructions)
TOKEN = "17dc2583939cda3c3ce680de2ef733596a40666e" 

# site name
# for example, if my user_name is "foo", then this notebook will create
# a site at https://foo.github.io/hw4/
SITE_NAME = "hw4"

Next, run this cell to configure git.

In [0]:
!git config --global user.email {USER_NAME}
!git config --global user.name  {USER_EMAIL}

Clone your GitHub pages repo (see the lecture 8 slides for instructions on how to create one).

In [0]:
import os
repo_path = USER_NAME + '.github.io'
if not os.path.exists(os.path.join(os.getcwd(), repo_path)):
  !git clone https://{USER_NAME}:{TOKEN}@github.com/{USER_NAME}/{USER_NAME}.github.io

In [5]:
os.chdir(repo_path)
!git pull

From https://github.com/fak2116/fak2116.github.io
   69c6942..1f2ed50  master     -> origin/master
Already up to date.


Create a folder for your site.

In [0]:
project_path = os.path.join(os.getcwd(), SITE_NAME)
if not os.path.exists(project_path): 
  os.mkdir(project_path)
os.chdir(project_path)

These paths will be used by the converter script.

In [0]:
# DO NOT MODIFY
MODEL_DIR = os.path.join(project_path, "model_js")
if not os.path.exists(MODEL_DIR):
  os.mkdir(MODEL_DIR)

As an example, we will create and vectorize a few documents. (Check out https://www.gutenberg.org/ for a bunch of free e-books.)

In [0]:
urls = ["https://www.gutenberg.org/files/58344/58344-0.txt", "https://www.gutenberg.org/files/11/11-0.txt", "https://www.gutenberg.org/files/98/98-0.txt"]
beginning = ["In the shade", "Alice was beginning", "It was the best"]
dest = "temp"

In [11]:
labels = []
texts = []
test = []

for i, url, begintext in zip(range(3), urls, beginning): 
  urllib.request.urlretrieve(url, dest)
  text = open(dest).read().lower()
  print('Corpus length:', len(text))
  f = open(dest)
  text = f.read()
  beginindex = text.find(begintext)
  sentences = re.split(r' *[\.\?!][\'"\)\]]* *', text[beginindex:])
  sentences = [s.replace("\n", ' ') for s in sentences]
  subset = sentences[:1000]
  texts += subset
  labels += [i]*1000
  test += sentences[1000:1002]

Corpus length: 255017
Corpus length: 163817
Corpus length: 776697


In [12]:
print (len(labels))
print (texts[999])
print (test[0])
print (len(test))

3000
 Worthless, it seemed to him, worthless and meaningless was the life he had been leading; nothing living, nothing that was in anyway beautiful or worth keeping had remained with him
He stood there alone and empty, like a castaway on the shore
6


In [0]:
x_train=texts
y_train=labels

In [0]:
#Shuffle the training examples

npx = np.array(x_train)
npy = np.array(y_train)
np.random.seed(0)
indices = np.arange(len(x_train))
np.random.shuffle(indices)
npx = npx[indices]
npy = npy[indices]
x_train = npx.tolist()
y_train = npy.tolist()

In [0]:
# Store back in texts and labels
texts = x_train
labels = y_train

Tokenize the documents, create a word index (word -> number).

In [20]:
max_len = 20
num_words = 1000
from keras.preprocessing.text import Tokenizer
# Fit the tokenizer on the training data
t = Tokenizer(num_words=num_words)
t.fit_on_texts(x_train)

Using TensorFlow backend.


In [21]:
print(t.word_index)



Here's how we vectorize a document.

In [22]:
vectorized = t.texts_to_sequences([texts[0]])
print(vectorized)

[[36, 265, 396, 24, 242, 859, 1, 587, 95, 8, 49, 2, 1, 103, 195, 215, 22, 35, 1, 4, 10, 356, 1, 4, 10, 95, 37, 37, 860, 37, 37, 74, 29, 199, 86, 216, 2, 356]]


Apply padding if necessary.

In [0]:
from keras.preprocessing.sequence import pad_sequences
padded = pad_sequences(vectorized, maxlen=max_len, padding='post')

In [24]:
print(padded)

[[  1   4  10 356   1   4  10  95  37  37 860  37  37  74  29 199  86 216
    2 356]]


We will save the word index in metadata. Later, we'll use it to convert words typed in the browser to numbers for prediction.

In [0]:
metadata = {
  'word_index': t.word_index,
  'max_len': max_len,
  'vocabulary_size': num_words,
}

Prepare some training data.

In [26]:
x_train = t.texts_to_sequences(x_train)
x_train = pad_sequences(x_train, maxlen=max_len, padding='post')
print(x_train)

[[  1   4  10 ... 216   2 356]
 [ 90 861 777 ...   0   0   0]
 [  7  13  21 ...  12  93 112]
 ...
 [314 233  77 ...   0   0   0]
 [192  17  21 ...   0   0   0]
 [ 17 182  63 ...   0   0   0]]


Define a model.

In [27]:
embedding_size = 64
n_classes = 3
epochs = 5

import keras
model = keras.Sequential()
model.add(keras.layers.Embedding(num_words, embedding_size, input_shape=(max_len,)))
#model.add(keras.layers.Flatten())
model.add(keras.layers.LSTM(128, return_sequences=True))
#model.add(keras.layers.LSTM(128, return_sequences=True))
#model.add(keras.layers.LSTM(128, return_sequences=True))
model.add(keras.layers.LSTM(128))
#model.add(keras.layers.Dense(256, activation='relu'))
model.add(keras.layers.Dense(3, activation='softmax'))
model.compile('adam', 'sparse_categorical_crossentropy', metrics=['accuracy'])
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 20, 64)            64000     
_________________________________________________________________
lstm_1 (LSTM)                (None, 20, 128)           98816     
_________________________________________________________________
lstm_2 (LSTM)                (None, 128)               131584    
_________________________________________________________________
dense_1 (Dense)              (None, 3)                 387       
Total params: 294,787
Trainable params: 294,787
Non-trainable params: 0
_________________________________________________________________


In [28]:
model.fit(x_train, y_train, epochs=epochs, validation_split=.1)

Train on 2700 samples, validate on 300 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7fd741c08ef0>

Demo using the model to make predictions.

In [53]:
test_example = "As Siddhartha left the grove where the buddha, the perfect one, remained behind, where Govinda remained behind, he felt that he was also leaving behind his life so far, that it was separating itself from him."
x_test = t.texts_to_sequences([test_example])
x_test = pad_sequences(x_test, maxlen=max_len, padding='post')
print(x_test)
preds = model.predict(x_test)
print(preds)
import numpy as np
print(np.argmax(preds))

[[250 255   8 154  11   8   9 519 864 255  10 136  36 336  11   7   9 292
   35  22]]
[[9.9995959e-01 1.3686732e-05 2.6658401e-05]]
0


Test the model on the collected test sentences

In [30]:
for ex in test:
  print (ex)
  test_example = ex
  x_test = t.texts_to_sequences([test_example])
  x_test = pad_sequences(x_test, maxlen=max_len, padding='post')
  print(x_test)
  preds = model.predict(x_test)
  print(preds)
  import numpy as np
  print(np.argmax(preds))

He stood there alone and empty, like a castaway on the shore
[[  8 209  41 305   2 855  53   5  24   1   0   0   0   0   0   0   0   0
    0   0]]
[[0.8461247  0.03898405 0.11489121]]
0
  In low spirits, Siddhartha betook himself to one of the pleasure gardens he owned, he locked the gate, sat down under a mango tree, felt the death in his heart and bleakness in his heart, he sat and felt how something in him was dying, wilting, coming to its end
[[  6  10 188   2   6  10 188   8 217   2 154  90 127   6  22   9 547   3
   70 277]]
[[0.9939143  0.00149519 0.00459053]]
0
’  The soldiers were silent, and looked at Alice, as the question was evidently meant for her
[[ 13   1 981  39 323   2  83  23  34  18   1 301   9  20  26   0   0   0
    0   0]]
[[7.0537822e-06 9.9996901e-01 2.3927287e-05]]
1
  ‘Yes
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
[[0.09965076 0.5978687  0.30248055]]
1
Your bank-notes had a musty odour, as if they were fast decomposing into rags again
[[ 73 317  15   5  18 

Convert the model

In [55]:
import json
import tensorflowjs as tfjs

metadata_json_path = os.path.join(MODEL_DIR, 'metadata.json')
json.dump(metadata, open(metadata_json_path, 'wt'))
tfjs.converters.save_keras_model(model, MODEL_DIR)
print('\nSaved model artifacts in directory: %s' % MODEL_DIR)


Saved model artifacts in directory: /content/fak2116.github.io/hw4/model_js


Write an index.html and an index.js file configured to load our model.

In [0]:

index_html = """
<!doctype html>

<body>
  <style>
    #textfield {
      font-size: 120%;
      width: 60%;
      height: 200px;
    }
  </style>
  <h1>
    HW 4 - Classification 
  </h1>
  <hr>
  <div class="create-model">
    <button id="load-model" style="display:none">Load model</button>
  </div>
  <div>
    <div>
      <span>Vocabulary size: </span>
      <span id="vocabularySize"></span>
    </div>
    <div>
      <span>Max length: </span>
      <span id="maxLen"></span>
    </div>
  </div>
  <hr>
  <div>
    <select id="example-select" class="form-control">
      <option value="example1">SIDDHARTHA: A poem of India</option>
      <option value="example2">Alice's Adventures in Wonderland</option>
      <option value="example3">A Tale of Two Cities</option>
    </select>
  </div>
  <div>
    <textarea id="text-entry"></textarea>
  </div>
  <hr>
  <div>
    <span id="status">Standing by.</span>
  </div>

  <script src='https://cdn.jsdelivr.net/npm/@tensorflow/tfjs/dist/tf.min.js'></script>
  <script src='index.js'></script>
</body>
"""

In [0]:
index_js = """
const HOSTED_URLS = {
  model:
      'model_js/model.json',
  metadata:
      'model_js/metadata.json'
};

const examples = {
  'example1':
      'He stood there alone and empty, like a castaway on the shore',
  'example2':
      'Alice was beginning to get very tired of sitting by her sister on the bank',
  'example3':
      'There were a king with a large jaw and a queen with a plain face'      
};

function status(statusText) {
  console.log(statusText);
  document.getElementById('status').textContent = statusText;
}

function showMetadata(metadataJSON) {
  document.getElementById('vocabularySize').textContent =
      metadataJSON['vocabulary_size'];
  document.getElementById('maxLen').textContent =
      metadataJSON['max_len'];
}

function settextField(text, predict) {
  const textField = document.getElementById('text-entry');
  textField.value = text;
  doPredict(predict);
}

function setPredictFunction(predict) {
  const textField = document.getElementById('text-entry');
  textField.addEventListener('input', () => doPredict(predict));
}

function disableLoadModelButtons() {
  document.getElementById('load-model').style.display = 'none';
}

function doPredict(predict) {
  const textField = document.getElementById('text-entry');
  const result = predict(textField.value);
  score_string = "Class scores: ";
  for (var x in result.score) {
    score_string += x + " ->  " + result.score[x].toFixed(3) + ", "
  }
  var maxlabel = Object.keys(result).reduce((a, b) => result[a] > result[b] ? a : b);
  console.log("MaxLabel:");
  console.log(maxlabel);
  //console.log(score_string);
  status(
      score_string + ' elapsed: ' + result.elapsed.toFixed(3) + ' ms)');
}

function prepUI(predict) {
  setPredictFunction(predict);
  const testExampleSelect = document.getElementById('example-select');
  testExampleSelect.addEventListener('change', () => {
    settextField(examples[testExampleSelect.value], predict);
  });
  settextField(examples['example1'], predict);
}

async function urlExists(url) {
  status('Testing url ' + url);
  try {
    const response = await fetch(url, {method: 'HEAD'});
    return response.ok;
  } catch (err) {
    return false;
  }
}

async function loadHostedPretrainedModel(url) {
  status('Loading pretrained model from ' + url);
  try {
    const model = await tf.loadModel(url);
    status('Done loading pretrained model.');
    disableLoadModelButtons();
    return model;
  } catch (err) {
    console.error(err);
    status('Loading pretrained model failed.');
  }
}

async function loadHostedMetadata(url) {
  status('Loading metadata from ' + url);
  try {
    const metadataJson = await fetch(url);
    const metadata = await metadataJson.json();
    status('Done loading metadata.');
    return metadata;
  } catch (err) {
    console.error(err);
    status('Loading metadata failed.');
  }
}

class Classifier {

  async init(urls) {
    this.urls = urls;
    this.model = await loadHostedPretrainedModel(urls.model);
    await this.loadMetadata();
    return this;
  }

  async loadMetadata() {
    const metadata =
        await loadHostedMetadata(this.urls.metadata);
    showMetadata(metadata);
    this.maxLen = metadata['max_len'];
    console.log('maxLen = ' + this.maxLen);
    this.wordIndex = metadata['word_index']
  }

  predict(text) {
    // Convert to lower case and remove all punctuations.
    const inputText =
        text.trim().toLowerCase().replace(/(\.|\,|\!)/g, '').split(' ');
    // Look up word indices.
    const inputBuffer = tf.buffer([1, this.maxLen], 'float32');
    for (let i = 0; i < inputText.length; ++i) {
      const word = inputText[i];
      inputBuffer.set(this.wordIndex[word], 0, i);
      //console.log(word, this.wordIndex[word], inputBuffer);
    }
    const input = inputBuffer.toTensor();
    //console.log(input);

    status('Running inference');
    const beginMs = performance.now();
    const predictOut = this.model.predict(input);
    //console.log(predictOut.dataSync());
    const score = predictOut.dataSync();//[0];
    predictOut.dispose();
    const endMs = performance.now();

    return {score: score, elapsed: (endMs - beginMs)};
  }
};

async function setup() {
  if (await urlExists(HOSTED_URLS.model)) {
    status('Model available: ' + HOSTED_URLS.model);
    const button = document.getElementById('load-model');
    button.addEventListener('click', async () => {
      const predictor = await new Classifier().init(HOSTED_URLS);
      prepUI(x => predictor.predict(x));
    });
    button.style.display = 'inline-block';
  }

  status('Standing by.');
}

setup();
"""

In [0]:
with open('index.html','w') as f:
  f.write(index_html)
  
with open('index.js','w') as f:
  f.write(index_js)

In [59]:
!ls

index.html  index.js  model_js	temp


Commit and push everything. Note: we're storing large binary files in GitHub, this isn't ideal (if you want to deploy a model down the road, better to host it in a cloud storage bucket).

In [0]:
pythondir = os.path.join(project_path, "python")
if not os.path.exists(pythondir):
  os.mkdir(pythondir)
os.chdir(pythondir)

In [0]:
!touch 8_model.ipynb

In [78]:
!git add . 
!git commit -m "colab -> github"
!git push https://{USER_NAME}:{TOKEN}@github.com/{USER_NAME}/{USER_NAME}.github.io/ master

[master 80fc4cc] colab -> github
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 hw4/python/8_model.ipynb
Counting objects: 5, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (5/5), 394 bytes | 394.00 KiB/s, done.
Total 5 (delta 1), reused 0 (delta 0)
remote: Resolving deltas: 100% (1/1), completed with 1 local object.[K
To https://github.com/fak2116/fak2116.github.io/
   b9948dd..80fc4cc  master -> master


All done! Hopefully everything worked. You may need to wait a few moments for the changes to appear in your site. If not working, check the JavaScript console for errors (in Chrome: View -> Developer -> JavaScript Console).

In [61]:
print("Now, visit https://%s.github.io/%s/" % (USER_NAME, SITE_NAME))

Now, visit https://fak2116.github.io/hw4/


If you are debugging and Chrome is failing to pick up your changes, though you've verified they're present in your GitHub repo, see the second answer to: https://superuser.com/questions/89809/how-to-force-refresh-without-cache-in-google-chrome