# Prepare Top2Vec Models for Deployment

Let's try to make a couple models to deploy using these embedding models:
- universal-sentence-encoder
- universal-sentence-encoder-large

And, trained on a couple columns from health_tech.csv:
- article_clean > 200 words
- topic_clean

Here are all the available models: https://github.com/ddangelov/Top2Vec/blob/d625b507aa18a921b7d0a3710d1a4c176f9b8f84/top2vec/Top2Vec.py#L51

This notebook is copied from the Top2Vec Explorations notebook https://colab.research.google.com/drive/1BqLeOZG9wPcmO7Z9LwpxoZ8TE69M5zQn


## Installs, mounts, and imports

In [2]:
!pip install top2vec



In [3]:
!pip install top2vec[sentence_encoders]



In [4]:
from google.colab import drive
import os
drive.mount('/content/drive')
os.chdir('/content/drive/My Drive/Colab Notebooks/fourthbrain/Week 13')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [5]:
import numpy as np 
import pandas as pd 
import json
import os
import ipywidgets as widgets
from IPython.display import clear_output, display
from top2vec import Top2Vec
import matplotlib.pyplot as plt
pd.set_option('display.max_colwidth', 140)

## Load `health_tech.csv` and prepare train sets

Prepare 2 sets of docs:
- article_clean > 200 words
- title_clean

In [6]:
df = pd.read_csv('health_tech.csv')

In [7]:
docs_article = df.loc[df.num_words_per_article>200].article_clean.tolist()

In [8]:
docs_title = df.title_clean.tolist()
docs_title = [ x for x in docs_title if type(x) == str ]

In [9]:
print('Articles:', len(docs_article))
print('Titles:', len(docs_title))

Articles: 85724
Titles: 129679


## Train `universal-sentence-encoder` with articles (~10 min)

In [None]:
model_articles = Top2Vec(docs_article, embedding_model='universal-sentence-encoder')

2022-05-11 07:14:33,455 - top2vec - INFO - Pre-processing documents for training
2022-05-11 07:17:35,503 - top2vec - INFO - Downloading universal-sentence-encoder model
2022-05-11 07:17:52,202 - top2vec - INFO - Creating joint document/word embedding
INFO:top2vec:Creating joint document/word embedding
2022-05-11 07:22:34,070 - top2vec - INFO - Creating lower dimension embedding of documents
INFO:top2vec:Creating lower dimension embedding of documents
2022-05-11 07:24:12,432 - top2vec - INFO - Finding dense areas of documents
INFO:top2vec:Finding dense areas of documents
2022-05-11 07:24:17,348 - top2vec - INFO - Finding topics
INFO:top2vec:Finding topics


In [None]:
model_articles.save('articles.univ-sent-enco.t2v')

## Train `universal-sentence-encoder` with titles (~3 min)

In [None]:
model_titles = Top2Vec(docs_title, embedding_model='universal-sentence-encoder')

2022-05-11 07:30:45,746 - top2vec - INFO - Pre-processing documents for training
INFO:top2vec:Pre-processing documents for training
2022-05-11 07:30:52,310 - top2vec - INFO - Downloading universal-sentence-encoder model
INFO:top2vec:Downloading universal-sentence-encoder model
2022-05-11 07:30:56,898 - top2vec - INFO - Creating joint document/word embedding
INFO:top2vec:Creating joint document/word embedding
2022-05-11 07:31:26,693 - top2vec - INFO - Creating lower dimension embedding of documents
INFO:top2vec:Creating lower dimension embedding of documents
2022-05-11 07:33:32,650 - top2vec - INFO - Finding dense areas of documents
INFO:top2vec:Finding dense areas of documents
2022-05-11 07:33:40,514 - top2vec - INFO - Finding topics
INFO:top2vec:Finding topics


In [None]:
model_titles.save('titles.univ-sent-enco.t2v')

## Train `universal-sentence-encoder-large` with articles (Crashed after 13 min)



### Here's the log

I'll try the high memory kernel.

|Timestamp|Level|Message|
|---|---|---|
|May 11, 2022, 10:45:36 AM|WARNING|WARNING:root:kernel b177b925-1125-4021-83fc-f8b07fd89f7b restarted|
|May 11, 2022, 10:45:36 AM|INFO|KernelRestarter: restarting kernel \(1/5), keep random ports|
|May 11, 2022, 10:45:21 AM|WARNING|tcmalloc: large alloc 18504712192 bytes == 0x38f11e000 @  0x7f756abafb6b 0x7f756abcf379 0x7f74f8f99257 0x7f74e742330f 0x7f74e74bfd2b 0x7f74e72d3d97 0x7f74e72d4600 0x7f74e72d4708 0x7f74f2675c43 0x7f74e7666228 0x7f74e75f8ea3 0x7f74ed02cce1 0x7f74ed0299a3 0x7f74e7d1e8d5 0x7f756a5916db 0x7f756a8ca61f|
|May 11, 2022, 10:43:50 AM|WARNING|tcmalloc: large alloc 8845033472 bytes == 0x10bb78000 @  0x7f756abafb6b 0x7f756abcf379 0x7f74f8f99257 0x7f74e742330f 0x7f74e74bfd2b 0x7f74e72d3d97 0x7f74e72d4600 0x7f74e72d4708 0x7f74f2675c43 0x7f74e7666228 0x7f74e75f8ea3 0x7f74ed02cce1 0x7f74ed0299a3 0x7f74e7d1e8d5 0x7f756a5916db 0x7f756a8ca61f|
|May 11, 2022, 10:43:39 AM|WARNING|tcmalloc: large alloc 8845033472 bytes == 0x10ee92000 @  0x7f756abafb6b 0x7f756abcf379 0x7f74f8f99257 0x7f74e742330f 0x7f74e74bfd2b 0x7f74e72d3d97 0x7f74e72d4600 0x7f74e72d4708 0x7f74f2675c43 0x7f74e7666228 0x7f74e75f8ea3 0x7f74ed02cce1 0x7f74ed0299a3 0x7f74e7d1e8d5 0x7f756a5916db 0x7f756a8ca61f|
|May 11, 2022, 10:43:27 AM|WARNING|tcmalloc: large alloc 8845033472 bytes == 0x10ee92000 @  0x7f756abafb6b 0x7f756abcf379 0x7f74f8f99257 0x7f74e742330f 0x7f74e74bfd2b 0x7f74e72d3d97 0x7f74e72d4600 0x7f74e72d4708 0x7f74f2675c43 0x7f74e7666228 0x7f74e75f8ea3 0x7f74ed02cce1 0x7f74ed0299a3 0x7f74e7d1e8d5 0x7f756a5916db 0x7f756a8ca61f|
|May 11, 2022, 10:43:16 AM|WARNING|tcmalloc: large alloc 8845033472 bytes == 0x10bb78000 @  0x7f756abafb6b 0x7f756abcf379 0x7f74f8f99257 0x7f74e742330f 0x7f74e74bfd2b 0x7f74e72d3d97 0x7f74e72d4600 0x7f74e72d4708 0x7f74f2675c43 0x7f74e7666228 0x7f74e75f8ea3 0x7f74ed02cce1 0x7f74ed0299a3 0x7f74e7d1e8d5 0x7f756a5916db 0x7f756a8ca61f|
|May 11, 2022, 10:43:04 AM|WARNING|tcmalloc: large alloc 8845033472 bytes == 0x11076c000 @  0x7f756abafb6b 0x7f756abcf379 0x7f74f8f99257 0x7f74e742330f 0x7f74e74bfd2b 0x7f74e72d3d97 0x7f74e72d4600 0x7f74e72d4708 0x7f74f2675c43 0x7f74e7666228 0x7f74e75f8ea3 0x7f74ed02cce1 0x7f74ed0299a3 0x7f74e7d1e8d5 0x7f756a5916db 0x7f756a8ca61f|
|May 11, 2022, 10:42:53 AM|WARNING|tcmalloc: large alloc 8845033472 bytes == 0x11076c000 @  0x7f756abafb6b 0x7f756abcf379 0x7f74f8f99257 0x7f74e742330f 0x7f74e74bfd2b 0x7f74e72d3d97 0x7f74e72d4600 0x7f74e72d4708 0x7f74f2675c43 0x7f74e7666228 0x7f74e75f8ea3 0x7f74ed02cce1 0x7f74ed0299a3 0x7f74e7d1e8d5 0x7f756a5916db 0x7f756a8ca61f|
|May 11, 2022, 10:39:23 AM|WARNING|tcmalloc: large alloc 10231734272 bytes == 0x113c18000 @  0x7f756abafb6b 0x7f756abcf379 0x7f74f8f99257 0x7f74e742330f 0x7f74e74bfd2b 0x7f74e72d3d97 0x7f74e72d4600 0x7f74e72d4708 0x7f74f2675c43 0x7f74e7666228 0x7f74e75f8ea3 0x7f74ed02cce1 0x7f74ed0299a3 0x7f74e7d1e8d5 0x7f756a5916db 0x7f756a8ca61f|
|May 11, 2022, 10:39:10 AM|WARNING|tcmalloc: large alloc 10231734272 bytes == 0x113c18000 @  0x7f756abafb6b 0x7f756abcf379 0x7f74f8f99257 0x7f74e742330f 0x7f74e74bfd2b 0x7f74e72d3d97 0x7f74e72d4600 0x7f74e72d4708 0x7f74f2675c43 0x7f74e7666228 0x7f74e75f8ea3 0x7f74ed02cce1 0x7f74ed0299a3 0x7f74e7d1e8d5 0x7f756a5916db 0x7f756a8ca61f|
|May 11, 2022, 10:38:57 AM|WARNING|tcmalloc: large alloc 10231734272 bytes == 0x113c4c000 @  0x7f756abafb6b 0x7f756abcf379 0x7f74f8f99257 0x7f74e742330f 0x7f74e74bfd2b 0x7f74e72d3d97 0x7f74e72d4600 0x7f74e72d4708 0x7f74f2675c43 0x7f74e7666228 0x7f74e75f8ea3 0x7f74ed02cce1 0x7f74ed0299a3 0x7f74e7d1e8d5 0x7f756a5916db 0x7f756a8ca61f|
|May 11, 2022, 10:38:44 AM|WARNING|tcmalloc: large alloc 10231734272 bytes == 0x113c4c000 @  0x7f756abafb6b 0x7f756abcf379 0x7f74f8f99257 0x7f74e742330f 0x7f74e74bfd2b 0x7f74e72d3d97 0x7f74e72d4600 0x7f74e72d4708 0x7f74f2675c43 0x7f74e7666228 0x7f74e75f8ea3 0x7f74ed02cce1 0x7f74ed0299a3 0x7f74e7d1e8d5 0x7f756a5916db 0x7f756a8ca61f|
|May 11, 2022, 10:38:31 AM|WARNING|tcmalloc: large alloc 10231734272 bytes == 0x1155a4000 @  0x7f756abafb6b 0x7f756abcf379 0x7f74f8f99257 0x7f74e742330f 0x7f74e74bfd2b 0x7f74e72d3d97 0x7f74e72d4600 0x7f74e72d4708 0x7f74f2675c43 0x7f74e7666228 0x7f74e75f8ea3 0x7f74ed02cce1 0x7f74ed0299a3 0x7f74e7d1e8d5 0x7f756a5916db 0x7f756a8ca61f|
|May 11, 2022, 10:38:17 AM|WARNING|tcmalloc: large alloc 10231734272 bytes == 0x1155a4000 @  0x7f756abafb6b 0x7f756abcf379 0x7f74f8f99257 0x7f74e742330f 0x7f74e74bfd2b 0x7f74e72d3d97 0x7f74e72d4600 0x7f74e72d4708 0x7f74f2675c43 0x7f74e7666228 0x7f74e75f8ea3 0x7f74ed02cce1 0x7f74ed0299a3 0x7f74e7d1e8d5 0x7f756a5916db 0x7f756a8ca61f|
|May 11, 2022, 10:38:06 AM|WARNING|tcmalloc: large alloc 7857053696 bytes == 0x10ccf0000 @  0x7f756abafb6b 0x7f756abcf379 0x7f74f8f99257 0x7f74e742330f 0x7f74e74bfd2b 0x7f74e72d3d97 0x7f74e72d4600 0x7f74e72d4708 0x7f74f2675c43 0x7f74e7666228 0x7f74e75f8ea3 0x7f74ed02cce1 0x7f74ed0299a3 0x7f74e7d1e8d5 0x7f756a5916db 0x7f756a8ca61f|
|May 11, 2022, 10:37:55 AM|WARNING|tcmalloc: large alloc 7857053696 bytes == 0x10ccf0000 @  0x7f756abafb6b 0x7f756abcf379 0x7f74f8f99257 0x7f74e742330f 0x7f74e74bfd2b 0x7f74e72d3d97 0x7f74e72d4600 0x7f74e72d4708 0x7f74f2675c43 0x7f74e7666228 0x7f74e75f8ea3 0x7f74ed02cce1 0x7f74ed0299a3 0x7f74e7d1e8d5 0x7f756a5916db 0x7f756a8ca61f|
|May 11, 2022, 10:37:45 AM|WARNING|tcmalloc: large alloc 7857053696 bytes == 0x10ccf0000 @  0x7f756abafb6b 0x7f756abcf379 0x7f74f8f99257 0x7f74e742330f 0x7f74e74bfd2b 0x7f74e72d3d97 0x7f74e72d4600 0x7f74e72d4708 0x7f74f2675c43 0x7f74e7666228 0x7f74e75f8ea3 0x7f74ed02cce1 0x7f74ed0299a3 0x7f74e7d1e8d5 0x7f756a5916db 0x7f756a8ca61f|
|May 11, 2022, 10:37:35 AM|WARNING|tcmalloc: large alloc 7857053696 bytes == 0x10ccf0000 @  0x7f756abafb6b 0x7f756abcf379 0x7f74f8f99257 0x7f74e742330f 0x7f74e74bfd2b 0x7f74e72d3d97 0x7f74e72d4600 0x7f74e72d4708 0x7f74f2675c43 0x7f74e7666228 0x7f74e75f8ea3 0x7f74ed02cce1 0x7f74ed0299a3 0x7f74e7d1e8d5 0x7f756a5916db 0x7f756a8ca61f|
|May 11, 2022, 10:37:24 AM|WARNING|tcmalloc: large alloc 7857053696 bytes == 0x10e6d4000 @  0x7f756abafb6b 0x7f756abcf379 0x7f74f8f99257 0x7f74e742330f 0x7f74e74bfd2b 0x7f74e72d3d97 0x7f74e72d4600 0x7f74e72d4708 0x7f74f2675c43 0x7f74e7666228 0x7f74e75f8ea3 0x7f74ed02cce1 0x7f74ed0299a3 0x7f74e7d1e8d5 0x7f756a5916db 0x7f756a8ca61f|
|May 11, 2022, 10:37:11 AM|WARNING|tcmalloc: large alloc 7857053696 bytes == 0x10b192000 @  0x7f756abafb6b 0x7f756abcf379 0x7f74f8f99257 0x7f74e742330f 0x7f74e74bfd2b 0x7f74e72d3d97 0x7f74e72d4600 0x7f74e72d4708 0x7f74f2675c43 0x7f74e7666228 0x7f74e75f8ea3 0x7f74ed02cce1 0x7f74ed0299a3 0x7f74e7d1e8d5 0x7f756a5916db 0x7f756a8ca61f|
|May 11, 2022, 10:37:05 AM|WARNING|tcmalloc: large alloc 3240804352 bytes == 0xdfe04000 @  0x7f756abafb6b 0x7f756abcf379 0x7f74f8f99257 0x7f74e742330f 0x7f74e74bfd2b 0x7f74e72d3d97 0x7f74e72d4600 0x7f74e72d4708 0x7f74f2675c43 0x7f74e7666228 0x7f74e75f8ea3 0x7f74ed02cce1 0x7f74ed0299a3 0x7f74e7d1e8d5 0x7f756a5916db 0x7f756a8ca61f|
|May 11, 2022, 10:37:00 AM|WARNING|tcmalloc: large alloc 3240804352 bytes == 0xedc64000 @  0x7f756abafb6b 0x7f756abcf379 0x7f74f8f99257 0x7f74e742330f 0x7f74e74bfd2b 0x7f74e72d3d97 0x7f74e72d4600 0x7f74e72d4708 0x7f74f2675c43 0x7f74e7666228 0x7f74e75f8ea3 0x7f74ed02cce1 0x7f74ed0299a3 0x7f74e7d1e8d5 0x7f756a5916db 0x7f756a8ca61f|
|May 11, 2022, 10:37:00 AM|WARNING|2022-05-11 17:37:00\.478254: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 3240797184 exceeds 10% of free system memory.|
|May 11, 2022, 10:36:54 AM|WARNING|tcmalloc: large alloc 3240804352 bytes == 0xedc64000 @  0x7f756abafb6b 0x7f756abcf379 0x7f74f8f99257 0x7f74e742330f 0x7f74e74bfd2b 0x7f74e72d3d97 0x7f74e72d4600 0x7f74e72d4708 0x7f74f2675c43 0x7f74e7666228 0x7f74e75f8ea3 0x7f74ed02cce1 0x7f74ed0299a3 0x7f74e7d1e8d5 0x7f756a5916db 0x7f756a8ca61f|
|May 11, 2022, 10:36:54 AM|WARNING|2022-05-11 17:36:54\.900567: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 3240797184 exceeds 10% of free system memory.|
|May 11, 2022, 10:36:48 AM|WARNING|tcmalloc: large alloc 3240804352 bytes == 0xe6d34000 @  0x7f756abafb6b 0x7f756abcf379 0x7f74f8f99257 0x7f74e742330f 0x7f74e74bfd2b 0x7f74e72d3d97 0x7f74e72d4600 0x7f74e72d4708 0x7f74f2675c43 0x7f74e7666228 0x7f74e75f8ea3 0x7f74ed02cce1 0x7f74ed0299a3 0x7f74e7d1e8d5 0x7f756a5916db 0x7f756a8ca61f|
|May 11, 2022, 10:36:48 AM|WARNING|2022-05-11 17:36:48\.962790: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 3240797184 exceeds 10% of free system memory.|
|May 11, 2022, 10:36:43 AM|WARNING|tcmalloc: large alloc 3240804352 bytes == 0xedc64000 @  0x7f756abafb6b 0x7f756abcf379 0x7f74f8f99257 0x7f74e742330f 0x7f74e74bfd2b 0x7f74e72d3d97 0x7f74e72d4600 0x7f74e72d4708 0x7f74f2675c43 0x7f74e7666228 0x7f74e75f8ea3 0x7f74ed02cce1 0x7f74ed0299a3 0x7f74e7d1e8d5 0x7f756a5916db 0x7f756a8ca61f|
|May 11, 2022, 10:36:43 AM|WARNING|2022-05-11 17:36:43\.702054: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 3240797184 exceeds 10% of free system memory.|
|May 11, 2022, 10:36:37 AM|WARNING|tcmalloc: large alloc 3240804352 bytes == 0xf5ce0000 @  0x7f756abafb6b 0x7f756abcf379 0x7f74f8f99257 0x7f74e742330f 0x7f74e74bfd2b 0x7f74e72d3d97 0x7f74e72d4600 0x7f74e72d4708 0x7f74f2675c43 0x7f74e7666228 0x7f74e75f8ea3 0x7f74ed02cce1 0x7f74ed0299a3 0x7f74e7d1e8d5 0x7f756a5916db 0x7f756a8ca61f|
|May 11, 2022, 10:36:37 AM|WARNING|2022-05-11 17:36:37\.062152: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 3240797184 exceeds 10% of free system memory.|
|May 11, 2022, 10:35:38 AM|WARNING|2022-05-11 17:35:38\.651029: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected|
|May 11, 2022, 10:28:53 AM|INFO|Adapting to protocol v5\.1 for kernel b177b925-1125-4021-83fc-f8b07fd89f7b|
|May 11, 2022, 10:28:51 AM|INFO|Kernel started: b177b925-1125-4021-83fc-f8b07fd89f7b|
|May 11, 2022, 10:26:52 AM|INFO|Use Control-C to stop this server and shut down all kernels \(twice to skip confirmation).|
|May 11, 2022, 10:26:52 AM|INFO|http://172\.28.0.2:9000/|
|May 11, 2022, 10:26:52 AM|INFO|The Jupyter Notebook is running at:|
|May 11, 2022, 10:26:52 AM|INFO|0 active kernels|
|May 11, 2022, 10:26:52 AM|INFO|Serving notebooks from local directory: /|
|May 11, 2022, 10:26:52 AM|INFO|google\.colab serverextension initialized.|
|May 11, 2022, 10:26:52 AM|INFO|Use Control-C to stop this server and shut down all kernels \(twice to skip confirmation).|
|May 11, 2022, 10:26:52 AM|INFO|http://172\.28.0.12:9000/|
|May 11, 2022, 10:26:52 AM|INFO|The Jupyter Notebook is running at:|
|May 11, 2022, 10:26:52 AM|INFO|0 active kernels|
|May 11, 2022, 10:26:52 AM|INFO|Serving notebooks from local directory: /|
|May 11, 2022, 10:26:52 AM|INFO|google\.colab serverextension initialized.|
|May 11, 2022, 10:26:52 AM|WARNING|    	/root/\.jupyter/jupyter_notebook_config.json|
|May 11, 2022, 10:26:52 AM|WARNING|    	/root/\.local/etc/jupyter/jupyter_notebook_config.json|
|May 11, 2022, 10:26:52 AM|WARNING|    	/usr/etc/jupyter/jupyter\_notebook_config.json|
|May 11, 2022, 10:26:52 AM|WARNING|    	/usr/local/etc/jupyter/jupyter\_notebook_config.json|
|May 11, 2022, 10:26:52 AM|WARNING|    	/usr/local/etc/jupyter/jupyter\_notebook_config.d/panel-client-jupyter.json|
|May 11, 2022, 10:26:52 AM|WARNING|    	/root/\.jupyter/jupyter_notebook_config.json|
|May 11, 2022, 10:26:52 AM|WARNING|    	/root/\.local/etc/jupyter/jupyter_notebook_config.json|
|May 11, 2022, 10:26:52 AM|WARNING|    	/usr/etc/jupyter/jupyter\_notebook_config.json|
|May 11, 2022, 10:26:52 AM|WARNING|    	/etc/jupyter/jupyter\_notebook_config.json|
|May 11, 2022, 10:26:52 AM|WARNING|    	/usr/local/etc/jupyter/jupyter\_notebook_config.json|
|May 11, 2022, 10:26:52 AM|WARNING|    	/usr/local/etc/jupyter/jupyter\_notebook_config.d/panel-client-jupyter.json|
|May 11, 2022, 10:26:52 AM|WARNING|    	/etc/jupyter/jupyter\_notebook_config.json|
|May 11, 2022, 10:26:52 AM|INFO|Writing notebook server cookie secret to /root/\.local/share/jupyter/runtime/notebook_cookie_secret|
|May 11, 2022, 10:26:52 AM|WARNING|  FutureWarning\)|
|May 11, 2022, 10:26:52 AM|WARNING|/usr/local/lib/python3\.7/dist-packages/traitlets/traitlets.py:2205: FutureWarning: Supporting extra quotes around strings is deprecated in traitlets 5.0. You can use '/content' instead of '"/content"' if you require traitlets >=5.|
|May 11, 2022, 10:26:52 AM|WARNING|  FutureWarning\)|
|May 11, 2022, 10:26:52 AM|WARNING|/usr/local/lib/python3\.7/dist-packages/traitlets/traitlets.py:2205: FutureWarning: Supporting extra quotes around strings is deprecated in traitlets 5.0. You can use '/' instead of '"/"' if you require traitlets >=5.|
|May 11, 2022, 10:26:52 AM|WARNING|  FutureWarning\)|
|May 11, 2022, 10:26:52 AM|INFO|Writing notebook server cookie secret to /root/\.local/share/jupyter/runtime/notebook_cookie_secret|
|May 11, 2022, 10:26:52 AM|WARNING|  FutureWarning\)|
|May 11, 2022, 10:26:52 AM|WARNING|/usr/local/lib/python3\.7/dist-packages/traitlets/traitlets.py:2205: FutureWarning: Supporting extra quotes around strings is deprecated in traitlets 5.0. You can use '/content' instead of '"/content"' if you require traitlets >=5.|
|May 11, 2022, 10:26:52 AM|WARNING|  FutureWarning\)|
|May 11, 2022, 10:26:52 AM|WARNING|/usr/local/lib/python3\.7/dist-packages/traitlets/traitlets.py:2205: FutureWarning: Supporting extra quotes around strings is deprecated in traitlets 5.0. You can use '/' instead of '"/"' if you require traitlets >=5.|
|May 11, 2022, 10:26:52 AM|WARNING|  FutureWarning\)|
|May 11, 2022, 10:26:52 AM|WARNING|/usr/local/lib/python3\.7/dist-packages/traitlets/traitlets.py:2205: FutureWarning: Supporting extra quotes around strings is deprecated in traitlets 5.0. You can use '172.28.0.2' instead of '"172.28.0.2"' if you require traitlets >=5.|
|May 11, 2022, 10:26:52 AM|WARNING|/usr/local/lib/python3\.7/dist-packages/traitlets/traitlets.py:2205: FutureWarning: Supporting extra quotes around strings is deprecated in traitlets 5.0. You can use '172.28.0.12' instead of '"172.28.0.12"' if you require traitlets >=5.|

### Trying again with high memory


In [1]:
modellarge_articles = Top2Vec(docs_article, embedding_model='universal-sentence-encoder-large')

NameError: ignored

In [None]:
modellarge_articles.save('articles.univ-sent-enco-larg.t2v')

## Train `universal-sentence-encoder-large` with titles (~?? min)

In [None]:
modellarge_titles = Top2Vec(docs_title, embedding_model='universal-sentence-encoder-large')

2022-05-11 19:17:14,362 - top2vec - INFO - Pre-processing documents for training
2022-05-11 19:17:19,034 - top2vec - INFO - Downloading universal-sentence-encoder-large model
2022-05-11 19:17:36,894 - top2vec - INFO - Creating joint document/word embedding
INFO:top2vec:Creating joint document/word embedding
2022-05-11 19:19:36,695 - top2vec - INFO - Creating lower dimension embedding of documents
INFO:top2vec:Creating lower dimension embedding of documents


In [None]:
modellarge_titles.save('titles.univ-sent-enco-larg.t2v')

|Timestamp|Level|Message|
|---|---|---|
|May 11, 2022, 12:19:41 PM|WARNING|WARNING:root:kernel e1696367-906c-4f18-8203-574a481e44af restarted|
|May 11, 2022, 12:19:41 PM|INFO|KernelRestarter: restarting kernel \(1/5), keep random ports|
|May 11, 2022, 12:17:22 PM|WARNING|2022-05-11 19:17:22\.909661: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.|
|May 11, 2022, 12:07:45 PM|INFO|Adapting to protocol v5\.1 for kernel e1696367-906c-4f18-8203-574a481e44af|
|May 11, 2022, 11:52:50 AM|WARNING|WARNING:root:kernel e1696367-906c-4f18-8203-574a481e44af restarted|
|May 11, 2022, 11:52:50 AM|INFO|KernelRestarter: restarting kernel \(1/5), keep random ports|
|May 11, 2022, 11:52:48 AM|WARNING|2022-05-11 18:52:48\.287318: F tensorflow/core/common_runtime/device/device_event_mgr.cc:221] Unexpected Event status: 1|
|May 11, 2022, 11:52:48 AM|WARNING|2022-05-11 18:52:48\.287223: E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered|
|May 11, 2022, 11:52:47 AM|WARNING|2022-05-11 18:52:47\.321228: W tensorflow/core/common_runtime/bfc_allocator.cc:343] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature.|
|May 11, 2022, 11:52:17 AM|WARNING|2022-05-11 18:52:17\.226814: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.|
|May 11, 2022, 11:38:09 AM|INFO|Adapting to protocol v5\.1 for kernel e1696367-906c-4f18-8203-574a481e44af|
|May 11, 2022, 11:38:08 AM|INFO|Kernel started: e1696367-906c-4f18-8203-574a481e44af|
|May 11, 2022, 11:33:39 AM|INFO|Use Control-C to stop this server and shut down all kernels \(twice to skip confirmation).|
|May 11, 2022, 11:33:39 AM|INFO|http://172\.28.0.12:9000/|
|May 11, 2022, 11:33:39 AM|INFO|Use Control-C to stop this server and shut down all kernels \(twice to skip confirmation).|
|May 11, 2022, 11:33:39 AM|INFO|The Jupyter Notebook is running at:|
|May 11, 2022, 11:33:39 AM|INFO|http://172\.28.0.2:9000/|
|May 11, 2022, 11:33:39 AM|INFO|0 active kernels|
|May 11, 2022, 11:33:39 AM|INFO|The Jupyter Notebook is running at:|
|May 11, 2022, 11:33:39 AM|INFO|Serving notebooks from local directory: /|
|May 11, 2022, 11:33:39 AM|INFO|0 active kernels|
|May 11, 2022, 11:33:39 AM|INFO|Serving notebooks from local directory: /|
|May 11, 2022, 11:33:39 AM|INFO|google\.colab serverextension initialized.|
|May 11, 2022, 11:33:39 AM|INFO|google\.colab serverextension initialized.|
|May 11, 2022, 11:33:39 AM|WARNING|    	/root/\.jupyter/jupyter_notebook_config.json|
|May 11, 2022, 11:33:39 AM|WARNING|    	/root/\.jupyter/jupyter_notebook_config.json|
|May 11, 2022, 11:33:39 AM|WARNING|    	/root/\.local/etc/jupyter/jupyter_notebook_config.json|
|May 11, 2022, 11:33:39 AM|WARNING|    	/root/\.local/etc/jupyter/jupyter_notebook_config.json|
|May 11, 2022, 11:33:39 AM|WARNING|    	/usr/etc/jupyter/jupyter\_notebook_config.json|
|May 11, 2022, 11:33:39 AM|WARNING|    	/usr/local/etc/jupyter/jupyter\_notebook_config.json|
|May 11, 2022, 11:33:39 AM|WARNING|    	/usr/local/etc/jupyter/jupyter\_notebook_config.d/panel-client-jupyter.json|
|May 11, 2022, 11:33:39 AM|WARNING|    	/usr/etc/jupyter/jupyter\_notebook_config.json|
|May 11, 2022, 11:33:39 AM|WARNING|    	/usr/local/etc/jupyter/jupyter\_notebook_config.json|
|May 11, 2022, 11:33:39 AM|WARNING|    	/usr/local/etc/jupyter/jupyter\_notebook_config.d/panel-client-jupyter.json|
|May 11, 2022, 11:33:39 AM|WARNING|    	/etc/jupyter/jupyter\_notebook_config.json|
|May 11, 2022, 11:33:39 AM|WARNING|    	/etc/jupyter/jupyter\_notebook_config.json|
|May 11, 2022, 11:33:39 AM|INFO|Writing notebook server cookie secret to /root/\.local/share/jupyter/runtime/notebook_cookie_secret|
|May 11, 2022, 11:33:39 AM|WARNING|  FutureWarning\)|
|May 11, 2022, 11:33:39 AM|WARNING|/usr/local/lib/python3\.7/dist-packages/traitlets/traitlets.py:2205: FutureWarning: Supporting extra quotes around strings is deprecated in traitlets 5.0. You can use '/content' instead of '"/content"' if you require traitlets >=5.|
|May 11, 2022, 11:33:39 AM|WARNING|  FutureWarning\)|
|May 11, 2022, 11:33:39 AM|WARNING|/usr/local/lib/python3\.7/dist-packages/traitlets/traitlets.py:2205: FutureWarning: Supporting extra quotes around strings is deprecated in traitlets 5.0. You can use '/' instead of '"/"' if you require traitlets >=5.|
|May 11, 2022, 11:33:39 AM|WARNING|  FutureWarning\)|
|May 11, 2022, 11:33:39 AM|WARNING|/usr/local/lib/python3\.7/dist-packages/traitlets/traitlets.py:2205: FutureWarning: Supporting extra quotes around strings is deprecated in traitlets 5.0. You can use '172.28.0.2' instead of '"172.28.0.2"' if you require traitlets >=5.|
|May 11, 2022, 11:33:39 AM|INFO|Writing notebook server cookie secret to /root/\.local/share/jupyter/runtime/notebook_cookie_secret|
|May 11, 2022, 11:33:39 AM|WARNING|  FutureWarning\)|
|May 11, 2022, 11:33:39 AM|WARNING|/usr/local/lib/python3\.7/dist-packages/traitlets/traitlets.py:2205: FutureWarning: Supporting extra quotes around strings is deprecated in traitlets 5.0. You can use '/content' instead of '"/content"' if you require traitlets >=5.|
|May 11, 2022, 11:33:39 AM|WARNING|  FutureWarning\)|
|May 11, 2022, 11:33:39 AM|WARNING|/usr/local/lib/python3\.7/dist-packages/traitlets/traitlets.py:2205: FutureWarning: Supporting extra quotes around strings is deprecated in traitlets 5.0. You can use '/' instead of '"/"' if you require traitlets >=5.|
|May 11, 2022, 11:33:39 AM|WARNING|  FutureWarning\)|
|May 11, 2022, 11:33:39 AM|WARNING|/usr/local/lib/python3\.7/dist-packages/traitlets/traitlets.py:2205: FutureWarning: Supporting extra quotes around strings is deprecated in traitlets 5.0. You can use '172.28.0.12' instead of '"172.28.0.12"' if you require traitlets >=5.|

In [None]:
!pip freeze

absl-py==1.0.0
alabaster==0.7.12
albumentations==0.1.12
altair==4.2.0
appdirs==1.4.4
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
arviz==0.12.0
astor==0.8.1
astropy==4.3.1
astunparse==1.6.3
atari-py==0.2.9
atomicwrites==1.4.0
attrs==21.4.0
audioread==2.1.9
autograd==1.4
Babel==2.10.1
backcall==0.2.0
beautifulsoup4==4.6.3
bleach==5.0.0
blis==0.4.1
bokeh==2.3.3
Bottleneck==1.3.4
branca==0.5.0
bs4==0.0.1
CacheControl==0.12.11
cached-property==1.5.2
cachetools==4.2.4
catalogue==1.0.0
certifi==2021.10.8
cffi==1.15.0
cftime==1.6.0
chardet==3.0.4
charset-normalizer==2.0.12
click==7.1.2
cloudpickle==1.3.0
cmake==3.22.4
cmdstanpy==0.9.5
colorcet==3.0.0
colorlover==0.3.0
community==1.0.0b1
contextlib2==0.5.5
convertdate==2.4.0
coverage==3.7.1
coveralls==0.5
crcmod==1.7
cufflinks==0.17.3
cvxopt==1.2.7
cvxpy==1.0.31
cycler==0.11.0
cymem==2.0.6
Cython==0.29.28
daft==0.0.4
dask==2.12.0
datascience==0.10.6
debugpy==1.0.0
decorator==4.4.2
defusedxml==0.7.1
descartes==1.1.0
dill==0.3.4
distributed=