print-sentence-vectors instead of print-vectors

epfml · Jul 5, 2017 · 56c0777 · 56c0777
1 parent 6bd297f
commit 56c0777
Showing 1 changed file with 11 additions and 7 deletions.
diff --git a/get_sentence_embeddings_from_pre-trained_models.ipynb b/get_sentence_embeddings_from_pre-trained_models.ipynb
@@ -26,18 +26,20 @@
     "\n",
     "As mentioned in the readme, here are the pretrained models you can download:\n",
     "\n",
-    "- [sent2vec_wiki_unigrams](https://drive.google.com/uc?export=download&confirm=FHHw&id=0BwblUWuN_Bn9akZpdVg0Qk8zbGs) 5GB (600dim, trained on wikipedia)\n",
-    "- [sent2vec_wiki_bigrams](https://drive.google.com/uc?export=download&confirm=IcCE&id=0BwblUWuN_Bn9RURIYXNKeE5qS1U) 16GB (700dim, trained on wikipedia)\n",
-    "- [sent2vec_twitter_unigrams](https://drive.google.com/uc?export=download&confirm=D2U1&id=0BwblUWuN_Bn9RkdEZkJwQWs4WmM) 13GB (700dim, trained on tweets)\n",
-    "- [sent2vec_twitter_bigrams](https://drive.google.com/uc?export=download&confirm=BheQ&id=0BwblUWuN_Bn9VTEyUzA2ZFNmVWM) 23GB (700dim, trained on tweets)"
+    "- [sent2vec_wiki_unigrams](https://drive.google.com/open?id=0B6VhzidiLvjSa19uYWlLUEkzX3c) 5GB (600dim, trained on english wikipedia)\n",
+    "- [sent2vec_wiki_bigrams](https://drive.google.com/open?id=0B6VhzidiLvjSaER5YkJUdWdPWU0) 16GB (700dim, trained on english wikipedia)\n",
+    "- [sent2vec_twitter_unigrams](https://drive.google.com/open?id=0B6VhzidiLvjSaVFLM0xJNk9DTzg) 13GB (700dim, trained on english tweets)\n",
+    "- [sent2vec_twitter_bigrams](https://drive.google.com/open?id=0B6VhzidiLvjSeHI4cmdQdXpTRHc) 23GB (700dim, trained on english tweets)\n",
+    "- [sent2vec_toronto books_unigrams](https://drive.google.com/open?id=0B6VhzidiLvjSOWdGM0tOX1lUNEk) 2GB (700dim, trained on the [BookCorpus dataset](http://yknzhu.wixsite.com/mbweb))\n",
+    "- [sent2vec_toronto books_bigrams](https://drive.google.com/open?id=0B6VhzidiLvjSdENLSEhrdWprQ0k) 7GB (700dim, trained on the [BookCorpus dataset](http://yknzhu.wixsite.com/mbweb))"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "---\n",
-    "From here, one simple way to get sentence embeddings is to use the `print-vectors` command as shown in the README.  To properly use our models you ideally need to use the same preprocessing used during training. We provide here some simple code wrapping around the `print-vectors` command and handling the tokenization to match our models properly."
+    "From here, one simple way to get sentence embeddings is to use the `print-sentence-vectors` command as shown in the README.  To properly use our models you ideally need to use the same preprocessing used during training. We provide here some simple code wrapping around the `print-sentence-vectors` command and handling the tokenization to match our models properly."
    ]
   },
   {
@@ -66,6 +68,8 @@
     "\n",
     "MODEL_WIKI_UNIGRAMS = os.path.abspath(\"./sent2vec_wiki_unigrams\")\n",
     "MODEL_WIKI_BIGRAMS = os.path.abspath(\"./sent2vec_wiki_bigrams\")\n",
+    "MODEL_TORONTOBOOKS_UNIGRAMS = os.path.abspath(\"./sent2vec_wiki_unigrams\")\n",
+    "MODEL_TORONTOBOOKS_BIGRAMS = os.path.abspath(\"./sent2vec_wiki_bigrams\")\n",
     "MODEL_TWITTER_UNIGRAMS = os.path.abspath('./sent2vec_twitter_unigrams')\n",
     "MODEL_TWITTER_BIGRAMS = os.path.abspath('./sent2vec_twitter_bigrams')"
    ]
@@ -155,7 +159,7 @@
     "    embeddings_path = os.path.abspath('./'+timestamp+'_fasttext.embeddings.txt')\n",
     "    dump_text_to_disk(test_path, sentences)\n",
     "    call(fasttext_exec_path+\n",
-    "          ' print-vectors '+\n",
+    "          ' print-sentence-vectors '+\n",
     "          model_path + ' < '+\n",
     "          test_path + ' > ' +\n",
     "          embeddings_path, shell=True)\n",
@@ -302,7 +306,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.4.3"
+   "version": "3.5.2"
   }
  },
  "nbformat": 4,