Skip to content

Commit

Permalink
print-sentence-vectors instead of print-vectors
Browse files Browse the repository at this point in the history
  • Loading branch information
mpagli committed Jul 5, 2017
1 parent 6bd297f commit 56c0777
Showing 1 changed file with 11 additions and 7 deletions.
18 changes: 11 additions & 7 deletions get_sentence_embeddings_from_pre-trained_models.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -26,18 +26,20 @@
"\n",
"As mentioned in the readme, here are the pretrained models you can download:\n",
"\n",
"- [sent2vec_wiki_unigrams](https://drive.google.com/uc?export=download&confirm=FHHw&id=0BwblUWuN_Bn9akZpdVg0Qk8zbGs) 5GB (600dim, trained on wikipedia)\n",
"- [sent2vec_wiki_bigrams](https://drive.google.com/uc?export=download&confirm=IcCE&id=0BwblUWuN_Bn9RURIYXNKeE5qS1U) 16GB (700dim, trained on wikipedia)\n",
"- [sent2vec_twitter_unigrams](https://drive.google.com/uc?export=download&confirm=D2U1&id=0BwblUWuN_Bn9RkdEZkJwQWs4WmM) 13GB (700dim, trained on tweets)\n",
"- [sent2vec_twitter_bigrams](https://drive.google.com/uc?export=download&confirm=BheQ&id=0BwblUWuN_Bn9VTEyUzA2ZFNmVWM) 23GB (700dim, trained on tweets)"
"- [sent2vec_wiki_unigrams](https://drive.google.com/open?id=0B6VhzidiLvjSa19uYWlLUEkzX3c) 5GB (600dim, trained on english wikipedia)\n",
"- [sent2vec_wiki_bigrams](https://drive.google.com/open?id=0B6VhzidiLvjSaER5YkJUdWdPWU0) 16GB (700dim, trained on english wikipedia)\n",
"- [sent2vec_twitter_unigrams](https://drive.google.com/open?id=0B6VhzidiLvjSaVFLM0xJNk9DTzg) 13GB (700dim, trained on english tweets)\n",
"- [sent2vec_twitter_bigrams](https://drive.google.com/open?id=0B6VhzidiLvjSeHI4cmdQdXpTRHc) 23GB (700dim, trained on english tweets)\n",
"- [sent2vec_toronto books_unigrams](https://drive.google.com/open?id=0B6VhzidiLvjSOWdGM0tOX1lUNEk) 2GB (700dim, trained on the [BookCorpus dataset](http://yknzhu.wixsite.com/mbweb))\n",
"- [sent2vec_toronto books_bigrams](https://drive.google.com/open?id=0B6VhzidiLvjSdENLSEhrdWprQ0k) 7GB (700dim, trained on the [BookCorpus dataset](http://yknzhu.wixsite.com/mbweb))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"From here, one simple way to get sentence embeddings is to use the `print-vectors` command as shown in the README. To properly use our models you ideally need to use the same preprocessing used during training. We provide here some simple code wrapping around the `print-vectors` command and handling the tokenization to match our models properly."
"From here, one simple way to get sentence embeddings is to use the `print-sentence-vectors` command as shown in the README. To properly use our models you ideally need to use the same preprocessing used during training. We provide here some simple code wrapping around the `print-sentence-vectors` command and handling the tokenization to match our models properly."
]
},
{
Expand Down Expand Up @@ -66,6 +68,8 @@
"\n",
"MODEL_WIKI_UNIGRAMS = os.path.abspath(\"./sent2vec_wiki_unigrams\")\n",
"MODEL_WIKI_BIGRAMS = os.path.abspath(\"./sent2vec_wiki_bigrams\")\n",
"MODEL_TORONTOBOOKS_UNIGRAMS = os.path.abspath(\"./sent2vec_wiki_unigrams\")\n",
"MODEL_TORONTOBOOKS_BIGRAMS = os.path.abspath(\"./sent2vec_wiki_bigrams\")\n",
"MODEL_TWITTER_UNIGRAMS = os.path.abspath('./sent2vec_twitter_unigrams')\n",
"MODEL_TWITTER_BIGRAMS = os.path.abspath('./sent2vec_twitter_bigrams')"
]
Expand Down Expand Up @@ -155,7 +159,7 @@
" embeddings_path = os.path.abspath('./'+timestamp+'_fasttext.embeddings.txt')\n",
" dump_text_to_disk(test_path, sentences)\n",
" call(fasttext_exec_path+\n",
" ' print-vectors '+\n",
" ' print-sentence-vectors '+\n",
" model_path + ' < '+\n",
" test_path + ' > ' +\n",
" embeddings_path, shell=True)\n",
Expand Down Expand Up @@ -302,7 +306,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.4.3"
"version": "3.5.2"
}
},
"nbformat": 4,
Expand Down

0 comments on commit 56c0777

Please sign in to comment.