Binary model that was trained on Common crawl #428

MrBoor · 2018-02-06T04:35:41Z

Hello!
I enjoy using your library and pretrained vectors. I see that for vectors that were trained on wiki you provide both binary model and pretrained vectors. However, for vectors that were trained on Common crawl, you only provide pretrained vectors. Is it possible for you to publish binary model for them?

Thanks,
Alexander.

orech · 2018-02-12T20:08:07Z

That would be very helpful for me as well

JovanVeljanoski · 2018-02-14T17:51:48Z

I would also very much appreciate it if you could publish the binary model. Thanks!

rboyes · 2018-03-08T16:11:12Z

Yes it would be very useful

rboyes · 2018-03-09T08:00:56Z

For the english link you post above, they only contain the word vectors, not the model .bin files, which is what we are asking for.

With the model files, we can create out of vocabulary word vectors, but we can't do that with the word vectors only.

phdowling · 2018-04-05T14:32:02Z

Also interested in this. The bin files for english would be very valuable.

m09 · 2018-04-24T20:36:19Z

I would also be interested in the binary vectors.

Schneitzer · 2018-05-11T11:09:57Z

Is there a reason why the .bin file will not be made open to the public?

It would be really helpful to be able to generate OOV word vectors for English words, but without the .bin file this would not be possible.

maxfriedrich · 2018-05-11T11:24:14Z

I found a link to an English .bin in the comments of #494: https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki-news-300d-1M-subword.bin.zip

Schneitzer · 2018-05-11T12:08:00Z

Thank you maxfriedrich.

However, I think most of us would like to see the .bin file on the Common Crawl corpus. The link you provided only contains the vectors trained on the Wikipedia and News, but not on the Common Crawl.

I'm currently working on text classification tasks on Tweets, so it would be nice to have the Common Crawl vectors. Hope it will be published later.

rktamplayo · 2018-05-21T03:50:10Z

Any update on this? I hope an admin at least assign someone to answer our queries...

thusithaC · 2018-06-23T11:57:47Z

This is indeed strange. For non English languages, the common crawl binaries are available but for English (which is most widely used) it is missing?

yuchsiao · 2018-08-11T05:41:55Z

Just check in back to see if there is any plan to release the common crawl version of binaries for English. Any update?

vdpappu · 2018-08-14T07:20:45Z

just popping this up. checking if we could bet the binaries for commoncrawl

EdouardGrave · 2018-08-14T19:35:15Z

Hi all,

Thank you for raising this issue.

The model trained on the common crawl data did not use subwords, and thus the binary model would not contain anymore information compared to the text file that we released. In particular, this binary model could not be used to compute representation for out of vocabulary words. This is the reason why we did not release the binary model.

We will likely release a model trained on crawl data with subwords in the near future (both binary and text models will be released).

Best,
Edouard.

thusithaC · 2018-11-04T11:36:27Z

@EdouardGrave Hi Edo, Any update on the sub-word model trained on the common crawl?

EdouardGrave closed this as completed Aug 14, 2018

This was referenced Sep 11, 2018

fastText facebookresearch/InferSent#10

Closed

handling OOV entities facebookresearch/InferSent#46

Closed

plan to support fasttext? facebookresearch/InferSent#83

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Binary model that was trained on Common crawl #428

Binary model that was trained on Common crawl #428

MrBoor commented Feb 6, 2018

orech commented Feb 12, 2018

JovanVeljanoski commented Feb 14, 2018

rboyes commented Mar 8, 2018

rboyes commented Mar 9, 2018

phdowling commented Apr 5, 2018

m09 commented Apr 24, 2018

Schneitzer commented May 11, 2018

maxfriedrich commented May 11, 2018

Schneitzer commented May 11, 2018

rktamplayo commented May 21, 2018

thusithaC commented Jun 23, 2018

yuchsiao commented Aug 11, 2018

vdpappu commented Aug 14, 2018

EdouardGrave commented Aug 14, 2018

thusithaC commented Nov 4, 2018

Binary model that was trained on Common crawl #428

Binary model that was trained on Common crawl #428

Comments

MrBoor commented Feb 6, 2018

orech commented Feb 12, 2018

JovanVeljanoski commented Feb 14, 2018

rboyes commented Mar 8, 2018

rboyes commented Mar 9, 2018

phdowling commented Apr 5, 2018

m09 commented Apr 24, 2018

Schneitzer commented May 11, 2018

maxfriedrich commented May 11, 2018

Schneitzer commented May 11, 2018

rktamplayo commented May 21, 2018

thusithaC commented Jun 23, 2018

yuchsiao commented Aug 11, 2018

vdpappu commented Aug 14, 2018

EdouardGrave commented Aug 14, 2018

thusithaC commented Nov 4, 2018