Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tweet-lm.gz #1

Closed
diegoceccarelli opened this issue Jan 15, 2014 · 4 comments
Closed

tweet-lm.gz #1

diegoceccarelli opened this issue Jan 15, 2014 · 4 comments

Comments

@diegoceccarelli
Copy link

I cannot find the tweet-lm file in the repo, could you add it? or explain how it can be generated? thanks

@gouwsmeister
Copy link
Owner

The language models depend on the type of data that you'll be working with, and should be trained using SRI-LM. It is really easy to train a new language model using SRI-LM, and also very fast. If you want to try out the tool, you can use these two language models that I trained on LA Times English and a sample of tweets that I was working with:

i) https://dl.dropboxusercontent.com/u/2424861/latimes-lm.gz
ii) https://dl.dropboxusercontent.com/u/2424861/tweet-lm.gz

@phavanh
Copy link

phavanh commented Sep 24, 2015

I would like to try out your system, so do you mind if i ask you to share your two language models again. A heap of thanks

@gouwsmeister
Copy link
Owner

I uploaded these two files. latimes-lm.gz is split into two files to come in below the 50Mb limit. Simply join them using cat latimes-lm.gz.part* > latimes-lm.gz.

@diegoceccarelli
Copy link
Author

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants