fnlp

This repo contains scripts to train NLP models using the text data.

Dependencies

pytorch
numpy
nltk.tokenize

Train new GloVe vectors

glove.py contains a GloVe model written in pytorch. dataset.py contains a Dataset class - it is written in a way so that torch.utils.data.DataLoader utility class of pytorch can be used for training.

$ python3 glove.py --input wiki_data.txt --batch_size 512

Check the word Vectors

Trained word vectors are available on the releases page.

Let's check if the closest words make sense.

$ python3 test_word_vectors.py --word IRA
roth, iras, sep, 401, contribute

$ python3 test_word_vectors.py --word option
call, options, put, exercise, underlying

$ python3 test_word_vectors.py --word stock
shares, share, market, stocks, price

Notes

This CPU-only implementation is not yet optimized. For training on CPU, it might be best to download the Glove software from here.

Credits

GloVe Paper
TorchGlove repo

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
scrapy-spiders		scrapy-spiders
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
glove.py		glove.py
test_word_vectors.py		test_word_vectors.py
wiki_data.txt		wiki_data.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fnlp

Dependencies

Train new GloVe vectors

Check the word Vectors

Notes

Credits

License

About

Releases 5

Packages

Languages

License

hardikp/fnlp

Folders and files

Latest commit

History

Repository files navigation

fnlp

Dependencies

Train new GloVe vectors

Check the word Vectors

Notes

Credits

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Languages

Packages