Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make vector loading more flexible. #353

Merged
merged 10 commits into from
Sep 12, 2018

Conversation

anna-hope
Copy link

@anna-hope anna-hope commented Jul 21, 2018

  • Add support for passing a max_vectors argument, in situations where the entire set of pre-trained vectors can't fit in memory (most pre-trained vector sets are sorted in the descending order of word frequency, so if you pass, say, max_vectors=100_000, you will get pre-trained for 100_000 most common words)
  • Add ability to read pre-trained vectors from gzipped files (without tar)
  • Read pre-trained vectors by iterating over the vectors file object line by line, without reading the whole file into memory first
  • Only load the file in binary mode to make parsing logic more compact
  • Make vector loading more memory-efficient (based on Fix vector loading memory requirements #298)
  • Minor refactoring to improve style

@mttk
Copy link
Contributor

mttk commented Sep 11, 2018

@notnami could you fix the issues Travis has with code style so this can be merged? (there's just a few indentations and unused imports)

@anna-hope
Copy link
Author

anna-hope commented Sep 12, 2018

I fixed the issues with style, but now the Python 3.5 build has failed for an unclear reason. I believe that same build had passed before. Someone with write access to the repo could restart it, or I could close this PR and re-submit.

@mttk
Copy link
Contributor

mttk commented Sep 12, 2018

@notnami I restarted it, probably just timed out. Will merge when it passes.

@mttk mttk merged commit 0bbcd0a into pytorch:master Sep 12, 2018
@mttk
Copy link
Contributor

mttk commented Sep 12, 2018

@notnami thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants