Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "How to use GloVe as a TensorFlow GPU Layer" #147

Closed
wants to merge 1 commit into from

Conversation

guillaume-chevalier
Copy link
Contributor

No description provided.

@NirantK
Copy link
Collaborator

NirantK commented Oct 13, 2018

Hey @guillaume-chevalier, great work!

Could you consider adding tests to your own repository? That'd make it easier for users like me to trust your approach (and code).

The open issues are also worth looking into, specially given that they raise concerns about how re-usable this code is?

@guillaume-chevalier
Copy link
Contributor Author

Hi @NirantK, thank you for your quick response.

For now, running the code "as-is" should produce the same output as in the notebook. The embeddings loaded are by default the 25-dimensional embeddings, for example, it fits in the RAM of my laptop for a quick demo. I think the output of the notebook is convincing that it works. It would indeed give a better output using the full 300 dims, which would require changing a constant

By the way, it uses the Chakin downloader, which is just a collection of embeddings ready to be download out-of-the-box: https://github.com/chakki-works/chakin
This way, running the code should download and unzip automatically some official embeddings, so it's easy to validate for anyone.

Somehow, I recognize my code would benefit from a refactor, such as extracting the code to a class rather than as a notebook. And once it's a class, then it'd be easily TDD-testable.

By the way, the issues opened on my project only seems to be questions resulting from the misusage of modified versions of my code (as the code in the issues is different and/or has different filenames and paths than my original code). Real issues would be more like using "utf8" in the file's open() method for the thing to work on windows and not only under UNIX. There is also 1 typo that would need an edit (an exponent by two in the latex). If you have any other questions, just ask. I'll add those issues myself for now, as I'd prefer waiting before the release of TensorFlow 2.0 before editing the code again.

@guillaume-chevalier
Copy link
Contributor Author

Note: 3+ persons already ran my code on their laptop, so that's why I know for the Linux/Windows thing.

@the-ethan-hunt
Copy link
Collaborator

@NirantK I think @guillaume-chevalier is right in the case regarding the issues and the test results. Should I merge this?

@NirantK
Copy link
Collaborator

NirantK commented Oct 13, 2018

I'm still on the bench whether this is awesome enough to be listed together with Richard Socher et al @the-ethan-hunt.

The apprehension comes from two questions:

  1. Is this battle-tested enough? e.g. similar to NLTK, spaCy or even indic-libraries
  2. Whether this is useful/novel enough to consider some leeway for testing? (as we did for KoNLPy)

Which is why I asked him to increase the code quality via tests et al.
That would have definitely biased me in favour of including it.

What do you think?

@NirantK
Copy link
Collaborator

NirantK commented Oct 30, 2018

Closing this for now. Happy to review again with some changes

@NirantK NirantK closed this Oct 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants