New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add libtorchtext cpp example #1817
Conversation
@parmeet one thing I wanted to double check with you is whether to reuse the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gpt2_bpe_vocab.bpe
and gpt2_bpe_encoder.json
seem to be the second set of the same assets added in https://github.com/pytorch/text/tree/main/test/asset
This kind of trend is one of the reasons why I was suggesting not to check-in such a huge asset. #1462 (comment)
I think instead we could just ask users to download the artifacts in readme? I agree with @mthrok to avoid check-in artifacts.
|
Thanks for the feedback @mthrok and @parmeet. Just removed the assets and added instructions on how to download it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Nayef211 for adding this example application, LGTM!!
Hi @Nayef211 , I was wondering if we need a CMakeLists.txt in |
@JiedaokouWangguan thanks for calling out this issue. I think it should be resolved by #1908. Would you be able to check out the PR locally and see if you are able to get the example working for you? |
Thanks for the quick fix! Let me have a try |
Reference Issue #1644
Description
libtorchtext
andlibtorch
libraries in a C++ applicationGPT2BPETokenizer
and shows that the tokenization results are consistent across Python and C++The example is inspired by the libtorchaudio examples from the torchaudio repo and the tokenizer example from @mreso