Add libtorchtext cpp example #1817

Nayef211 · 2022-07-07T03:59:33Z

Reference Issue #1644

Description

Adding an example that demonstrates how to use libtorchtext and libtorch libraries in a C++ application
The example demonstrates the use of a scripted GPT2BPETokenizer and shows that the tokenization results are consistent across Python and C++

The example is inspired by the libtorchaudio examples from the torchaudio repo and the tokenizer example from @mreso

Nayef211 · 2022-07-07T14:45:19Z

@parmeet one thing I wanted to double check with you is whether to reuse the gpt2_bpe_encoder.json and gpt2_bpe_vocab.bpe in the test/asset folder instead of checking it in the example folder. I thought it would be cleaner if all artifacts for the example were self-contained in one folder, but I could go either way here.

mthrok

gpt2_bpe_vocab.bpe and gpt2_bpe_encoder.json seem to be the second set of the same assets added in https://github.com/pytorch/text/tree/main/test/asset

This kind of trend is one of the reasons why I was suggesting not to check-in such a huge asset. #1462 (comment)

parmeet · 2022-07-07T15:35:05Z

I think instead we could just ask users to download the artifacts in readme? I agree with @mthrok to avoid check-in artifacts.

wget https://download.pytorch.org/models/text/gpt2_bpe_vocab.bpe

wget https://download.pytorch.org/models/text/gpt2_bpe_encoder.json

Nayef211 · 2022-07-07T15:58:09Z

I think instead we could just ask users to download the artifacts in readme? I agree with @mthrok to avoid check-in artifacts.

wget https://download.pytorch.org/models/text/gpt2_bpe_vocab.bpe

wget https://download.pytorch.org/models/text/gpt2_bpe_encoder.json

Thanks for the feedback @mthrok and @parmeet. Just removed the assets and added instructions on how to download it.

parmeet

Thanks @Nayef211 for adding this example application, LGTM!!

examples/libtorchtext/tokenizer/main.cpp

JiedaokouWangguan · 2022-09-18T08:59:05Z

Hi @Nayef211 , I was wondering if we need a CMakeLists.txt in http://text/examples/libtorchtext/tokenizer/? I was not able to run the script
I'm sorry if the question is too dumb, I'm very new to cmake

Nayef211 · 2022-09-19T23:54:22Z

Hi @Nayef211 , I was wondering if we need a CMakeLists.txt in http://text/examples/libtorchtext/tokenizer/? I was not able to run the script I'm sorry if the question is too dumb, I'm very new to cmake

@JiedaokouWangguan thanks for calling out this issue. I think it should be resolved by #1908. Would you be able to check out the PR locally and see if you are able to get the example working for you?

JiedaokouWangguan · 2022-09-20T07:24:11Z

Thanks for the quick fix! Let me have a try

Nayef211 added 3 commits July 6, 2022 20:10

First attempt at adding examples

fd4f79d

Working tokenizer example

79ff868

Fixes to readme

68f0a64

facebook-github-bot added the cla signed label Jul 7, 2022

Nayef211 requested review from parmeet and mthrok July 7, 2022 14:43

Nayef211 marked this pull request as ready for review July 7, 2022 14:45

Formatting fixes

c3f03f9

mthrok reviewed Jul 7, 2022

View reviewed changes

Added instructions to download artifacts

7fc8881

Nayef211 requested a review from mthrok July 7, 2022 17:02

parmeet approved these changes Jul 7, 2022

View reviewed changes

mthrok approved these changes Jul 7, 2022

View reviewed changes

examples/libtorchtext/tokenizer/main.cpp Outdated Show resolved Hide resolved

Resolve PR comments

52eb09c

Nayef211 merged commit cf94d30 into pytorch:main Jul 8, 2022

Nayef211 deleted the example/libtorchtext branch July 8, 2022 03:33

This was referenced Jul 8, 2022

Enable CMake Based Build System for Torchtext #1644

Closed

Error loading saved tokenizer #1255

Closed

Nayef211 mentioned this pull request Sep 19, 2022

Add missing CMakeLists.txt file in tokenizer dir #1908

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add libtorchtext cpp example #1817

Add libtorchtext cpp example #1817

Nayef211 commented Jul 7, 2022 •

edited

Nayef211 commented Jul 7, 2022

mthrok left a comment

parmeet commented Jul 7, 2022

Nayef211 commented Jul 7, 2022

parmeet left a comment

JiedaokouWangguan commented Sep 18, 2022

Nayef211 commented Sep 19, 2022

JiedaokouWangguan commented Sep 20, 2022

Add libtorchtext cpp example #1817

Add libtorchtext cpp example #1817

Conversation

Nayef211 commented Jul 7, 2022 • edited

Description

Nayef211 commented Jul 7, 2022

mthrok left a comment

Choose a reason for hiding this comment

parmeet commented Jul 7, 2022

Nayef211 commented Jul 7, 2022

parmeet left a comment

Choose a reason for hiding this comment

JiedaokouWangguan commented Sep 18, 2022

Nayef211 commented Sep 19, 2022

JiedaokouWangguan commented Sep 20, 2022

Nayef211 commented Jul 7, 2022 •

edited