Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update load_sp_model func to accept a file object #965

Merged
merged 9 commits into from
Sep 4, 2020

Conversation

zhangguanheng66
Copy link
Contributor

@zhangguanheng66 zhangguanheng66 commented Sep 3, 2020

Test against pytext.utils.file_io.PathManager and it works with spm = load_sp_model(PathManager.open('m_user.model', 'rb'))

@codecov
Copy link

codecov bot commented Sep 3, 2020

Codecov Report

Merging #965 into master will increase coverage by 0.00%.
The diff coverage is 85.71%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #965   +/-   ##
=======================================
  Coverage   77.53%   77.54%           
=======================================
  Files          44       44           
  Lines        3085     3090    +5     
=======================================
+ Hits         2392     2396    +4     
- Misses        693      694    +1     
Impacted Files Coverage Δ
torchtext/data/functional.py 86.84% <85.71%> (-1.04%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b62e34b...eb72d18. Read the comment docs.

@zhangguanheng66
Copy link
Contributor Author

@hudeven Let me know if this API works for the internal cases.

Comment on lines +58 to +61
elif isinstance(spm, io.BufferedReader):
return torch.ops.torchtext.load_sp_model_string(spm.read())
else:
raise RuntimeError('the input of the load_sp_model func is not supported.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangguanheng66 PathManager.open() returns "_io.TextIOWrapper" type and will fail here. How about remove the check?

Suggested change
elif isinstance(spm, io.BufferedReader):
return torch.ops.torchtext.load_sp_model_string(spm.read())
else:
raise RuntimeError('the input of the load_sp_model func is not supported.')
return torch.ops.torchtext.load_sp_model_string(spm.read())

Copy link
Contributor Author

@zhangguanheng66 zhangguanheng66 Sep 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that and was thinking about adding that check.
However, with PathManager.open() , it will return f as _io.TextIOWrapper. It seems not working with sentencepiece constructor. It needs to be binary reading where PathManager.open(, 'rb') returns io.BufferedReader. @hudeven

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangguanheng66 you are right! It LGTM. Note: there is a test failure

@zhangguanheng66 zhangguanheng66 merged commit 53e3ae2 into pytorch:master Sep 4, 2020
elif isinstance(spm, io.BufferedReader):
return torch.ops.torchtext.load_sp_model_string(spm.read())
else:
raise RuntimeError('the input of the load_sp_model func is not supported.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the type is not detect you can specify the type found in the error message to make it easier to debug.

@@ -68,7 +68,7 @@ def test_sentencepiece_numericalizer(self):
def test_sentencepiece_tokenizer(self):
test_sample = 'SentencePiece is an unsupervised text tokenizer and detokenizer'
model_path = get_asset_path('spm_example.model')
sp_model = load_sp_model(model_path)
sp_model = load_sp_model(open(model_path, 'rb'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertRaises an unsupported type

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants