Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support 'bytes' type in torchscript #18005

Open
pritamdamania87 opened this issue Mar 14, 2019 · 5 comments
Open

Support 'bytes' type in torchscript #18005

pritamdamania87 opened this issue Mar 14, 2019 · 5 comments
Assignees
Labels
jit-backlog oncall: jit Add this issue/PR to JIT oncall triage queue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@pritamdamania87
Copy link
Contributor

pritamdamania87 commented Mar 14, 2019

馃殌 Feature

Need support for 'bytes' type in torchscript

Motivation

While dealing with image data, we have use cases where we download a blob from a blobstore and need to pass it to a torchscript module for further processing (ex: torchvision.transforms).

Currently, there is no clean way to do this since pybind enforces that python strings be utf-8 encoded and as a result we can't pass in arbitrary bytes to a torchscript module.

Supporting the python 'bytes' type in torchscript would be a clean way to support use cases like this.

cc @suo

@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Mar 14, 2019
@pritamdamania87
Copy link
Contributor Author

@dzhulgakov

@dzhulgakov
Copy link
Collaborator

Desired semantics for str and bytes:

  • they behave like str and bytes types in Python3 (though encode/decode functions won't be part of script for a while likely)
  • in Python3 str and bytes can be converted to the corresponding Python types only
  • in Python2, both TorchScript str and bytes can accept python's str type as an input. Since TorchScript is statically typed at this level we can disambiguate which type they should become. When transferred to python land both of the types convert to regular str in py2.
  • in Python2, we might consider accepting unicode type as source for str, but it might be cleaner and simpler to just forbid it

@apaszke
Copy link
Contributor

apaszke commented Mar 18, 2019

Can't you just store the raw data in a byte tensor? I really think we should be careful about the limitations of what we implement, or we'll end up with a complete reimplementation of Python.

@pritamdamania87
Copy link
Contributor Author

@apaszke Short term, we are using byte tensor for this. Although, I think the concern in general is if we store arbitrary blobs in bytetensor, its not very clear what individual elements in the tensor represent. For example, if I store a jpg blob in a bytetensor and then reshape it, the new tensor doesn't have much meaning (similarly for multiplying/adding tensors).

@apaszke
Copy link
Contributor

apaszke commented Mar 19, 2019

And how does adding a byte type help with this? Sure, you can't resize/multiply them, but why would you do that to a tensor that has binary data in the first place?

@suo suo added the jit-backlog label Oct 3, 2019
@wanchaol wanchaol added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jit-backlog oncall: jit Add this issue/PR to JIT oncall triage queue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

7 participants