Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add String data type support #58

Merged
merged 4 commits into from
Feb 24, 2021

Conversation

marshallpierce
Copy link
Contributor

  • Tensor construction now uses different logic for primitive types (using their native in-memory layout) and strings (filling the tensor with onnxruntime's FillStringTensor).
    • Extended TypeToTensorElementDataType to also be able to expose utf8 contents, if present
    • Utf8Data trait to make it possible to use both String and &str
  • call_ort helper that takes care of mapping the ort status to a Result so you can't forget to do it
  • print_structure example that shows the inputs and outputs of an .onnx model (names, types, etc)

Without this, interacting with session inputs for a model with a string input would lead to SIGABRT as the auto-generated Debug, etc, didn't know what to do with an enum variant of 8 (onnxruntime's string type).

If this looks good, I'll work on an integration test with a trivial model to ensure that feeding data through actually does work, but I wanted to get feedback on the approach first.

- Tensor construction now uses different logic for primitive types (using their native in-memory layout) and strings (filling the tensor with onnxruntime's `FillStringTensor`).
  - Extended `TypeToTensorElementDataType` to also be able to expose utf8 contents, if present
  - `Utf8Data` trait to make it possible to use both `String` and `&str`
- `call_ort` helper that takes care of mapping the ort status to a Result so you can't forget to do it
- `print_structure` example that shows the inputs and outputs of an .onnx model (names, types, etc)
@marshallpierce marshallpierce marked this pull request as ready for review February 16, 2021 22:12
@marshallpierce
Copy link
Contributor Author

Strings outputs aren't properly handled, and also even a trivial string model that only applies Tensorflow's unique produces multiple outputs of different types, so output types will need to become more flexible. Returning to draft status until I have that resolved.

@marshallpierce marshallpierce marked this pull request as draft February 18, 2021 23:16
Copy link
Owner

@nbigaouette nbigaouette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work and sorry for my late reply!

Except from small nitpicks, I wouldn't mind merging this. I would even like to see a small/simple .onnx model committed to test the functionality. Can you create a one that would take a <10s KB?

Thanks!

onnxruntime/src/error.rs Outdated Show resolved Hide resolved
onnxruntime/src/error.rs Outdated Show resolved Hide resolved
onnxruntime/src/tensor/ort_tensor.rs Show resolved Hide resolved
onnxruntime/src/tensor/ort_tensor.rs Show resolved Hide resolved
@marshallpierce
Copy link
Contributor Author

I'll tidy up this stuff per your comments. I've found that just applying TensorFlow's unique as an ultra simple "model" produces a tiny 424 byte .onnx artifact that doesn't need any custom operators, so no problem checking that in, but it produces outputs of both string and int types. Could add a test that it doesn't crash when you run it with string inputs, but can't really meaningfully assert on the output until (1) string output and (2) dynamic type output are addressed in a follow up PR.

@marshallpierce marshallpierce marked this pull request as ready for review February 22, 2021 17:20
@nbigaouette
Copy link
Owner

Awesome! Thanks for this!! 👍

@nbigaouette nbigaouette merged commit 73d770f into nbigaouette:master Feb 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants