Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification about TensorProto.segment #2630

Open
edgchen1 opened this issue Feb 27, 2020 · 4 comments
Open

Clarification about TensorProto.segment #2630

edgchen1 opened this issue Feb 27, 2020 · 4 comments
Labels
spec clarification Clarification of the ONNX spec needed

Comments

@edgchen1
Copy link
Contributor

TensorProto has a segment member.

From here:

  // For very large tensors, we may want to store them in chunks, in which
  // case the following fields will specify the segment that is stored in
  // the current TensorProto.
  message Segment {
    optional int64 begin = 1;
    optional int64 end = 2;
  }
  optional Segment segment = 3;

What do begin and end mean? How are large tensors supposed to be represented with multiple TensorProtos?

External data is another way to handle large tensors. Does that mechanism supersede this, or is there still a use for segment?

On a related note, I also had some TensorProto questions here: #2392. Any clarification there would be appreciated as well.

@linkerzhang
Copy link
Member

Thank you! @edgchen1

Yep. this looks confusing... @postrational any comments on this?

I'd suggest to deprecate one mechanism of storing large tensors if possible.

@postrational
Copy link
Contributor

The external data field was designed to work around limitations of protobuf file size limits.
The Segment approach does not seem to address this issue.

I haven't yet seen a model using Segment-ted tensors, but there may be users out there.

@boydjohnson
Copy link

This is the Rust debug println of this test proto: https://github.com/onnx/onnx/blob/master/onnx/backend/test/data/node/test_identity_sequence/test_data_set_0/input_0.pb

TensorProto { dims: [120], data_type: 1, segment: Some(Segment { begin: 2, end: 1 }), float_data: [], int32_data: [], string_data: [], int64_data: [], name: "", doc_string: "", raw_data: [], external_data: [], data_location: Default, double_data: [], uint64_data: [] }

Could someone comment on what the begin and end fields of segment mean and where to find the Tensor data?

@edgchen1
Copy link
Contributor Author

FWIW, I think this is still an unanswered question. The comments haven't been clarified yet:

onnx/onnx/onnx.in.proto

Lines 517 to 524 in a252e65

// For very large tensors, we may want to store them in chunks, in which
// case the following fields will specify the segment that is stored in
// the current TensorProto.
message Segment {
optional int64 begin = 1;
optional int64 end = 2;
}
optional Segment segment = 3;

@jcwchen jcwchen reopened this Sep 15, 2022
@justinchuby justinchuby added the spec clarification Clarification of the ONNX spec needed label Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec clarification Clarification of the ONNX spec needed
Projects
None yet
Development

No branches or pull requests

6 participants