Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unpack indices of arbitrary component types #227

Merged
merged 3 commits into from
Nov 27, 2023
Merged

Unpack indices of arbitrary component types #227

merged 3 commits into from
Nov 27, 2023

Conversation

shawnhatori
Copy link
Contributor

Since the memcpy only needs the size of the index data, it can use the component size (when equal to the stride) to determine the total number of bytes to copy. This allows it to work for different component types, including the common
cgltf_component_type_r_16u.

This also adds a reference comment for the function at the top of the file, consistent with cgltf_accessor_unpack_floats.

@zeux
Copy link
Contributor

zeux commented Sep 18, 2023

This doesn't seem right. When the input data is floats, we will now memcpy floats into 32-bit unsigned int output which is not what the caller expects. When the input data is r16u, we will now memcpy 16-bit indices into 32-bit unsigned int output which is also not what the caller expects. This function gives the caller a 32-bit unsigned index buffer while handling format conversions; the memcpy is an optimization for the case where no conversion is required.

@shawnhatori
Copy link
Contributor Author

shawnhatori commented Sep 18, 2023

You're right, the change as written was not clear. I'll explain the intention.

What the API currently does: Requires index data to be stored in a uint32_t array. If the component size is smaller than uint32_t, each element is padded to 32-bit boundaries (e.g. for r16u, 2, 1, 0, 3 -> 0x02000000 0x01000000 0x00000000 0x03000000).

What I was expecting: Takes in a void* for an index array and an expected component_size (i.e. the data type of the index array). If the component_size of the accessor is larger than expected, returns 0. If it is smaller than expected, pad to expected (this replicates the existing padding behavior but for any larger type). Else, they're equal so memcopy everything (fast path) (e.g. for r16u, 2, 1, 0, 3 -> 0x0200 0x0100 0x0000 0x0300).

In my use case (and I assume many others), I own and know my mesh data, which all have 16-bit index values so my indices are stored as an array of uint16_t. My graphics API backends are also expecting index data packed as R16U. So I was hoping to have an API function to handle the unpacking boilerplate that respects the data type of the index array.

I'll leave it up to the maintainers to determine if that's a use case worth supporting, and if so, how to without a breaking API change. For the sake of discussion, I'll add a temporary new function that aims to handle this case as described above.

Since the `memcpy` only needs the size of the index data, it can
use the component size (when equal to the stride) to determine
the total number of bytes to copy. This allows it to work for
different component types, including the common
`cgltf_component_type_r_16u`.

This also adds a reference comment for the function at the top of
the file, consistent with `cgltf_accessor_unpack_floats`.
@shawnhatori
Copy link
Contributor Author

@zeux Does the most recent code change address the data type issues you identified in your previous comment? I believe it's correct now.

For what it's worth, I've been using this change in my own asset pipeline to support my 16-bit index use case and it has worked great. I did test in a debugger using 32-bit indices and the data layout appears correct (i.e. no breaking change).

If this is a use case worth supporting, the interface obviously needs tweaks. This could possibly be an _ex function.

@zeux
Copy link
Contributor

zeux commented Oct 25, 2023

Yeah I think this addresses my concerns; I tested this in gltfpack which uses 32-bit indices and the change works without issues. I think we would need to either add this argument as mandatory to unpack_indices, or call the new function something else. Because unpack_indices was added after the last numbered release, I would recommend just adding a required argument to the function - all callers should be trivial to adjust.

There's still one corner case that I mentioned that isn't handled the same way with this change: namely, when the input is floats, memcpy will reinterpret the bits if the destination is unsigned int. If the component type is passed instead of a component size, the function can do memcpy only when the accessor type matches the expected type. You will note that cgltf_component_read_index does support floats as the input.

However, thinking about this and looking at the history, I am not sure why we're supporting reading floats as indices in the first place. This was initially introduced all the way back in 1f77128 and then gradually refactored into the current state. I would expect read_index to be used when glTF spec disallows floating point values, and the only example OTOH that's even theoretically plausible is feature IDs in upcoming EXT_mesh_features, that I would reasonably expect to read as floats.

Maybe @prideout has context here -- is reading floats-as-indices via cgltf_accessor_read_index used in Filament?

cgltf.h Outdated
{
*dest = (cgltf_uint)cgltf_component_read_index(element, accessor->component_type);
cgltf_size index_data = cgltf_component_read_index(element, accessor->component_type);
memcpy(dest, &index_data, index_component_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor note is that this only works on little-endian platforms. Maybe that's fine though, as we have issues with endian conversion in general, but a comment would be nice here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this should also copy out_component_size, not index_component_size.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, I would need to measure this, but this might significantly regress the 16->32 bit conversion speed, because the memcpy here is very short and variable size so the compiler won't inline it. It might be better to explicitly handle index_component_size=1,2,4 with a switch and a fixed-length write directly.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Do you want to profile it before we merge it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'll need to double check perf. We never need to copy 1 byte indices here since even if out_component_size is 1, this forces us down the memcpy path, so I think this can just check if out_component_size is 2 and write a 2-byte or 4-byte index accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this should also copy out_component_size, not index_component_size.

I'm not sure this is correct. If the output array is uint32_t (out_component_size) but the each index is uint16_t (index_component_size), we want to:

  1. Copy just the uint16_t of index data.
  2. Move the index pointer to the next index (+ 16 bits).
  3. Move the array pointer to the next index (+ 32 bits).

This is what the current loop is doing. Let me know if I'm misunderstanding your proposed change.

I think this can just check if out_component_size is 2 and write a 2-byte or 4-byte index accordingly.

Keep in mind, this else block handles the case where index_component_size < out_component_size. According to the glTF specification, index data can be UNSIGNED_BYTE, UNSIGNED_SHORT, or UNSIGNED_INT. So the cases would be (as you said earlier):

  • index size 1 to output size 2, 4, 8
  • index size 2 to output size 4, 8
  • index size 4 to output size 8

Finally, I would need to measure this, but this might significantly regress the 16->32 bit conversion speed, because the memcpy here is very short and variable size so the compiler won't inline it. It might be better to explicitly handle index_component_size=1,2,4 with a switch and a fixed-length write directly.

Yeah looking at Godbolt, I think you're right. Switching from the variable copy to the fixed-length one inlines on GCC/Clang/MSVC.
https://godbolt.org/z/6rqYYfvoq

I've done this in the latest change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the cost of a little copy-paste, I moved the switch outside of the for loop so you don't pay the cost of the branch every loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy just the uint16_t of index data.

This will leave gaps in the target data: for every 4 bytes that the caller expects, you'll be only writing 2. This works without changes in gltfpack (which expects 32-bit output), but that's only because gltfpack uses zero-initialized memory for the output buffer which should not be expected.

Since we have separate loops anyway, I would recommend just writing uint16_t or uint32_t according to the output (critically!) component size. This will probably also mean we'll be able to drop the case 1 as it becomes redundant, and this will actually remove the extra endianness concerns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was thinking to require the user to provide a zero-initialized array. You're right that this isn't necessarily a reasonable expectation.

Ah okay I think I see what you're saying now. I've made this change and tested it for the 16-bit data -> 32-bit array case, works as expected.

@prideout
Copy link
Contributor

Maybe @prideout has context here -- is reading floats-as-indices via cgltf_accessor_read_index used in Filament?

No. If we were to remove that particular case from the switch, I don't think we would adversely affect Filament.

As I recall, I included that case for completeness, since this function was originally conceived as a general purpose conversion utility, which is consistent with a strict interpretation of its docblock at the top of the file.

Note: I am no longer affiliated with Filament or Google.

@zeux
Copy link
Contributor

zeux commented Oct 25, 2023

Great, thanks! I submitted #232 to remove these to avoid future confusion and ensure that unpack_indices never encounters valid scenarios where it could decode float -> int.

Copy link
Contributor

@zeux zeux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks good to merge.

@jkuhlmann jkuhlmann merged commit 3209a22 into jkuhlmann:master Nov 27, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants