Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change ScalarType to U8 for colors in lookup tables stored in binary files #47

Merged

Conversation

Siimeloni
Copy link
Contributor

When trying to parse a binary .vtk file the program had an error in the Vtk::parse_legacy_be() function.

2024-08-15T11:56:05.184570Z ERROR GST convert failed error=
   0: Parse error: Complete

The file has following structure:

# vtk DataFile Version 3.0
vtk output
BINARY
DATASET UNSTRUCTURED_GRID
POINTS 2804 float
---data---

CELL_TYPES 3633
--- data---

CELL_DATA 3633

SCALARS colors float 1
LOOKUP_TABLE color_table
---data---

LOOKUP_TABLE color_table 2
---data---
contains (in binary, encoded as u8):
1 0 0 1
1 1 1 1
-----------

The error happens in the attribute_lookup_table() function.

It was not possible to parse this file because vtkio was expecting the color values (r, g, b, a) to be encoded in f32 binary and not in u8. Since we are able to write these files on our own we tested output with the color values written as f32, but on import ParaView would throw an error. This does not happen with the u8 color values, so we assume that u8 is the correct encoding here.

When changing the appropriate location in vtkio we are able to successfully parse these files.

@elrnv
Copy link
Owner

elrnv commented Aug 16, 2024

This looks like a bug, nice find and thank you for the PR!
The correct behavior is in the spec and is thankfully consistent with your comment:

The definition of color scalars (i.e., unsigned char values directly mapped to color) varies depending upon the number of values (nValues) per scalar. If the file format is ASCII, the color scalars are defined using nValues float values between (0,1). If the file format is BINARY, the stream of data consists of nValues unsigned char values per scalar value.

We will need to parse as float for ASCII and u8 for binary. Could you update the PR to do this and add a test? I would be happy to accept.

@MrMuetze
Copy link

Heyo 👋 I've been looking into this issue together with @Siimeloni and we've found a couple of things that probably make this fix a bit more involved. We wanted to summarize our findings so far to maybe get some input from @elrnv and to just have something to come back to in the future.

Changing the parser part is only half of the required fix here as we also need to adjust the writer side. We have mostly looked at the cube_complex_test() since it covers binary and ASCII for parsing and writing. For the parser, the fix can be done as described by @elrnv and correctly parses F32 for ASCII input and U8 for binary input.

Here we run into the first issue. Since the cube_complex_test() tests the writer and the parser (and the writer hasn't been adjusted so far), the test fails because for binary output, the writer still writes F32 values. Thus the parser does not encounter U8 which leads to a test failure.

Then I started to look into the writer code more deeply. I guess that, up until this point, the assumptions under which the VTK model and the parser/writer infrastructure haven been devised include that all attributes for a given dataset always remain in the same type independent of the input/output format. The writer functions are relatively rigid in regards to the output type. They just write whatever buffer they have been fed. For this specific lookup table we would have to decide whether to write F32 or U8 based on the output type. This is something that cannot be easily done right away (if I haven't overlooked anything).

Since I wanted to try something anyway, I hacked something into the writer code (adjusting

vtkio/src/writer.rs

Lines 881 to 883 in 02d9100

IOBuffer::F32(v) => {
write_buf_impl(v, &mut self.0, W::write_f32::<BO>)?;
}
) to convert F32 input to U8 if the length would match the lookup table data from the test. This leads to the writer/parser combination working nicely, but of course that is not a practical solution. Additionally, the test still would not work because a difference in the VTK Model (test expects F32, but now we parse a U8 buffer).

This might be another issue that came up. With the changes described above, the VTK model might contain an IOBuffer::F32 or an IOBuffer::U8 for the lookup table, while still representing the same kind of data. So based on input/output types, we might have to do one of the following conversions when writing the model again:

F32 -> F32
F32 -> U8
U8 -> F32
U8 -> U8

I'm not sure how practical it is that the VTK model is directly dependent on the input data type at this point. Maybe it would make sense to alway convert this lookup table data to either F32 or U8? Another question is if we should adjust all writer functions to be more flexible in terms of their output or if it would be more sensible to add new functions that handle this specific lookup table case? This is the point where we would like to get some feedback. Maybe we have overlooked something up to this point or are not seeing the best solution. We would love to hear your thoughts on this!

Best regards

This is implemented in the writer and parser.
@elrnv
Copy link
Owner

elrnv commented Aug 22, 2024

Good point. I added an writer implementation here: GiGainfosystems#1 . Please review to see if I missed anything.

@MrMuetze
Copy link

MrMuetze commented Sep 3, 2024

I've incorporated the remaining changes discussed on the other PR and I think this should be ready now. Thanks again for your input and work on this!

@elrnv
Copy link
Owner

elrnv commented Sep 4, 2024

Just one comment above. Otherwise ready to merge. Thank you, @MrMuetze, for adding these changes here!

@MrMuetze
Copy link

MrMuetze commented Sep 4, 2024

Cool, Im not sure what comment you are referring to though. 😅 Am I not seeing something on my end?

src/writer.rs Outdated
@@ -869,8 +863,8 @@ mod write_vtk_impl {
}

match buf {
IOBuffer::Bit(v) => write_buf_impl(v, &mut self.0, |x| x * 255 as u8)?,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we change this to if x != 0 { 255 } else { 0 } to make it explicit. I think all of these other than u8 and f32 are just trying to predict the intent of the user who may have made a mistake. Since Bit is either zero or 1, I think it's more appropriate to have 1 correspond to the largest byte value.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this doesn't apply anymore. I've removed the * 255 part here (see GiGainfosystems#1 (comment)). My reasoning was that the bit case also includes a Vec<u8> (which is a bit curious). So I handled it like the other match arm.

pub enum IOBuffer {
    /// Bit array is stored in 8 bit chunks.
    Bit(Vec<u8>),
...
}

So for this type of conversion only the float values need to be multiplied by 255.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, you are absolutely right, thank you for that reference!

@elrnv
Copy link
Owner

elrnv commented Sep 4, 2024

sorry about that, I forgot to submit the review! Do you see the comment now?

@elrnv elrnv merged commit 661f7cd into elrnv:master Sep 6, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants