Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi Byte Characters cause flat file definitions to shift #52

Conversation

QuickenTheDead
Copy link

@QuickenTheDead QuickenTheDead commented May 4, 2023

In current state, if you have a file that contain multi-byte characters (for example, em dash, accented characters, etc). The unmarshal moves the characters over x positions depending on the bytes. This causes some values from other fields to not be in the correct struct field.

This PR will change the string unmarshal to convert to a rune before picking the start and end positions and select full characters instead of bytes.

@ianlopshire
Copy link
Owner

@QuickenTheDead, Thanks for taking the time to submit a PR!

I believe this case is covered by using the "use codepoint indices" option. Can you test after setting that flag and see if the problem persists?

decoder := fixedwidth.NewDecoder(strings.NewReader(data))
decoder.SetUseCodepointIndices(true)
// Decode as usual now

@QuickenTheDead
Copy link
Author

@ianlopshire You're right, I was able to set that in a specific decoder and it does work as expected.

Question: Why isn't SetUseCodepointIndices(true) the default for the Unmarshal() method? Maybe I'm missing something, but I would thinking that would be a much more user friendly option then the number of bytes that it currently does.

With that, thanks for writing package! It's helped me save time on my current project

@ianlopshire
Copy link
Owner

@QuickenTheDead,

You are 100% right that using codepoint indices would be the ideal default behavior.

The initial releases of the library didn't support the option, so to maintain backward compatibility, it was made an opt-in feature.

When I get around to releasing v1.0.0 I'll make the behavior opt-out instead of opt-in. I've added #53 and #54 to track this.

Thanks again for the PR!

@ianlopshire ianlopshire closed this May 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants