Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encode/decode strings with UTF-8 #57

Merged
merged 2 commits into from
May 15, 2019
Merged

Conversation

dirk-thomas
Copy link
Member

@dirk-thomas dirk-thomas commented May 15, 2019

This patch rolls back the changes from #26 and encodes / decodes (non W-)strings as UTF-8.

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

Signed-off-by: Dirk Thomas <dirk-thomas@users.noreply.github.com>
Signed-off-by: Dirk Thomas <dirk-thomas@users.noreply.github.com>
@dirk-thomas dirk-thomas added enhancement New feature or request in review Waiting for review (Kanban column) labels May 15, 2019
@dirk-thomas dirk-thomas self-assigned this May 15, 2019
@hidmic
Copy link
Contributor

hidmic commented May 15, 2019

This patch rolls back the changes from #26

Does it? Also, I wonder, will this play along with other non-Python nodes? Encoded UTF-8 isn't ASCII nor any stream of bytes is UTF-8 decodable. What's the premise here?

@dirk-thomas
Copy link
Member Author

Does it?

It doesn't revert the exact change but rolls back the semantic change of enforcing ASCII.

will this play along with other non-Python nodes?

Yes, the referenced test cases cover C++ as well as Python and the tests in test_communication ensure that the information is correctly passed between languages.

Encoded UTF-8 isn't ASCII nor any stream of bytes is UTF-8 decodable.

We follow the X-Types spec to encode (non W-)strings with UTF-8. If you only use ASCII characters nothing changes here since all ASCII characters have the same representation when encoded with UTF-8. If you do use non-ASCII characters they are being encoded, the wire format contains UTF-8 encoded bytes, and the receiver side decodes the string.

If you have an external entity which publishes strings with data which can't be UTF-8 decoded (an violate the X-Types spec) our receiving nodes will bail.

@hidmic
Copy link
Contributor

hidmic commented May 15, 2019

We follow the X-Types spec

Oh, that's it. Perfect. LGTM too!

@dirk-thomas dirk-thomas merged commit d5b6cb1 into master May 15, 2019
@delete-merged-branch delete-merged-branch bot deleted the dirk-thomas/string-utf8 branch May 15, 2019 20:10
@dirk-thomas dirk-thomas removed the in review Waiting for review (Kanban column) label May 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants