Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF8 and Unicode #25

Merged
merged 14 commits into from
Sep 11, 2019
Merged

UTF8 and Unicode #25

merged 14 commits into from
Sep 11, 2019

Conversation

ppaulweber
Copy link
Contributor

  • provides UTF-8 byte sequence parsing and code point abstraction accordingly to the RFC3629 standard reference
  • provides generic include point called Unicode which provides an alias for UTF8 and defines the Unicode planes, blocks, and ranges
  • Unicode consists of helper functions to check of a UTF8 is inside one or multiple block ranges
  • contains unit tests for both abstractions

* added UTF-8 support as its own standard implementation
  - http://tools.ietf.org/html/rfc3629
* allows to create UTF-8 abstraction of byte sequences and decode as
  32bit value
* added some basic unit tests
* added code value description
* updated some basic unit tests
* updated byte sequence detection
* provided new `byteSequenceLengthIndication` functionality
* split single header into header file and compilation unit
* updated expansion functionality to support UTF-8
* provided a new internal helper function to extract a UTF-8 slice of a
  given source file and the source positions
* added support to represent UTF-8 byte sequence as unicode value and
  string representation
* primer support of common Unicode planes and block ranges
* added functionality to test if UTF-8 characters are inside certain
  block ranges
* provided proper unit tests
EXPECT_EQ( range.plane(), Plane::SUPPLEMENTARY_MULTILINGUAL );
break;
}
case Block::TRANSPORT_AND_MAP_SYMBOLS:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unreachable?

Better use multiple test cases (or parameterized test case) instead of the for loop.
Makes it easier to follow and easier to detect the faulty block in case of an test case error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emmanuel099 good catch, I've created an issue, which addresses this comment and will fix this problem with the suggested solution in a future PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants