New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add unicode letters generator #70
Conversation
""" | ||
if sys.version_info.major == 2: | ||
chr_function = unichr | ||
range_function = xrange |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chr_function = unichr # pylint:disable=undefined-variable
range_function = xrange # pylint:disable=undefined-variable
ACK pending comments |
4178f1b
to
204485f
Compare
Unicode Standard Annex 44: Unicode Character Database, section 5.5.1 General Category Values explains the meaning of values such as "Lu" or "No". |
Do you think the types of characters emitted by |
This generator is a helper for the gen_utf8 function which will provide the system supported list of unicode letters. This will avoid generating unicode string with control characters and other non letters characters. Also adds tests for the generator in order to ensure it is not generating unwanted characters. Closes omaciel#69
@Ichimonji10 I have updated the docstring. For testing purposes I think just the letters should suffice as probably we will get at least one UTF-8 glyph with 2 or more bytes. We can add more categories for sure, but I think is better to "stay safe", just using a multi-byte character will define if the system is capable of handling UTF-8 strings. |
204485f
to
8bbf290
Compare
On my system I got a total of 48270 unicode letters. It is 73.66% of the maxunicode 65535. |
ACK |
Have done some practical testing and here are the results: With this changes:
Without the changes
For this test I commented out all non UTF-8 test data and I removed most of the failures output and left just the invalid messages. Also I ran more than once to get other random values. |
ACK |
This generator is a helper for the gen_utf8 function which will provide
the system supported list of unicode letters. This will avoid generating
unicode string with control characters and other non letters characters.
Also adds tests for the generator in order to ensure it is not
generating unwanted characters.
Closes #69