-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
expand set of codepoints returned #21
Comments
The upper limit (inclusive) of UTF8 is 0x10FFFF. |
The tricky part is determining all the valid ranges and avoiding characters that are not "real" but I totally agree with you that ultimately |
I'm working on this. You'll have a pull request soon. |
The current pull request fixes this issue. |
Merged! You should update |
When asked to generate UTF8 characters, the
FauxFactory.generate_string
class method returns only characters in the CJK characterset. Although CJK characters are valid UTF8 characters, they do not represent the entire range of valid UTF8 characters. By the same logic,generate_string
could return only ASCII characters when asked for UTF8 characters, and given that ASCII characters are a subset of UTF8, it would technically be acting correctly.It would be better if the
generate_string
method returned a fuller set of UTF8 characters when asked to generate UTF8 characters.The lower limit of valid UTF8 code points is 0x0, and I'm not sure what the upper limit is. According to RFC 3629, several ranges of characters are also off-limits: 0xC0, 0xC1, 0xF5–0xFF and 0xD800–0xDFFF.
The text was updated successfully, but these errors were encountered: