expand set of codepoints returned #21

Ichimonji10 · 2014-05-16T17:44:51Z

When asked to generate UTF8 characters, the FauxFactory.generate_string class method returns only characters in the CJK characterset. Although CJK characters are valid UTF8 characters, they do not represent the entire range of valid UTF8 characters. By the same logic, generate_string could return only ASCII characters when asked for UTF8 characters, and given that ASCII characters are a subset of UTF8, it would technically be acting correctly.

It would be better if the generate_string method returned a fuller set of UTF8 characters when asked to generate UTF8 characters.

The lower limit of valid UTF8 code points is 0x0, and I'm not sure what the upper limit is. According to RFC 3629, several ranges of characters are also off-limits: 0xC0, 0xC1, 0xF5–0xFF and 0xD800–0xDFFF.

The text was updated successfully, but these errors were encountered:

Ichimonji10 · 2014-05-17T11:47:58Z

The upper limit (inclusive) of UTF8 is 0x10FFFF.

omaciel · 2014-05-17T13:48:41Z

The tricky part is determining all the valid ranges and avoiding characters that are not "real" but I totally agree with you that ultimately generate_string can be improved. I need to spend a bit more time researching the topic so that I can take a stab at it.

Ichimonji10 · 2014-05-17T13:51:14Z

I'm working on this. You'll have a pull request soon.

omaciel · 2014-05-17T13:56:56Z

Sweet!!!

Ichimonji10 · 2014-05-17T14:14:26Z

The current pull request fixes this issue.

omaciel · 2014-05-17T15:13:46Z

Merged! You should update README.rst, HISTORY.rst and add yourself to AUTHORS.rst :)

omaciel closed this as completed May 17, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

expand set of codepoints returned #21

expand set of codepoints returned #21

Ichimonji10 commented May 16, 2014

Ichimonji10 commented May 17, 2014

omaciel commented May 17, 2014

Ichimonji10 commented May 17, 2014

omaciel commented May 17, 2014

Ichimonji10 commented May 17, 2014

omaciel commented May 17, 2014

expand set of codepoints returned #21

expand set of codepoints returned #21

Comments

Ichimonji10 commented May 16, 2014

Ichimonji10 commented May 17, 2014

omaciel commented May 17, 2014

Ichimonji10 commented May 17, 2014

omaciel commented May 17, 2014

Ichimonji10 commented May 17, 2014

omaciel commented May 17, 2014