Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Recorder-generated text asserts need improved unicode normalization #1128

Closed
mdmintz opened this issue Dec 16, 2021 · 0 comments · Fixed by #1133
Closed

Some Recorder-generated text asserts need improved unicode normalization #1128

mdmintz opened this issue Dec 16, 2021 · 0 comments · Fixed by #1133
Assignees
Labels
bug Uh oh... Something needs to be fixed

Comments

@mdmintz
Copy link
Member

mdmintz commented Dec 16, 2021

Some Recorder-generated text asserts need improved unicode normalization.
(https://unicode.org/reports/tr15/#Norm_Forms has details for anyone reading this that's confused.)

Essentially, some foreign language characters can be represented by more than one unicode format, and the Recorder isn't being consistent with character generation, which may cause text-based asserts to fail after the Recorder generates self.assert_text(TEXT, SELECTOR) lines because even if the TEXT appears to match the visible text on the web page, different unicode formats could cause the assertion to fail.

Here's an example of that:

ipdb> 'й'.encode()
b'\xd0\xb9'
ipdb> 'й'.encode()
b'\xd0\xb8\xcc\x86'
@mdmintz mdmintz added the bug Uh oh... Something needs to be fixed label Dec 16, 2021
@mdmintz mdmintz self-assigned this Dec 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Uh oh... Something needs to be fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant