Skip to content

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Sep 22, 2025

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test for the coded cookie on the second line (and non-ascii first line).

Also add a test with specified ASCII encoding, but non-ASCII content that can still be decoded as UTF-8. E.g. '#coding=ascii €'.encode('utf-8') and corresponding for two lines.

@vstinner
Copy link
Member Author

@serhiy-storchaka: I added more tests, please review the updated PR. Is it what you wanted?

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for update. In two-line cases please use non-ASCII data in the first line, before the codec cookie. Test that the tokenizer uses correct encoding to decode comments in first lines.

It may be already tested elsewhere, but I would also add tests for non-ASCII data in the first and in the second comment lines, when no codec cookie is present (so UTF-8 should be used). For valid and invalid UTF-8.

I expect that the tokenizer correctly decodes files that match the explicit or implicit encoding, and reject files that do not match. And the interpreter should work the same.

@vstinner
Copy link
Member Author

Ok, I added more tests. Please review the updated PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants