Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trim Zero White Space #609

Merged
merged 2 commits into from
Jul 21, 2022
Merged

Trim Zero White Space #609

merged 2 commits into from
Jul 21, 2022

Conversation

elachlan
Copy link
Contributor

Fixes #608

@@ -125,6 +125,7 @@ public class SourceGenerator : ISourceGenerator

private const string NativeMethodsTxtAdditionalFileName = "NativeMethods.txt";
private const string NativeMethodsJsonAdditionalFileName = "NativeMethods.json";
private static readonly char[] ZeroWhiteSpace = new char[] { '\uFEFF', '\u200B' };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first one is BOM, the seconds one is zero width space.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ZERO WIDTH SPACE (U+200B) and ZERO WIDTH NO-BREAK SPACE (U+FEFF), should I add a comment documenting each one? Also should I only be checking ZERO WIDTH SPACE (U+200B)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, comments with these meanings would be great.
That said, I would expect our TextReader to remove the BOM. Is it not doing that?
And I wonder why we'd want to tolerate other strange unicode characters. We already have code elsewhere in the source generator to detect any non-ANSI character and emit a warning (or error maybe) so the user can correct it. This change looks like it would just allow these characters to creep in. Is that really a win?

Copy link
Contributor Author

@elachlan elachlan Jul 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We ran into it dealing with a merge conflict via github. I think it would be helpful to sanitize the inputs as best as possible. A warning instead of an error would then pickup the ZWSP for fixing. I don't think it should stop it from compiling and it should probably let the user know its there as well.

Edit: this is kind of what I mean https://en.wikipedia.org/wiki/Robustness_principle

Copy link
Contributor Author

@elachlan elachlan Jul 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can also see your point on the ZWSP. But the code already runs a Trim() against it. Which is pretty similar. so ActivateKeyboardLayout will work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, fair enough. I'll take the PR. Can you just add the comments we talked about?

@AArnott AArnott enabled auto-merge (squash) July 21, 2022 02:25
@AArnott AArnott merged commit 26be795 into microsoft:main Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Zero width space characters break NativeMethods.txt
3 participants