Skip to content

Range analysis and useless-comparison query: don't treat all unicode surrogates as if they are U+FFFD #7239

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

smowton
Copy link
Contributor

@smowton smowton commented Nov 25, 2021

No description provided.

@aschackmull
Copy link
Contributor

I don't think this is the right fix. Instead I think that CharacterLiteral.getCodePointValue should be fixed to return the value as if the Java code had cast the character to an int. That's what the qldoc states, and I believe that was the intention of the predicate. We'll want something that uses the current implementation as much as possible, but for chars that would yield the result 65533, we should instead parse the literal.

@aschackmull
Copy link
Contributor

The predicate getCodePointValue is fairly new, so I doubt there's any code relying on its current buggy behaviour.

@smowton smowton force-pushed the smowton/fix/useless-comparison-surrogates branch from a6ccd43 to db39c0b Compare November 25, 2021 14:08
@smowton
Copy link
Contributor Author

smowton commented Nov 25, 2021

👍 pushed the in-place fix instead

@@ -731,7 +743,11 @@ class CharacterLiteral extends Literal, @characterliteral {
* this literal. The result is the same as if the Java code had cast
* the character to an `int`.
*/
int getCodePointValue() { result.toUnicode() = this.getValue() }
int getCodePointValue() {
if this.getLiteral().matches("'\\u%'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the match pattern slightly more precise.

Suggested change
if this.getLiteral().matches("'\\u%'")
if this.getLiteral().matches("'\\u____'")

@@ -713,6 +713,18 @@ class DoubleLiteral extends Literal, @doubleliteral {
override string getAPrimaryQlClass() { result = "DoubleLiteral" }
}

// Implementation taken from @p0 at https://github.com/github/codeql/issues/4145
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this comment, I think.

@smowton
Copy link
Contributor Author

smowton commented Nov 25, 2021

@aschackmull done

Copy link
Contributor

@aschackmull aschackmull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's technically possible to construct examples where getCodePointValue still returns the wrong value 65533 (when a surrogate literal doesn't match "'\\u____'"), but these examples are extremely obscure, so I'm not sure that it's worth the effort to properly support them until we see an actual need.

@aschackmull aschackmull merged commit 57fd397 into github:main Nov 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants