Range analysis and useless-comparison query: don't treat all unicode surrogates as if they are U+FFFD #7239

smowton · 2021-11-25T12:01:05Z

No description provided.

aschackmull · 2021-11-25T13:37:08Z

I don't think this is the right fix. Instead I think that CharacterLiteral.getCodePointValue should be fixed to return the value as if the Java code had cast the character to an int. That's what the qldoc states, and I believe that was the intention of the predicate. We'll want something that uses the current implementation as much as possible, but for chars that would yield the result 65533, we should instead parse the literal.

aschackmull · 2021-11-25T13:38:23Z

The predicate getCodePointValue is fairly new, so I doubt there's any code relying on its current buggy behaviour.

smowton · 2021-11-25T14:08:23Z

👍 pushed the in-place fix instead

aschackmull · 2021-11-25T14:44:15Z

java/ql/lib/semmle/code/java/Expr.qll

@@ -731,7 +743,11 @@ class CharacterLiteral extends Literal, @characterliteral {
   * this literal. The result is the same as if the Java code had cast
   * the character to an `int`.
   */
-  int getCodePointValue() { result.toUnicode() = this.getValue() }
+  int getCodePointValue() {
+    if this.getLiteral().matches("'\\u%'")


Let's make the match pattern slightly more precise.

Suggested change

if this.getLiteral().matches("'\\u%'")

if this.getLiteral().matches("'\\u____'")

aschackmull · 2021-11-25T14:45:08Z

java/ql/lib/semmle/code/java/Expr.qll

@@ -713,6 +713,18 @@ class DoubleLiteral extends Literal, @doubleliteral {
  override string getAPrimaryQlClass() { result = "DoubleLiteral" }
 }

+// Implementation taken from @p0 at https://github.com/github/codeql/issues/4145


No need for this comment, I think.

smowton · 2021-11-25T15:21:29Z

@aschackmull done

aschackmull

It's technically possible to construct examples where getCodePointValue still returns the wrong value 65533 (when a surrogate literal doesn't match "'\\u____'"), but these examples are extremely obscure, so I'm not sure that it's worth the effort to properly support them until we see an actual need.

smowton requested a review from a team as a code owner November 25, 2021 12:01

github-actions bot added documentation Java labels Nov 25, 2021

CharacterLiteral.getCodePointValue: fix handling of surrogates

db39c0b

smowton force-pushed the smowton/fix/useless-comparison-surrogates branch from a6ccd43 to db39c0b Compare November 25, 2021 14:08

aschackmull reviewed Nov 25, 2021

View reviewed changes

Apply review comments

ce63549

Update charLiterals.expected

7ac5791

aschackmull approved these changes Nov 26, 2021

View reviewed changes

aschackmull merged commit 57fd397 into github:main Nov 26, 2021

MathiasVP mentioned this pull request Nov 26, 2021

LGTM.com - false positive #7238

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Range analysis and useless-comparison query: don't treat all unicode surrogates as if they are U+FFFD #7239

Range analysis and useless-comparison query: don't treat all unicode surrogates as if they are U+FFFD #7239

Uh oh!

smowton commented Nov 25, 2021

Uh oh!

aschackmull commented Nov 25, 2021

Uh oh!

aschackmull commented Nov 25, 2021

Uh oh!

smowton commented Nov 25, 2021

Uh oh!

aschackmull Nov 25, 2021

Uh oh!

aschackmull Nov 25, 2021

Uh oh!

smowton commented Nov 25, 2021

Uh oh!

aschackmull left a comment

Uh oh!

Uh oh!

	if this.getLiteral().matches("'\\u%'")
	if this.getLiteral().matches("'\\u____'")

Range analysis and useless-comparison query: don't treat all unicode surrogates as if they are U+FFFD #7239

Range analysis and useless-comparison query: don't treat all unicode surrogates as if they are U+FFFD #7239

Uh oh!

Conversation

smowton commented Nov 25, 2021

Uh oh!

aschackmull commented Nov 25, 2021

Uh oh!

aschackmull commented Nov 25, 2021

Uh oh!

smowton commented Nov 25, 2021

Uh oh!

aschackmull Nov 25, 2021

Choose a reason for hiding this comment

Uh oh!

aschackmull Nov 25, 2021

Choose a reason for hiding this comment

Uh oh!

smowton commented Nov 25, 2021

Uh oh!

aschackmull left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!