-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
questionFurther information is requestedFurther information is requested
Description
Version
CodeQL CLI version: 2.6.0
Description of the issue
Slightly related to #5297
The predicate getValue()
of CodeQL's StringLiteral and CharacterLiteral seems to replace unpaired Unicode surrogates (U+D800
- U+DBFF
and U+DC00
- U+DFFF
) with the character ?
.
This is not a display problem in the Query Console or the VS Code extension; the database really seems to contain a ?
as value.
This can lead to incorrect results for queries since the value reported by CodeQL does not match what the source code contains.
Reproduction steps
Run the following query:
import java
from StringLiteral s, string literal, string value
where
literal = s.getLiteral()
and value = s.getValue()
// Value contains '?'
and value.matches("%?%")
// But literal does not contain '?'; neither literally nor escaped
and not literal.matches(["%?%", "%\\u77%", "%\\u077%", "%\\u003f%", "%\\u003F%"])
select s, value
Workaround
A workaround might be to use getLiteral()
which contains the Unicode escape sequences for the surrogate characters. However, then you have to manually parse escape sequences which is rather error-prone.
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested