Skip to content

Ruby: interpret string escape sequences in getConstantValue() #8164

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 16, 2022

Conversation

nickrolfe
Copy link
Contributor

Motivation: being able to write queries that can tell the difference between the string values '\n' (U+005C, U+006E) and "\n" (U+000A).

I've added a new test in ruby/ql/test/library-tests/ast/escape_sequences that I hope covers all the edge cases, and I think all the other test changes make sense.

I'm not sure if any of the new unescape predicates need to be cached. I'll run DCA and see if anything stands out.

@nickrolfe nickrolfe added the Ruby label Feb 22, 2022
@nickrolfe nickrolfe requested a review from a team as a code owner February 22, 2022 11:07
@@ -3,6 +3,7 @@ private import AST
private import Constant
private import TreeSitter
private import codeql.ruby.controlflow.CfgNodes
private import codeql.NumberUtils

int parseInteger(Ruby::Integer i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of parseInteger should probably use NumberUtils

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll have a go at that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done this locally, but my implementation is more conservative. My new parseHexInt returns no value for 0xdeadbeef, while the current parseInteger implementation returns -559038737 (i.e. wraps to a negative number). Ruby treats 0xdeadbeef as the positive/unsigned integer 3735928559.

Should I attempt to update my implementation of parseHexInt to allow wrapping, even if it's not correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no problem changing the meaning of parseInteger if the old definition is incorrect. Returning no value is probably better than incorrect wrapping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've made that change. I made some improvements to the binary/hex/octal parsing predicates in the process. Now they can all parse 2^31-1, return no result for 2^31, and cope with an arbitrary number of leading zeros (e.g. 0x00000000000000000007f). I've also added those edge cases to the tests.

@nickrolfe
Copy link
Contributor Author

The DCA run showed a significant slowdown on one project in particular. I'm investigating.

Copy link
Contributor

@hvitved hvitved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but as you say, performance needs to be investigated.

@nickrolfe nickrolfe force-pushed the nickrolfe/escape_sequences branch from 5f343bd to f8ecd0b Compare March 14, 2022 15:42
@nickrolfe nickrolfe force-pushed the nickrolfe/escape_sequences branch from f8ecd0b to 488c8ef Compare March 14, 2022 15:51
hvitved
hvitved previously approved these changes Mar 16, 2022
@nickrolfe nickrolfe merged commit 82ef2a1 into main Mar 16, 2022
@nickrolfe nickrolfe deleted the nickrolfe/escape_sequences branch March 16, 2022 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants