Skip to content

Conversation

owen-mc
Copy link
Contributor

@owen-mc owen-mc commented Mar 11, 2024

When we see "a" + "b" + "c" + "d", do not add a row to the constvalues table for the intermediate strings "ab" and "abc". We still have entries for the string literals ("a", "b", "c", and "d") and the whole string concatenation ("abcd").

This was inspired by golang/go-frontend, which is a somewhat pathological case with a concatenation of 7098 literals of 60 characters each. The db goes down from 2.5GB to 170MB.

When we see "a" + "b" + "c" + "d", do not add a
row to the constvalues table for the intermiediate
strings "ab" and "abc". We still have entries for
the string literals ("a", "b", "c", and "d") and
the whole string concatenation ("abcd").
@github-actions github-actions bot added the Go label Mar 11, 2024
@owen-mc owen-mc marked this pull request as ready for review March 11, 2024 17:06
@owen-mc owen-mc requested a review from a team as a code owner March 11, 2024 17:06
@owen-mc
Copy link
Contributor Author

owen-mc commented Mar 11, 2024

I'm thinking about how to test this. Should I add a check that we don't have a string with value "ab" in a db for a go file containing "a" + "b" + "c"? I feel like that isn't quite testing the right thing. Or could we have a pathological example and check the db isn't too big?

@smowton
Copy link
Contributor

smowton commented Mar 11, 2024

The former sounds right to me

@owen-mc
Copy link
Contributor Author

owen-mc commented Mar 13, 2024

DCA shows nothing interesting, so I'll do a QA run to see if that turns up any other repos with this pathologically long string concatenations.

@owen-mc owen-mc changed the title Go: extractor: avoid long string concatenations Go: extractor: do not store intermediate values in long string concatenations Mar 13, 2024
@owen-mc
Copy link
Contributor Author

owen-mc commented Mar 15, 2024

QA results looked broadly fine. I saw one case where " Stringpool size measured as 271614442" was replaced by "Stringpool size measured as 11567514" (23 times smaller), which is obviously encouraging. There are three alert changes I should look into - they may just because the base for my comparison wasn't quite right.

@owen-mc
Copy link
Contributor Author

owen-mc commented Mar 21, 2024

I looked into one of the alert changes, which was two extra alerts for go/incomplete-hostname-regex. The short version is that they are TPs and there was a bug in the way that the query was written which was hiding them before. I've filed this issue to fix it properly, but it shouldn't block this PR being merged.

I also started looking at the other alert change, which is a lost result for go/incorrect-integer-conversion in evilsocket/opensnitch. I've figured out where the problem is happening but not why. It seems to be that we don't follow flow into a particular function. I can't see how the changes in this PR would affect that, as there aren't any string concatenations in sight. It isn't caused by an incorrect base for the comparison as I can reproduce it with the base of the PR. When I have more time to investigate I will look into the predicate for viable callees.

@owen-mc
Copy link
Contributor Author

owen-mc commented Mar 27, 2024

I investigated the second alert change some more. It is intermittent, though it did happen on QA and the first time I ran locally. The flow doesn't follow a method call because there is no parameter node defined, which is because there is no link between the Function (object/entity) and its declaration. This must come from a problem during extraction. There was a problem extracting the method Process.BuildTree, defined in daemon/procmon/details.go. Process is defined in the sibling file daemon/procmon/process.go. It turns out that we need to have protoc installed when we run the makefile to generate some code which is then imported by these files. Not having that package available stops us from extracting the Process type correctly on the first pass.

I don't think this should block this PR, as it seems unrelated to the changes made here. At worst we have perturbed something non-determinimistic which causes this to be a problem more often.

smowton
smowton previously approved these changes Apr 2, 2024
Copy link
Contributor

@atorralba atorralba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rubber-stamping @smowton's previous approval.

@owen-mc owen-mc merged commit 1c0ef90 into github:main Apr 10, 2024
@owen-mc owen-mc deleted the go/extractor/no-intermediate-string-values branch April 10, 2024 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants