-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Python: Port py/weak-crypto-key to use type-tracking #5075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f9a6c65
to
cfa7989
Compare
To deal with the changes from moving query files around, I found it easier to rebase on top of main, instead of merging. |
Tests working can be verified by running ``` ls ql/python/ql/test/experimental/library-tests/frameworks/crypto*/*.py | xargs -L1 sh -c 'python $0 || exit 255' ```
I did spend some time to figure out how to best write `minimumSecureKeySize` predicate. I wanted to write once and for all the recommended sizes for each cryptosystem. I considered making the predicate such as ```codeql int minimumSecureKeySize() { this.getName() = "RSA" and result = 2048 or this.getName() = "DSA" and result = 2048 or this.getName() = "ECC" and result = 244 } ``` but then it would be impossible to add a new model without also being able to modify the body of this predicate -- which seems like a bad way to start off a brand new way of modeling things. So I considered if we could add it to the non-range class, such as ```codeql class RSAKeyGeneration extends KeyGeneration { RSAKeyGeneration() { this.getName() = "RSA" } override int minimumSecureKeySize() { result = 2048 } } ``` This has the major problem that when you're writing the models for a new API (and therefore extending KeyGeneration::Range), there is no way for you to see that you need to take this extra step :| (also problem about how we should define `minimumSecureKeySize` on `KeyGeneration` class then, since if we make it abstract, we effectively disable the ability to refine `KeyGeneration` since any subclass must provide an implementation.) So, therefore I ended up with this solution ;)
* Removed backend arugment that is not required * Added DSA constants (they are just accidentially the same as RSA right now) * Removed FakeWeakEllipticCurve and used a real weak elliptic curve instead
instead of points-to. Looking at query results also made me realize I didn't supply a very good "origin" for ECC in cryptography package, so I improved that 👍 -- maybe that sohuld have been split into multiple commits... too late :(
Since WeakCrypto always makes me think that it's about all weak crypto (like using MD5, or completely broken ciphers such as ARC4 ro DES) and not just about weak key generation.
after asking around, this seems to be the right approach
This was the result of an internal dicussion we had about this some time ago.
We used to handle this, but no more :( Adding this example was inspired by looking at results differences
From looking at old results on LGTM.com, this was quite common (and those alerts doesn't really provide value).
and since those were reverted, I rebased once again 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi - thanks for including a change log entry.
I'm a little surprised to see a query filename change. I've no idea how commonly users include queries by path in their query suites, but wonder if this needs to be flagged beyond just a mention in the change log (potentially to CodeQL CLI, code scanning, LGTM.com and LGTM Enterprise users). What error message will users see if they are affected?
@AlonaHlobina and @sj, I'll leave the discussion about whether this needs more communication to you, since you have a much better idea of how many people this migh affect than I do.
Also, do we have any documentation on the new type-tracking approach you mention here?
Co-authored-by: Felicity Chapman <felicitymay@github.com>
I cleared this up with @sj internally, who gave it a green light ✔️
We're going to work on that soon, but currently there is not any documentation for this (at least not for Python). |
Thank you for following up on those questions 😄 |
Like we've done for pretty much everything else. An experiment to see what this means for query performance.
Internal evaluation showed that this didn't perform better than normal (forward) type-tracking, but it feels more like the right approach.
I removed the global data-flow stuff for tracking integer literals again, since I got better performance (266s -> 176s) from using type-tracking. Type-tracking is the approach we've been using in other places of the code (at least I have). So since I'm still a bit uncertain what's the right approach, I think it's less disruptive to use the type-tracking approach for now. As highlighted in the commit message, type back-tracking didn't yield better performance, but does feel more like the right approach. |
Converted back to draft, to make sure we don't merge it in for the next RC branch, since performance with type-tracking might be a bit wobbly. |
Since RC branch has been created, I'm happy to get this merged now 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks great! Easy to read. I wonder about the exclusion of test code; should that really be user configurable, as in the analysis run excludes test code? Or, if we bake it into queries like this, should we have a concept for it?
Co-authored-by: yoff <lerchedahl@gmail.com>
I'm not 100% sure what you're asking here. But I'm guessing you're asking whether this query tries to exclude results in test code, and whether that's really a good idea -- since results in test code won't be shown by default (at least not on LGTM.com). If that is indeed what you're asking, I'm only ignoring cases where the key-size comes from a test, since that usually isn't very interesting. I saw quite a few real projects that used this setup, and wanted to get rid of these. I also added our own test-case in https://github.com/github/codeql/pull/5075/files#diff-b3091cad1d47a897a2e25187124c5ce95545ad23d6054b9afbc5de4c628b6d7f |
I see, it is a specific part of the logic that is excluded, so only the suggestion with a concept makes sense. However |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Draft since I will need to run this query on some real projects to verify that results are still looking good.
The major difference between this query and the old (besides type-tracking vs. points-to) is how good we are at tracking integer literals. I've used the same simple approach with local source nodes as we/I have used in the other type-tracking based modeling, but I'm guessing that since being able to correctly track integer literals is so important for this query, we might need to do something a bit more sophisticated. I'm waiting to look at query results before attacking this problem, but my idea would be to try and use some global data-flow to track things properly.
EDIT: I had to use this global data-flow approach, since results were too poor without it. The most controversial bit is that I introduced a new data-flow copy in 40c592a, to avoid re-evaluation.
EDIT2: I removed that again, since I got better performance (266s -> 176s) from using type-tracking for this instead. Type-tracking is the approach we've been using in other places of the code (at least I have). So since I'm still a bit uncertain what's the right approach, I think it's less disruptive to use the type-tracking approach for now.
Although it's only a draft now, I would still like some input on the way I ended up implementingwe did this, thanks 👍minimumSecureKeySize
in 66df9af, which is why I have requested a review already. So feel free to skip rest of this PR for now 😉 -- come to think of it, we would probably be better off discussing this in person, but since I already wrote up my problem, would be great if you could both read through my thought process @tausnb and @yoff 😊