Python: Port py/weak-crypto-key to use type-tracking #5075

RasmusWL · 2021-02-02T16:26:51Z

Draft since I will need to run this query on some real projects to verify that results are still looking good.

The major difference between this query and the old (besides type-tracking vs. points-to) is how good we are at tracking integer literals. I've used the same simple approach with local source nodes as we/I have used in the other type-tracking based modeling, but I'm guessing that since being able to correctly track integer literals is so important for this query, we might need to do something a bit more sophisticated. I'm waiting to look at query results before attacking this problem, but my idea would be to try and use some global data-flow to track things properly.

EDIT: I had to use this global data-flow approach, since results were too poor without it. The most controversial bit is that I introduced a new data-flow copy in 40c592a, to avoid re-evaluation.

EDIT2: I removed that again, since I got better performance (266s -> 176s) from using type-tracking for this instead. Type-tracking is the approach we've been using in other places of the code (at least I have). So since I'm still a bit uncertain what's the right approach, I think it's less disruptive to use the type-tracking approach for now.

Although it's only a draft now, I would still like some input on the way I ended up implementing minimumSecureKeySize in 66df9af, which is why I have requested a review already. So feel free to skip rest of this PR for now 😉 -- come to think of it, we would probably be better off discussing this in person, but since I already wrote up my problem, would be great if you could both read through my thought process @tausnb and @yoff 😊 we did this, thanks 👍

RasmusWL · 2021-02-17T13:38:25Z

To deal with the changes from moving query files around, I found it easier to rebase on top of main, instead of merging.

Tests working can be verified by running ``` ls ql/python/ql/test/experimental/library-tests/frameworks/crypto*/*.py | xargs -L1 sh -c 'python $0 || exit 255' ```

I did spend some time to figure out how to best write `minimumSecureKeySize` predicate. I wanted to write once and for all the recommended sizes for each cryptosystem. I considered making the predicate such as ```codeql int minimumSecureKeySize() { this.getName() = "RSA" and result = 2048 or this.getName() = "DSA" and result = 2048 or this.getName() = "ECC" and result = 244 } ``` but then it would be impossible to add a new model without also being able to modify the body of this predicate -- which seems like a bad way to start off a brand new way of modeling things. So I considered if we could add it to the non-range class, such as ```codeql class RSAKeyGeneration extends KeyGeneration { RSAKeyGeneration() { this.getName() = "RSA" } override int minimumSecureKeySize() { result = 2048 } } ``` This has the major problem that when you're writing the models for a new API (and therefore extending KeyGeneration::Range), there is no way for you to see that you need to take this extra step :| (also problem about how we should define `minimumSecureKeySize` on `KeyGeneration` class then, since if we make it abstract, we effectively disable the ability to refine `KeyGeneration` since any subclass must provide an implementation.) So, therefore I ended up with this solution ;)

* Removed backend arugment that is not required * Added DSA constants (they are just accidentially the same as RSA right now) * Removed FakeWeakEllipticCurve and used a real weak elliptic curve instead

instead of points-to. Looking at query results also made me realize I didn't supply a very good "origin" for ECC in cryptography package, so I improved that 👍 -- maybe that sohuld have been split into multiple commits... too late :(

Since WeakCrypto always makes me think that it's about all weak crypto (like using MD5, or completely broken ciphers such as ARC4 ro DES) and not just about weak key generation.

after asking around, this seems to be the right approach

Added in 3.10 release https://github.com/Legrandin/pycryptodome/blob/master/Changelog.rst#3100-6-february-2021

This was the result of an internal dicussion we had about this some time ago.

We used to handle this, but no more :( Adding this example was inspired by looking at results differences

From looking at old results on LGTM.com, this was quite common (and those alerts doesn't really provide value).

RasmusWL · 2021-02-19T14:08:07Z

To deal with the changes from moving query files around, I found it easier to rebase on top of main, instead of merging.

and since those were reverted, I rebased once again 😅

felicitymay

Hi - thanks for including a change log entry.

I'm a little surprised to see a query filename change. I've no idea how commonly users include queries by path in their query suites, but wonder if this needs to be flagged beyond just a mention in the change log (potentially to CodeQL CLI, code scanning, LGTM.com and LGTM Enterprise users). What error message will users see if they are affected?

@AlonaHlobina and @sj, I'll leave the discussion about whether this needs more communication to you, since you have a much better idea of how many people this migh affect than I do.

Also, do we have any documentation on the new type-tracking approach you mention here?

python/change-notes/2021-02-02-port-weak-crypto-key-query.md

Co-authored-by: Felicity Chapman <felicitymay@github.com>

RasmusWL · 2021-02-23T14:25:09Z

I'm a little surprised to see a query filename change. I've no idea how commonly users include queries by path in their query suites, but wonder if this needs to be flagged beyond just a mention in the change log (potentially to CodeQL CLI, code scanning, LGTM.com and LGTM Enterprise users). What error message will users see if they are affected?

I cleared this up with @sj internally, who gave it a green light ✔️

Also, do we have any documentation on the new type-tracking approach you mention here?

We're going to work on that soon, but currently there is not any documentation for this (at least not for Python).

felicitymay · 2021-02-23T17:54:48Z

Thank you for following up on those questions 😄

Like we've done for pretty much everything else. An experiment to see what this means for query performance.

Internal evaluation showed that this didn't perform better than normal (forward) type-tracking, but it feels more like the right approach.

RasmusWL · 2021-02-25T10:34:50Z

I removed the global data-flow stuff for tracking integer literals again, since I got better performance (266s -> 176s) from using type-tracking. Type-tracking is the approach we've been using in other places of the code (at least I have). So since I'm still a bit uncertain what's the right approach, I think it's less disruptive to use the type-tracking approach for now.

As highlighted in the commit message, type back-tracking didn't yield better performance, but does feel more like the right approach.

RasmusWL · 2021-02-26T16:08:33Z

Converted back to draft, to make sure we don't merge it in for the next RC branch, since performance with type-tracking might be a bit wobbly.

RasmusWL · 2021-03-02T07:51:28Z

Since RC branch has been created, I'm happy to get this merged now 👍

yoff

Generally looks great! Easy to read. I wonder about the exclusion of test code; should that really be user configurable, as in the analysis run excludes test code? Or, if we bake it into queries like this, should we have a concept for it?

python/ql/src/semmle/python/Concepts.qll

python/ql/src/semmle/python/frameworks/Cryptodome.qll

python/ql/src/semmle/python/frameworks/Cryptography.qll

python/ql/src/semmle/python/frameworks/Cryptodome.qll

python/ql/src/semmle/python/frameworks/Cryptography.qll

Co-authored-by: yoff <lerchedahl@gmail.com>

RasmusWL · 2021-03-18T10:05:43Z

I wonder about the exclusion of test code; should that really be user configurable, as in the analysis run excludes test code? Or, if we bake it into queries like this, should we have a concept for it?

I'm not 100% sure what you're asking here. But I'm guessing you're asking whether this query tries to exclude results in test code, and whether that's really a good idea -- since results in test code won't be shown by default (at least not on LGTM.com).

If that is indeed what you're asking, I'm only ignoring cases where the key-size comes from a test, since that usually isn't very interesting. I saw quite a few real projects that used this setup, and wanted to get rid of these. I also added our own test-case in https://github.com/github/codeql/pull/5075/files#diff-b3091cad1d47a897a2e25187124c5ce95545ad23d6054b9afbc5de4c628b6d7f

yoff · 2021-03-18T19:48:13Z

I see, it is a specific part of the logic that is excluded, so only the suggestion with a concept makes sense. However TestScope almost is that already. So I am happy to leave it as is for now, and if we find the need to exclude bits in test code more often in the future, we can make it more convenient to do so.

yoff

LGTM

RasmusWL requested a review from a team February 2, 2021 16:26

github-actions bot added documentation Python labels Feb 2, 2021

RasmusWL force-pushed the crypto branch 2 times, most recently from f9a6c65 to cfa7989 Compare February 8, 2021 20:14

RasmusWL force-pushed the crypto branch from 1136548 to 213b9b9 Compare February 17, 2021 13:37

RasmusWL added 19 commits February 19, 2021 13:26

Python: Add a few tests for crypto frameworks

4ab61bb

Tests working can be verified by running ``` ls ql/python/ql/test/experimental/library-tests/frameworks/crypto*/*.py | xargs -L1 sh -c 'python $0 || exit 255' ```

Python: Add missing annotations to new crypto tests

1bf9f7d

Python: Add modeling for cryptography PyPI package

bd40965

Python: Add modeling for pycryptodomex PyPI package

6e4c627

Python: Add modeling for pycryptodome PyPI package

d5ff477

Python: Rewrite py/weak-crypto-key tests

2429c6c

* Removed backend arugment that is not required * Added DSA constants (they are just accidentially the same as RSA right now) * Removed FakeWeakEllipticCurve and used a real weak elliptic curve instead

Python: Rename WeakCrypto to WeakCryptoKey

0e9a54e

Since WeakCrypto always makes me think that it's about all weak crypto (like using MD5, or completely broken ciphers such as ARC4 ro DES) and not just about weak key generation.

Python: Use camelCase for RSA/DSA/ECC

32d0790

after asking around, this seems to be the right approach

Python: Fix bad join in crypto models

8d3170b

Python: Add test of public_key method with cryptodome

bfbaa85

Added in 3.10 release https://github.com/Legrandin/pycryptodome/blob/master/Changelog.rst#3100-6-february-2021

Python: Port cryptography models to use API graphs (mostly)

1eabfbd

Python: Port cryptodome models to use API graphs

2a8f720

Python: Make KeyGeneration range member overrides final

37f0d5a

This was the result of an internal dicussion we had about this some time ago.

Python: Add weak crypto key example through function call

a658334

We used to handle this, but no more :( Adding this example was inspired by looking at results differences

Python: Better IntegerLiteral tracking for weak crypto key

dfa223a

Python: Add example of test-code with weak crypto key

bfc8ead

Python: Ignore weak key-sizes from test-code in weak-crypto-key

d084261

From looking at old results on LGTM.com, this was quite common (and those alerts doesn't really provide value).

RasmusWL force-pushed the crypto branch from 213b9b9 to d084261 Compare February 19, 2021 14:07

Python: Introduce DataFlowOnlyInternalUse to avoid re-evaluation

40c592a

RasmusWL marked this pull request as ready for review February 22, 2021 14:04

RasmusWL requested a review from felicitymay as a code owner February 22, 2021 14:04

felicitymay reviewed Feb 23, 2021

View reviewed changes

python/change-notes/2021-02-02-port-weak-crypto-key-query.md Outdated Show resolved Hide resolved

Python: Apply suggestions from code review

fd18fd8

Co-authored-by: Felicity Chapman <felicitymay@github.com>

RasmusWL added 4 commits February 25, 2021 11:30

Merge branch 'main' into crypto

2798771

Python: Use type-tracking for integer literal tracking

c195c64

Like we've done for pretty much everything else. An experiment to see what this means for query performance.

Pyhton: Use type back-tracking for keysize on key-generation

4610b1b

Internal evaluation showed that this didn't perform better than normal (forward) type-tracking, but it feels more like the right approach.

Docs: Add crypto to supported Python frameworks

472ff97

RasmusWL marked this pull request as draft February 26, 2021 16:08

RasmusWL closed this Feb 27, 2021

RasmusWL deleted the crypto branch February 27, 2021 09:54

RasmusWL restored the crypto branch February 27, 2021 10:27

RasmusWL reopened this Feb 27, 2021

RasmusWL marked this pull request as ready for review March 2, 2021 07:51

RasmusWL mentioned this pull request Mar 2, 2021

Python: Port stack trace exposure #5118

Merged

yoff requested changes Mar 17, 2021

View reviewed changes

Python: Apply suggestions from code review

7b92012

Co-authored-by: yoff <lerchedahl@gmail.com>

RasmusWL requested a review from yoff March 18, 2021 10:05

yoff approved these changes Mar 18, 2021

View reviewed changes

yoff merged commit 746e994 into github:main Mar 18, 2021

RasmusWL deleted the crypto branch March 19, 2021 08:54

RasmusWL mentioned this pull request Jun 10, 2021

Python: Support EC keygen without class-instance for cryptography #5836

Merged

Python: Port py/weak-crypto-key to use type-tracking #5075

Python: Port py/weak-crypto-key to use type-tracking #5075

Uh oh!

Conversation

RasmusWL commented Feb 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RasmusWL commented Feb 17, 2021

Uh oh!

RasmusWL commented Feb 19, 2021

Uh oh!

felicitymay left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

RasmusWL commented Feb 23, 2021

Uh oh!

felicitymay commented Feb 23, 2021

Uh oh!

RasmusWL commented Feb 25, 2021

Uh oh!

RasmusWL commented Feb 26, 2021

Uh oh!

RasmusWL commented Mar 2, 2021

Uh oh!

yoff left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RasmusWL commented Mar 18, 2021

Uh oh!

yoff commented Mar 18, 2021

Uh oh!

yoff left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

RasmusWL commented Feb 2, 2021 •

edited

Loading