-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Sink endpoint characteristics #11055
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@kaeluka IIRC you said there's extensive testing in the PR checks, so the fact that those have passed indicates this PR has made no change to the training data or the endpoints that get scored at inference time, right? |
Yes, but they're no guarantee 👍. One crucial test is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have left a few comments. Have you also started a performance evaluation for this change that you could link here?
...ql/experimental/adaptivethreatmodeling/lib/experimental/adaptivethreatmodeling/ATMConfig.qll
Show resolved
Hide resolved
...l/adaptivethreatmodeling/lib/experimental/adaptivethreatmodeling/EndpointCharacteristics.qll
Outdated
Show resolved
Hide resolved
...l/adaptivethreatmodeling/lib/experimental/adaptivethreatmodeling/EndpointCharacteristics.qll
Outdated
Show resolved
Hide resolved
...l/adaptivethreatmodeling/lib/experimental/adaptivethreatmodeling/EndpointCharacteristics.qll
Outdated
Show resolved
Hide resolved
...l/adaptivethreatmodeling/lib/experimental/adaptivethreatmodeling/EndpointCharacteristics.qll
Show resolved
Hide resolved
It's this experiment that I pinged you about yesterday, because I don't understand the failure errors |
Thanks @tiferet - I've had a look and LGTM, I don't have anything to add to @kaeluka's comments. My only suggestion would be regarding the below:
I think we should define a canary database (or a couple, to hit all the sink types) that we can all use and refer to ensure we're reproducing the set of endpoints. Concretely I'd like us to define somewhere along these lines:
While a couple of databases won't capture all edge cases the aim would be to have a simple target so we can track progress and easily identify regressions. |
EndpointCharacteristic() { any() } | ||
|
||
// Indicators with confidence at or above this threshold are considered to be high-confidence indicators. | ||
float getHighConfidenceThreshold() { result = 0.8 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question I had actually was what is the role of this predicate? I'm guessing it's a categorical selection wrapper on the more fine grain confidence floats, but why do we need it at this stage? (same question with the Medium
one below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I could have left these for a later PR. I use them in the endpoint selection code. For example, to implement logic such as "if the list of characteristics includes positive indicators with high confidence for this class, select this as a training sample belonging to the class". I put them in EndpointCharacteristic
because I think the place that sets confidences for various types of endpoints should also define what we mean by "high confidence".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we introduce them in the PR that will use them then? I know you have written more code locally, but for us that will make things a little easier to follow along.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO it's not worth deleting them from this PR just to add them in the next PR 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok but in the future please do not introduce concepts not discussed beforehand and not used in the PR you open.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Jean. In general, it's nice to avoid introducing dead code.
I think we could accomplish Jean's suggestion using QL unit tests. The test case could have sample code with all the possible endpoint types, and you can check in an |
Regarding Jean and Aditya's testing suggestions, these QL tests already exist 👍 : e.g. |
I think I've addressed all the review comments 🏓 I ran
What's left is to get the timing DCA tests to run (latest attempt), plus a (hopefully) final review.... |
@kaeluka Using the latest CLI seems to have solved the DCA failures. I don't know how to read the resulting report. The instructions say |
@tiferet see the Note on performance we added to the instructions recently:
cc @esbena to confirm |
That's great - to clarify I had in mind something more basic that just checked on the endpoints, without coupling it to format as I thought we might have to break that format during implementation of the design (if only temporarily, as discussed we'll need to write glue-code to minimise disruption to the pipeline inputs). But we can proceed with these tests for now and cross that bridge when we get to it. |
But the absolute time difference is sometimes big (e.g. 191.7 seconds). That's why I assumed the |
I only had a quick look but the diff is only big for one source and from memory even then |
The ATM thresholds have not been crossed. You can confirm by looking in reports/any.md and observing the ToC:
^ nothing interesting. For completeness:
|
@esbena Thank you for the clarification about where to look! I've updated our instructions to reflect this ❤️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks generally good! A few comments.
...ql/experimental/adaptivethreatmodeling/lib/experimental/adaptivethreatmodeling/ATMConfig.qll
Outdated
Show resolved
Hide resolved
...l/adaptivethreatmodeling/lib/experimental/adaptivethreatmodeling/EndpointCharacteristics.qll
Outdated
Show resolved
Hide resolved
...l/adaptivethreatmodeling/lib/experimental/adaptivethreatmodeling/EndpointCharacteristics.qll
Show resolved
Hide resolved
...l/adaptivethreatmodeling/lib/experimental/adaptivethreatmodeling/EndpointCharacteristics.qll
Outdated
Show resolved
Hide resolved
EndpointCharacteristic() { any() } | ||
|
||
// Indicators with confidence at or above this threshold are considered to be high-confidence indicators. | ||
float getHighConfidenceThreshold() { result = 0.8 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Jean. In general, it's nice to avoid introducing dead code.
...l/adaptivethreatmodeling/lib/experimental/adaptivethreatmodeling/EndpointCharacteristics.qll
Fixed
Show fixed
Hide fixed
...l/adaptivethreatmodeling/lib/experimental/adaptivethreatmodeling/EndpointCharacteristics.qll
Fixed
Show fixed
Hide fixed
...l/adaptivethreatmodeling/lib/experimental/adaptivethreatmodeling/EndpointCharacteristics.qll
Fixed
Show fixed
Hide fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all new changes are benign
Write the reasons that indicate that an endpoint is a sink for each sink type. Also fix import error.
If the list of reasons includes positive indicators with maximal confidence for this class, it's a known sink for the class. This negates the need for each query config to define the isKnownSink predicate individually.
Change the name to EndpointCharacteristics.
Make the implementations of specific `EndpointCharacteristic`s private.
fed54ef
to
833041c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor comment changes. I'm not too familiar with this area, so I'm not even sure if these suggestions are correct. Feel free to take or leave them.
* This predicate describes what the characteristic tells us about an endpoint. | ||
* | ||
* Params: | ||
* endpointClass: Class 0 is the negative class. Each positive int corresponds to a single sink type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know what you mean here, but it took me a second since 0 is not negative. I don't have any suggestions on improvement, though.
* isPositiveIndicator: Does this characteristic indicate this endpoint _is_ a member of the class, or that it | ||
* _isn't_ a member of the class? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor:
* isPositiveIndicator: Does this characteristic indicate this endpoint _is_ a member of the class, or that it | |
* _isn't_ a member of the class? | |
* isPositiveIndicator: If true, this endpoint is a member of the class. |
* confidence: A number in [0, 1], which tells us how strong an indicator this characteristic is for the endpoint | ||
* belonging / not belonging to the given class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not confident that this comment change makes things better. It's mostly for my own understanding.
* confidence: A number in [0, 1], which tells us how strong an indicator this characteristic is for the endpoint | |
* belonging / not belonging to the given class. | |
* confidence: A float in [0, 1], which tells us how strong an indicator this characteristic is for the endpoint | |
* belonging / not belonging to the given class. 0 means complete confidence that this characteristic _does not_ indicate belonging to this endpoint. And 1 means complete confidence that this characteristic _does_ belong.. |
This PR adds the class
EndpointCharacteristic
(formerly referred to asClassificationReason
in the design doc).As a first step, it implements only the four characteristics that indicate that an endpoint is a sink. Subsequent PRs will add characteristics that indicate an endpoint is not a sink.
The definition of a known sink can now be written in a generic fashion in the base class ATMConfig.qll without needing each query's config to implement it independently.
The same logic will be used to surface positive training samples in a subsequent PR.
Update: I've written the characteristics that will replace NotASinkReason and verified that I can reproduce the current selection of training examples. I need to clean up that code before opening a PR with it, though.
Timing experiment: https://github.com/github/codeql-dca-main/issues/8273
Closes https://github.com/github/ml-ql-adaptive-threat-modeling/issues/2096