Skip to content

Conversation

dbartol
Copy link
Contributor

@dbartol dbartol commented Jul 18, 2024

This PR adds a new column to the results view to show the provenance information for each path node. For each model that contributed to that path node, we show the kind of model (source, sink, or step) and a link to the YAML source code for that model.

This feature is currently enabled only under the canary flag, and requires CodeQL CLI 2.18.1 or newer.

Relatively little thought has gone into the UI layout, so please submit feedback on how to improve it. I just wanted to get something working well enough for QL authors to use it and complain about it:)

Implementation notes

Links to models are displayed using the same SarifLocation widget that we use for locations in the source archive, except that databaseUri is set to undefined. There was one case in the Location() component function where we were treating an undefined databaseUri as being unclickable, but I'm convinced that none of the five call sites can actually hit that path. I removed that case so that an undefined databaseUri just gets passed through to the callback message to the host.

We didn't have an easy way of getting the user's settings from within the WebView code, so I added a setUserSettings message to the results view and compare view. For now, shouldDisplayProvenance is the only setting, but it will now be easy to add others. The settings are propagated as props to most descendant components.

When parsing the SARIF file, we used to tell the streaming parser to include only the results array for the first run. We now need the extensions property for that run as well. Since that would results in loading two disjoint subtrees, the easiest thing to do was to switch the parser to parse everything except a couple of potentially large arrays that we still don't need.

I needed to plumb the Run object through the component tree to allow the path node row to look up the model's location.

I'm still working on tests.

Screenshot 2024-07-18 Provenance

@dbartol dbartol added the Complexity: Medium Requires a moderate level of detail in design or review. label Jul 18, 2024
@aschackmull
Copy link

Does it also show provenance that isn't MaD? I.e. the cases where it's just some arbitrary string. Currently there's no structure to those cases - maybe they should have a prefix as well, such as e.g. QL: (if that's something you prefer).

This is roughly equivalent to the validation we had before, when we were only including `runs.0.results`.
@dbartol
Copy link
Contributor Author

dbartol commented Jul 19, 2024

Does it also show provenance that isn't MaD

No, but it's easy enough to add. We don't even emit those in SARIF today, but we certainly have that information at the point we emit the other provenance taxa in SARIF.

@aschackmull
Copy link

No, but it's easy enough to add.

It would be nice to have.

@dbartol
Copy link
Contributor Author

dbartol commented Jul 29, 2024

@dbartol dbartol marked this pull request as ready for review July 29, 2024 20:29
@dbartol dbartol requested review from a team as code owners July 29, 2024 20:29
Copy link
Contributor

@aeisenberg aeisenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good. I have a question, but it's probably a noop. I haven't actually tried to run this.

return [];
}

return taxa.flatMap((taxonRef, index) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the situations where we will have multiple taxa for a given path element? Realistically, are there any degenerate cases where we can have lots of them and it can make the UI unreadable?

Either:

  1. Ignore this problem since this is a canary-only feature
  2. Only show the first X links and the user will need to find the rest directly from the the sarif
  3. Show the first X links and add a "more" button that adds the remainder

Most likely, 1 is fine, but we will probably need to handle degenerate cases if this becomes a public feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, the number is almost always 1 or 2. The 2 case is usually due to having both a step model and a sink model on the same node. In theory I think there can be any number of models on a given node, most likely to happen if multiple MaD rows match the same AST element. @aschackmull might have a better idea of how likely this is to be a real problem.

In any case, I think we can leave this as-is unless it becomes a problem in practice, at least as long as it's still canary-only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with that. I think you already mentioned to me that it would be at most two, but I'd forgotten.

@dbartol dbartol merged commit 4d2f84d into main Jul 30, 2024
15 checks passed
@dbartol dbartol deleted the dbartol/provenance branch July 30, 2024 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Complexity: Medium Requires a moderate level of detail in design or review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants