Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of cherry picks in the specification #216

Closed
zacchiro opened this issue Jan 9, 2024 · 7 comments · Fixed by #221
Closed

Handling of cherry picks in the specification #216

zacchiro opened this issue Jan 9, 2024 · 7 comments · Fixed by #221

Comments

@zacchiro
Copy link
Contributor

zacchiro commented Jan 9, 2024

Currently, the specification (I'm looking at version 1.6.1, at the time of writing) does not mention how to handle cherry pick commits at all. However, as I understand it, the osv.dev implementation of the evaluation logic does handle them, to some extent.

It would be nice to specify properly how cherry picks should be handled, so that independent implementations of the evaluation logic can be created, compared, etc.

I have a few specific examples of cherry picks cases that the semantics non-trivial, which I can post here.
But before doing so I'd like to understand if there is an interest in making the OSV format specification evolve to describe how cherry picks should be handled.

@oliverchang
Copy link
Contributor

Hi! Thanks for the interest and question.

The spec makes no mention of cherrypicks, because in the context of the evaluation algorithm, all that matters are the events that are there in the data source. Adding cherrypicks explicitly into the algorithm is also difficult, because it's hard if not impossible to guarantee that all cherrypicks can be found algorithmically.

Re the OSV.dev implementation -- OSV.dev does have a best effort cherry pick detection as an additional validation /augmentation step separate from the evaluation. That is, OSV.dev infrastructrure will perform best effect detection of cherrypicks, and add them to the source OSV advisory. There are other similar things that OSV.dev performs here to augment the source OSV entry.

Then, evaluation happens after this, as per the OSV evaluation rules.

Does this answer your question?

@zacchiro
Copy link
Contributor Author

Hello, and thanks a lot for your quick answer.

It is unfortunate that, due to cherry picks (and maybe other features?), the OSV.dev implementation cannot be fully compared with the specification. As far as I understand, when API users receive an answer about whether a commit is vulnerable or not, they don't know if cherry picks were involved in that answer. So there is currently no way of detecting if the answer is "wrong" due to a bug or if it is "good" due to OSV.dev going the extra mail of taking cherry picks into account. Maybe that's just the way it could be. Or maybe we can make things a bit better on this front by including in API answers whether cherry picks were involved in the decision (arguably this should be discussed in an issue related to the OSV.dev implementation, not here).

Still, I'd like to offer a counter-argument to this:

Adding cherrypicks explicitly into the algorithm is also difficult, because it's hard if not impossible to guarantee that all cherrypicks can be found algorithmically.

The specification could assume there exists a general notion of "equivalent commits" (equivalence classes if you want), that could be implemented in practice by cherry pick detection or any other means. And then, based on that, specify how equivalent commits should be handled in the evaluation algorithm. This way you can decouple how one does cherry pick detection from how they are used by the evaluation. Is this something you could consider?

(Yes, I realize this would not completely solve the problem of the OSV.dev/specification divergence, but it would reduce the uncertainty margin for any specification implementer.)

@zacchiro zacchiro changed the title handling of cherry picks in the specification Handling of cherry picks in the specification Jan 10, 2024
@zacchiro
Copy link
Contributor Author

zacchiro commented Jan 10, 2024

Re the OSV.dev implementation -- OSV.dev does have a best effort cherry pick detection as an additional validation /augmentation step separate from the evaluation. That is, OSV.dev infrastructrure will perform best effect detection of cherrypicks, and add them to the source OSV advisory.

Aside from what to say about cherry picks in the spec or not, I have a followup question about the augmentation mentioned above: does it result in a combinatorial explosion of ranges to be considered?

Consider this case:

  • introduced: commit i_1
  • limit: f_1
  • commit i_2 is a cherry pick of i_1
  • commit f_2 is a cherry pick of f_1

In terms of commit ranges to be traversed for evaluation, does one need to consider all the 4 combinations: i_1..f_1, i_1..f_2, i_2..f_1, i_2..f_2? As I understand from your answer, this is what the OSV.dev implementation (implicitly) does. But it'd be great to have confirmation of this.
(Yes: in most cases i_1..f_1 and i_2..f_2 would be on disconnected parts of the Git history, so the combinatorics is not a practical problem. But complicated cases where they are connected can be built, so it is important to know for how to handle the most general case.)

If commit equivalence ends up being discussed in the spec, this is one of the tricky cases that would need to be clarified.

@oliverchang
Copy link
Contributor

Hello, and thanks a lot for your quick answer.

It is unfortunate that, due to cherry picks (and maybe other features?), the OSV.dev implementation cannot be fully compared with the specification.

Thanks for pointing this out. I believe we can potentially resolve this by adding flags/settings to get our implementation to fully conform. Would you also be able to help us understand the motivation behind this comparison you're looking into so we can better understand how to make this fit your needs?

The specification could assume there exists a general notion of "equivalent commits" (equivalence classes if you want),

Our goal with the OSV spec is to provide an unambiguous evaluation algorithm, given a fixed set of inputs (i.e. events). The fact that there is no definition for "equivalent" commits would complicate this algorithm.

I absolutely agree cherrypick detection is crucial for git and something that needs to be solved for producers of OSV entries. Perhaps this would be a better fit for a documented "guide" of sorts for DB publishers where we talk about this issue and hte importance of including all cherrypicked/equivalent commits in the data?

@zacchiro
Copy link
Contributor Author

zacchiro commented Jan 11, 2024 via email

@oliverchang
Copy link
Contributor

Regarding my motivation for this. I'm co-founder and tech lead of the Software Heritage project; we are considering applying the OSV.dev data and evaluation logic to our archive.

Thank you for the background!

Happy to review something such or contribute in other ways you consider useful.

We already have text in the spec to prefer fixed over last_affected, so I think it definitely makes sense to add some details on fixed vs limit.

Would you mind contributing a PR for that?

zacchiro added a commit to zacchiro/osv-schema that referenced this issue Jan 16, 2024
Rationale: reduce the amount of false negatives in commits belonging to
branches forked in-between `introduced` and `fixed`/`limit`.

Closes ossf#216.
@zacchiro
Copy link
Contributor Author

We already have text in the spec to prefer fixed over last_affected, so I think it definitely makes sense to add some details on fixed vs limit.

Would you mind contributing a PR for that?

Sure, here it is: #221.

zacchiro added a commit to zacchiro/osv-schema that referenced this issue Jan 16, 2024
Rationale: reduce the amount of false negatives in commits belonging to
branches forked in-between `introduced` and `fixed`/`limit`.

Closes ossf#216.

Signed-off-by: Stefano Zacchiroli <zack@upsilon.cc>
zacchiro added a commit to zacchiro/osv-schema that referenced this issue Jan 16, 2024
Rationale: reduce the amount of false negatives in commits belonging to
branches forked in-between `introduced` and `fixed`/`limit`.

Closes ossf#216.

Signed-off-by: Stefano Zacchiroli <zack@upsilon.cc>
zacchiro added a commit to zacchiro/osv-schema that referenced this issue Jan 29, 2024
Rationale: reduce the amount of false negatives in commits belonging to
branches forked in-between `introduced` and `fixed`/`limit`.

Closes ossf#216.

Signed-off-by: Stefano Zacchiroli <zack@upsilon.cc>
oliverchang pushed a commit that referenced this issue Feb 14, 2024
Rationale: reduce the amount of false negatives in commits belonging to
branches forked in-between `introduced` and `fixed`/`limit`.

Closes #216.

Signed-off-by: Stefano Zacchiroli <zack@upsilon.cc>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants