Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add package vulnerability reporting API #9552

Merged
merged 9 commits into from Jul 26, 2021
Merged

Add package vulnerability reporting API #9552

merged 9 commits into from Jul 26, 2021

Conversation

dekkagaijin
Copy link
Contributor

PyPI today does not store information about known vulnerabilities. We’d like to build the necessary changes such that known vulnerabilities can be associated with package releases. The ultimate goal is that users of “pip” will be automatically warned if their dependencies are vulnerable.

See: #9407, https://discuss.python.org/t/proposing-a-community-maintained-database-of-pypi-package-vulnerabilities/8374

Signed-off-by: Jake Sanders jsand@google.com

@dekkagaijin
Copy link
Contributor Author

/cc @di @oliverchang

@dekkagaijin
Copy link
Contributor Author

Tests are passing, but <100% coverage. Please direct hatred at the changelist

@ewjoachim
Copy link
Contributor

Woo, I'm seeing a lot of interactions with the Token Disclosure code I contributed :)
If I may suggest a few general things:

  • In the token disclosure code, there are 2 places where we take an arbitrary JSON payload and extract data from it (the meta API, and the disclosure payload), while trying very hard to check that it has the expected format. There isn't (yet) a lot of warehouse APIs where we accept JSON input payloads, so I had to do a little bit of work there, but if we're going toward refactoring and/or imitating this code, I think it would be much much easier to integrate a validation lib such as https://pypi.org/project/jsonschema/ (we could also go with Pydantic, Marshmallow or others like that but they're also quite oriented toward serialization and we're only interested in deserialization here). This may make your work much easier, and if you think it's worth it, I can try and make a first PR or draft to do that on the token disclosure side.
  • It really wasn't easy to test the full token disclosure scenario (PyPI has many unit tests but fewer integration tests and it's not always easy to write those). I made the "notgithub" service (see Notgithub: a service to simulate GitHub token scanning #9269) exactly to ease that, and I'll be glad if you want to extend it to be capable of sending vulnerabilities given the format seems to be quite close. As of today "notgithub"'s only known use-case is Warehouse, so I don't have a problem if we add some coupling.
  • Finally, around this part of the code, there are also discussions to integrate GitLab secrets detection too (Gitlab secrets detection #9280), but the format is not yet defined. If we move this code, let's make sure we make implementing future token scanning inputs easier rather than harder (I know it's not very actionnable advice, but in case of a doubt during implementation regarding re-usability, I hope this can provide some guidance)

@dekkagaijin dekkagaijin force-pushed the vulnz-api branch 7 times, most recently from 683ff3d to 3f01152 Compare June 1, 2021 23:11
@dekkagaijin
Copy link
Contributor Author

/assign @di

@dekkagaijin
Copy link
Contributor Author

@di This is in a reviewable state, at least for a first pass. I'll be adding more using tests for 100% coverage of what exists and modeling retrieve_public_key_payload and extract_public_keys after what exists or the token leak API

Copy link
Member

@di di left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preliminary review

warehouse/integrations/vulnerabilities/utils.py Outdated Show resolved Hide resolved
warehouse/packaging/models.py Outdated Show resolved Hide resolved
warehouse/integrations/vulnerabilities/utils.py Outdated Show resolved Hide resolved
Comment on lines 453 to 434
vulnerabilities = orm.relationship(
Vulnerability,
backref="releases",
secondary=lambda: release_vulnerabilities,
passive_deletes=True,
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably define this on the Vulnerability model instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we not want both?

Comment on lines 30 to 46
class VulnerabilityReport:
def __init__(
self,
project: str,
versions: List[str],
vulnerability_id: str,
advisory_link: str,
aliases: List[str],
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can probably condense the Vulnerability database model and this class into a single class. The class can also live at warehouse/integrations/vulnerabilities/models.py instead, and be imported into warehouse.packaging.models.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in, do the model representation, report validating, and DB interaction in a single class?

@dekkagaijin dekkagaijin force-pushed the vulnz-api branch 3 times, most recently from e912f0b to 1f71f43 Compare June 14, 2021 22:51
@dekkagaijin dekkagaijin force-pushed the vulnz-api branch 8 times, most recently from 14e463d to 2eb2332 Compare June 18, 2021 00:59
@dekkagaijin
Copy link
Contributor Author

@di I'm still finishing up the last few test cases, but please TAL

@dekkagaijin dekkagaijin force-pushed the vulnz-api branch 3 times, most recently from f10a17f to 2e7d51a Compare June 19, 2021 00:17
@dekkagaijin dekkagaijin changed the title [WIP] Add package vulnerability API Add package vulnerability API Jun 19, 2021

def test_retrieve_public_key_payload_json_error(self):
response = pretend.stub(
text="Still a non-json teapot",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Is it bad that I laughed at this given this is a copy of a joke I made :D ) (👍)(🍵)

@ewjoachim
Copy link
Contributor

ewjoachim commented Jul 2, 2021

(Finished this round of review. I forgot to make a general comment. I appreciate that there was a lot of code inspired from my PR, but if we need to do this a 3rd time, I'm afraid that will start to make quite a lot of duplication. This is by no means a way to block this PR (https://en.wikipedia.org/wiki/Rule_of_three_(computer_programming)) but we can think already about how we can avoid duplicating that much code next time)
(Great job 👍 👍 👍 🎉 )

@dekkagaijin dekkagaijin force-pushed the vulnz-api branch 2 times, most recently from 76e70ff to 91ae897 Compare July 12, 2021 21:23
@dekkagaijin
Copy link
Contributor Author

I would still like to see tagging implemented before merge.

Tagging?

@ewdurbin
Copy link
Member

I would still like to see tagging implemented before merge.

Tagging?

see the comment that is in reply to #9552 (comment)

Jake Sanders added 3 commits July 14, 2021 12:25
Signed-off-by: Jake Sanders <jsand@google.com>
Signed-off-by: Jake Sanders <jsand@google.com>
…tructor

Signed-off-by: Jake Sanders <jsand@google.com>
@dekkagaijin
Copy link
Contributor Author

I would still like to see tagging implemented before merge.

done

Copy link
Member

@ewdurbin ewdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one request (the timing metric tag) and some suggestions. none of it strictly critical for merge.

revision = "1dbb95161e5a"
down_revision = "08ccc59d9857"

# Note: It is VERY important to ensure that a migration does not lock for a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment(s) from template can be removed (not critical)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done



def downgrade():
# ### commands auto generated by Alembic - please adjust! ###
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment(s) from template can be removed (not critical)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done



def upgrade():
# ### commands auto generated by Alembic - please adjust! ###
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment(s) from template can be removed (not critical)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

def analyze_vulnerability(request, vulnerability_report, origin, metrics):
metrics.increment("warehouse.vulnerabilities.received", tags=[f"origin:{origin}"])
try:
_analyze_vulnerability(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's wrap this call in with metrics.timed('warehouse.vulnerabilities.analysis', tags=[f"origin:{origin}"]):

this will emit timing metrics so we can keep tabs on this programatically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, good idea

Jake Sanders added 2 commits July 22, 2021 17:00
Signed-off-by: Jake Sanders <jsand@google.com>
Signed-off-by: Jake Sanders <jsand@google.com>
@di di merged commit 3c54782 into pypi:main Jul 26, 2021
@dekkagaijin dekkagaijin deleted the vulnz-api branch July 26, 2021 15:02
domdfcoding pushed a commit to domdfcoding/warehouse that referenced this pull request Jun 7, 2022
* Add package vulnerability reporting API

Signed-off-by: Jake Sanders <jsand@google.com>

* genericize vuln report handling, split osv into a subpackage

Signed-off-by: Jake Sanders <jsand@google.com>

* use kwargs instead of positionals for VulnerabilityReportRequest constructor

Signed-off-by: Jake Sanders <jsand@google.com>

* use tags rather than namespacing metrics by origin

* also load all releases when loading vulnerability records

* add description for vuln report model columns

* remove comments from vulnerability migration

Signed-off-by: Jake Sanders <jsand@google.com>

* instrument `_analyze_vulnerability`

Signed-off-by: Jake Sanders <jsand@google.com>

Co-authored-by: Dustin Ingram <di@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants