Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate Package managers #93

Merged
merged 14 commits into from
Dec 15, 2023
Merged

Migrate Package managers #93

merged 14 commits into from
Dec 15, 2023

Conversation

keshav-space
Copy link
Member

@keshav-space keshav-space commented Sep 19, 2023

This PR migrates existing package managers code in VulnerableCode to FetchCode, also refactor and streamline the consumption using purl router.

- Fetch all versions for a given PURL

Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Copy link
Member

@JonoYang JonoYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keshav-space I've left some suggestions to avoid accessing a dictionary multiple times for the same key-value pair.

src/fetchcode/package_managers.py Outdated Show resolved Hide resolved
src/fetchcode/package_managers.py Outdated Show resolved Hide resolved
src/fetchcode/package_managers.py Outdated Show resolved Hide resolved
src/fetchcode/package_managers.py Outdated Show resolved Hide resolved
https://crates.io/policies#crawlers

Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
@keshav-space
Copy link
Member Author

Thanks @JonoYang, made changes as per your suggestions.

@pombredanne
Copy link
Member

LGTM... we need to find a better name for this module, may be "package_versions.py" for now?

Beyond this we need to have a better design.
https://github.com/package-url/packageurl-python/blob/main/src/packageurl/contrib/purl2url.py#L70 design is not great but has its use.

Here are some thoughts (to track in new issue(s)):

  1. We are going to have discrete functions each with a router that return a very specific URL or small piece of data like today. But here the functions that create URLs and fetch should be split in two so that we can have URL-only functions as explained in 2. that do not fetch anything.

  2. Functions that only transform a PURL in a URL should be in one place (likely packageurl-python)

  3. Then anything that fetches remote data should be of two kinds

  • One may return raw data from a JSON or XML API
  • One may return a ScanCode Package object converted from this raw data
  1. Some basic function that only return versions may just return lists of PURLs alright

  2. We need to account to for repository_url and download_url qualifiers

We also need to make the migration for VulnerableCode with this new code. Can you start a PR in parallel so we avoid duplicating code. In VCIO you could use a temp requirements in setup.cfg such as git+https://github.com/nexB/fetchcode@6faf26353b8cbc65b07ba6c3285a0e8c8a1c9f1b to collect a specific commit

Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. See fedback

tests/test_package_managers.py Outdated Show resolved Hide resolved
tests/test_package_managers.py Outdated Show resolved Hide resolved
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
@keshav-space
Copy link
Member Author

LGTM... we need to find a better name for this module, may be "package_versions.py" for now?

package_managers.py is now renamed to package_versions.py.

Beyond this we need to have a better design. https://github.com/package-url/packageurl-python/blob/main/src/packageurl/contrib/purl2url.py#L70 design is not great but has its use.

Here are some thoughts (to track in new issue(s)):

  1. We are going to have discrete functions each with a router that return a very specific URL or small piece of data like today. But here the functions that create URLs and fetch should be split in two so that we can have URL-only functions as explained in 2. that do not fetch anything.
  2. Functions that only transform a PURL in a URL should be in one place (likely packageurl-python)
  3. Then anything that fetches remote data should be of two kinds
  • One may return raw data from a JSON or XML API
  • One may return a ScanCode Package object converted from this raw data
  1. Some basic function that only return versions may just return lists of PURLs alright
  2. We need to account to for repository_url and download_url qualifiers

We're tracking this here: aboutcode-org/purldb#233

We also need to make the migration for VulnerableCode with this new code. Can you start a PR in parallel so we avoid duplicating code. In VCIO you could use a temp requirements in setup.cfg such as git+https://github.com/nexB/fetchcode@6faf26353b8cbc65b07ba6c3285a0e8c8a1c9f1b to collect a specific commit

Added PR for this here: aboutcode-org/vulnerablecode#1354

Copy link
Member

@JonoYang JonoYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keshav-space I left some comments regarding adding comments or docstring tests.

src/fetchcode/package_versions.py Outdated Show resolved Hide resolved
src/fetchcode/package_versions.py Outdated Show resolved Hide resolved
src/fetchcode/package_versions.py Outdated Show resolved Hide resolved
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Copy link
Member

@JonoYang JonoYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keshav-space I think this looks good!

@keshav-space keshav-space merged commit 0caa2aa into master Dec 15, 2023
11 checks passed
@keshav-space keshav-space deleted the package-managers branch December 15, 2023 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants