Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update inspect_manifest to accept archives #1037

Merged
merged 4 commits into from
Jan 25, 2024
Merged

Conversation

AyanSinhaMahapatra
Copy link
Member

Reference: #1034

Updates inspect_manifest pipeline to accept inputs other than manifests and does package resolving on any manifests that is found in the input.

The inspect_manifest pipeline is now renamed to inspect_manifests
and this supports uploading a whole package/codebase archive to
find manifests and resolve all packages in them, as opposed to
supporting only manifests to be uploaded.

Reference: #1034
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Reference: #1034
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
scanpipe/pipelines/inspect_manifest.py Outdated Show resolved Hide resolved
scanpipe/pipes/resolve.py Outdated Show resolved Hide resolved
scanpipe/pipes/resolve.py Outdated Show resolved Hide resolved
Comment on lines +69 to +72
if manifest_type:
resource.update(status=flag.APPLICATION_PACKAGE)

return project.codebaseresources.filter(status=flag.APPLICATION_PACKAGE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not efficient as we do 1 query per detected manifest_type

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to optimize this further, could you suggest how to do this here?

we do 1 query per detected manifest_type

Yes we are updating the status for each manifest present, and we can't optimize this to get all the manifests at once because we have to apply handler.is_datafile(location) on every location individually.

We can create a filter here by looping through each handler type and collecting path_patterns for each handler type, and then create a queryset by filtering by these patterns, but this would not be correct. Because the default is_datafile implementation here https://github.com/nexB/scancode-toolkit/blob/develop/src/packagedcode/models.py#L974 is more than just path patterns, and this is also often overridden by ecosystem classes that inherit from it.

We also cannot collect all the resources in a list and then create a queryset to update it at once, as this would not be ideal? I couldn't find examples for this elsewhere as everywhere we have to update each resource/object individually we run update() individually.

@tdruez
Copy link
Member

tdruez commented Jan 25, 2024

@AyanSinhaMahapatra could you merge the conflict (following the pipeline renaming) and have a look at the comments?

Reference: #1037
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
@AyanSinhaMahapatra
Copy link
Member Author

AyanSinhaMahapatra commented Jan 25, 2024

@tdruez thanks for the improvement suggestions!
This is merged and updated now, ready for your review again.
See comments above for more details.

@tdruez tdruez merged commit 0ff317f into main Jan 25, 2024
7 checks passed
@tdruez tdruez deleted the update-inspect-manifest branch January 25, 2024 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants