-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update inspect_manifest to accept archives #1037
Conversation
The inspect_manifest pipeline is now renamed to inspect_manifests and this supports uploading a whole package/codebase archive to find manifests and resolve all packages in them, as opposed to supporting only manifests to be uploaded. Reference: #1034 Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Reference: #1034 Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
if manifest_type: | ||
resource.update(status=flag.APPLICATION_PACKAGE) | ||
|
||
return project.codebaseresources.filter(status=flag.APPLICATION_PACKAGE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not efficient as we do 1 query per detected manifest_type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to optimize this further, could you suggest how to do this here?
we do 1 query per detected manifest_type
Yes we are updating the status for each manifest present, and we can't optimize this to get all the manifests at once because we have to apply handler.is_datafile(location)
on every location individually.
We can create a filter here by looping through each handler type and collecting path_patterns
for each handler type, and then create a queryset by filtering by these patterns, but this would not be correct. Because the default is_datafile
implementation here https://github.com/nexB/scancode-toolkit/blob/develop/src/packagedcode/models.py#L974 is more than just path patterns, and this is also often overridden by ecosystem classes that inherit from it.
We also cannot collect all the resources in a list and then create a queryset to update it at once, as this would not be ideal? I couldn't find examples for this elsewhere as everywhere we have to update each resource/object individually we run update() individually.
@AyanSinhaMahapatra could you merge the conflict (following the pipeline renaming) and have a look at the comments? |
Reference: #1037 Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
@tdruez thanks for the improvement suggestions! |
Reference: #1034
Updates
inspect_manifest
pipeline to accept inputs other than manifests and does package resolving on any manifests that is found in the input.