Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create purldb scan worker #1078

Merged
merged 30 commits into from
Mar 27, 2024
Merged

Create purldb scan worker #1078

merged 30 commits into from
Mar 27, 2024

Conversation

pombredanne
Copy link
Member

@pombredanne pombredanne commented Feb 15, 2024

This PR is the ScanCode.io client side of the PurlDB scanning queue as designed in nexB/purldb#236

@pombredanne pombredanne marked this pull request as draft February 15, 2024 10:02
@pombredanne pombredanne marked this pull request as ready for review February 15, 2024 10:03
@tdruez
Copy link
Member

tdruez commented Feb 19, 2024

  1. Tests are failing
  2. TODOs to be addressed
  3. Lack of unit tests for the new code

@tdruez tdruez marked this pull request as draft February 19, 2024 10:29
@tdruez
Copy link
Member

tdruez commented Feb 26, 2024

@JonoYang What's the latest status on this one?

@JonoYang
Copy link
Contributor

@tdruez

I was working on updates to the purldb and API. I've updated my management command in this PR to make sure that basic functionality works (get download url, scan package, send results back), and that we can report scan project failures to purldb. I still need to create tests for management command functions.

@pombredanne
Copy link
Member Author

This needs rebasing/merging once #1077 is merged ...because of the renamings.

Copy link
Member

@tdruez tdruez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JonoYang See my comments.
Also, should #1077 be merged first or after this one?

scanpipe/management/commands/__init__.py Outdated Show resolved Hide resolved
scanpipe/management/commands/purldb-scan-queue-worker.py Outdated Show resolved Hide resolved
"0 means no limit. Used only for testing.",
)

def handle(self, *args, **options):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too complex and needs to be refactored in smaller pieces

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored the code in the handle() method. Is what I have still too complex?

scanpipe/pipes/purldb.py Outdated Show resolved Hide resolved
scanpipe/pipes/purldb.py Outdated Show resolved Hide resolved
scanpipe/pipes/purldb.py Outdated Show resolved Hide resolved
scanpipe/pipes/purldb.py Outdated Show resolved Hide resolved
Comment on lines +787 to +790
class CreateProjectCommand(
commands.CreateProjectCommandMixin, commands.AddInputCommandMixin, BaseCommand
):
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This setup seems too complex.

Copy link
Contributor

@JonoYang JonoYang Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdruez

Do you have any suggestions on how I should test the command methods? Should I have just called create-project using call_command instead of putting the project creation logic into a method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about having the functions related to package creation outside the Command class? Those could be at the module level, what's your take?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdruez I am on board with that idea. What should be done about the messages that are printed using styling added by the django Command class? For instance, the calls to self.stdout.write() and self.stderr.write()? I think that's the main reason why I kept this code as Command mixins

@JonoYang JonoYang force-pushed the scio-purldb-scan-worker branch 2 times, most recently from 1360a05 to aa4427d Compare March 23, 2024 00:36
@JonoYang JonoYang requested a review from tdruez March 23, 2024 00:48
@JonoYang
Copy link
Contributor

Also, should #1077 be merged first or after this one?

I think its alright to merge them in any order

Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
     * Add sleep to main work loop

Signed-off-by: Jono Yang <jyang@nexb.com>
    * Handle exceptions in package-scan-worker and send to purldb as errors

Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
    * Remove `scanscan_project_url` from scanpipe.pipes.purldb.update_status()

Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
    * Bump matchcode-toolkit version to 4.0.0

Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
    * Sen both traceback and exception message to purldb

Signed-off-by: Jono Yang <jyang@nexb.com>
    * Update management command test names

Signed-off-by: Jono Yang <jyang@nexb.com>
    * Move lists of statuses in poll_until_success to their own variables
    * Remove unnecessary else statement
    * Use tuples as default values for `create_project`

Signed-off-by: Jono Yang <jyang@nexb.com>
    * Rename get_next_job to get_next_download_url
    * Update tests

Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
@tdruez tdruez marked this pull request as ready for review March 27, 2024 11:17
@tdruez tdruez merged commit 4f67f97 into main Mar 27, 2024
7 checks passed
@tdruez tdruez deleted the scio-purldb-scan-worker branch March 27, 2024 11:17
@JonoYang JonoYang mentioned this pull request Mar 28, 2024
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants