Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add management command to Sync RemoteRepositories and RemoteOrganizations #7803

Conversation

saadmk11
Copy link
Member

@saadmk11 saadmk11 commented Jan 6, 2021

This is the simple implementation of the management command that can be used in production to Sync RemoteRepositories and RemoteOrganizations for users. This will be useful to populate the new table created for the normalization.

Options

  • --queue name of the celery queue for the tasks to be triggered with. (Default: resync-oauth)
  • --skip-users Skip re-sync VCS provider data for specific users. (Default: [])
  • --users Re-sync VCS provider data for specific users only. (Default: [])
  • --max-users maximum number of users that should be synced. (Default: 100)
  • --force Force re-sync VCS provider data even if the users are already synced.
  • --dry-run Do not trigger tasks for VCS provider re-sync.

Example:
python manage.py sync_vcs_data --queue resync-oauth --max-users 500
python manage.py sync_vcs_data --users saadmk11 test_user --skip-users another_user

This management command triggers 1 sync_remote_repositories task per user.

closes #7791

@saadmk11 saadmk11 requested a review from humitos January 6, 2021 08:10
Copy link
Member

@humitos humitos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me!

readthedocs/oauth/management/commands/sync_vcs_data.py Outdated Show resolved Hide resolved
self.stdout.write(
self.style.SUCCESS(
'Found %s user(s) with the given parameters' % users.count()
)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also show the total number of users that need to be resync before taking the max_users slice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

users.count() shows the total number of users that needs to be re-synced according to the filters. The next count is the count after the slice. Did you mean something else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wanted to know the total of users that needs to be synced without applying sync_users or skip_users. So, maybe those filters need to be applied to users_to_sync instead of users. Don't you think?

In that case we will have:

  • Found TOTAL user(s) to sync in total
  • Re-syncing VCS Providers for FILTERED_USERS user(s) with given parameters

This gives us a clear number about "where we are" when running this command. Every time we run it we know exactly how many users still need re-syncing.

@saadmk11 saadmk11 requested a review from humitos January 7, 2021 14:23
@saadmk11
Copy link
Member Author

saadmk11 commented Jan 8, 2021

I think we can add a --dry-run option which will not trigger any tasks but show us the number of users found with the given parameters. What are your thoughts on this?

@humitos
Copy link
Member

humitos commented Jan 12, 2021

I think we can add a --dry-run option which will not trigger any tasks but show us the number of users found with the given parameters. What are your thoughts on this?

💯

@saadmk11 saadmk11 requested a review from humitos January 12, 2021 12:50
Copy link
Member

@humitos humitos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I just suggest to use --dry-run instead of --no-dry-run

@@ -40,13 +40,20 @@ def add_arguments(self, parser):
default=False,
help='Force re-sync VCS provider data even if the users are already synced.',
)
parser.add_argument(
'--no-dry-run',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more common to use the opposite here: --dry-run

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I added this so that we don't accidentally trigger tasks

@saadmk11 saadmk11 requested a review from humitos January 12, 2021 13:42
Copy link
Member

@humitos humitos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! Thanks for your help on this!

@humitos humitos merged commit dc40044 into readthedocs:remote-repository-normalization Jan 12, 2021
3 checks passed
@saadmk11 saadmk11 deleted the oauth-sync-command branch January 12, 2021 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants