Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable throttling for publish command #29

Closed
wants to merge 2 commits into from

Conversation

rbikar
Copy link
Member

@rbikar rbikar commented Feb 9, 2021

This change introduces --throttle option that
is usable for publish command.

This option allows to specify a number of repositories
that can be publish at one time. This may help
with not overloading pulp server with mass publishes
and leave some capacity of workers to do another tasks.

This change introduces --throttle option that
is usable for publish command.

This option allows to specify a number of repositories
that can be publish at one time. This may help
with not overloading pulp server with mass publishes
and leave some capacity of workers to do another tasks.
@rbikar rbikar marked this pull request as ready for review February 10, 2021 14:59
@rbikar rbikar requested a review from a team February 10, 2021 14:59
Copy link
Member

@rohanpm rohanpm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this isn't the ideal place to implement it:

If you have a look in pubtools-pulplib, it already has throttling support built-in. It just happens to use a built-in value which the caller is currently not allowed to control:

pubtools/pulplib/_impl/client/client.py-        self._task_executor = (
pubtools/pulplib/_impl/client/client.py-            Executors.thread_pool(max_workers=self._REQUEST_THREADS)
pubtools/pulplib/_impl/client/client.py-            .with_map(self._unpack_response)
pubtools/pulplib/_impl/client/client.py-            .with_map(self._log_spawned_tasks)
pubtools/pulplib/_impl/client/client.py-            .with_poll(poller, cancel_fn=poller.cancel)
pubtools/pulplib/_impl/client/client.py:            .with_throttle(self._TASK_THROTTLE)
pubtools/pulplib/_impl/client/client.py-            .with_retry(retry_policy=self._RETRY_POLICY)
pubtools/pulplib/_impl/client/client.py-        )

I think rather than implementing a separate throttling loop which is only going to work for publish, we should leverage the above, so the appropriate course here would be:

  • adjust pubtools-pulplib to make the throttle count a supported part of the API (e.g. it can be passed to constructor of the client)
  • just make pubtools-pulp pass the appropriate value when creating the client

Also, what do you think of allowing all tasks to set the throttle rather than just publish? This was requested for publish being the most significant use-case, but in reality just about every task using Pulp can benefit from having the ability to throttle it. If it doesn't complicate the current issue much, I'd say it'd be worthwhile to add the option to all of the tasks here.

Actually, from reviewing how the code in pubtools-pulp is set up, I'd say we almost get this "for free" - pulp client creation is abstracted by PulpClientService and if you add --throttle argument there and pass it into pubtools-pulplib, I think every task is going to gain support for throttling at once, consistently.

@rohanpm
Copy link
Member

rohanpm commented Feb 10, 2021

Also, what do you think of allowing all tasks to set the throttle rather than just publish?

Just to clarify, I only mean within pubtools-pulp itself here, I'm not proposing to make the same adjustment through all Pub commands.

@rbikar
Copy link
Member Author

rbikar commented Feb 11, 2021

@rohanpm
Yep, sounds reasonable and more usable - I'll have a look at it in pubtools-pulplib. It shouldn't complicate the issue much.
But I just wonder - if we have some more complicated tasks in pubtools-pulp in future - e.g. push advisory, shouldn't we throttle only some classes of pulp tasks that are expected to be long-running like publish tasks? It looks the with_throttle can be used dynamically, so it can be achieved quite easily. (just thinking out loud)

@rohanpm
Copy link
Member

rohanpm commented Feb 11, 2021

Yeah, in particular I would think that using a low throttle count for 'publish' and a high throttle count for 'associate' tasks, which usually take only a second or two, would make sense. This would probably have to be built-in to pubtools-pulplib since it knows which type of tasks are being created.

@rbikar
Copy link
Member Author

rbikar commented Mar 18, 2021

Closing in favor of #32

@rbikar rbikar closed this Mar 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants