Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporary files are an inefficient use of cloud storage #3827

Open
dralley opened this issue May 15, 2023 · 3 comments
Open

Temporary files are an inefficient use of cloud storage #3827

dralley opened this issue May 15, 2023 · 3 comments
Labels

Comments

@dralley
Copy link
Contributor

dralley commented May 15, 2023

Version
All

Describe the bug

Pulp uses temporary files and puts them on the same storage backend as everything else (AFAIK) - storage backends like S3 or Azure require you to pay for usage, so using those services for short lived and often small files is not an efficient use of resources.

@ggainey
Copy link
Contributor

ggainey commented May 16, 2023

@ggainey
Copy link
Contributor

ggainey commented May 23, 2023

@pedro-psb
Copy link
Member

The Pulp has discussed that on the pulpcore meeting today:

I've presented an initial idea to start the discussion:

See diagram ![image](https://github.com/pulp/pulpcore/assets/7907864/5316cdca-7ca5-4eae-838a-ab1a119f0ce6)

Some of key takes from the discussion were:

  • Allow API and Workers on same host is not a good idea:
    • probably fixing SFTPStorage (from django-storages, which is not thread-safe) would be easier than support this
  • Is adding the extra temporary_storage to be used be PulpTemporaryFile worth it?
    • Since we can't use shared FS in the cloud, we would end up using S3 for this anyway. Any advantage on that?
  • The S3 transfers shouldn't cost anything on cloud deployment in the same region. If so, this is more a performance issue.
    • TODO: find reference
  • Can we make object storage traffic faster (leveraging configs)? Like for parallel upload/download.
  • Tradeoffs are too abstract for cloud storage transfers. Can we get some real data?
    • setup some experiment to measure file transfers on cloud setups (e.g, instance->s3 vs EBS)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants