Problem
The package metadata query in the PyPI Simple API retrieve method is a significant contributor to database load in production. During peak load on 2026-05-20 at 16:00 UTC, 4 concurrent instances of this query pattern consumed 2.73 avg active sessions on the RDS instance (db.m7g.4xlarge, 16 vCPUs). Total DB load reached 21.9 avg active sessions, exceeding the CPU count.
Source
pulp_python/app/pypi/views.py lines 376-386:
packages = local_packages.annotate(
repo_added_time=Subquery(repo_added_subquery)
).values(
"filename",
"sha256",
"metadata_sha256",
"requires_python",
"size",
"repo_added_time",
"version",
)
SQL pattern observed (from RDS Performance Insights)
SELECT "python_pythonpackagecontent"."filename" AS "filename",
"python_pythonpackagecontent"."sha256" AS "sha256",
"python_pythonpackagecontent"."metadata_sha256" AS "metadata_sha256",
"python_pythonpackagecontent"."requires_python" AS "requires_python",
"python_pythonpackagecontent"."size" AS "size",
"core_content"."pulp_created" AS "pulp_created",
"python_pythonpackagecontent"."version" AS "version"
FROM "python_pythonpackagecontent"
INNER JOIN "core_content" ON (...)
The wait event breakdown during the spike was primarily CPU (17.9 avg active sessions), indicating the queries are compute-bound rather than I/O-bound.
Suggestions
- Add database indexes on commonly filtered/joined columns
- Consider caching results for frequently-accessed packages
- Evaluate whether the
Subquery annotation for repo_added_time can be optimized (e.g., via a JOIN instead of a correlated subquery)
Problem
The package metadata query in the PyPI Simple API
retrievemethod is a significant contributor to database load in production. During peak load on 2026-05-20 at 16:00 UTC, 4 concurrent instances of this query pattern consumed 2.73 avg active sessions on the RDS instance (db.m7g.4xlarge, 16 vCPUs). Total DB load reached 21.9 avg active sessions, exceeding the CPU count.Source
pulp_python/app/pypi/views.pylines 376-386:SQL pattern observed (from RDS Performance Insights)
The wait event breakdown during the spike was primarily CPU (17.9 avg active sessions), indicating the queries are compute-bound rather than I/O-bound.
Suggestions
Subqueryannotation forrepo_added_timecan be optimized (e.g., via a JOIN instead of a correlated subquery)