New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
on-demand content results in unclosed, idle, database connections #2816
Comments
|
Workin in a core/3.16 env, I have narrowed the problem down to the commit where we moved from loop.run_in_executor() to sync_to_async(): 5dc7314 Reverting this commit gets the reproducer back to 1 connection post-run. Investigation continues. |
Ooof. |
|
@dralley no worries - Something Magic was happening in run_in_executor(), that we are (apparently) losing with sync_to_async(). Calling django.db.connection.close() at the End Of Things gets us 1 cnx back; I am unclear on where the others are hiding. Investigation continues. |
|
How many stale connections are we looking at? Prior to this patch we used a threadpool of size 2, afterwards we use the default-sized threadpool, which is something like 2x $num_cores. I think there should be 1 database connection per thread. 5dc7314#diff-453fdf3a6e81f92ff5391cf65f9c6154852133a656d82082021e3beaa887776fL50 |
|
A good place to start might also be to just throw |
One per-RPM-requested. See 2062526#c21, that's where we started trying to debug this issue. (apologies for this and a number of the ensuing comments being private, they have non-public info in them) |
That's the default, so we shouldn't have to make it explicit. |
And just to be explicit - when I run on core/3.14 , I end the reproducer-run above with one database connection open - the one I'm using to query the number-of-open-connections with. On 3.16+, it's one-per-rpm. |
|
I am not able to reproduce on Fedora 36. Here is what I did: Then I ran the script above and it ended with: And I can see that the artifacts got created: I have following packages in my virtualenv: I am running |
|
On pulpcore/3.16, on pulp3-source-f36, with postgresql-14.1-3.fc36.x86_64 , I still end up with 36 open connections. Diff between the versions in the prev comment and the output of |
|
Running pulp3-source-fedora36, and pulpcore/main, I was not able to reproduce the problem. The difference between f36/core-3.16 and f36/main is the following: Investigation continues. |
|
And i thought Django made |
|
Independently confirmed - cherry-picking the commits from the above PRs into core/3.16 fixes the problem. |
Version
Describe the bug
A client asking for content from an "on_demand" remote results in a database connection remaining open for each piece of RemoteArtifact content that was streamed back to the client. This can run a pulp instance out of database connections, if (for example) a downstream Pulp attempts to sync 'immediate' from an on_demand upstream.
The idle connections all show an identical last-query, where they asked for the remote of the not-yet-streamed content. Here's an example:
If multiple clients ask for the same content at the same time, you can end up with more idle connections than content-units, due to the occasional concurrency-collisions.
To Reproduce
The following bash script will result in one connection to the database for each piece of content in the /rpm-signed fixture:
Notes
Expected behavior
Database connection must be closed as soon as the streamed Artifact has been successfully saved.
Additional context
See the discussion starting at https://bugzilla.redhat.com/show_bug.cgi?id=2062526#c19 for the investigation that leads us here. The BZ says "deadlock", this problem interferes with verifying that fix, and will have its own BZ "soon". Some of the discussion on '2526 contains machines and access-info, and is alas private. I have made as much of it public as I reasonably can.
The text was updated successfully, but these errors were encountered: