Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solve the prepared statement automatic-release difficulty #159

Closed
Abscissa opened this issue Feb 4, 2018 · 5 comments
Closed

Solve the prepared statement automatic-release difficulty #159

Abscissa opened this issue Feb 4, 2018 · 5 comments

Comments

@Abscissa
Copy link

Abscissa commented Feb 4, 2018

Note: This is not an especially high priority unless someone discovers a real need to release prepared statements from the server. Since the server treats prepared statements as being specific to a connection (and connections eventually end), this seems unlikely.


Suppose we want to use automatic resource management (ie, a scoped/RAII/refcounted/etc. wrapper) for Prepared to automatically release the statement from a connection when it's done being used (again, AFAIK, this shouldn't be necessary, but suppose we want to do it). This poses a problem:

What if the prepared statement becomes ready for release while there is already data pending on the connection? That is, while a result set input range has not yet been completely read. (This has been demonstrated as a realistic possibility, by @schveiguy.)

Commands cannot be sent to the server while data is pending for retreival. The usual solution mysql-native takes (as of v1.1.4) is to auto-purge any pending data when a command needs to be issued. For the most part, this has worked out well, because:

  1. Issuing a command is explicit in user code, and
  2. Attempting to read from a resultset which has been purged results in an exception, not a surprise end-of-data.

But the problem is, if we use automatic resource management to release a prepared statement, then issuing the command "release this prepared statement" becomes implicit and hidden. In situations such as the demonstration mentioned above, this hidden auto-purge would cause surprising problems that can be difficult or messy to deal with.

In an attempt to avoid this problem, ever since v1.1.4, mysql-native has implemented a "queued for release" system. When mysql-native is told to release a prepared statement, it doesn't release it immediately. Instead, it saves it in a queue (but mysql-native still treats it as having been released, from the user code's perspecive). Then, mysql-native waits until the next time an explicit command is issued. At that point, any pending data is auto-purged and any prepared statements queued for release are finally released for real.

Unfortunately, this create a new problem when the cleanup happens to be triggered during a GC cycle. We don't know ahead of time how many prepared statements might need to be released. So queuing one for release involves appending an item (ie which statement to release) to an array. This may trigger an allocation, but...

Allocations cannot be performed during garbage collection (it will throw an InvalidMemoryOperationError). If the cleanup (and thus, queueing it for release) happens to be triggered during a GC cycle, boom: InvalidMemoryOperationError.

So this cycle needs to be broken. I'm hesistant to hardcode the size of the "to be released" queue, because for anything short of soaking up several gigabytes (ie, not realistic) it still leaves the difficult question of: What should happen when the hardcoded limit's exceeded? I'm not sure I like any of the possible ways to answer that question.


I was just going to post this as a "to be mulled over", but in the process of writing this I think I may have actually come up with a solution:

As long as we never queue duplicates, nor statements that haven't been registered on the given connection, then we're quaranteed the release queue will never exceed the number of statements actually registered. So if the queue is resized upon registration of prepared statements, then it should never need to allocate when queueing for release. Thus, marking it @nogc should be possible, thereby enabling safe, reliable automatic cleanup. Also, switching the array to an AA should prevent O(n) complexity when checking the queue for duplicates.(no good: "indexing an associative array...may cause GC allocation")

Hooray for pausing to take a step back.

Have I missed anything?

@Abscissa Abscissa changed the title Solve the prepared statement refcounted-release paradox Solve the prepared statement automatic-release paradox Feb 4, 2018
@Abscissa Abscissa changed the title Solve the prepared statement automatic-release paradox Solve the prepared statement automatic-release problem Feb 4, 2018
@Abscissa
Copy link
Author

Abscissa commented Feb 4, 2018

Ok, so this is unfortunately stalled again for now.

I think I could fix this, as I indicated above, by using a plain array, rejiggering things so allocations only occur upon registration or actual release, keeping duplicates out, and doing a bunch of string comparisons. But in terms of big-O, I don't like the algorithmic complexity that involves. Especially for something that seems to have questionable merit anyway.

As above, associative arrays are out, because apperently, merely indexing them can somehow cause GC allocation (from automatic re-hashing, maybe?).

So before I can move forward with this I'll need one of two things:

A. The right data structure. It must have good time complexity for this use-case and never allocate on lookup.

or better yet:

B. Compelling evidence that releasing prepared statements within a connection's lifetime actually does have practical benefit (and that closing and re-opening the connection isn't good enough).

@schveiguy
Copy link
Collaborator

Hm... you don't need the GC to store an array of integers (which is really all this is). Why not C malloc/realloc/free?

In any case, there is a separate problem besides the nogc problem, the GC can be running in any thread, not just the one that allocated the connection. So you potentially could be running into race conditions.

The way phobos solves this is... Don't put RC structs in heap objects :) Yeah, it's that dumb. But without help from the GC (setting thread affinity for a heap object for example), there's no good way to solve this without resorting to defensive locking for everything.

My recommendation: just use a C array, and avoid the GC altogether, and warn people about the dangers of putting the RC object in a GC destructed heap object.

indexing an associative array...may cause GC allocation

It's not the way it currently works, only adding or removing elements may use the GC. I suppose that's to allow for future implementation details. But in any case, using an AA I think is overkill. I can't imagine cases where you are going to have more than a few stale prepared statements.

@Abscissa
Copy link
Author

Abscissa commented Feb 5, 2018

the GC can be running in any thread, not just the one that allocated the connection. So you potentially could be running into race conditions.

The way phobos solves this is... Don't put RC structs in heap objects :)

Ah, ok, I see. I guess in some ways that does simplify the problem, then.

But in any case, using an AA I think is overkill. I can't imagine cases where you are going to have more than a few stale prepared statements.

True, but generally speaking, I don't like leaving open the possibility of pathological cases when I can avoid it. They just increase the number of dank, dark corners in a design.

But, it may not matter anyway: With the Prepared redesign in #157, I already need to use an AA anyway to lookup per-connection prepared statement info (such as the statement ID handle). So I just tossed another field into the payload to indicate "queued for release?", and ripped out all the old statement queue stuff. Based on everything you've said above, it sounds like that (along with the "don't put RC structs in heap objects" warning) should be suficcient.

@schveiguy
Copy link
Collaborator

So I just tossed another field into the payload to indicate "queued for release?"

Ooh, nice idea. You can store a pointer to the actual bool in the prepared statement object, then you don't need to bother with the AA from the Prepared side, and it becomes almost thread-safe (you could be truly thread-safe with atomics, but the worst I can see happening is that the connection skips releasing the first time and gets it the next time). Still not safe to call within the GC (it's possible the connection is already collected, and the bool is part of some other allocated memory).

@Abscissa Abscissa changed the title Solve the prepared statement automatic-release problem Solve the prepared statement automatic-release difficulty Feb 19, 2018
@schveiguy
Copy link
Collaborator

I'm pretty sure this is a wontfix. The current system I have been using for a while, and it works awesomely. I don't think there's a good reason to proactively release prepared statements, but even if there is, it's possible to do it synchronously without RAII help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants