Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use gen_statem in Postgrex.SimpleConnection #643

Merged
merged 9 commits into from Feb 13, 2023

Conversation

whatyouhide
Copy link
Contributor

First pass, but seems to be quite promising. 🙃

@josevalim
Copy link
Member

LGTM too.

Comment on lines 367 to 374
# We have to do a hack here to carry the actual caller PID over to the handle_call/3
# callback, because gen_statem uses a proxy process to do calls with timeout != :infinity.
# This results in the caller PID not being the same as the PID in the "from" tuple,
# so things like Postgrex.Notifications cannot use that "from"'s PID to register
# notification handlers. This approach is paired with reconstructing the proper
# "from" tuple in the reply/2 function in this module.
{from_pid, from_ref} = from
callback_from = {from_pid, {from_ref, caller_pid}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, the second element of from must be opaque and I think we are going to be in trouble if we ask users to look into it. I think we should rather tell users of this library to not rely on from and instead explicitly pass the caller too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, I don't think the from should be passed on the caller here. Rather the .listen function in notifications should explicitly pass the pid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sent a similar PR to Oban which may also be affected by this: sorentwo/oban#843

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josevalim yes, the second element of from is opaque here outside of SimpleConnection. This is to be backwards compatible for folks using the SimpleConnection behaviour, so that they're unlikely to have to change their callbacks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the point. Your fix has handle_call looking at the second element of from, but there was no guarantee that would be of any shape whatsoever before this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you meant to write this:

Suggested change
# We have to do a hack here to carry the actual caller PID over to the handle_call/3
# callback, because gen_statem uses a proxy process to do calls with timeout != :infinity.
# This results in the caller PID not being the same as the PID in the "from" tuple,
# so things like Postgrex.Notifications cannot use that "from"'s PID to register
# notification handlers. This approach is paired with reconstructing the proper
# "from" tuple in the reply/2 function in this module.
{from_pid, from_ref} = from
callback_from = {from_pid, {from_ref, caller_pid}}
# We have to do a hack here to carry the actual caller PID over to the handle_call/3
# callback, because gen_statem uses a proxy process to do calls with timeout != :infinity.
# This results in the caller PID not being the same as the PID in the "from" tuple,
# so things like Postgrex.Notifications cannot use that "from"'s PID to register
# notification handlers. This approach is paired with reconstructing the proper
# "from" tuple in the reply/2 function in this module.
{from_pid, from_ref} = from
callback_from = {caller_pid, from}

That would be a bit more okayish (and backwards compatible).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah jeez, that's exactly what I wanted to do. I mixed things up 🙈 Thanks for catching that!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and fixed)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this change after erlang/otp#7081 ?

@whatyouhide
Copy link
Contributor Author

Aaaand all tests pass 🎉

@@ -146,7 +146,7 @@ defmodule Postgrex.Notifications do

* `:timeout` - Call timeout (default: `#{@timeout}`)
"""
@spec listen(GenServer.server(), String.t(), Keyword.t()) ::
@spec listen(:gen_statem.server_ref(), String.t(), Keyword.t()) ::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@type server :: :gen_statem.server_ref() at the top?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done ✅

sorentwo pushed a commit to sorentwo/oban that referenced this pull request Feb 11, 2023
@josevalim
Copy link
Member

@whatyouhide it seems we have some heisen-failures now in the test suite. :(

@josevalim
Copy link
Member

You can try elixir --erl "+T 9" -S mix test to see if you have better luck reproducing it locally.

@whatyouhide
Copy link
Contributor Author

@josevalim some race conditions in the tests themselves I think, not in the gen_statem code. I sprinkled some tracing in there, and it seems like the Heisenfailingtest is gone.

# Simulate a small delay, because it used to cause flaky test failures
# in the past.
# https://github.com/elixir-ecto/postgrex/actions/runs/4151824827/jobs/7182370012
Process.sleep(10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to sleep if we have the trace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think yes as a regression. I added a comment for it too 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am skeptical about relying on a sleep. Perhaps tracing SC.call is the wrong thing, especially if SC.call uses another process? Perhaps it is best to assert that the connection has received a message with "SELECT 123" (picking 123 so it is distinct enough?), this way we guarantee that request has come first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my tests tracing SC.call was enough. I triggered CI with just that and if it passes I think we're ok to merge that, otherwise yeah I can trace the query getting to the connection.

@whatyouhide
Copy link
Contributor Author

I rebased on #644 and remove the direct dependency on :connection.

@josevalim josevalim merged commit 83caf4a into elixir-ecto:master Feb 13, 2023
@josevalim
Copy link
Member

💚 💙 💜 💛 ❤️

@whatyouhide whatyouhide deleted the al/gen_statem branch February 13, 2023 11:31
sorentwo pushed a commit to sorentwo/oban that referenced this pull request Feb 17, 2023
tom-con pushed a commit to tom-con/oban that referenced this pull request May 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants