Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pg:keepalive() should not invalidate the socket reference #46

Closed
wants to merge 6 commits into from

Conversation

ttfkam
Copy link
Contributor

@ttfkam ttfkam commented Dec 20, 2016

pg:keepalive() calls down to the OpenResty socket connection for
reuse in the connection pool, however the reference should still
be made available for subsequent calls to pg:connect.

pg:keepalive() calls down to the OpenResty socket connection for
reuse in the connection pool, however the reference should still
be made available for subsequent calls to pg:connect.
@thibaultcha
Copy link
Contributor

If accepted, maybe this could warrant a regression test to avoid it appearing again?

@ttfkam
Copy link
Contributor Author

ttfkam commented Dec 20, 2016

On the bright side, it turns out the failing tests I was seeing were due to my build environment. The rockspec file really should have more dependencies defined, e.g., cjson.

On the downside, there are no tests for keepalive I can see. This probably makes sense since the connection behavior is different for OpenResty environments than for straight Lua usage. Do you know the differences offhand? (I don't.)

@thibaultcha
Copy link
Contributor

LuaSocket does not implement any connection pooling mechanism, hence does not implement a setkeepalive() method.

2 approaches:

Implement a fallback for it in the pgmoon.socket module used for interoperability between cosockets/LuaSocket/cqueues. Such a fallback should probably close the socket.

You could also rely on the @sock_type attribute and only call setkeepalive() when supported by the underlying socket implementation.

@leafo
Copy link
Owner

leafo commented Dec 20, 2016

keepalive is symmetrical to disconnect, the socket reference is deleted to ensure it's not used. I think a better implementation would be to create the socket on the call to connect instead of new. If you feel this is incorrect then we should match the behavior in disconnect, and make sure that attempting to use the socket again returns an appropriate error in all socket implementations.

@ttfkam
Copy link
Contributor Author

ttfkam commented Dec 20, 2016

I was seeing the same thing in the code. When looking more closely at other cosocket projects like redis and memcache, the socket reference is not set to nil.

@thibaultcha
Copy link
Contributor

the socket reference is deleted to ensure it's not used

I feel like we should rely on the behavior of the underlying socket implementation instead. If setkeepalive/close are called but subsequent connect are successful, that's great (this should be the case with cosockets, as per other lua-resty libraries). If the subsequent connect fails, let the underlying socket implementation return that error instead of trying to prevent it and limit the behavior with other implementations that would support it.

Wouldn't that make sense?

@ttfkam
Copy link
Contributor Author

ttfkam commented Dec 20, 2016

This means that common code that is used both inside and outside of OpenResty would work sometimes and not others vs. having a bunch of

  if sock_type == "nginx"
    -- do something
  else
    -- do something else

in your code. That or a bunch of people rolling their own abstraction layer just for this.

It might be more prudent (thinking from the user perspective) to have keepalive() fallback to disconnect() since, as you said @leafo, they have algorithmic symmetry. From the user's perspective, they are interchangeable. The worst that happens is that new socket connections are created instead of being reused, yes?

Edit: no, the worst is that an error is returned and a new pg object must be created

@ttfkam
Copy link
Contributor Author

ttfkam commented Dec 20, 2016

Thank you both for your patience. I'm an old coder, but new to Lua/Moonscript. Still learning the idioms and tooling.

@ttfkam
Copy link
Contributor Author

ttfkam commented Dec 20, 2016

From the failed tests, I see that cosockets work as expected are not in the test suite. A disconnected socket fails to reconnect with luasocket, and cqueues fails when keepalive() is called at all. Is cqueues another variant of cosockets, because I would have thought it would have falsified the conditional

  if @sock_type == "nginx"

@ttfkam
Copy link
Contributor Author

ttfkam commented Dec 20, 2016

I removed the keepalive() test for now. It's only useful for the cosockets case, but the test suite doesn't actually test the cosockets case. Testing the luasocket and cqueues cases seemed odd since they are implementation-specific to another project and not relevant to the use case where keepalive() is actually relevant. Thoughts?

@thibaultcha
Copy link
Contributor

but the test suite doesn't actually test the cosockets case

Yes, hence my previous suggestion of implementing a fallback for LuaSocket. This would have allowed you to:

  1. Test that pgmoon's setkeepalive() calls the underlying socket's setkeepalive() (see busted stubs and spies)
  2. Test that the @sock attribute is not nil after calling it

A disconnected socket fails to reconnect with luasocket,

This is expected of LuaSocket, but shouldn't be the case with cosockets (to be tested). Typical lua-resty libraries try to stay low-level and not abstract too much away from their underlying cosocket(s).

cqueues fails when keepalive() is called at all

Not familiar with cqueuess which are quite new, but you'd have to implement another fallback or come up with a workaround if that is the case. I don't even know if they implement a connection pool, but if they don't, you can close the connection as well (like LuaSocket).

Just my 2c on how I would have done this.

@ttfkam
Copy link
Contributor Author

ttfkam commented Dec 21, 2016

The cqueues.socket API is markedly different from the others. I can either make them all act roughly the same but hide the some implementation-dependent error types or I can accept that each implementation of sockets will behave differently and give different errors depending on the environment pgmoon is run from.

In addition, cqueues.socket always succeeds (in this context) from pg:connect(...) due to hwo its API is handled. By moving the socket connection reference to the connect(...) method, the behavior can be made more uniform, again at the expense of hiding some implementation details/errors.

My preference would be following @leafo's suggestion of moving from :new to :connect and making the whole thing more uniform even at the risk of masking what would perhaps be some implementation-specific errors. Thoughts from the veterans since I'm just a newcomer here? I'm happy to follow your lead.

@thibaultcha
Copy link
Contributor

thibaultcha commented Dec 21, 2016

The problem with this approach (beyond inefficiency) is that you are preventing pgmoon users from the opportunity to sett a granular timeout. The settimeout() function recently added allows one to update the current timeout of the underlying socket at any time, including before a connection, hence why the socket creation was moved to new(). This is also the reason for such a paradigm to exist among all those lua-resty-* libraries.

Sure, you could go with new properties such as connect_timeout and read_write_timeout and manually call the socket's settimeout() method under the hood, but you are building an abstraction and limiting the granularity and the flexibility of the library. You can come up with a workaround, providing a connect_timeout attribute and allowing further settimeout() only when a socket exists, but you would be complicating the library's API and, even if slightly, its efficiency.

This is why all those lua-resty-* libraries are usable like simple cosockets, but provide additional methods once they are connected: they stay flexible and allow abstractions to be built on top of them, not within.

@thibaultcha
Copy link
Contributor

Regarding the timeout granularity thing, it is worth noting the pending PR at openresty/lua-nginx-module#480, that could allow pgmoon to be instantiated with those 3 timeouts attributes, as previously discussed. However, you still need an exposed settimeout() to eventually change the timeout for a given query or after a given scenario, so overall, adding timeout attributes on top of that still complicates the library, imho.

@ttfkam
Copy link
Contributor Author

ttfkam commented Dec 21, 2016

Sounds reasonable to me. So each socket implementation will react differently. With that in mind, keepalive fail for implementations that don't support it? Just return nil with an error for luasocket/cqueue? Seems a bit disingenuous to close the socket without failure message.

…mplementation doesn't support them.

Explicitly test that luasocket and cquque fail this test.
@ttfkam
Copy link
Contributor Author

ttfkam commented Jan 17, 2017

Ping

@ttfkam
Copy link
Contributor Author

ttfkam commented Jan 26, 2017

@thibaultcha? @leafo? Hello?

@ttfkam
Copy link
Contributor Author

ttfkam commented Feb 8, 2017

@thibaultcha, it's been a month now

@thibaultcha
Copy link
Contributor

With that in mind, keepalive fail for implementations that don't support it? Just return nil with an error for luasocket/cqueue? Seems a bit disingenuous to close the socket without failure message.

My 2c: the issue with returning nil + an error message is that now, libraries built on top of this module that want to support all underlying sockets must handle this case on their own; aka not calling setkeepalive() if host.sock_type ~= "cosocket". To me, it appears to be clear that calling setkeepalive() has the same intent as calling close() in such contexts, and that this behavior can safely be assumed.

@ttfkam
Copy link
Contributor Author

ttfkam commented Feb 8, 2017

That's true, but the underlying implementations do not allow for seamless swap out. It's not something a simple proxy can fix. For example, cqueues can use the same reference to connect() again after calling disconnect() while the others cannot. Also, the upstream C queues API can use keepalive while the Lua cqueues implementation does not surface it. And all that's before it gets to pgmoon.

The truth is that there is no common path, no safe assumption. Not only is keepalive optional but you can't even be uniform with the definition of db info and the moment of connection—it all varies. The user must know the context the library's used in, and that's leaving aside whether there's anyone out there actually planning on using pgmoon in more than one environment and making the same calls with the same code chunks.

@silverwind
Copy link

Bump! I've also been hit by this issue. Without this fix, it seems impossible to recycle a pgmoon instance for multiple connections (neither sockets disconnected via .disconnect or .keepalive seem to be reopenable without getting the error in #44) and creating a new connection on each request seems rather wasteful to me.

@yangm97
Copy link

yangm97 commented May 3, 2017

Bump. Just got bitten by this issue while writing a fairly simple openresty application.

@yangm97
Copy link

yangm97 commented May 4, 2017

Just tried your branch, instead of getting the error from #44 I now got “attempt to receive data on a closed socket”. Any idea?

@ttfkam
Copy link
Contributor Author

ttfkam commented May 4, 2017

Make sure you're only calling keepalive() or disconnect(), never both.

@ttfkam
Copy link
Contributor Author

ttfkam commented May 4, 2017

Alternatively @yangm97, you can install and use pgbouncer for connection pooling. It's what I use.

@yangm97
Copy link

yangm97 commented May 5, 2017

So… I got this even without calling keepalive(), disconnect() or settimeout(), leafo:master gives the error from the issue even without calling any of these too. It looks like this only happens when lua code cache is enabled (which explains why I haven’t noticed earlier).

The application I’m messing with is an open source fork from GroupButler hosted at https://gitlab.com/Synko/GroupButler/tree/2-webhook-support .

content_by_lua requires database.lua, which is a collection of functions made to abstract the SQL complexity, should have SQL injection protections, etc, while not repeating code all over.
Plugins are loaded by init_by_lua, and because of that looks like they can’t call pg functions, but that’s another issue…
All in all, the first requisition is served just fine, the second and the next ones face the mentioned problem.

content_by_lua

local api = require 'methods' -- Load Telegram API
local db = require 'database' -- Load database helper functions
local u = require 'utilities' -- Load miscellaneous and cross-plugin functions

…

-- Init postgres connection
pg = pgmoon.new({
	host = config.db_host,
	port = config.db_port,
	user = config.db_user,
	database = config.db_db,
	password =  config.db_pass
})
assert(pg:connect())

…

pg:keepalive() -- Or disconnect(), or nothing or…

database.lua

…
-- Generic functions
function database.setval(table, col, val, col2, val2)
	local res = pg:query('INSERT INTO '..table..' ('..col..', '..col2..') values ('..val..', '..val2..') ON CONFLICT ('..col..') DO UPDATE SET '..col2..' = ‘..val2)
end
… 

@yangm97
Copy link

yangm97 commented May 5, 2017

Turns out moving database connection to database.lua fixed the problems from both branches.

@yangm97
Copy link

yangm97 commented Jul 3, 2018

Will this ever be merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants