Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single roundtrip mode for Queryer #209

Closed
johto opened this issue Dec 14, 2013 · 13 comments
Closed

Single roundtrip mode for Queryer #209

johto opened this issue Dec 14, 2013 · 13 comments

Comments

@johto
Copy link
Contributor

johto commented Dec 14, 2013

I saw some numbers which suggested that pq's Queryer interface could use some performance improvements when using parameters. I had a look at how much the numbers improve if we do a single roundtrip to the server with an unnamed prepared statement (crude PoC patch can be found here). As written, this would require effectively reverting 6235e1b, which would be a huge backwards compatibility breakage, but I think there's a way around that if we use the binary interface for strings and bytea; that would eliminate the need to look at the types of the bind parameters, as far as I can tell.

I got around 40% performance improvement out of this on the benchmark in the patch when running over TCP against localhost. With a remote server the numbers would probably be even bigger, so this might be worth pursuing.

@johto
Copy link
Contributor Author

johto commented Dec 14, 2013

Oh, the binary mode for strings and bytea is probably worth looking into even if we don't decide to go through with this. The binary/text flag can be set on a per-parameter basis in the Bind message.

@msakrejda
Copy link
Contributor

Nice, I think this is definitely worth looking into. A lot of drivers do this kind of thing since that extra round trip can be a killer for latency.

@johto
Copy link
Contributor Author

johto commented Dec 15, 2013

Indeed.

Oh, btw, I opened this as an issue instead of a pull request since I think this would be a good patch for someone who wanted something interesting to work on (and I won't be looking into this anymore before Christmas, at least). The first thing to research would be if the binary mode can truly be used for sending byte and string values in all cases, and then go from there.

@johto
Copy link
Contributor Author

johto commented Feb 2, 2014

I've been thinking about this recently, and it appears that there actually is a problem with using the binary interface. Currently you can pass data for any column using a "string" type. But if the data is sent in a binary format, it's not enough to send it in the binary format of a "text" value, we'd need to send it in the type's native format, which we obviously can't know unless we do a round-trip to the server first :-(

The best I can come up with is:

  • If the user passes a string, we always send it in the textual format. That way you can insert data into columns of arbitrary type by passing the input text format as a string. This would break applications which pass bytea values as "string".
  • If the user passes a []byte, we always send it in the binary format. This would break applications which pass e.g. uuid values in a byte slice containing hex-encoded data for the UUID value. However, it would save an extra encode step for bytea values, and would allow the user to send any data in the binary format, perhaps saving some cycles for other data types as well. Caveats include e.g. sending a 4-byte value for a parameter which PG deduces to be an int4, and the user expects it to be bytea; it would silently give the wrong answers under binary mode.

I think with these changes we could both avoid the extra round-trip and the (sometimes expensive) encode step, but users would still be able to work with arbitrary data. However, given the caveats and the fact that this could break some real applications, I would like to suggest adding a new pseudo parameter, allow_binary_format, which would enable this behaviour for applications designed from ground-up to work with these limitations in exchange for less round-trips and wasted CPU cycles.

Thoughts?

@johto
Copy link
Contributor Author

johto commented Oct 7, 2014

Since apparently nobody wants to pick this up, I'm going to start working on this next.

@johto johto mentioned this issue Mar 11, 2015
@cgilling
Copy link

Reading this over, if I understand this correctly, this will not only save a round trip, but will also play nicely with pgbouncer when in transaction mode, which is something that I am very interested in as well. I testing PQexecParams and that worked, and from you proof of concept implementation and looking at the libpq source, the PoC seems to be doing pretty much the same thing.

I do have one question though. If the goal is to do a single round trip, then it doesn't seem like using the binary protocol is necessary here. I'm wondering if I am missing something or if you just added that in as well for another optimization.

I'm not sure how far you've gotten on this, but if you haven't invested too much time, I could see how much time I have in the next couple of weeks to try and put something together.

@johto
Copy link
Contributor Author

johto commented Mar 15, 2015

Reading this over, if I understand this correctly, this will not only save a round trip, but will also play nicely with pgbouncer when in transaction mode, which is something that I am very interested in as well. I testing PQexecParams and that worked, and from you proof of concept implementation and looking at the libpq source, the PoC seems to be doing pretty much the same thing.

Oh. I wasn't aware that this would help with that. That's good, I guess.

I do have one question though. If the goal is to do a single round trip, then it doesn't seem like using the binary protocol is necessary here. I'm wondering if I am missing something or if you just added that in as well for another optimization.

The basic problem is deciding how to encode the values being sent to the database. If the user gives the driver a []byte, you can't just send it over if it contains binary data. One possible strategy would be to always encode byte slices as bytea data, but that seems really short-sighted, when we could kill two birds with the stone and support sending any data types over in binary.

I couldn't really measure any significant performance improvements from using the binary mode on bytea values (in the order of ~5%, at best), so every can keep their pants on. The only significant optimization is the round trip reduction.

I'm not sure how far you've gotten on this, but if you haven't invested too much time, I could see how much time I have in the next couple of weeks to try and put something together.

I am, actually, almost completely done. The code just needs some polishing and splitting it up into separate commits, but it passes all the tests and the design is what I wanted. If you want, I could have a look at it today and either fix it up or submit the WIP code so you can finish it.

@cgilling
Copy link

That's great to hear that you're almost done. I think this is going to be a great win to reduce the number of round trips necessary for getting a query done. I think if you have the time it makes the most sense for you to finish it up because you're very familiar with the code, but if you don't have time for the next couple of weeks I could try my hand at it. Either way I'd love to be able to take a look at it and also be able to test it with pgbouncer to prove to myself whether this is going to fix that issue as well.

@cgilling
Copy link

Any update on this one?

@johto
Copy link
Contributor Author

johto commented Mar 30, 2015

Any update on this one?

Procrastination is a hell of a drug.

I'll open a new PR for this later today.

@johto
Copy link
Contributor Author

johto commented Mar 30, 2015

I've published my work in #357, but I'm less convinced that this is the best way to solve this issue.

@cgilling
Copy link

Sweet, thanks. I'll take a look and move any further discussion over there.

@johto
Copy link
Contributor Author

johto commented Jul 16, 2015

Solved in #357.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants