New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance vs. Ruby #83
Comments
First of all, did you truncate the table before running the benchmark? When running benchmarks, ideally you want the environment to be as similar as possible. Note that you don't actually use a socket with this database URL. With With that done, the conditions should be similar for both versions. Try comparing the benchmark result now! But we're not done yet, the Rust version can still be improved! 😄 Instead of formatting the query string each iteration, make it constant! Replace Finally, you can use a prepared statement to avoid sending the query text each iteration. // outside the loop:
let stmt = conn.prepare("...");
// inside the loop:
stmt.execute(&[&user_id, &web_login, &cycle]).unwrap(); |
It looks like not reusing the same prepared statement is the main culprit. On my machine, your sample runs in ~35 seconds, but this runs in ~22, which matches up with the times you're seeing in Ruby: extern crate postgres;
use std::rand;
use postgres::{Connection, SslMode};
fn main() {
println!("Hello, world!");
let conn = Connection::connect("postgres://postgres@localhost", &SslMode::None).unwrap();
let cycles = 101i32;
let users = 1001i32;
for cycle in range(1, cycles) {
let query = format!("insert into user_daily_statistics (user_id, created_on, web_login) values ($1, now() + '{} days'::interval, $2)", cycle);
let stmt = conn.prepare(query.as_slice()).unwrap();
for user_id in range(1, users) {
let web_login = ((rand::random::<uint>() % 1000u) + 1u) as i32;
stmt.execute(&[&user_id, &web_login]).unwrap();
}
println!("Done: {}", &cycle);
}
} My guess is that Ruby's Postgres driver has a built in statement cache, so it's transparently avoiding the extra calls to the database. r2d2-postgres has a connection wrapper that pools statements that you may find useful, though it's entirely undocumented at the moment :(. |
I'd guess the problem isn't really with not reusing prepared statements; it's in the fact that execute() does two round-trips instead of one. Ruby probably uses the libpq C library which knows how to do a query in one round-trip without prepared statements. I've measured easily ~40% improvements on simple queries in the past by halving the number of round-trips. |
Hmm, that could be true as well, yeah. It shouldn't be that hard to do the same for rust-postgres. The implementation right now is as naive as possible: |
Actually, on second thought, two round trips for |
Thank you for the quick answers. When I started learning Rust one week ago I’ve become amazed by the language. Now I am amazed by the community too. I think I’ve just found my next language. As for the details. I didn’t use prepare because I almost never use it for real, and I wanted to write the same code in Ruby and Rust. However, if you say that Ruby caches the statement, I think it’s fair to add it. I know that localhost is not using the socket, I tested both versions but posted only one. I took your advices, and now it runs much faster. I am glad. extern crate postgres;
use std::rand;
use postgres::{Connection, SslMode};
fn main() {
println!("Hello, world!");
let conn = Connection::connect("postgres://mage@%2Ftmp", &SslMode::None).unwrap();
let cycles = 101i32;
let users = 1001i32;
let query = "insert into user_daily_statistics (user_id, created_on, web_login) values ($1, now() + ($3::int * interval '1 day'), $2)";
let stmt = conn.prepare(query).unwrap();
for cycle in range(1, cycles) {
for user_id in range(1, users) {
let web_login = ((rand::random::<uint>() % 1000u) + 1u) as i32;
stmt.execute(&[&user_id, &web_login, &cycle]).unwrap();
}
println!("Done: {}", &cycle);
}
} I had to add ::int to the interval $3 part otherwise the compiler complained about a Float. The results are 10-11s with socket and 14 with localhost. I am happy. I will test the Ruby version with prepare too just to be fair. (Side note: the Go version quits after inserting 95 rows). |
Yeah, I am a bit conflicted about adding statement caching to the connection itself. On one hand, doing that seems pretty common (JDBC drivers do it almost universally for example) and makes straightforward stuff like this faster. On the other hand, it seems kind of weird for the database driver itself to be doing that. shrug |
That's a bit unfortunate. Prepared statements are a huge PITA, but running 40% slower all the time isn't necessarily fun either. It's been a while since I've read the source code; is that because of bytea? Or is there another reason it's required? |
I happen to be one of the maintainers of lib/pq. Do you have an error? |
This is the Ruby version with prepare: require 'pg'
conn = PG.connect(dbname: 'mage')
stmt = conn.prepare('hoi', "insert into user_daily_statistics (user_id, created_on, web_login) values ($1, now() + ($3::int * interval '1 day'), $2)")
1.upto(100) do |cycle|
1.upto(1000) do |user_id|
conn.exec_prepared('hoi', [user_id, rand(1000), cycle])
end
puts "Done: #{cycle}"
end It runs for 8.2-8.8s. Pretty close to the improved Rust + socket version. |
johto: The Go version always inserts 95 records which might be the limit of the max_connections (or my non-existent Go coding skills). package main
import (
_ "github.com/lib/pq"
"database/sql"
"log"
"fmt"
"math/rand"
)
func main() {
db, err := sql.Open("postgres", "user=mage dbname=mage sslmode=disable host=localhost")
if err != nil {
log.Fatal(err)
}
for cycle := 1; cycle < 101; cycle++ {
query := fmt.Sprintf("insert into user_daily_statistics (user_id, created_on, web_login) values ($1, now() + '%d days'::interval, $2)", cycle)
for user_id := 1; user_id < 1001; user_id++ {
_, err := db.Query(query, user_id, rand.Intn(1000))
if err != nil {
log.Fatal(err)
}
}
fmt.Println("Done: ", cycle)
}
} $ go build postgres.go With fresh src. |
Yeah, you're not closing the Rows object, which leaves the connection open. Use Exec() if you don't care about the resulting rows (or there are none, such as in this case), or QueryRow() if you want the results but you know there'll only be one. |
johto: thank you. This is the fixed version, using prepare and socket. 8.4s. package main
import (
_ "github.com/lib/pq"
"database/sql"
"log"
"fmt"
"math/rand"
)
func main() {
db, err := sql.Open("postgres", "user=mage dbname=mage sslmode=disable host=/tmp")
if err != nil {
log.Fatal(err)
}
stmt, err := db.Prepare("insert into user_daily_statistics (user_id, created_on, web_login) values ($1, now() + ($3::int * interval '1 day'), $2)")
if err != nil {
log.Fatal(err)
}
for cycle := 1; cycle < 101; cycle++ {
for user_id := 1; user_id < 1001; user_id++ {
_, err := stmt.Exec(user_id, rand.Intn(1000), cycle)
if err != nil {
log.Fatal(err)
}
}
fmt.Println("Done: ", cycle)
}
} I hope I can contribute more than micro-benchmarks soon. |
Whenever converting a Rust type to a Postgres type, the driver passes the expected Postgres type to the |
The issue at hand seems to be resolved, so I'm going to close this. I've filed #84 to figure what to do about a statement cache in |
Avoiding the statment close by using the unnamed statment improves the lower bound perf by ~1/3: test bench_execute ... bench: 425429 ns/iter (+/- 24970) test bench_naiive_execute ... bench: 607967 ns/iter (+/- 53434) cc #83
This is my very first Rust code ever, it might be completely wrong, however, I wonder why it is slower than Ruby.
The Ruby version runs for 16s when using socket. The Rust version runs for 30 when using socket.
The text was updated successfully, but these errors were encountered: