Skip to content

Conversation

Robert-Steiner
Copy link
Contributor

@Robert-Steiner Robert-Steiner commented Nov 4, 2020

Description

I recently saw that instead of creating one request::Client and reusing it for further requests, we always create a new for each single request. request::Client internally manages a connection pool with which we can reuse the underlying connection. This can significantly improve performance of read/write requests.

This PR revises the influx client in such a way that the influx client only creates a request::Client at the beginning and reuses it for later requests.

I did a small test on my machine and the results look promising:

New client for each request
10k write request
done in: 6.825374934s

Reuse Client
10k write request
done in: 903.934066ms

Here is the test script if you want to test as well

use chrono::{DateTime, Utc};
use futures::{
    stream::{FuturesUnordered, StreamExt},
};
use influxdb::InfluxDbWriteable;
use influxdb::{Client, Query};
use std::time::Instant;

#[tokio::main]
async fn main() {
    let client = Client::new("http://localhost:8086", "metrics");
    let query = format!("CREATE DATABASE {}", "metrics");
    let res = client.query(&Query::raw_read_query(query)).await;
    // println!("{:?}", res);

    #[derive(InfluxDbWriteable, Clone)]
    struct WeatherReading {
        time: DateTime<Utc>,
        humidity: i32,
        #[tag]
        wind_direction: String,
    }

    let mut pending = FuturesUnordered::new();

    for _ in 0..5000 {
        let weather_reading = WeatherReading {
            time: Utc::now(),
            humidity: 30,
            wind_direction: String::from("north"),
        };
        let client_task = client.clone();
        let future = async move {
            let res = client_task
                .query(&weather_reading.into_query("weather"))
                .await;
            if res.is_err() {
                println!("{:?}", res);
            }
        };
        pending.push(future)
    }

    let now = Instant::now();
    println!("start");

    loop {
        if pending.next().await.is_none() {
            break;
        }
    }

    let new_now = Instant::now();
    println!("done in {:?}", new_now.duration_since(now));
}

Note
I have restricted the visibility of the Client fields. It would be a breaking API change for the people who currently using the old fields.

Checklist

  • Formatted code using cargo fmt --all
  • Linted code using clippy cargo clippy --all-targets --all-features -- -D warnings
  • Updated README.md using cargo readme -r influxdb -t ../README.tpl > README.md
  • Reviewed the diff. Did you leave any print statements or unnecessary comments?
  • Any unfinished work that warrants a separate issue captured in an issue with a TODO code comment

fix

remove clone

clean up

fix
@Robert-Steiner
Copy link
Contributor Author

I updated the test script. It contained a small bug that caused the futures to be spawned in the loop.
However, this time I could only execute 5k requests concurrently without errors. More than that seems a little too much.

New client for each request
5k write request
done in: 3.492351122s

Reuse Client
5k write request
done in: 576.364192ms

I also wrapped the parameters in an Arc. It would give a slight performance improvement if the client is cloned a lot. I am not sure if this is often the case. Maybe it is better to remove it to keep the code simple.

@Empty2k12
Copy link
Collaborator

Thanks for this PR. Great addition and time improvement!

Maybe we can integrate your test script into the test suite to prevent performance regressions in the future? Although it also greatly depends on the InfluxDb performance. What do you think?

@Robert-Steiner
Copy link
Contributor Author

I think it is difficult to design the script in such a way that it outputs meaningful/reliable values. The performance depends on many factors such as: the host system (hardware), the network latency, the influxdb configuration and, as you already mentioned, influxdb itself. Therefore, the result of the script would not be suitable for comparing it against fixed values. But I agree, the script could give at least an indication for any performance regressions. However, I would have to adjust it a bit. It seems that sometimes the script generates too many requests too quickly, which leads to some connection errors. Maybe something like backpressure could help here. I think I will implement something tomorrow.

Do you prefer to have the test script in this or in a separate pr?

@Empty2k12
Copy link
Collaborator

Maybe we should then just add the script into the project which can be used manually to test performance.
I agree that results could be not meaningful, but I'd also hate the work be lost. Maybe it'll suffice if the script was added to the repo for manual execution?
Your call!

@Robert-Steiner
Copy link
Contributor Author

Alright, I have adjusted the script. I took some ideas from this blog post. I put the script into benches. I'm not sure if this is the right place. You can run it with cargo bench. Let me know what you think 🙂

@Empty2k12
Copy link
Collaborator

Awesome! Thanks a lot!

@Empty2k12 Empty2k12 merged commit 7ee95f4 into influxdb-rs:master Nov 7, 2020
@Empty2k12
Copy link
Collaborator

@Robert-Steiner

Your PR is now under 0.3.0 at crates.io: https://crates.io/crates/influxdb

Thanks again for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants