Pool::acquire times out #622

skuzins · 2020-08-10T13:49:21Z

I'm using a connection pool with a larger number of tokio tasks, see the example below. The tasks run a small query, do some work without holding on to a connection, then run another query. In the example, the second call to acquire quickly starts failing with PoolTimeout or it takes too long, roughly connect_timeout seconds.

Any ideas would be appreciated.

use sqlx::mysql::*;
async fn run_queries(i: u32, pool: &sqlx::Pool<MySql>) {
    {
        let mut conn = pool.acquire().await.unwrap();

        let _ = sqlx::query("SELECT 1").fetch_one(&mut conn).await.unwrap();

        // conn gets dropped here and should be returned to the pool
    }

    // (do some other work here without holding on to a connection)

    {
        let start = std::time::Instant::now();

        // this often fails (PoolTimedOut) or takes connect_timeout seconds
        let mut _conn = pool.acquire().await.unwrap();

        println!("{} pool.acquire() took {:?}", i, std::time::Instant::now() - start);

        // (run another query...)

    }
}

#[tokio::main]
async fn main() {
    let pool = MySqlPoolOptions::new()
        .max_connections(1) // also fails with higher counts, e.g. 5
        .connect(&std::env::var("DATABASE_URL").unwrap())
        .await
        .unwrap();

    let mut handles = vec![];
    for i in 0..100 {
        let pool = pool.clone();

        handles.push(tokio::spawn(async move {
            run_queries(i, &pool).await;
        }));
    }

    futures::future::join_all(handles).await;
}

[dependencies]
sqlx = { version = "0.4.0-beta.1", default-features = false, features = [ "runtime-tokio", "mysql" ] }
tokio = "0.2.22"
futures = "0.3.5"

Output snippets:

thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: PoolTimedOut'
pool.acquire() took 30.001958452s

MySQL Engine version 8.0.17. Tested with stable-x86_64-unknown-linux-gnu and stable-x86_64-pc-windows-msvc.

The text was updated successfully, but these errors were encountered:

cetra3 · 2020-08-12T07:20:57Z

I've ran into the same issue. I am not seeing connections recycled when they are dropped, which is causing issues when multiple queries happen on one web request/future.

I have worked around it by using deadpool for connection pooling, which appears to be more reliable in my limited tests.

Here's my (very simple) implementation. (also note: I needed to use DerefMut to coax the compiler, as I am not sure what combination of * and & is needed here.)

use std::ops::DerefMut;
use sqlx::{Connect, Connection, Error as SqlxError, Executor, PgConnection};

use async_trait::async_trait;
use deadpool::managed::{Manager, RecycleResult};
use log::*;
type Pool = deadpool::managed::Pool<PgConnection, SqlxError>;

struct DbPool {
    url: String,
}

impl DbPool {
    fn new(url: String, size: usize) -> Pool {
        Pool::new(DbPool { url }, size)
    }
}

#[async_trait]
impl Manager<PgConnection, SqlxError> for DbPool {
    async fn create(&self) -> Result<PgConnection, SqlxError> {
        PgConnection::connect(&self.url).await
    }
    async fn recycle(&self, obj: &mut PgConnection) -> RecycleResult<SqlxError> {
        Ok(obj.ping().await?)
    }
}



async fn main() {
    ...
    let pool = DbPool::new(url, 16);
    ...
    sqlx::query_as("select * from users where email = $1")
      .bind(&email)
      .fetch_one(pool.get().await?.deref_mut())
      .await?;
     ... 
}

skuzins · 2020-08-12T18:03:55Z

Thanks for sharing your workaround! deadpool looks great.

abonander · 2020-08-12T22:00:05Z

Currently the pool doesn't behave well if the acquire() future is cancelled; it fails to remove the waiter from the list, which holds up the queue. This could have a knock-on effect with other acquire() calls timing out due to the same problem. Each time a connection is returned to the pool it only attempts to wake the task at the head of the queue. If there's more dead tasks than connections in the queue then the pool will stall.

@skuzins in your example, could you please try debug-printing the pool after the second .acquire() call times out, instead of panicking:

if let Err(e) = pool.acquire().await {
    println!("acquire error: {}, pool state: {:?}", e, pool);
}

skuzins · 2020-08-13T01:37:14Z

I'm not sure any future would/should normally be cancelled in this example - I believe they should all just run to completion.

Here's the debug message, please let me know if you need more information.

acquire error: pool timed out while waiting for an open connection, pool state: 
Pool { size: 1, num_idle: 0, is_closed: false, options: PoolOptions { 
max_connections: 1, min_connections: 0, connect_timeout: 30s, 
max_lifetime: Some(1800s), idle_timeout: Some(600s), test_before_acquire: true } }

abonander · 2020-08-13T01:47:32Z

The timeout itself cancels the internal future that's actually waiting in the queue.

However, I'd expect to see num_idle: 1 if that was the case. How about with .max_connections(5)?

cetra3 · 2020-08-13T01:51:16Z

I am not seeing cancellations in my code and the example posted doesn't have them either. It feels more like an issue with waking up after the pool has stopped using a connection.

skuzins · 2020-08-13T02:02:08Z

@abonander The thing is, there shouldn't be any timeouts. The task acquires the connection, performs a short query, drops the connection and does the same thing once more. The whole program should finish within milliseconds without any timeouts.

Here's the output with .max_connections(5):

acquire error: pool timed out while waiting for an open connection, pool state: Pool 
{ size: 5, num_idle: 0, is_closed: false, options: PoolOptions { 
max_connections: 5, min_connections: 0, connect_timeout: 30s, 
max_lifetime: Some(1800s), idle_timeout: Some(600s), test_before_acquire: true } }

fixes #622 Signed-off-by: Austin Bonander <austin@launchbadge.com>

fixes #622

jgrund · 2020-08-18T18:36:45Z

FWIW, I think I'm hitting this (or something like it) when using try_acquire with Postgres.

I have a loop which waits 5 seconds and calls try_acquire to see if any connections are available. The loop also does some work with the pool in a spawned task.

Once the 5 seconds elapse the pool shows all connections idle, but try_acquire returns None.

We're hitting an issue that looks similar to launchbadge/sqlx#622 I was able to work around it by disabling fairness, but we can just avoid the lookup entirely if active workers is >= the pool size. Signed-off-by: Joe Grund <jgrund@whamcloud.io>

fixes #622

bbqsrc · 2021-05-05T21:21:16Z

I'm not certain this bug has been fixed, as I've been required to use the deadpool workaround, which has entirely solved the problems we've been having. PgPool seems to, in seemingly simple cases, break even under a very basic load.

abonander · 2021-05-05T21:58:25Z

Yeah I think adding this periodic wake was a mistake: https://github.com/launchbadge/sqlx/blob/master/sqlx-core/src/pool/inner.rs#L180

After a single wait period it effectively makes the pool unfair again as it becomes dependent on the timing of when tasks wake and poll the future.

Something I realized in 0.6 that I want to backport is that if a task is "woken" (as in its woken flag is set) but it doesn't try to acquire, on drop of the Waiter it needs to wake the next task in the list so that it can attempt to acquire. I think without that it leads to a knock-on effect where there's a connection in the idle queue and tasks waiting but none are getting woken because the first task was woken but didn't poll (either because it timed out or was cancelled).

The idle reaper might also be unintentionally preempting waiting tasks, I need to look into that.

abonander · 2021-05-06T00:02:15Z

Just spotted another bug, too; since waking a waiting task involves popping it from the SegQueue, if the task wakes and doesn't get a connection, it'll just... go back to sleep without putting itself back in the queue.

Using SegQueue was probably a bad idea anyway since waiting tasks can't retain their spot in line and have to get returned to the back of the queue. In 0.6 we're switching to an intrusive linked list where waiting tasks aren't removed until acquire() completes or is cancelled, so that should avoid this issue.

The idle reaper is also definitely preempting some waiting tasks since it directly pops connections from the idle queue without checking if any tasks are waiting.

* a task that is marked woken but didn't actually wake will instead wake the next task in the queue * a task that wakes but doesn't get a connection will put itself back in the queue instead of waiting until it times out with no way to be woken * the idle reaper won't run if there are tasks waiting for a connection, and also uses the proper `SharedPool::release()` to return validated connections to the pool so waiting tasks get woken closes #622, #1210 (hopefully for good this time) Signed-off-by: Austin Bonander <austin@launchbadge.com>

* a task that is marked woken but didn't actually wake before being cancelled will instead wake the next task in the queue * a task that wakes but doesn't get a connection will put itself back in the queue instead of waiting until it times out with no way to be woken * the idle reaper now won't run if there are tasks waiting for a connection, and also uses the proper `SharedPool::release()` to return validated connections to the pool so waiting tasks get woken closes #622, #1210 (hopefully for good this time) Signed-off-by: Austin Bonander <austin@launchbadge.com>

slhmy · 2021-05-06T09:17:36Z

I met similiar problem, Using MySQLPool in Actix-web 4.0-beta server.

Codes are mentioned in #1199 (comment)

I make async queries by using the pool directly in each small queries, and request queries in 512 connections concurrently.
Then I met PoolTimedOut error. Shouldn't block in pool connections since every query will ran successfully.

abonander · 2021-05-06T09:25:15Z

@slhmy if you would, please try #1211 to see if that fixes the issue.

slhmy · 2021-05-06T10:25:36Z

@slhmy if you would, please try #1211 to see if that fixes the issue.

😍fixed, I checked Tokio runtime.

abonander · 2021-05-06T10:36:58Z

That's awesome to hear, thanks!

slckl · 2021-05-11T10:28:53Z

I'm also running into this issue with a somewhat silly usecase of hundreds-to-low-thousands tokio tasks running small write queries against a PgPool. It works for a couple of minutes, and then buckles with a PoolTimedOut.
#1211 unfortunately, does not seem to have any effect in my case.
will try the deadpool version suggested above.

Edit: deadpool version works just fine..

altanozlu · 2021-05-12T14:02:08Z

#1211 fixed my issue thanks @abonander

* a task that is marked woken but didn't actually wake before being cancelled will instead wake the next task in the queue * a task that wakes but doesn't get a connection will put itself back in the queue instead of waiting until it times out with no way to be woken * the idle reaper now won't run if there are tasks waiting for a connection, and also uses the proper `SharedPool::release()` to return validated connections to the pool so waiting tasks get woken closes #622, #1210 (hopefully for good this time) Signed-off-by: Austin Bonander <austin@launchbadge.com>

Our external cloud database doesn't seem to be able to make do with just one. It does seem that there is a sqlx bug making this worse: launchbadge/sqlx#622 But not a lot of options for now, and a large connection pool is good anyway.

fjoanis-legion · 2022-05-20T19:47:47Z

I have a minimal repro of the timeout issue in async. This might be a wrongful usage, please tell me if that's the case!

Cargo.toml

[package]
name = "repro_sqlx_timeout"
version = "0.1.0"
edition = "2021"

[dependencies]
sqlx = { version = "0.5", features = [
    "sqlite",
    "any",
    "runtime-tokio-native-tls",
] }

tokio = { version = "1", features = ["full"] }
futures = "0.3"

[dev-dependencies]
tempfile = "3.2.0"

main.rs

fn main() {}

#[cfg(test)]
mod tests {
    use futures::future::join_all;
    use sqlx::{migrate::MigrateDatabase, Executor};
    use std::{thread, time::Duration};

    #[tokio::test]
    async fn sqlx_timeout() {
        let dir = tempfile::tempdir().unwrap();
        let path = dir.path();
        let db_path = path.join("database.db3");
        let database_uri = format!("sqlite://{}", db_path.to_str().unwrap().replace('\\', "/"));

        sqlx::Any::create_database(&database_uri).await.unwrap();

        let connection = sqlx::any::AnyPoolOptions::new()
            .max_connections(1)
            .connect_timeout(Duration::from_millis(300))
            .connect(&database_uri)
            .await
            .unwrap();

        connection
            .execute("CREATE TABLE some_table(some_number BIGINT);")
            .await
            .unwrap();

        let mut vec_future = Vec::new();

        for _ in 1..3 {
            vec_future.push(async {
                connection
                    .execute(sqlx::query("SELECT * FROM some_table"))
                    .await
                    .unwrap(); // thread 'tests::sqlx_timeout' panicked at 'called `Result::unwrap()` on an `Err` value: PoolTimedOut'
                thread::sleep(Duration::from_secs(1));
            });
        }

        let _results = join_all(vec_future).await;
    }
}

abonander · 2022-06-16T23:07:47Z

@fjoanis-legion That thread::sleep() blocks the Tokio runtime and doesn't let the connection cleanup tasks execute. You want tokio::time::sleep(Duration::from_secs(1)).await; instead.

abonander · 2022-06-16T23:08:24Z

Locking this issue to prevent further necroing. If anyone is encountering sporadic timeouts on SQLx 0.6.0, please open a new issue.

abonander added a commit that referenced this issue Aug 13, 2020

fix(pool): ignore spurious wakeups when waiting for a connection

fdfafa8

fixes #622 Signed-off-by: Austin Bonander <austin@launchbadge.com>

abonander mentioned this issue Aug 13, 2020

fix(pool): ignore spurious wakeups when waiting for a connection #629

Merged

abonander added a commit that referenced this issue Aug 13, 2020

fix(pool): ignore spurious wakeups when waiting for a connection

770ff81

fixes #622 Signed-off-by: Austin Bonander <austin@launchbadge.com>

abonander added a commit that referenced this issue Aug 13, 2020

fix(pool): ignore spurious wakeups when waiting for a connection

72d630c

fixes #622 Signed-off-by: Austin Bonander <austin@launchbadge.com>

abonander added a commit that referenced this issue Aug 16, 2020

fix(pool): ignore spurious wakeups when waiting for a connection

8caac32

fixes #622 Signed-off-by: Austin Bonander <austin@launchbadge.com>

abonander added a commit that referenced this issue Aug 16, 2020

fix(pool): ignore spurious wakeups when waiting for a connection

60fabab

fixes #622

jgrund mentioned this issue Aug 18, 2020

Don't use try_acquire alongside spawned task whamcloud/integrated-manager-for-lustre#2165

Merged

mehcode closed this as completed in #629 Oct 13, 2020

mehcode pushed a commit that referenced this issue Oct 13, 2020

fix(pool): ignore spurious wakeups when waiting for a connection

fa7981f

fixes #622

abonander mentioned this issue May 5, 2021

sqlx 0.5.2 with a pool size of 1 fails to respond to simultaneous queries #1210

Closed

abonander reopened this May 6, 2021

abonander mentioned this issue May 6, 2021

more pool fixes #1211

Merged

abonander closed this as completed in #1211 May 18, 2021

glommer mentioned this issue Jan 31, 2022

allow more connections when connecting to the database chiselstrike/chiselstrike#756

Merged

danenbm mentioned this issue May 25, 2022

Refactors the program parser trait and clean up some functions, jarry-xiao/candyland#69

Merged

launchbadge locked as resolved and limited conversation to collaborators Jun 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pool::acquire times out #622

Pool::acquire times out #622

skuzins commented Aug 10, 2020

cetra3 commented Aug 12, 2020

skuzins commented Aug 12, 2020

abonander commented Aug 12, 2020 •

edited

skuzins commented Aug 13, 2020

abonander commented Aug 13, 2020

cetra3 commented Aug 13, 2020

skuzins commented Aug 13, 2020

jgrund commented Aug 18, 2020

bbqsrc commented May 5, 2021

abonander commented May 5, 2021

abonander commented May 6, 2021

slhmy commented May 6, 2021

abonander commented May 6, 2021

slhmy commented May 6, 2021

abonander commented May 6, 2021

slckl commented May 11, 2021 •

edited

altanozlu commented May 12, 2021

fjoanis-legion commented May 20, 2022 •

edited

abonander commented Jun 16, 2022

abonander commented Jun 16, 2022

Pool::acquire times out #622

Pool::acquire times out #622

Comments

skuzins commented Aug 10, 2020

cetra3 commented Aug 12, 2020

skuzins commented Aug 12, 2020

abonander commented Aug 12, 2020 • edited

skuzins commented Aug 13, 2020

abonander commented Aug 13, 2020

cetra3 commented Aug 13, 2020

skuzins commented Aug 13, 2020

jgrund commented Aug 18, 2020

bbqsrc commented May 5, 2021

abonander commented May 5, 2021

abonander commented May 6, 2021

slhmy commented May 6, 2021

abonander commented May 6, 2021

slhmy commented May 6, 2021

abonander commented May 6, 2021

slckl commented May 11, 2021 • edited

altanozlu commented May 12, 2021

fjoanis-legion commented May 20, 2022 • edited

abonander commented Jun 16, 2022

abonander commented Jun 16, 2022

abonander commented Aug 12, 2020 •

edited

slckl commented May 11, 2021 •

edited

fjoanis-legion commented May 20, 2022 •

edited