-
Notifications
You must be signed in to change notification settings - Fork 231
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
database | user | cl_active | cl_waiting | cl_cancel_req | sv_active | sv_idle | sv_used | sv_tested | sv_login | maxwait | maxwait_us | pool_mode
-------------------+-----------+-----------+------------+---------------+-----------+---------+---------+-----------+----------+---------+------------+-------------
shard_0_primary | main_user | 0 | 57466 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | transaction
shard_0_replica_0 | main_user | 6 | 8 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | transaction
shard_0_replica_1 | main_user | 1 | 57258 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | transaction
(3 rows)
To Reproduce
Create a pool with one primary and one replica then kill the replica
docker-compose.yml
version: '3'
services:
pgp:
image: postgres:12-bullseye
environment:
POSTGRES_HOST_AUTH_METHOD: trust
POSTGRES_DB: main_db
POSTGRES_USER: main_user
pgr0:
image: postgres:12-bullseye
environment:
POSTGRES_HOST_AUTH_METHOD: trust
POSTGRES_DB: main_db
POSTGRES_USER: main_user
pgcat:
build: https://github.com/levkk/pgcat.git#main
restart: always
command: ["pgcat", "/etc/pgcat/pgcat.toml"]
environment:
- RUST_LOG=debug
ports:
- 6000:5432
links:
- pgp
- pgr0
volumes:
- "./config.toml:/etc/pgcat/pgcat.toml"
app:
image: ruby
links:
- pgcat
volumes:
- "./entrypoint.rb:/app/entrypoint.rb"
command: bash -c "gem install pg && ruby /app/entrypoint.rb"
config.toml
[general]
host = "0.0.0.0"
port = 5432
pool_size = 2
pool_mode = "transaction"
connect_timeout = 5000
healthcheck_timeout = 500
ban_time = 1 # Seconds
statsd_address = ''
[user]
name = "main_user"
password = ""
[shards]
[shards.0]
servers = [
[ "pgp", 5432, "primary" ],
[ "pgr0", 5432, "replica" ]
]
database = "main_db"
[query_router]
default_role = "replica"
query_parser_enabled = false
primary_reads_enabled = true
sharding_function = "pg_bigint_hash"
entrypoint.rb
require 'pg'
$stdout.sync = true
def work
conn = PG::connect("postgres://main_user:@pgcat:5432/main_db")
conn.async_exec("SET SERVER ROLE TO 'replica'")
conn.async_exec("SELECT 1")
conn.close
rescue Exception => e
puts "Encountered #{e}"
end
sleep 5
loop { 10.times.map { Thread.new { work } }.map(&:join) }
After that you can run
docker compse stop pgp
and run
watch -n 1 'psql "postgres://main_user:@localhost:6000/pgbouncer" -c "SHOW POOLS;"'
to watch the waiting_clients explode
I think this is because we report the client as waiting early in the codepath and exit the codepath without reporting the client as disconnecting or active
https://github.com/levkk/pgcat/blob/main/src/pool.rs#L212
Expected behavior
We shouldn't count a client as waiting on an instance unless we actually consider it for the query.
levkk
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working