Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: DISCARD ALL cannot run inside a transaction block when connection closed uncleanly #64

Closed
drdrsh opened this issue May 9, 2022 · 4 comments

Comments

@drdrsh
Copy link
Collaborator

drdrsh commented May 9, 2022

Describe the bug

I am working on a test bench to test some edge cases against pgcat. Opening a transaction and then closing the client connection causes pgcat to issue ROLLBACK; DISCARD ALL, however this behavior triggers ERROR: DISCARD ALL cannot run inside a transaction block error on the server. Might be something wrong with my setup or an issue with pgcat

To Reproduce
Put the following 3 files in the same directory and run docker compose up

docker-compose.yml

version: '3'
services:
  pg:
    image: postgres:12-bullseye
    environment:
      POSTGRES_HOST_AUTH_METHOD: trust
      POSTGRES_DB: main_db
      POSTGRES_USER: main_user

  pgcat:
    build: https://github.com/levkk/pgcat.git#main
    restart: always
    command: ["pgcat", "/etc/pgcat/pgcat.toml"]
    links:
      - pg
    volumes:
      - "./config.toml:/etc/pgcat/pgcat.toml"

  app:
    image: ruby
    links:
      - pgcat
    volumes:
      - "./entrypoint.rb:/app/entrypoint.rb"
    command: bash -c "gem install pg && ruby /app/entrypoint.rb"

entrypoint.rb

require 'pg'
$stdout.sync = true

def poorly_behaved_client
    conn = PG::connect("postgres://main_user:@pgcat:5432/main_db")
    conn.async_exec("BEGIN")
    conn.async_exec("SELECT 1")
    conn.close
rescue Exception => e
    puts "Encountered #{e}"
end

sleep 5
loop { poorly_behaved_client }

config.toml

[general]
host = "0.0.0.0"
port = 5432
pool_size = 5
pool_mode = "transaction"
connect_timeout = 5000
healthcheck_timeout = 1000000
ban_time = 1 # Seconds
[user]
name = "main_user"
password = ""
[shards]
[shards.0]
servers = [[ "pg", 5432, "primary" ]]
database = "main_db"
[query_router]
default_role = "primary"
query_parser_enabled = false
primary_reads_enabled = false
sharding_function = "pg_bigint_hash"

Expected behavior
No server errors.

@levkk
Copy link
Contributor

levkk commented May 9, 2022

@drdrsh Hey, tried to fix it in #65, hope it helps. I couldn't reproduce it locally, as is tradition with Postgres connection abnormalities :)

Worst case we can just close the connection instead of re-using it, but that takes us back to Pgbouncer connection thrashing which we don't want, so if you come up with the root cause and solution, it would be awesome.

@drdrsh
Copy link
Collaborator Author

drdrsh commented May 9, 2022

That fix worked. I am not able to reproduce this anymore.
I still do not understand how it is possible that the DISCARD ALL ends up running in a transaction.

@drdrsh drdrsh closed this as completed May 9, 2022
@drdrsh
Copy link
Collaborator Author

drdrsh commented Jun 3, 2022

@levkk I think I understand what happened here. Docs for PQexec state that

The command string can include multiple SQL commands (separated by semicolons). Multiple queries sent in a single PQexec call are processed in a single transaction, unless there are explicit BEGIN/COMMIT commands included in the query string to divide it into multiple transactions.

This behavior can be demonstrated in ruby by doing the following

require 'pg'
conn = PG::connect("some_url")
conn.async_exec("BEGIN")
conn.async_exec("ROLLBACK;DISCARD ALL")

This will fail with ERROR: DISCARD ALL cannot run inside a transaction block (PG::ActiveSqlTransaction)

@levkk
Copy link
Contributor

levkk commented Jun 3, 2022

Very nice find! Thank you.

levkk added a commit that referenced this issue Oct 25, 2023
…ation (#618)

* Initial commit

* Cleanup and add stats

* Use an arc instead of full clones to store the parse packets

* Use mutex instead

* fmt

* clippy

* fmt

* fix?

* fix?

* fmt

* typo

* Update docs

* Refactor custom protocol

* fmt

* move custom protocol handling to before parsing

* Support describe

* Add LRU for server side statement cache

* rename variable

* Refactoring

* Move docs

* Fix test

* fix

* Update tests

* trigger build

* Add more tests

* Reorder handling sync

* Support when a named describe is sent along with Parse (go pgx) and expecting results

* don't talk to client if not needed when client sends Parse

* fmt :(

* refactor tests

* nit

* Reduce hashing

* Reducing work done to decode describe and parse messages

* minor refactor

* Merge branch 'main' into zain/reimplment-prepared-statements-with-global-lru-cache

* Rewrite extended and prepared protocol message handling to better support mocking response packets and close

* An attempt to better handle if there are DDL changes that might break cached plans with ideas about how to further improve it

* fix

* Minor stats fixed and cleanup

* Cosmetic fixes (#64)

* Cosmetic fixes

* fix test

* Change server drop for statement cache error to a `deallocate all`

* Updated comments and added new idea for handling DDL changes impacting cached plans

* fix test?

* Revert test change

* trigger build, flakey test

* Avoid potential race conditions by changing get_or_insert to promote for pool LRU

* remove ps enabled variable on the server in favor of using an option

* Add close to the Extended Protocol buffer

---------

Co-authored-by: Lev Kokotov <levkk@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants