Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[postgres] handle errors #10

Closed
rmannibucau opened this issue May 1, 2024 · 14 comments · Fixed by #11
Closed

[postgres] handle errors #10

rmannibucau opened this issue May 1, 2024 · 14 comments · Fixed by #11
Assignees
Labels
bug Something isn't working

Comments

@rmannibucau
Copy link

Hi,

I tested the runner with postgres executor to proxy oracle database to use it in Grafana.
Globally it works well while there is no execution error, when it happens - in my case Grafana triggers a discovery query (to get postgres version during configuration and tables/columns during query typing) which fails on oracle since tables are different - then the connection hangs and it seems it is no more working at all.
Patching the postgres executor to use valid queries solves the issue.
Indeed a better fix is to ensure the error is sent back to postgres using the wire protocol properly - in my case it helps to replace the postgres version query by hardcoding it but it is a particular, while the failures are properly forwarded I should be able to use the project without any patching at a very acceptable additional configuration cost.

Hope it makes sense

@kendarorg
Copy link
Owner

Finally someone else's use case :D Using the " -v DEBUG " could give some more information. In the meantime, is there some exception thrown (hopefully)? Can you try with the "10-postgres-handle-errors" where i catch something earlier. If none of this work i'll setup an Oracle environment to simulate the error (PS what oracle version and what oracle driver version) Thanks :)

@kendarorg kendarorg self-assigned this May 2, 2024
@kendarorg kendarorg added the bug Something isn't working label May 2, 2024
@kendarorg kendarorg linked a pull request May 2, 2024 that will close this issue
@rmannibucau
Copy link
Author

@kendarorg I suspect it can be an issue with what grafana does also, for example it does a SELECT 1 or ; where SELECT 1 from dual should be used - or any "check" query.
There are also queries like these ones which can't work if the proxied database is not of the same type: https://github.com/grafana/grafana/blob/f900098cc9f5771c02b6189ba5138547b4f5e6c2/public/app/plugins/datasource/postgres/postgresMetaQuery.ts .

The error is generally something like "invalid query" - missing from for ; for example, invalid table name for the discovery query - the previous link.

What is weird is that the grafana postgres client does not see the error message the protocol runner sends and then it just hangs cause the connection "exchange" is considered as not completed whereas the error was sent.

The used driver is https://github.com/lib/pq - I didn't get much issues using a jdbc client so I suspect it can be something more precise.

@kendarorg
Copy link
Owner

kendarorg commented May 2, 2024

Seems like the behaviour of a not sent ReadyForQuery message is, so the client keep waiting till the timeout (thinking). At the beginning i thought that somehow not even the error was sent

@kendarorg
Copy link
Owner

kendarorg commented May 2, 2024

@rmannibucau Or -maybe- (dang :( ) the SQLSTATE "Code: the SQLSTATE code for the error (see Appendix A). Not localizable. Always present." but ignored by the JDBC driver if not present. Analyzing the pq driver seemed mandatory (correctly) https://github.com/lib/pq/blob/master/error.go With the last version on the branch should work... #11 (if you can give it a try)

@kendarorg
Copy link
Owner

@rmannibucau New release built, let's see

@rmannibucau
Copy link
Author

@kendarorg tested the new main and still the same behavior

testing running: docker run --name=grafana --network host grafana/grafana:9.3.2 and docker run --network host -e ORACLE_PASSWORD=password gvenzl/oracle-free and configuring -l 5432 -p postgres -v INFO -xw password -xl system -xc jdbc:oracle:thin:@localhost:1521/FREEPDB1. Then I create a datasource in grafana and do a query using explore view.

@kendarorg
Copy link
Owner

kendarorg commented May 2, 2024

trying now with the following compose

version: "3"
services:
  grafana:
    image: grafana/grafana:9.3.2
    container_name: grafana 
    restart: unless-stopped
    ports:
      - "3000:3000"
  oracle:
    image: gvenzl/oracle-free
    container_name: oracle 
    restart: unless-stopped
    environment:
      - ORACLE_PASSWORD=password
    ports:
      - "1521:1521"

@kendarorg
Copy link
Owner

I was able to test a connection from grafana to oracle..but of course it fails on creating the dashaboard

java -cp "ojdbc11.jar;protocol-runner.jar" \
    org.springframework.boot.loader.JarLauncher \
    -p postgres -l 5432 \
    -xl system -xw password -xc jdbc:oracle:thin:@192.168.1.96:1521/FREEPDB1 \
    -xd test -v DEBUG

And i notice that it requires the table names to fill the list of postgres table, plus, i think will follow the same thing for columns aggregation and aliases

select quote_ident(table_name) as "table" from information_schema.tables
    where quote_ident(table_schema) not in ('information_schema',
                             'pg_catalog',
                             '_timescaledb_cache',
....

Probably a good approach could be adding a kind of config file to intercept requests doing some kind of regexp search/replace. Or a way to add hooks to the runner

@rmannibucau
Copy link
Author

@kendarorg i think the fakeQueries could be configurable to help/map the queries but still, a failure shouldn't corrupt the connection, isnt it?

@kendarorg
Copy link
Owner

Should not create problems in theory :( I noticed now that a CancelRequest is issued, let me check how it is handled (even by the driver)

2024-05-02 12:56:29,799 [DEBUG] CID:2 [SERVER][TX]: ErrorResponse Tags:  [ProtoContext]
2024-05-02 12:56:29,800 [DEBUG] CID:2 [SERVER][TX]: ReadyForQuery Tags:  [ProtoContext]
2024-05-02 12:56:29,805 [INFO] CID: [SERVER] Accepted connection from /192.168.1.96:41132 [TcpServer] 
2024-05-02 12:56:29,868 [DEBUG] CID:3 [SERVER][RX]: CancelRequest Tags:  [ProtoContext] 

@kendarorg
Copy link
Owner

What evidence you have of the connection closing abruptly? Because every time i try to add a query on explore, it connects correctly (albeit failing for the missing metadata tables)

@kendarorg
Copy link
Owner

...and can you send me the queries you patched so i can put all attention on them :D

@rmannibucau
Copy link
Author

What evidence you have of the connection closing abruptly?

None obvious, just that I restarted the database manually - for "maintenance" purposes - and the keep alive in the client was long enough to still have a connection.

...and can you send me the queries you patched so i can put all attention on them :D

  • ; -> replaced by select 1 from dual (:warning: it is oracle specific)
  • SELECT current_setting('server_version_num')::int/100 as version -> select 15 from dual
  • if sql contains from information_schema.tables -> return an empty resultset (ignore discovery, didnt take time to impl this part)

the link I sent before on grafana sources has a few more statements but it was the ones breaking my use case.

@kendarorg
Copy link
Owner

kendarorg commented May 2, 2024

Ok, now i think i get it

  • Check the keepalive (actually is set internally to ...hem.. a lot) to avoid giving the impression to never close the connections
  • Add the "patch queries" file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants