Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

context canceled Error when using cassandra database plugin #20169

Open
Albert-W opened this issue Apr 14, 2023 · 17 comments
Open

context canceled Error when using cassandra database plugin #20169

Albert-W opened this issue Apr 14, 2023 · 17 comments

Comments

@Albert-W
Copy link

Describe the bug
A clear and concise description of what the bug is.

[ERROR] UnexpectedError: <html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
</body>
</html>
, on post https://{baseurl}/v1/database/config/hydro-cassandra-db

Vault is logging the below error:

[ERROR] core: forward request error: error=\"error during forwarding RPC request\"      
[ERROR] core: error during forwarded RPC request: error="rpc error: code = Canceled desc = context canceled"

Vault configuration

  - type: database
    path: database
    description: General Vault Database secrets engine
    options:
      default_lease_ttl: 720h
      max_lease_ttl: 720h
    config:
      cassandra_auth: {{CASSANDRA_AUTH}}
      engine_options:
        name: {{VAULT_CASSANDRA_DB_PLUGIN_NAME}}
        plugin_name: cassandra-database-plugin
        hosts: {{CASSANDRA_HOST}}
        username: ***
        password: ***
        protocol_version: 4
        allowed_roles:
          - "*"
        connect_timeout: 120s
        skip_verification: true
        consistency: LOCAL_QUORUM
        local_datacenter: {{CASSANDRA_DATACENTER}}
        tls: {{CASSANDRA_TLS}}
        insecure_tls: false
        tls_server_name: {{CASSANDRA_TLS_SERVER_NAME_TO_VERIFY}}
        pem_json: "{\"ca_chain\":{{CA_CHAIN}}}"

Expected behavior
Expected the action to be success.

Environment:
vault 1.13.0 Failed.
vault 1.12.2 Failed.
vault 1.6.0 Succeed.

@Albert-W Albert-W changed the title 504 Error when using cassandra database plugin context canceled Error when using cassandra database plugin Apr 14, 2023
@hghaf099
Copy link
Contributor

Welcome to HashiCorp Vault, and thanks for filing this issue. Would you please provide us with the Vault configuration and and steps to reproduce the issue. Also, I am wondering if you could verify that Cassandra was reachable at the time of the issue?

@Albert-W
Copy link
Author

Hi @hghaf099, Here is my user case.
I have a Cassandra cluster running in an AWS account with a public load-balancer.
And I have a vault running in a different AWS account under an api-gateway. we get this error when we are trying to execute a similar command like:

vault write database/config/my-cassandra-database \
    plugin_name="cassandra-database-plugin" \
    hosts=<host> \
    port=<port> \
    protocol_version=4 \
    username=<username> \
    password=<password>\
    allowed_roles=* \
    consistency=LOCAL_QUORUM \
    connect_timeout=30s

It works well with vault 1.6.0, but failed with vault 1.12.2 and 1.13.0

@heatherezell
Copy link
Contributor

Can you change the connect timeout to 60 or 90s to see if that helps at all?

@Albert-W
Copy link
Author

Sure, I update it into 3000s, but still failed.
Error writing data to database/config/yichang-cassandra-database: context deadline exceeded

using vault monitor -log-level=debug, get the following message
2023-05-10T15:34:45.327Z [DEBUG] secrets.database.database_f4c1337b: got database plugin instance: type=cassandra

only one line related.

@maxb
Copy link
Contributor

maxb commented May 10, 2023

Try adding verify_connection=false to your

vault write database/config/my-cassandra-database \

If that makes it "work", you will have proved that the problem is that Vault is unable to successfully connect to your Cassandra database, and you will have to debug your network setup and Cassandra to determine why that is the case.

@Albert-W
Copy link
Author

Albert-W commented May 10, 2023

the connection is definitely working, because I can read the creds.

Key                Value
---                -----
lease_id           database/creds/my-role/GeSBplzOxjuknJzvNo3J0srr
lease_duration     8m
lease_renewable    true
password           oWToydMwaZc1qD-9rrTn
username           v_root_my_role_8oxsqvg6ytjlknq1khrq_xxxxx

but the error persists when I update the configuration.

Add "verify_connection=false" will improve the configuration, but I will get an error when read roles.

sh-4.2$ vault read database/creds/my-role
Error reading database/creds/my-role: context deadline exceeded

@maxb
Copy link
Contributor

maxb commented May 10, 2023

OK, so, the connection was working using the configuration that had previously been set in Vault:

the connection is definitely working, because I can read the creds.

Key                Value
---                -----
lease_id           database/creds/my-role/GeSBplzOxjuknJzvNo3J0srr
lease_duration     8m
lease_renewable    true
password           oWToydMwaZc1qD-9rrTn
username           v_root_my_role_8oxsqvg6ytjlknq1khrq_xxxxx

but the error persists when I update the configuration.

But the new configuration you were trying to apply is in some way broken, such that Vault times out connecting to Cassandra when using it.

Add "verify_connection=false" will improve the configuration, but I will get an error when read roles.

By adding verify_connection=false you were able to update the configuration...

sh-4.2$ vault read database/creds/my-role
Error reading database/creds/my-role: context deadline exceeded

... and so now, generating new credentials no longer works, because Vault times out connecting to Cassandra.

Everything you've shared is pointing at the configuration you're trying to set in Vault being incorrect. You need to investigate that. It doesn't look like an issue with Vault itself.

@heatherezell
Copy link
Contributor

Thank you, @maxb - I agree completely. This may be a question better suited to our Discuss forum. @Albert-W, please consider closing this issue and posting it there. Thanks! :)

@Albert-W
Copy link
Author

Albert-W commented May 10, 2023

Let me put it in a sequence, so that it's easy to understand.

  1. I create a configuration:
vault write database/config/yichang-cassandra-database \
plugin_name="cassandra-database-plugin" \
hosts="cassandra-us-east-1.cerberus.io" \
port=9042 \
protocol_version=4 \
username="vault" \
password="ioWN9mdYcDbh9y39EviEriX/JmO7rOAqxxxxxxxxxx" \
allowed_roles=my-role \
consistency="LOCAL_QUORUM" \
connect_timeout="60s" \
tls=true \
pem_json="{\"ca_chain\":$ca_chain}" \
  1. It failed with an error Error writing data to database/config/yichang-cassandra-database: context deadline exceeded, but the configuration is created.
  2. I created a role.
  3. I can read creds from the role. vault read database/roles/my-role
  4. I update the timeout="70s"
vault write database/config/yichang-cassandra-database \
plugin_name="cassandra-database-plugin" \
hosts="cassandra-us-east-1.cerberus.io" \
port=9042 \
protocol_version=4 \
username="vault" \
password="ioWN9mdYcDbh9y39EviEriX/JmO7rOAqxxxxxxxxxx" \
allowed_roles=my-role \
consistency="LOCAL_QUORUM" \
connect_timeout="70s" \
tls=true \
pem_json="{\"ca_chain\":$ca_chain}" \

same thing happens.

The point is that the configuration is working, but the command returns an error.

@heatherezell
Copy link
Contributor

The best guess I can make with this information is that the connection to cassandra is working, but the return trip to report that to Vault is not. I would start by doing some packet tracing and other network troubleshooting for connections to and from your Vault to your cassandra instance.

@Albert-W
Copy link
Author

The same code has being running for a while (more than a year).
When I am using vault 1.6.0, all is fine, when I upgrade vault to 1.12.2 or 1.13.0, it begins to fail.
I tested this for a couple times.

@heatherezell
Copy link
Contributor

Okay, thanks for that info. I'll check with some folks who are more experienced with cassandra. :)

@maxb
Copy link
Contributor

maxb commented May 10, 2023

There's something really odd going on here... If you're still seeing the error when writing the config with verify_connection=false, then that operation isn't connecting to Cassandra at all. It should be just a simple write to Vault storage.

With that extra information, my understanding of the problem completely changes... The problem is, it's now firmly into "this shouldn't be possible" territory.

The only things I can think of to suggest are general "something weird is happening" debugging options:

  • Turn the log level of Vault all the way up to trace, and see if anything useful is logged.
  • Send Vault a SIGUSR2 whilst the config write operation is hanging, before it times out, to get a dump of goroutine stacks written to the Vault logs - perhaps that way we can figure out what the long running operation is.

@Albert-W
Copy link
Author

Hi @maxb , thanks, set verify_connection=false will succeed.
without the setting, I dump the stacks, I get the traces

2023-05-11T08:08:07.383Z [DEBUG] secrets.database.database_0e42fb14: created database object: name=yichang-cassandra-database plugin_name=cassandra-database-plugin
2023-05-11T08:08:42.321Z [DEBUG] secrets.database.database_0e42fb14: got database plugin instance: type=cassandra
2023-05-11T08:09:00.055Z [DEBUG] core.cluster-listener: performing server cert lookup
2023-05-11T08:09:00.122Z [DEBUG] core.request-forward: got request forwarding connection

the creds generated by it is working.

@Albert-W
Copy link
Author

When reading the creds, it will first fail with "context deadline exceeded", but it can by fixed by vault write -force /sys/leases/revoke-force/database/creds/my-role. Here are the full commands.

sh-4.2$ vault read database/creds/my-role
Error reading database/creds/my-role: context deadline exceeded
sh-4.2$ vault write -force /sys/leases/revoke-force/database/creds/my-role
Success! Data written to: sys/leases/revoke-force/database/creds/my-role
sh-4.2$ vault read database/creds/my-role
Key                Value
---                -----
lease_id           database/creds/my-role/TBYEhJ0Wm6O20jXbCxTksik8
lease_duration     8h
lease_renewable    true
password           PkZjRaHRylIB47-UCtWL
username           v_root_my_role_tocakq3ip9ris7kzlbbs_1683794198

@maxb
Copy link
Contributor

maxb commented May 11, 2023

Hi @maxb , thanks, set verify_connection=false will succeed.
without the setting, I dump the stacks, I get the traces

This seems to contradict what you said earlier.

I am sorry, but due to too much conflicting information given, I no longer have any idea what the actual problem is, and don't expect to be able to help further.

@Albert-W
Copy link
Author

sorry for the confusion.
When I paste the commands, I accidentally paste the wrong command, here is the updated version:
#20169 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants