Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HA vault cluster failing to communicate with MySQL db endpoint from another cluster #8458

Open
pkoritala opened this issue Mar 3, 2020 · 7 comments
Labels
bug Used to indicate a potential bug core/ha specific to high-availability

Comments

@pkoritala
Copy link

pkoritala commented Mar 3, 2020

Describe the bug
I'm currently using hashicorp vault chart which has image 1.3.2 to install in kubernetes cluster(v1.13.10) using helm (v3.0.3) for HA installation. I created a new amazon rds aurora db and when installed with vault chart, it worked fine.
But when I use a different kubernetes cluster on the same amazon rds aurora db endpoint with a different table name,
vault operator init - successful
`vault login - failed with the following error. This bug is consistent.

Error authenticating: error looking up token: Error making API request.

URL: GET https://127.0.0.1:8200/v1/auth/token/lookup-self
Code: 500. Errors:

* sql: no rows in result set

Also, I created a new db and new table using the same db endpoint, I see the same issue.
I tried to debug, by logging into the mysql database.
The _lock table has no records.

To Reproduce
Steps to reproduce the behavior:

  1. Download hashicorp vault chart

  2. Package the chart.

  3. Create values.yaml and configure as per the below content.

server:
  image:
    repository: "vault"
    tag: "1.3.2"
    pullPolicy: IfNotPresent

ha:
    enabled: true
    replicas: 3
    config: |
      ui = true
      listener "tcp" {
          address = "[::]:8200"
          cluster_address = "[::]:8201"
          tls_cert_file = "/vault/userconfig/vault-server-tls/tls.crt"
          tls_key_file  = "/vault/userconfig/vault-server-tls/tls.key"
          tls_client_ca_file = "/vault/userconfig/vault-server-tls/ca.crt"
        }
 
      storage "mysql" {
          ha_enabled = "true"
          address = "{{ .Values.mysql.address }}"
          username = "{{ .Values.mysql.username }}"
          password = "{{ .Values.mysql.password }}"
          database = "{{ .Values.mysql.database }}"
          table = "{{ .Values.mysql.table }}"
      }
 
      seal "awskms" {
        region     = "{{ .Values.awskms.region }}"
        kms_key_id = "{{ .Values.awskms.kms_key_id }}"
      }
  1. Install chart by overriding the values.yaml
    helm install vault <package_name> --values values.yaml

For the 1st cluster with the db endpoint.

  1. Run `vault operator init
  2. Run vault login <roottoken> (works fine)

For the 2nd cluster with the db endpoint with different table name.

  1. Run `vault operator init
  2. Run vault login <roottoken>
  3. See error

After this now, I repeat vault deployment in the first/second cluster, I can see the error always.

Expected behavior
Should be able to perform vault operations using existing db endpoint with a different table name.

Environment:

  • Vault Server Version (retrieve with vault status): Vault v1.3.2
  • Vault CLI Version (retrieve with vault version): Vault v1.3.2
  • Server Operating System/Architecture: centos 7 / kubernetes cluster (v1.13.10)

CC: @michaeljs1990 , I saw your commits on the mqsql go files. Could you please help me here. Is my ha configuration right? any recommendations?

@catsby catsby added bug Used to indicate a potential bug core/ha specific to high-availability labels Mar 4, 2020
@catsby
Copy link
Member

catsby commented Mar 4, 2020

Hello - what method are you using to login? Are you using the root token? My mistake, I see that in your steps you're using the root token(s).

@catsby
Copy link
Member

catsby commented Mar 4, 2020

Can you confirm that the tables each Vault cluster is using are different, and they are not using the same lock table? Can you repeat this error if you explicitly set the lock table to something unique per Vault cluster?

@pkoritala
Copy link
Author

pkoritala commented Mar 4, 2020

yes, performed on a different table, also different db in the same db endpoint. Can repeat the error.

Even for a new db endpoint, worked fine with the first cluster using as a storage. When used the same db endpoint for a different cluster, the error happens.

@pkoritala
Copy link
Author

pkoritala commented Mar 4, 2020

Can you confirm that the tables each Vault cluster is using are different, and they are not using the same lock table? Can you repeat this error if you explicitly set the lock table to something unique per Vault cluster?

@catsby ,yes , explicitly mentioned lock table. I can still replicate the error.

@pkoritala
Copy link
Author

@catsby ,
found a solution where, if we perform the below steps, we no more see the error.

  1. vault operator init
  2. vault operator unseal
  3. vault operator unseal
  4. vault operator unseal

Able to leader record under lock table.
Also able to perform login operation.
5. vault login <root_token>

But, this is still a bug where for the first installation, it should be auto-unsealed and auto-leader election should be happening and updating the leader record in the lock table.

@pandeybk
Copy link

pandeybk commented Mar 4, 2020

Looks like you are running into the following problem.

Check steps 6 (there is a glitch)

Error checking leader status: Error making API request.

URL: GET http://127.0.0.1:8200/v1/sys/leader
Code: 500. Errors:

* sql: no rows in result set

Steps to reproduce:

  1. Setup local MySQL server
  2. Create config file config.hcl
disable_mlock = true    

api_addr = "http://127.0.0.1:8201" 
listener "tcp" {
  tls_disable = 1

} 

storage "mysql" {
  ha_enabled = "true"
  address = "localhost"
  username = "root"
  password = "root"
  database = "vault"
  table = "tb4"
}
  1. Start vault server vault server -config=./config.hcl

  2. Initialize vault vault operator init

$ export VAULT_ADDR="http://127.0.0.1:8200"
$ vault operator init
Unseal Key 1: /4JTXLRoH879wvBAJ9Np0kXwXrY9y/zF+r00Uuc1xGPd
Unseal Key 2: htArtbmK3Ojnjwo7PW1mbQGFq1d5U9F+g/dFfh7A7D9S
Unseal Key 3: C4IeYiJJC+3CERC+h7tBd26veCbo5GRG2iNhzPH1PmfZ
Unseal Key 4: LHpQiE4Rb8QQDbx2q4ZbSjIolnf4jXINHCG3ZURJPpch
Unseal Key 5: +DVTBiyyuAOUNPoiuxJuhI8F1JP0089TMkVxgEt6bf7l

Initial Root Token: s.l6Y05rygjbrqVxChhtgxBxcg

Vault initialized with 5 key shares and a key threshold of 3. Please securely
distribute the key shares printed above. When the Vault is re-sealed,
restarted, or stopped, you must supply at least 3 of these keys to unseal it
before it can start servicing requests.

Vault does not store the generated master key. Without at least 3 key to
reconstruct the master key, Vault will remain permanently sealed!

It is possible to generate new unseal keys, provided you have a quorum of
existing unseal keys shares. See "vault operator rekey" for more information.
  1. Connect to MySQL and check the lock table. There are 0 recordsets.
mysql> select * from tb4_lock;
Empty set (0.00 sec)
  1. Now start vault unsealing process
$ vault operator unseal /4JTXLRoH879wvBAJ9Np0kXwXrY9y/zF+r00Uuc1xGPd
Key                Value
---                -----
Seal Type          shamir
Initialized        true
Sealed             true
Total Shares       5
Threshold          3
Unseal Progress    1/3
Unseal Nonce       354044bc-6cc0-267c-4ab0-d7dcb969a077
Version            1.4.0-beta1
HA Enabled         true

$ vault operator unseal htArtbmK3Ojnjwo7PW1mbQGFq1d5U9F+g/dFfh7A7D9S
Key                Value
---                -----
Seal Type          shamir
Initialized        true
Sealed             true
Total Shares       5
Threshold          3
Unseal Progress    2/3
Unseal Nonce       354044bc-6cc0-267c-4ab0-d7dcb969a077
Version            1.4.0-beta1
HA Enabled         true

$ vault operator unseal C4IeYiJJC+3CERC+h7tBd26veCbo5GRG2iNhzPH1PmfZ
Error checking leader status: Error making API request.

URL: GET http://127.0.0.1:8200/v1/sys/leader
Code: 500. Errors:

* sql: no rows in result set
  1. Now check the MySQL lock table again.
mysql> select * from tb4_lock;
+----------+--------------------------------------+
| node_job | current_leader                       |
+----------+--------------------------------------+
| leader   | bf9172f6-60f0-26c3-1d22-64899cbb9fe3 |
+----------+--------------------------------------+
1 row in set (0.00 sec)
  1. Do any other vault operation
$ vault status
Key             Value
---             -----
Seal Type       shamir
Initialized     true
Sealed          false
Total Shares    5
Threshold       3
Version         1.4.0-beta1
Cluster Name    vault-cluster-3e9bc053
Cluster ID      74b9b05b-bbf0-8df1-c7fd-c701c1aca00b
HA Enabled      true
HA Cluster      https://127.0.0.1:8202
HA Mode         active

$ vault login s.l6Y05rygjbrqVxChhtgxBxcg
Success! You are now authenticated. The token information displayed below
is already stored in the token helper. You do NOT need to run "vault login"
again. Future Vault requests will automatically use this token.

Key                  Value
---                  -----
token                s.l6Y05rygjbrqVxChhtgxBxcg
token_accessor       fZ7q26LKQEqul6FMmviO9Bp3
token_duration       ∞
token_renewable      false
token_policies       ["root"]
identity_policies    []
policies             ["root"]

Everything seems fine. So my guess is since you are using the following configuration, which is supposed to auto unseal vault without calling vault unseal operation. Possibly it's not able to add a leader records set in MySQL table. Because of that all of your operations are failing.

 seal "awskms" {
        region     = "{{ .Values.awskms.region }}"
        kms_key_id = "{{ .Values.awskms.kms_key_id }}"
      }

@pkoritala
Copy link
Author

@pandeybk @catsby

I started testing using a different cluster now. The error happens during the unseal operation itself after vault operator init.
Not able to leader elect.

/ $ vault operator unseal <unseal_key>
Error checking leader status: Error making API request.

URL: GET https://127.0.0.1:8200/v1/sys/leader
Code: 500. Errors:

* sql: no rows in result set

db lock table:

select * from k8s164b2dev1400_lock ;
Empty set (0.00 sec)

Fyi., Not able to deploy vault in the working cluster as well now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Used to indicate a potential bug core/ha specific to high-availability
Projects
None yet
Development

No branches or pull requests

3 participants