Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connect: ensure all vault connect CA tests use limited privilege tokens #15669

Merged
merged 2 commits into from Dec 6, 2022

Conversation

rboyer
Copy link
Member

@rboyer rboyer commented Dec 5, 2022

Description

All of the current integration tests where Vault is the Connect CA now use non-root tokens for the test. This helps us detect privilege changes in the vault model so we can keep our guides up to date.

One larger change was that the RenewIntermediate function got refactored slightly so it could be used from a test, rather than the large duplicated function we were testing in a test which seemed error prone.

@rboyer rboyer added theme/consul-vault Relating to Consul & Vault interactions backport/1.12 Changes are backported to 1.12 backport/1.13 Changes are backported to 1.13 backport/1.14 Changes are backported to 1.14 labels Dec 5, 2022
@rboyer rboyer requested a review from kisunji December 5, 2022 18:09
@rboyer rboyer self-assigned this Dec 5, 2022
@github-actions github-actions bot added the theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies label Dec 5, 2022
SkipIfVaultNotPresent(t)

provider, testVault := testVaultProviderWithConfig(t, false, nil)
defer testVault.Stop()
t.Parallel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these safe to run parallel now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They always were I believe. Each test gets its own vault process.

I added this b/c it shaved the runtime down by a factor of ~5. Locally it was ~28s -> ~6s.

tests := CASigningKeyTypeCases()

for _, tc := range tests {
tc := tc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually need this now that we're running t.Parallel(), right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the run(t, tc) thing here instead.

agent/consul/leader_connect_ca.go Outdated Show resolved Hide resolved
for _, tc := range KeyTestCases {
tc := tc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be preserved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did the run(t, tc) approach for the rest of the test table tests so it shouldn't be necessary.

}
}

func CreateVaultTokenWithAttrs(t testing.T, client *vaultapi.Client, attr *VaultTokenAttributes) string {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New test helper for use outside of the package too.

@@ -87,7 +87,7 @@ func (s *Server) checkBindingRuleUUID(id string) (bool, error) {
}

func (s *Server) InPrimaryDatacenter() bool {
return s.config.PrimaryDatacenter == "" || s.config.Datacenter == s.config.PrimaryDatacenter
return s.config.InPrimaryDatacenter()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convenience for a test in leader_connect_ca_test.go

@@ -415,6 +415,10 @@ type Config struct {
*EnterpriseConfig
}

func (c *Config) InPrimaryDatacenter() bool {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convenience for a test in leader_connect_ca_test.go

@@ -778,37 +826,62 @@ func TestCAManager_Initialize_Vault_WithExternalTrustedCA(t *testing.T) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record TestCAManager_Initialize_Vault_WithExternalTrustedCA was a test I invested a lot in because it surfaced a bunch of strange things such as #15661

rawConfig: map[string]interface{}{},
name: "DefaultConfig",
rawConfig: map[string]any{
"RootPKIPath": "pki-root/",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid accidentally triggering two providers to share some vault mount paths in tests, I changed it so that test authors need to explicitly set the paths all the time.

@@ -1118,23 +1326,8 @@ func getIntermediateCertTTL(t *testing.T, caConf *structs.CAConfiguration) time.
return dur
}

func testVaultProviderWithConfig(t *testing.T, isPrimary bool, rawConf map[string]interface{}) (*VaultProvider, *TestVaultServer) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I inlined testVaultProviderWithConfig so that we can create the token in between the two steps.

RootPath: "pki-root",
IntermediatePath: "pki-intermediate",
ConsulManaged: true,
WithSudo: withSudo,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This validates the continued need for the sudo capability now.

assertCorrectKeyType(t, tc.CSRKeyType, intPEM)

if expectFailure {
testCrossSignProvidersShouldFail(t, provider1, provider2)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we repeat the test without sudo we expect it to fail.

@@ -224,6 +215,9 @@ func (v *TestVaultServer) Stop() error {
// wait for the process to exit to be sure that the data dir can be
// deleted on all platforms.
if err := v.cmd.Wait(); err != nil {
if strings.Contains(err.Error(), "exec: Wait was already called") {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some cases when I was repeatably executing the tests it would hit this double-wait issue which crashed the test and flaked it. Now those get ignored to make double-Stop ok.


case a.VaultManaged:
// Vault-managed PKI root.
t.Fatal("TODO: implement this and use it in tests")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stub is here as a reminder to add more tests for the other policy in our docs.

@@ -1069,7 +1068,7 @@ func (c *CAManager) secondaryRequestNewSigningCert(provider ca.Provider, newActi
}

if err := setLeafSigningCert(newActiveRoot, intermediatePEM); err != nil {
return err
return fmt.Errorf("Failed to set the leaf signing cert to the intermediate: %w", err)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapped the error here to help figure out which line caused the error (all other returns from this function wrap the errors).

@rboyer rboyer marked this pull request as ready for review December 5, 2022 20:28
@rboyer rboyer requested a review from skpratt December 5, 2022 20:29
Base automatically changed from fix-vault-ca-secondary-renewal to main December 5, 2022 21:39
@vercel
Copy link

vercel bot commented Dec 5, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

2 Ignored Deployments
Name Status Preview Comments Updated
consul ⬜️ Ignored (Inspect) Dec 6, 2022 at 3:34PM (UTC)
consul-ui-staging ⬜️ Ignored (Inspect) Dec 6, 2022 at 3:34PM (UTC)

@rboyer rboyer merged commit 900584c into main Dec 6, 2022
@rboyer rboyer deleted the vault-tests-use-tokens branch December 6, 2022 16:06
rboyer added a commit that referenced this pull request Dec 6, 2022
…flakes (#15691)

It turns out that by default the dev mode vault server will attempt to interact with the
filesystem to store the provided root token. If multiple vault instances are running
they'll all awkwardly share the filesystem and if timing results in one server stopping
while another one is starting then the starting one will error with:

    Error initializing Dev mode: rename /home/circleci/.vault-token.tmp /home/circleci/.vault-token: no such file or directory

This change uses `-dev-no-store-token` to bypass that source of flakes. Also the
stdout/stderr from the vault process is included if the test fails.

The introduction of more `t.Parallel` use in #15669
increased the likelihood of this failure, but any of the tests with multiple vaults in use
(or running multiple package tests in parallel that all use vault) were eventually going
to flake on this.
rboyer added a commit that referenced this pull request Dec 6, 2022
…flakes (#15691)

It turns out that by default the dev mode vault server will attempt to interact with the
filesystem to store the provided root token. If multiple vault instances are running
they'll all awkwardly share the filesystem and if timing results in one server stopping
while another one is starting then the starting one will error with:

    Error initializing Dev mode: rename /home/circleci/.vault-token.tmp /home/circleci/.vault-token: no such file or directory

This change uses `-dev-no-store-token` to bypass that source of flakes. Also the
stdout/stderr from the vault process is included if the test fails.

The introduction of more `t.Parallel` use in #15669
increased the likelihood of this failure, but any of the tests with multiple vaults in use
(or running multiple package tests in parallel that all use vault) were eventually going
to flake on this.
rboyer added a commit that referenced this pull request Dec 6, 2022
…flakes (#15691)

It turns out that by default the dev mode vault server will attempt to interact with the
filesystem to store the provided root token. If multiple vault instances are running
they'll all awkwardly share the filesystem and if timing results in one server stopping
while another one is starting then the starting one will error with:

    Error initializing Dev mode: rename /home/circleci/.vault-token.tmp /home/circleci/.vault-token: no such file or directory

This change uses `-dev-no-store-token` to bypass that source of flakes. Also the
stdout/stderr from the vault process is included if the test fails.

The introduction of more `t.Parallel` use in #15669
increased the likelihood of this failure, but any of the tests with multiple vaults in use
(or running multiple package tests in parallel that all use vault) were eventually going
to flake on this.
rboyer added a commit that referenced this pull request Dec 6, 2022
…flakes (#15691)

It turns out that by default the dev mode vault server will attempt to interact with the
filesystem to store the provided root token. If multiple vault instances are running
they'll all awkwardly share the filesystem and if timing results in one server stopping
while another one is starting then the starting one will error with:

    Error initializing Dev mode: rename /home/circleci/.vault-token.tmp /home/circleci/.vault-token: no such file or directory

This change uses `-dev-no-store-token` to bypass that source of flakes. Also the
stdout/stderr from the vault process is included if the test fails.

The introduction of more `t.Parallel` use in #15669
increased the likelihood of this failure, but any of the tests with multiple vaults in use
(or running multiple package tests in parallel that all use vault) were eventually going
to flake on this.
rboyer added a commit that referenced this pull request Dec 6, 2022
…flakes (#15691) (#15694)

It turns out that by default the dev mode vault server will attempt to interact with the
filesystem to store the provided root token. If multiple vault instances are running
they'll all awkwardly share the filesystem and if timing results in one server stopping
while another one is starting then the starting one will error with:

    Error initializing Dev mode: rename /home/circleci/.vault-token.tmp /home/circleci/.vault-token: no such file or directory

This change uses `-dev-no-store-token` to bypass that source of flakes. Also the
stdout/stderr from the vault process is included if the test fails.

The introduction of more `t.Parallel` use in #15669
increased the likelihood of this failure, but any of the tests with multiple vaults in use
(or running multiple package tests in parallel that all use vault) were eventually going
to flake on this.

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
rboyer added a commit that referenced this pull request Dec 6, 2022
…flakes (#15691) (#15693)

It turns out that by default the dev mode vault server will attempt to interact with the
filesystem to store the provided root token. If multiple vault instances are running
they'll all awkwardly share the filesystem and if timing results in one server stopping
while another one is starting then the starting one will error with:

    Error initializing Dev mode: rename /home/circleci/.vault-token.tmp /home/circleci/.vault-token: no such file or directory

This change uses `-dev-no-store-token` to bypass that source of flakes. Also the
stdout/stderr from the vault process is included if the test fails.

The introduction of more `t.Parallel` use in #15669
increased the likelihood of this failure, but any of the tests with multiple vaults in use
(or running multiple package tests in parallel that all use vault) were eventually going
to flake on this.

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
rboyer added a commit that referenced this pull request Dec 6, 2022
…flakes (#15691) (#15692)

It turns out that by default the dev mode vault server will attempt to interact with the
filesystem to store the provided root token. If multiple vault instances are running
they'll all awkwardly share the filesystem and if timing results in one server stopping
while another one is starting then the starting one will error with:

    Error initializing Dev mode: rename /home/circleci/.vault-token.tmp /home/circleci/.vault-token: no such file or directory

This change uses `-dev-no-store-token` to bypass that source of flakes. Also the
stdout/stderr from the vault process is included if the test fails.

The introduction of more `t.Parallel` use in #15669
increased the likelihood of this failure, but any of the tests with multiple vaults in use
(or running multiple package tests in parallel that all use vault) were eventually going
to flake on this.

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.12 Changes are backported to 1.12 backport/1.13 Changes are backported to 1.13 backport/1.14 Changes are backported to 1.14 theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies theme/consul-vault Relating to Consul & Vault interactions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants