Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix KMS encryption context handling #435

Merged
merged 3 commits into from
Mar 21, 2019

Conversation

jpsrn
Copy link
Contributor

@jpsrn jpsrn commented Mar 6, 2019

It turns out that sops doesn't send correct KMS encryption context key-value pairs to KMS service during encryption and decryption operations. If an encryption context contains at least two key-value pairs and more than one unique value among the keys' values, sops will incorrectly call KMS API with an encryption context where all correct keys are present, but they all have the same value. This pull request fixes that and adds a regression test for KMS encryption context handling in keyservice/server.go, where the bug occurs.

Example of the bug

Suppose sops is asked to encrypt a new file with the following encryption context.

"first key": "first value"
"second key": "second value"
"third key": "third value"

In that case, sops sends to KMS an encryption context where every key has exactly the same value, which is chosen randomly from the values of the actually requested encryption context above. This random choice happens on every invocation of sops KMS encrypt or decrypt command and can lead to sops attempting to encrypt and decrypt with different encryption contexts between successive invocations!

"first key": "second value"
"second key": "second value"
"third key": "second value"

Consequences

Due to the nature of the bug, it looks like the data key in every KMS encryption context using sops file has been incorrectly encrypted, provided that the following conditions are fulfilled.

  • KMS encryption context is used
  • At lest two key-value pairs are provided in the encryption context
  • The encryption context's key-value pairs have more than one unique value

As a result, this pull request's fix will break decryption of any such KMS-backed data keys!

Workaround suggestion

This pull request only includes the immediate bug fix, but it would be worthwhile for the sops team to consider implementing a compatibility feature that can detect the presence of KMS encryption context configuration that is impacted by this bug. Sops could then prompt the user if sops should be allowed to perform multiple KMS decrypt requests to find out which unique encryption context value was actually used for all context keys when the data key was previously successfully encrypted while the bug was present. It should be possible to detect such KMS encryption context configurations, because the bug only impacts what is sent to the KMS API, meaning that the serialised sops file should still have to the correct encryption context configuration.

@codecov-io
Copy link

codecov-io commented Mar 6, 2019

Codecov Report

Merging #435 into master will decrease coverage by 6.51%.
The diff coverage is 83.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #435      +/-   ##
==========================================
- Coverage   42.94%   36.42%   -6.52%     
==========================================
  Files          16       20       +4     
  Lines        2254     2715     +461     
==========================================
+ Hits          968      989      +21     
- Misses       1197     1637     +440     
  Partials       89       89
Impacted Files Coverage Δ
keyservice/server.go 6.28% <83.33%> (ø)
keyservice/client.go 0% <0%> (ø)
keyservice/keyservice.pb.go 4.29% <0%> (ø)
keyservice/keyservice.go 0% <0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c3c3b7c...0d4af3f. Read the comment docs.

@ajvb
Copy link
Contributor

ajvb commented Mar 8, 2019

Thank you for the PR and the detailed write-up, and great find. I will do a quick CR here shortly, but also definitely want @autrilla to take a look when he time.

As well, as you stated, we will need to have a migration path for users who are affected by this bug. I'm going to get started on that here shortly. We won't want to merge this until that is ready to be merged as well.

Role: svcKey.Role,
EncryptionContext: ctx,
AwsProfile: svcKey.AwsProfile,
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run go fmt, will remove this semi-colon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad! I'll fix the formatting with go fmt later today.

Copy link
Contributor

@autrilla autrilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the very comprehensive description of the bug and especially for submitting a patch to fix it.

I fully agree with @ajvb that we want the upgrade path to be part of the codebase before we merge this.

My main concern with this patch is it increases the code complexity a bit just to make it testable. Since we already mock the AWS KMS interface (e.g. in https://github.com/mozilla/sops/blob/c968ce2e3b13fac738d1d0c144dca3c9c7cfacb8/kms/keysource_test.go#L16), I think we could check that that interface is called with the expected parameters, instead of adding an additional layer of abstraction. We could still have tests for this bug, but reduce the impact on code complexity. What do you think?

@@ -31,19 +52,19 @@ func (ks *Server) encryptWithPgp(key *PgpKey, plaintext []byte) ([]byte, error)
func (ks *Server) encryptWithKms(key *KmsKey, plaintext []byte) ([]byte, error) {
ctx := make(map[string]*string)
for k, v := range key.Context {
ctx[k] = &v
value := v // Allocate a new string to prevent the pointer below from referring to only the last iteration value
ctx[k] = &value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do ctx[k] = &key.Context[k] too to avoid this as well, but what you did works just fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if there's any linters that catch or warn against this, since I know for a fact this has triggered a bug at least one other time in sops.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make sense to try to use &key.Context[k] to get pointers to the map values, but that doesn't work, because apparently Go map values are not addressable. So the simplest fix I could find was to just assign the value to a new variable causing a new string to be allocated.

Testing that the encryption context is correctly passed all the way to the KMS SDK would be a good idea and I was initially considering that approach for the test cases. I didn't go with that approach, because I was a little reluctant exposing the kmsSvc kmsiface.KMSAPI global variable to the keyservice package, which I thought to be the best location for these tests. If it's okay to reuse that global variable to also enable KMS API mocking in keyservice package, I don't see why I couldn't change the implementation to use that approach and remove the provider indirection I introduced.

Finally, using a linter to catch these types of bugs would be awesome. I'm not knowledgeable enough about the Go ecosystem to suggest a linter that would detect this problem. I did do some quick googling, but all I was able to find was this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now pushed changes that use the already existing KMS mock. These changes have the aforementioned downside that they expose kmsiface.KMSAPI variable in kms package in order to make it accessible from keyservice package. It would be simpler to have this new encryption context test located within kms package to allow it to access that kmsiface.KMSAPI mock without publicly exporting the variable, but this would lead to a circular dependency between the packages (kms -> keyservice -> kms).

In my opinion, it would be preferable to consider getting rid of the use of a global variable for mock injection purposes and instead refactor the kms package to allow tests to provide the kmsiface.KMSAPI implementation directly to an instance of kms.MasterKey. This would remove the need to make the variable public for mocking purposes. Additionally, I'm wondering if the global variable might cause any kind of concurrency issues between tests. I'm not that well-versed in Go to know how concurrency might influence these testing scenarios, but I can see that if two tests were running concurrently and attempted to modify the global kmsiface.KMSAPI, there would be trouble.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that is somewhat ugly too. I think we're taking the wrong approach to testing this. This bug was caused by the conversion from a keyservice.KmsKey to a kms.MasterKey being buggy. We should split that out into a function and test it. I feel like the kind of testing you're doing right now is better suited for our "functional tests" (end-to-end tests). Unfortunately we don't have the infrastructure right now to use KMS keys during tests, but I feel like setting that up and running integration tests would be the way forward. Testing Go code across packages can get very messy, and mocking in Go is not great. I'm not really sure the tests we currently have in the kms package (before this PR) are actually very useful. It'd be much better to have a real end-to-end test instead of using a mock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making the global variable public is indeed ugly. On the other hand, your suggestion of extracting a helper method is a sound choice. I considered that initially, but opted to go with a mock-based test, because by using a mock I could check that the actual call path would provide the wanted encryption context. In contrast calling the helper method directly in a test will not guarantee that the actual call path will actually use that helper function, even if the test case shows the helper function to be working fine. That said, I'm fine with your suggestion, especially if a more comprehensive functional test is added in the future. I pushed new code changes that extract a helper function and test that function directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making these changes again. It looks good to me now. If my information is still correct, @ajvb will work on setting up integration tests and an upgrade path for this PR, at which point, it can be merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem! I rebased the changes, dropping irrelevant commits.

The code copying encryption context value strings to a map
containing string pointers was incorrectly getting a pointer to a
string variable which is being re-used by the for loop, causing
all keys to point to the same value string.
@ajvb
Copy link
Contributor

ajvb commented Mar 19, 2019

@jpsrn Can you change this PR to be merged into dev/fix-aws-kms-enc-ctx instead of master? We are going to use this as a staging branch for this PR, #444, and an soon-to-be-created PR with the migration code (edit: now created - #445).

@jpsrn jpsrn changed the base branch from master to dev/fix-aws-kms-enc-ctx March 20, 2019 07:39
@jpsrn
Copy link
Contributor Author

jpsrn commented Mar 20, 2019

@jpsrn Can you change this PR to be merged into dev/fix-aws-kms-enc-ctx instead of master? We are going to use this as a staging branch for this PR, #444, and an soon-to-be-created PR with the migration code (edit: now created - #445).

Done

@ajvb ajvb requested a review from autrilla March 21, 2019 16:51
@ajvb ajvb merged commit f2e48b1 into getsops:dev/fix-aws-kms-enc-ctx Mar 21, 2019
@ajvb
Copy link
Contributor

ajvb commented Mar 21, 2019

@jpsrn Again, thank you so much for your extremely detailed analysis and contributions on this issue.

If you have the capacity, would love your eyes on #448 which is a PR into master including this PR, the functional tests, and the autofix path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants