New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-24546 Support for Hashicorp vaults and kubernetes secrets #14122
Conversation
https://track.hpccsystems.com/browse/HPCC-24546 |
@richardkchapman @ghalliday please review. The helm/examples/secrets/README.md file is meant to be a very easy to use walk through of setting up and using vault and kubernetes secrets.. please include usability comments in your review. I've kept the secret and vault functions in jutil.cpp for now, but can move them once we've agreed on the approach. I've added open source cpp-http code for my REST calls. The nice thing is it's all in one header file. But when I move the code I could use our CHttpClient code. I didn't want to link esphttp to jlib. And this library is a bit more low level. |
Another option for REST calls could be boost.. I didn't want to deal with large dependencies though. |
b7ccf82
to
718e763
Compare
Removed tests that were only meant for my environment, and updated what seems to be arbitrary cpp-httplib openssl version deprecation. |
I am sure the schedule2 failure isn't related to this PR. |
718e763
to
0039b49
Compare
Updated caching logic to check all viable caching locations before going disk or vaults. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@afishbeck the changes look good to me. A few comments, but none of them very serious.
system/jlib/jutil.cpp
Outdated
if (created && (created < timeoutThreshold)) | ||
{ | ||
tree->removeTree(envelope); | ||
puts("\nremoved from vault cache\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar comments about tracing with puts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to scrub these personal debug statements out.
system/jlib/jutil.cpp
Outdated
return false; | ||
} | ||
}; | ||
class CVaultSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: newline between classes.
system/jlib/jutil.cpp
Outdated
//MORE: cache the secret for up to secretTimeoutMs | ||
bool getCachedSecret(CVaultKind &kind, StringBuffer &content, const char *secret, const char *version) | ||
{ | ||
std::map<std::string, std::unique_ptr<CVault>>::iterator it = vaults.begin(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possibly simpler to use "auto" instead of the explicit type (and elsewhere)
system/jlib/jutil.cpp
Outdated
@@ -3056,19 +3175,481 @@ extern jlib_decl void setSecretMount(const char * path) | |||
secretDirectory.set(path); | |||
} | |||
|
|||
extern jlib_decl StringBuffer & getSecret(StringBuffer & result, const char * name, const char * key) | |||
enum class CVaultKind { kv_v1, kv_v2 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it worth splitting this code into a jsecret.hpp/cpp? (I don't have a strong opinion either way.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I'll split it out. Wanted an initial sanity check on the approach first.
system/jlib/jutil.cpp
Outdated
static StringBuffer secretDirectory; | ||
static CriticalSection secretCS; | ||
static unsigned secretTimeoutMs = UINT_MAX; | ||
static unsigned secretTimeoutMs = 60 * 60 * 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this typical for k8s? (I never researched what the standard default was.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see if there's some consensus somewhere.
helm/examples/secrets/README.md
Outdated
Setup vault auth policy granting access to the ecl secrets locations we plan to use: | ||
|
||
```bash | ||
vault policy write hpcc-kv-ro hpcc_vault_policies.hcl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users will need to give the absolute path to hpcc_vault_policies.hcl or be located in the directory.
vault policy write hpcc-kv-ro HPCC-Platform/helm/examples/secrets/hpcc_vault_policies.hcl
helm/examples/secrets/README.md
Outdated
Install the HPCC helm chart with the secrets just defined added to all components that run ECL. | ||
|
||
```bash | ||
helm install myhpcc ../../hpcc/ --set global.image.version=latest -f val |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolute path:
helm install myhpcc HPCC-Platform/helm/hpcc/ --set global.image.version=latest -f HPCC-Platform/helm/examples/secrets/values-secrets.yaml
0039b49
to
ab0ead0
Compare
Updated to support binary secrets, moved secret code to jsecrets.cpp, cleaned up based on review comments. helm install myhpcc hpcc/ --set global.image.root=afishbeck --set global.image.version=latest -f examples/secrets/values-secrets.yaml |
@AttilaVamos not sure what "Conflicting files, should skip build and test" means. I'll try rebasing. |
ab0ead0
to
51e8855
Compare
rebased |
Also changed the vault yaml format a bit. |
51e8855
to
a4d5724
Compare
rebased again to resolve conflict with my other PR which was just merged. |
@AttilaVamos I doubt those roxie timeouts had much to do with this change, can we re-run the test? |
@afishbeck The schedule2 is definitely not related to this PR. The workflow test is new, I should check it. However I think it isn't related, as well based on that test has been successful on in the another Smoketest environment (Host: ip-10-20-0-221.ca-central-1.compute.internal). |
Added HPCC-24707 to cover the last issue - not likely to be related to tony's change, but worth investigating. |
@ghalliday - to make sure I review it next |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@afishbeck this looks like it is very close to being ready. A few questions/comments.
common/thorhelper/thorsoapcall.cpp
Outdated
@@ -801,10 +805,13 @@ class CWSCHelper : implements IWSCHelper, public CInterface | |||
unsigned done; | |||
Owned<IPropertyTree> xpathHints; | |||
Linked<ClientCertificate> clientCert; | |||
bool customClientCert = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
packing: I don't think it matters in this case, so I probably wouldn't change since there are not likely to be large numbers of the objects, but if this boolean was moved to follow timeLimitExceeded the object would be 8 bytes smaller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if I need to make any other changes I will include this change as well.
"ecl": { | ||
"$ref": "#/definitions/vaultCategory" | ||
}, | ||
"ecl-user": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: How does ecl-user differ from ecl? Are there some components that will have ecl available, but no ecl-user? I assume it is to avoid ecl code from accessing implicit secrets. Is there corresponding protection for k8s secrets?
The secret categories were not previously required by the caller, but I think they are now. Is that because you need to have potentially different vaults/security for the different categories to prevent them leaking when they shouldn't. I think it generally makes sense, but want to be sure I understand it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider publishing the k8s secrets in subdirectories corresponding to the category - I think that would avoid the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The categories allow the various types of secrets to be organized and secured differently. To prevent leaking and usage for unintended purposes. I haven't made use of "ecl-user" yet, but that is going to be the least secure category, ecl code will have direct access to the contents of the secret. For example, if I need a key for use with some plugin or to pass in some non standard way to a 3rd party. Because the ecl can see the contents it could return it, write it, or send it anywhere. I don't want gateway or storage plane credentials to be accessible in that way.
When it comes to the organization of the actual vault it could be a simple as:
http://${env.VAULT_SERVICE_HOST}:${env.VAULT_SERVICE_PORT}/v1/secret/data/storage/${secret}
http://${env.VAULT_SERVICE_HOST}:${env.VAULT_SERVICE_PORT}/v1/secret/data/ecl/${secret}
http://${env.VAULT_SERVICE_HOST}:${env.VAULT_SERVICE_PORT}/v1/secret/data/ecl-user/${secret}
But those different sections can have different access rights, and allow the secrets to be organized based on how and where they are used.
Over time we may want to segment the vault access by components. So certain roxies can call some gateways but hthor can't for example. But I'd rather get a feel for how these things are used in the wild a bit first. Our users can give us some feedback on how they need to further restrict.
I definitely agree on the idea of the k8s secret subdirectories. That might mean I no longer have to mangle the names of the HTTP-CONNECT secrets. That was for example to prevent them being used in the way the ecl-user category will be in the future.
system/jlib/jsecrets.cpp
Outdated
|
||
static void splitUrlAuthority(const char *authority, size_t authorityLen, StringBuffer &user, StringBuffer &password, StringBuffer &host, StringBuffer *port) | ||
{ | ||
if (isEmptyString(authority)||authorityLen==0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure it should be calling isEmptyString - very unlikely change of accessing invalid memory. Main objection is it suggests it is a null terminated string.
system/jlib/jsecrets.cpp
Outdated
|
||
static inline void extractUrlProtocol(const char *&url, StringBuffer *scheme) | ||
{ | ||
if(!url || strlen(url) <= 7) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial - I think the strlen() test is redundant.
if (token.length()) | ||
return; | ||
StringBuffer login_token; | ||
login_token.loadFile("/var/run/secrets/kubernetes.io/serviceaccount/token"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this require the pods to have any access rights? Some recent PRs of Richard's have tied down both the access rights and the network access. Does that introduce problems here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on a quick comment from @richardkchapman I think the line in the README where I do:
vault write auth/kubernetes/role/hpcc-vault-access
bound_service_account_names=hpcc-default
bound_service_account_namespaces=default
policies=hpcc-kv-ro
ttl=24h
may be enough to grant access to our default account and can be changed to reflect whatever the user may change the account to.
It's more of an operational thing on the vault side.
I also support passing in a direct access token as a k8s secret. Hopefully that will cover some other authentication schemes the user may set up at least until we determine what other authentication modes we want to support.
Vault supports many authentication schemes and we can add more over time based on what scenarios users need, or we want to support in general.
@ghalliday updated
|
@afishbeck I like the changes - please can you squash and I will scan it one more time before merging. |
Signed-off-by: Anthony Fishbeck <anthony.fishbeck@lexisnexisrisk.com>
7689bf4
to
6b9063a
Compare
@ghalliday squashed changes |
Automated Smoketest: ✅ Unit tests result:
Regression test result:
HPCC Stop: OK
|
Automated Smoketest: ✅ Unit tests result:
Regression test result:
HPCC Stop: OK
|
Signed-off-by: Anthony Fishbeck anthony.fishbeck@lexisnexisrisk.com
Type of change:
Checklist:
Smoketest:
Testing: