Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Secret-store implementation #11232

Merged
merged 36 commits into from
Dec 8, 2022
Merged

Conversation

srebhan
Copy link
Contributor

@srebhan srebhan commented Jun 1, 2022

resolves #3124
replaces: #9775

This PR implements a secret-store for storing sensitive data such as usernames, passwords, tokens, etc. Those secrets can then be referenced in the Telegraf config with a @{store_id:secret key} pattern. This effectively allows to remove secrets from config file(s) and store them in a "safe" place.

The secret-store can be configured using [[secretstores.xyz]] sections where xyz references the secret-store plugin to use. Currently, implementations to use OS native stores (kernel keyrings for linux, Windows credential manager and keychain for MacOS) and Javascript Object Signing and Encryption files are available. Further implementation may be added later.

The stored secrets can be easily referenced via an @{<secretstore id>:<secret identifier>} pattern, with <secretstore id> being the unique identified you give to the secret-store and secret identifier being the unique key for the secret in that store. Multiple secret-stores can be defined and referenced by the id specified in the secret-store config.
To enable secret references in a config option you need to change the type to config.Secret, which will also make static secrets stored encrypted in memory. This has be done in this PR for inputs.http and inputs.sql as examples.

Secrets can be either managed using the tools native to the backend (e.g. using keyctl) or using Telegraf. Commands for getting/setting and listing secrets have been added (e.g. ./telegraf secret set <secretstore name> sqlthing "incredible"). When using Telegraf, the config is parsed and must contain an uncommented [[secretstores.xyz]] section.

Limitations

  • There is no access control, i.e. if a plugin references a secret it will get that secret no matter if it was intended by the user as secrets are simply injected into the structure. We might work out an access-control regime here, but it is not yet in place (suggestions welcome).
  • Currently, all @{\w+:[\w ]+} patterns are replaced (in the tagged fields). There is currently no way to escape that pattern.
  • tested with Linux kernel user-session keyring and file keyring ONLY

Suggestions, comments and critique is welcome!

@telegraf-tiger telegraf-tiger bot added the feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin label Jun 1, 2022
@srebhan srebhan added the security raise security concerns or improve the security of Telegraf label Jun 1, 2022
@srebhan srebhan mentioned this pull request Jun 1, 2022
3 tasks
@srebhan srebhan requested review from sspaink and reimda June 2, 2022 08:00
@jdstrand
Copy link
Contributor

I skimmed the PR and may have missed a detail, but how are secrets meant to survive reboots? I saw the section on JOSE, but then the kernel keyring seemed separate from that. Perhaps put another way: can you provide user journeys on how this is meant to be used along with lifecycle (create, modify, use, revoke, reboot, etc) discussion (all with pointers into implementation details)? Perhaps you have a spec that outlines all of this already. Thanks!

Copy link
Contributor

@reimda reimda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial questions

cmd/telegraf/telegraf.go Outdated Show resolved Hide resolved
config/config.go Show resolved Hide resolved
@srebhan
Copy link
Contributor Author

srebhan commented Jun 28, 2022

I skimmed the PR and may have missed a detail, but how are secrets meant to survive reboots? I saw the section on JOSE, but then the kernel keyring seemed separate from that.

The secrets in the linux secret-store will not survive a reboot. You need to re-inject them after boot-up. However, this is a limitation of the kernel keyring and out-of-scope for Telegraf.

Perhaps put another way: can you provide user journeys on how this is meant to be used along with lifecycle (create, modify, use, revoke, reboot, etc) discussion (all with pointers into implementation details)?

I'm not sure what you want me to do here, but I'm a bit hesitant to document how to setup and maintain a kernel keyring as it a general Linux thing, tools might change, etc...

@jdstrand
Copy link
Contributor

I skimmed the PR and may have missed a detail, but how are secrets meant to survive reboots? I saw the section on JOSE, but then the kernel keyring seemed separate from that.

The secrets in the linux secret-store will not survive a reboot. You need to re-inject them after boot-up. However, this is a limitation of the kernel keyring and out-of-scope for Telegraf.

Perhaps put another way: can you provide user journeys on how this is meant to be used along with lifecycle (create, modify, use, revoke, reboot, etc) discussion (all with pointers into implementation details)?

I'm not sure what you want me to do here, but I'm a bit hesitant to document how to setup and maintain a kernel keyring as it a general Linux thing, tools might change, etc...

I wasn't looking for you to add to the docs how to manage all this stuff. I was trying to understand the feature in the first place. I'm aware that the kernel keyring doesn't survive reboots, but the feature seems to imply removing secrets out of the config files and so I wondered who wants this feature and why. I'm unclear on what the expected experience is intended to be. Your answer implies that it is up to the user to put things where they need to be and to tell telegraf where they are, but I didn't glean that from this PR, which is why I asked for user journeys.

@jdstrand
Copy link
Contributor

I'm aware that the kernel keyring doesn't survive reboots, but the feature seems to imply removing secrets out of the config files and so I wondered who wants this feature and why. I'm unclear on what the expected experience is intended to be. Your answer implies that it is up to the user to put things where they need to be and to tell telegraf where they are, but I didn't glean that from this PR, which is why I asked for user journeys.

Replying to myself: I spoke with someone out of band and indeed, this is the requested feature. Specifically, the request is not for a complete solution in telegraf itself for secure storage, but instead to have a building block for secure secrets storage that the user can use as part of a larger solution where telegraf simply fetches secrets from a configured-outside-of-telegraf secure location (like the kernel keyring, etc).

@srebhan
Copy link
Contributor Author

srebhan commented Jun 29, 2022

Sorry for the late answer. Indeed, the idea is to be able to connect to other secret stores like Vault, AWS, etc, the current PR provides the infrastructure and two simple bare-minimum plugins for an MVP.
The other target point is to move secrets out of the telegraf config to be able to check it into code-versioning systems as usually policies of some users might not allow those files to contain secrets/credentials.

Hope this makes more sense now @jdstrand!?

Copy link
Contributor

@jdstrand jdstrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not completed my review but came across a few things that I wanted to highlight right away. See inline (in addition to the documentation updates, note that I wasn't able to successfully use @{id:secret_key} in my telegraf config TOML).

plugins/secretstores/jose/README.md Show resolved Hide resolved
plugins/secretstores/jose/README.md Outdated Show resolved Hide resolved
docs/COMMANDS_AND_FLAGS.md Outdated Show resolved Hide resolved
docs/COMMANDS_AND_FLAGS.md Outdated Show resolved Hide resolved
docs/COMMANDS_AND_FLAGS.md Outdated Show resolved Hide resolved
docs/COMMANDS_AND_FLAGS.md Outdated Show resolved Hide resolved
config/secret.go Outdated Show resolved Hide resolved
Copy link
Contributor

@jdstrand jdstrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finished my review of this PR. Please note, I am not a cryptographer and I have not done a full source code audit of the added dependencies, but I have spot-checked a few things.

The github.com/99designs/keyring project is used by telegraf's secret store as a standard interface for different credential storage services. For this PR, the keyring package is used to implement backends for file-backed, JOSE credential storage as well as native OS credential storage (keyctl on Linux, keychain for OSX and wincred for Windows). These backends are implemented as secret store telegraf plugins, and these plugins implement the new SecretStore interface and these implementations use keyring internally. This project appears to be healthy and used by a few folks, with 361 stars, 56 watchers and 98 forks. It is used by the popular github.com/99designs/aws-vault project (429k downloads, 6,500 stars, 123 watchers, 637 forks). github.com/99designs/keyring has < 2500 lines of code with reasonable documentation.

The github.com/awnumar/memguard project is used in an effort to securely handle secrets obtained from the secret store. The project appears to be healthy and fairly popular with 2,200 stars, 50 watchers and 102 forks. It has < 4000 lines of code. It has good documentation and between that and spot checking the code, the author has written this with security in mind. https://github.com/awnumar/memguard#features in particular discusses a number of features of the project.

At a high level, telegraf config initializes any configured secret stores and stores and retrieves secrets from the aforementioned keyring backends into protected memory via memguard. This PR initially sets up the http and sql input plugins to optionally use a configured secret store.

Please see inline for a discussion of how config/secret.go:read() improperly copies a protected secret into standard memory. For memguard to have its intended effectiveness, this should be addressed.

Please see inline in the go.sum diff for a discussion of the added dependencies. Nothing of great concern, but worth understanding.

Please see inline for a discussion of desired missing test cases. Eg, secret.go lacks test coverage. People understandably have different opinions on 100% coverage and I'm not advocating for that per se (I wouldn't object to it though!), however, I think with something as sensitive as secrets, it might make sense to expand tests to various error conditions/etc where possible since sometimes weird things can happen in these edge cases.

I did manual functional testing for the JOSE and OS/Linux secret-store usage and gave feedback in my initial, abbreviated review of this PR.

JOSE

The JOSE backend works well enough with telegraf secret set secretstore-jose myname mysecret creating a file-backed secret store at the configured directory path with file at /path/for/config/myname. The contents of the file are a base64 encoded JOSE file and encrypted to the password defined in the telegraf configuration (or via telegraf prompt) using PBES2_HS256_A128KW / A256GCM. Inspecting the file directly, we are unable to see the secret:

$ cat /dev/shm/secrets/myname
eyJhbGciOiJQQkVTMi1IUzI1NitBMTI4S1ciLCJjcmVhdGVkIjoiMjAyMi0wOC0wOSAxOTozMDoxNC4yMTY2NDU0NjIgKzAwMDAgVVRDIG09KzAuMTAzNDgwNDMzIiwiZW5jIjoiQTI1NkdDTSIsInAyYyI6ODE5MiwicDJzIjoibGw4M0M4VzJfRWFCclZ4eCJ9.Fkwwup5DCL6HN98dLqyhJpNYqlNtwfA6dPCRqhpPUzBBPJJ892qCWA.YTRpBuU0bk6PMG4K.nXw0W1dBRnmjq-YAA68XJ4SICqTO7NTJXvjMmSVR9_XsMwLJfNQ9qSmqmgMpdjXblVIV4nY6f0EbkNQodzCDrd-uBqLewNs7XeteeCbyevIz-A6QQxpFy-4uU0Ai4rlm_WbcOnSJZrkBkAzjtOVIpWLwvtx-k5ZzFbuIkz5VIdgUOFCI.4BuDxZk0c_7n9fhcliTKWg

$ for s in $(cat /dev/shm/secrets/myname | tr '.' '\n') ; do echo "# $s" ; echo "$s" | base64 -d ; echo ; echo ; done
# eyJhbGciOiJQQkVTMi1IUzI1NitBMTI4S1ciLCJjcmVhdGVkIjoiMjAyMi0wOC0wOSAxOTozMDoxNC4yMTY2NDU0NjIgKzAwMDAgVVRDIG09KzAuMTAzNDgwNDMzIiwiZW5jIjoiQTI1NkdDTSIsInAyYyI6ODE5MiwicDJzIjoibGw4M0M4VzJfRWFCclZ4eCJ9
{"alg":"PBES2-HS256+A128KW","created":"2022-08-09 19:30:14.216645462 +0000 UTC m=+0.103480433","enc":"A256GCM","p2c":8192,"p2s":"ll83C8W2_EaBrVxx"}

# Fkwwup5DCL6HN98dLqyhJpNYqlNtwfA6dPCRqhpPUzBBPJJ892qCWA
L0����7�.��&�X�Sm��:t��OS0A<�|�j�Xbase64: invalid input

# YTRpBuU0bk6PMG4K
a4i�4nN�0n

# nXw0W1dBRnmjq-YAA68XJ4SICqTO7NTJXvjMmSVR9_XsMwLJfNQ9qSmqmgMpdjXblVIV4nY6f0EbkNQodzCDrd-uBqLewNs7XeteeCbyevIz-A6QQxpFy-4uU0Ai4rlm_WbcOnSJZrkBkAzjtOVIpWLwvtx-k5ZzFbuIkz5VIdgUOFCI
�|4[WAFy�base64: invalid input

# 4BuDxZk0c_7n9fhcliTKWg
�ř4base64: invalid input

Obtaining the keys with telegraf secret get secretstore-jose works as expected (requires password in the telegraf config to match the password that was used to encrypt) and returns mysecret.

The JOSE backend supports multiple secrets (though see my prior review for edge cases).

OS

I only manually tested the Linux keyctl backend and looked at the keyring implementation for Linux as a sort of spot check for the keyring package. The OS backend works well enough with telegraf secret set secretstore-os myname mysecret2 creating entries in the kernel 'user keyring':

$ telegraf secret set secretstore-os myname mysecret2
2022-08-09T20:27:03Z I! Using config file: ../tg-secret.conf

$ keyctl show @u
Keyring
 361220595 --alswrv   1000 65534  keyring: _uid.1000
 814727799 --alswrv   1000  1000   \_ keyring: telegraf
 195945450 --alswrv   1000  1000       \_ user: myname

$ keyctl request user myname
195945450

$ keyctl print 195945450
mysecret2

Obtaining the keys with telegraf secret get secretstore-os works as expected and returns mysecret2 (note, no password is required, see below).

The OS/Linux backend supports multiple secrets.

Management of keys is done via syscalls and therefore mediated by various kernel mechanisms. Keys have an owning user id, owning group and security label. Each key also has a set of permissions which is beyond normal UNIX file permissions. Keys require unswappable kernel memory and are quota controlled. Keys may have an expiration.

Keys may be of different types: keyring, user, etc.

  • "user" is a general purpose key type and always kept in kernel memory. It is accessible for read and write from userspace.
  • "keyring" is a special key which stores links to other keys, similar to a directory holding links to files. Keys need to be anchored to not be garbage collected and keyrings are used to anchor keys. Types of anchor keyrings:
    • process keyrings: process credentials reference keyrings with specific semantics. 3 keyrings with different inheritance/shared: session-keyring(7) (inherited/shared by all child processes), process-keyring(7) (shared by all threads in a process) and thread-keyring(7) (specific to thread)
    • user keyrings: each UID has two keyrings: user-keyring(7) and user-session-keyring(7) and they exist for as long as the kernel knows about the UID record
    • persistent keyrings: a persistent-keyring(7) is available to each UID known to the system and may persist beyond the life of the UID record, but with an expiration
    • special keyrings: used by the kernel for special purposes, et, a system keyring for holding encryption keys for module signature verification. Typically not alterable by user space
    • unimplemented 'group keyring'

Keys have 'possession' rules and permissions. (see man keyrings)

With the telegraf implementation, it anchors a user key into a user keyring that is specific to the user id, so telegraf running as the telegraf user storing keys in the user keyring would have the key protected from other users on the system. It has view, read, write, search, link and setattr permissions for possessor and owner:

$ keyctl show @u
Keyring
 361220595 --alswrv   1000 65534  keyring: _uid.1000
 814727799 --alswrv   1000  1000   \_ keyring: telegraf
 195945450 --alswrv   1000  1000       \_ user: myname

# the 'alswrv' corresponds to setattr, link, search, write, read, view. There
# are 4 groups of these 6 characters that correspond to: possessor, owner, group
# and other.
$ keyctl describe 195945450
195945450: alswrvalswrv------------  1000  1000 user: myname

This setup is similar to the DAC protections given via /proc/<pid>/environ (which has 0400 permissions and mediated by the PTRACE_MODE_READ_FSCREDS capability).

Being in the kernel keyring, the kernel has additional access controls:

  • there are LSM hooks for keyctl
  • changing the ownership (uid and gid) of a key (man 2 keyctl, KEYCTL_CHOWN) requires CAP_SYS_ADMIN
  • changing the permissions of the key (man 2 keyctl, KEYCTL_SETPERM) requires CAP_SYS_ADMIN for keys that the user does not own.

/proc/keys is 0444 and shows all keyring names that the user can access (based on internal kernel access controls). /proc/key-users is 0444 and show statistics for different keyring users.

In general, telegraf when configured to use OS on Linux is not prescribing a methodology to securely put keys into the kernel keyring for the telegraf user as part of a larger solution. Instead, it provides a means for telegraf to access them (and tooling to insert). This portion of the implementation alone is not significantly different from putting a variable in the environment for telegraf to use. What makes this PR interesting from a security perspective is the addition of memguard for these secrets (an improvement over today) and by virtue of using the kernel keyring, it allows users to populate in the manner they choose (such as trusted keys via TPM support, etc). It might be interesting to consider 'encrypted keys' in future work depending on requirements.

References:

Summary

On the whole, this looks good overall. The choice of software seems reasonable and adding JOSE along with OS/Linux, OS/OSX and OS/Windows in the initial implementation of this PR seems useful to telegraf users.

I'd like to see more test coverage and consideration for secret handling after leaving memguard. See inline and my previous review for other comments.

cmd/telegraf/telegraf.go Outdated Show resolved Hide resolved
config/secret.go Show resolved Hide resolved
config/secret.go Outdated Show resolved Hide resolved
config/secret.go Show resolved Hide resolved
config/secret.go Show resolved Hide resolved
go.sum Outdated Show resolved Hide resolved
config/secret.go Outdated
var buf bytes.Buffer
_, err = io.Copy(&buf, lockbuf.Reader())

return buf.Bytes(), err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This puts the sensitive key material into standard, unprotected (dumpable, swappable, garbage collected) memory.

https://pkg.go.dev/github.com/awnumar/memguard says: "The general workflow is to store sensitive information in Enclaves when it is not immediately needed and decrypt it when and where it is. After use, the LockedBuffer should be destroyed."

This API does not guarantee this as it simply returns the secret in []byte which, as mentioned, is unprotected. The caller could end up holding onto it, which would defeat the purpose of using memguard. Even if it didn't hold onto, it is going to stick around until the GC recycles it, which isn't ideal since it wouldn't be securely cleared (AIUI) and could be dumped or swapped out before the recycling.

Perhaps it makes sense to adjust the API to return a *bytes.Reader instead? Maybe you could even do something fancy like return a Reader that when read from once, Destroy()s the underlying LockedBuffer (eg, perhaps the API is oneTimeReader() or similar). Doing just the former will keep the secret in protected memory whereas the latter more closely adheres to upstream's recommended API usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasoning was the following. The read function shown above tries to guarantee the

After use, the LockedBuffer should be destroyed.

part (see defer lockbuf.Destroy()). It is used in two places, the Link part and the Get part. In both places, we need the unencrypted secret to either link it to the secret stores (or replace static parts) or to return it to the plugin for further use (e.g. to add it to a query or authentication call). Unfortunately, we cannot return the lockbuf Reader, as it is not valid after destruction. Returning it to the plugin or Link caller is not an option, as we cannot guarantee the lockbuf is actually destroyed, running into instance limitations.

This leaves us with copying the read code to the two calling functions (Link and Get). For both functions, this has the gain to safe a placement on the stack, but the lockbuf destruction might be delayed and we have code duplication. Anyway, if you prefer to have it done this way, I can change the code.

In all cases, we, at some point, need to make the unencrypted secret accessible to the plugin for use. As most of the secrets are embedded into authentication calls requiring the plain-text password as string, this is unavoidable and will always lead to places where the secret is stored in unprotected memory. The memguard protection just pushes this part a bit down the pipe at least _enabling _ plugins to handle secrets safely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasoning was the following. The read function shown above tries to guarantee the

After use, the LockedBuffer should be destroyed.

part (see defer lockbuf.Destroy()). It is used in two places, the Link part and the Get part. In both places, we need the unencrypted secret to either link it to the secret stores (or replace static parts) or to return it to the plugin for further use (e.g. to add it to a query or authentication call). Unfortunately, we cannot return the lockbuf Reader, as it is not valid after destruction. Returning it to the plugin or Link caller is not an option, as we cannot guarantee the lockbuf is actually destroyed, running into instance limitations.

Yes, I saw this which was why I mentioned that perhaps there was a pattern you could employ that would return an implementation of an interface that would have a method that would both read the secret out and then destroy the LockedBuffer, which would satisfy this recommendation and make that method sorta 'one time use'. Note, I was spit-balling here.

This leaves us with copying the read code to the two calling functions (Link and Get). For both functions, this has the gain to safe a placement on the stack, but the lockbuf destruction might be delayed and we have code duplication. Anyway, if you prefer to have it done this way, I can change the code.

In all cases, we, at some point, need to make the unencrypted secret accessible to the plugin for use. As most of the secrets are embedded into authentication calls requiring the plain-text password as string, this is unavoidable and will always lead to places where the secret is stored in unprotected memory. The memguard protection just pushes this part a bit down the pipe at least _enabling _ plugins to handle secrets safely.

Reading the code, I don't think this is on the stack since a Buffer is a variable-sized buffer of bytes with Read and Write methods. Even if it were on the stack, while it would be safe from the GC, it could still be dumpable and swapped (at least on Linux).

It's understood that the unencrypted secret must be made available to the plugin to use. What I was trying to think through was the lifetime of the unencrypted secret. memguard's goals are to "reduce the likelihood of sensitive data being exposed when in memory" with the idea that you leave it in memguard's enclave as often as is possible and only pull it out when absolutely needed.

The way that I thought about this wrt telegraf was that once the secret was fetched from secure storage and placed under memguard, the plugins would ask memguard to decrypt the secret every time they needed it and then immediately discard it which could have a meaningful improvement of security iff the plugins handled the decrypted secret safely (which might include a guarded allocation but perhaps simply on the stack (to avoid GC; perhaps overwriting with 0s as part of a defer could be interesting as well...)). This is a meaningful improvement since the unencrypted secret is rarely in memory (in terms of CPU time) and therefore far less likely to be leaked unencrypted to disk. It is not perfect since there might be brief moments where it is exposed and of course if a debugger is attached (though, on Linux PR_SET_DUMPABLE is an option to prevent ptrace).

Looking at the code:

  • plugins/inputs/http/http.go:gatherURL() fetches the username and password from the secure enclave when the URL is constructed, and it looks like it is constructed each time telegraf runs the plugin (ie, the URL (and therefore the username and password) doesn't seem to be cached; please correct me. This is good!). However, plugins/inputs/http/http.go:gatherURL() holds onto the memory for the rest of the function instead of wiping it, etc after h.client.Do(request). This is a missed opportunity, not to mention the allocation is on the heap (see above)
  • plugins/inputs/sql/sql.go:Init() fetches the DSN from the secure enclave and doesn't cache it (good!) and then holds onto it for the remainder of the function. This is a missed opportunity, not to mention the allocation is on the heap (see above)
  • plugins/inputs/sql/sql.go:Start() fetches the DSN from the secure enclave and then holds onto it via s.db, err = dbsql.Open(s.driverName, dsn) until plugins/inputs/sql/sql.go:Stop() is called (I didn't chase down if s.db.Close() (called from Stop()) let go of it; presumably it would). It appears that Start() and Stop() are called around s.startGathering() (based on plugins/common/shim/input.go); I didn't confirm this further. The DSN is in memory for quite some time with this plugin. If I'm reading correctly, for the entire time that all input plugins are running (please correct me; this was something I inferred)
  • config/secret.go:Link() fetches the secret and holds onto it for long enough to check to see if the secret currently stored in memguard is the same as the one fetched from secret storage (this is good). The allocation is on the heap (see above) and therefore could be improved

The http plugin, sql:Init() and Link() are quite close to what I think is desired in that they are fetching when they need it, though I think there are improvements on the timing of when they 'let go' of the unencrypted secret and more importantly, that (I believe) the allocation is on the heap and will linger in the GC. Ensuring it is on the stack, scribbling over it with 0s/non-blocking random data immediately after use or (gasp) using CGO with calloc (then scribble 0s/non-blocking random data and free after use) might be options.

The architecture for Start() input plugins that need to start, gather all the inputs, then Stop() all the input plugins that need to stop is a bit problematic and is at odds with how we'd like to use memguard. I'm not sure what to suggest here (Start()ing, gathering and Stop()ing all the secret storage-using input plugins one at a time and then doing all the non-secret storage-using input plugins like today could be a way forward; there are likely other choices). If nothing is done with this plugin, it should probably at a minimum be called out in documentation as a limitation of this input plugin's use of secret storage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all I moved the read() code into the two places where it was called and reduced the number of copies flying around. I cannot think of a way to reduce this further in those functions.
Regarding an interface function, I would rather not want that. While it might slightly extend the range of the encrypted secret it has multiple drawbacks making this very fragile as we give-up the control over the lockbuf

  1. Only a limited number of lockbuf objects can exist at the same time (see documentation), so if the receiver holds a reference for too long we might run out of buffers.
  2. As we do not have control over the potential implementation, we cannot guarantee that lockbuf.Destroy() is actually called, e.g. if the secret is never read.
  3. A plugin can still copy the content of the lockbuf, reducing the overall gain at the cost of a higher complexity.

Regarding the plugins, I really appreciate your analysis and we should definitively look into those. However, I think the current PR cannot and should not try to solve all problems at once. Usually the plugins will call external function which usually do not make any attempt to secure credentials (database drivers, API implementations, etc). So let's push this down to the plugin level for now and allow for plugins which handle stuff securely, which is already a leap to the current state.

plugins/secretstores/os/os.go Show resolved Hide resolved
plugins/secretstores/jose/jose.go Show resolved Hide resolved
plugins/secretstores/jose/jose.go Outdated Show resolved Hide resolved
Copy link
Contributor

@jdstrand jdstrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates to the PR! I looked at the changes for addressing my feedback and things are looking really good. I did a cursory look over the whole PR and nothing jumped out, but am relying on the telegraf team to review those parts.

I also looked at the revamped CLI since my initial review and it works nicely.

I couldn't remember if I verified the file permissions with JOSE, so I did this time:

$ ls -l /dev/shm
total 0

$ telegraf set-secret secretstore myname mypass
2022/10/14 15:45:31 I! Using config file: ./tg-secret.conf
Enter passphrase to unlock "/dev/shm/secrets":

$ ls -lR /dev/shm
/dev/shm:
total 0
drwx------ 2 ubuntu ubuntu 60 Oct 14 15:45 secrets

/dev/shm/secrets:
total 4
-rw------- 1 ubuntu ubuntu 468 Oct 14 15:45 myname

I left feedback for a couple of things in-line.

Also, regarding your comment:

Regarding the plugins, I really appreciate your analysis and we should definitively look into those. However, I think the current PR cannot and should not try to solve all problems at once. Usually the plugins will call external function which usually do not make any attempt to secure credentials (database drivers, API implementations, etc). So let's push this down to the plugin level for now and allow for plugins which handle stuff securely, which is already a leap to the current state.

That's a fair point. I do think that these plugins are going to provide a template for future plugins use of secrets, so if pushing down to the plugin level, then lets make them as good as we can and:

  • write all zeros to the variable after use of the secret in the plugin. That should be straightforward for the http plugin (eg, in plugins/inputs/http/http.go:gatherURL() but also verify anywhere else in the plugin). For the sql plugin, we can only do this in Stop() I think since with the current architecture, it needs to hold onto it
  • add a TODO comment in a strategic location for addressing my previous comment: "The architecture for Start() input plugins that need to start, gather all the inputs, then Stop() all the input plugins that need to stop is a bit problematic and is at odds with how we'd like to use memguard." I'm not sure what you want to say, but the idea is just mention something about how the architecture needs some tweaking to better support secret-store, perhaps adding a URL to comments in this PR
  • add a comment in the sql plugin documentation about this limitation of its usage of secret storage (holding onto the unencrypted secret in memory during runtime)

config/secret.go Outdated Show resolved Hide resolved
plugins/inputs/sql/sql.go Outdated Show resolved Hide resolved
docs/COMMANDS_AND_FLAGS.md Outdated Show resolved Hide resolved
docs/COMMANDS_AND_FLAGS.md Outdated Show resolved Hide resolved
plugins/secretstores/os/README.md Outdated Show resolved Hide resolved
plugins/secretstores/os/sample_darwin.conf Outdated Show resolved Hide resolved
@srebhan
Copy link
Contributor Author

srebhan commented Oct 19, 2022

Thanks @jdstrand for your review! I addressed your comments and would like to keep the list-secrets command for convenience...

[...] I did a cursory look over the whole PR and nothing jumped out, but am relying on the telegraf team to review those parts.

Yes, that's the plan. After resolving all security issues between us ;-) I'll ask the Telegraf team to review the PR.

[...] I do think that these plugins are going to provide a template for future plugins use of secrets, so if pushing down to the plugin level, then lets make them as good as we can and:

write all zeros to the variable after use of the secret in the plugin. That should be straightforward for the http plugin (eg, in plugins/inputs/http/http.go:gatherURL() but also verify anywhere else in the plugin). For the sql plugin, we can only do this in Stop() I think since with the current architecture, it needs to hold onto it

I could not find a reliable way to overwrite the string with zeros as in Golang strings are immutable. Do you know of a way? The only thing I could think of is to return the secret as []byte which can be overwritten and then pass the secret on using
someFunction(...,string(secret),...). Do you think this is a better way? If so, I can modify the code accordingly.

add a TODO comment in a strategic location for addressing my previous comment: "The architecture for Start() input plugins that need to start, gather all the inputs, then Stop() all the input plugins that need to stop is a bit problematic and is at odds with how we'd like to use memguard." I'm not sure what you want to say, but the idea is just mention something about how the architecture needs some tweaking to better support secret-store, perhaps adding a URL to comments in this PR
add a comment in the sql plugin documentation about this limitation of its usage of secret storage (holding onto the unencrypted secret in memory during runtime)

I don't think this is an issue. As even for the sql plugin, we could "delete" the plain-text secret after passing it on to the respective library function. I guess that works for most, if not all, plugins, which brings us back to the question above on how to best "delete/erase" the plain-text stuff... I appreciate any idea!

@srebhan
Copy link
Contributor Author

srebhan commented Oct 19, 2022

@jdstrand the last commit converts the returned secrets as []byte and erases them ASAP in the plugin functions. I also lock the memory (where available) to prevent swapping of the memory section. Do you like it?

Copy link
Contributor

@jdstrand jdstrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[...] I do think that these plugins are going to provide a template for future plugins use of secrets, so if pushing down to the plugin level, then lets make them as good as we can and:

  • write all zeros to the variable after use of the secret in the plugin. That should be straightforward for the http plugin (eg, in plugins/inputs/http/http.go:gatherURL() but also verify anywhere else in the plugin). For the sql plugin, we can only do this in Stop() I think since with the current architecture, it needs to hold onto it

I could not find a reliable way to overwrite the string with zeros as in Golang strings are immutable. Do you know of a way? The only thing I could think of is to return the secret as []byte which can be overwritten and then pass the secret on using
someFunction(...,string(secret),...). Do you think this is a better way? If so, I can modify the code accordingly.

I may not have been clear. config/secret.go:Get() is defined as func (s *Secret) Get() ([]byte, error) and, for example, plugins/inputs/http/http.go does:

func (h *HTTP) setRequestAuth(request *http.Request) error {
        username, err := h.Username.Get()
        if err != nil {
                return fmt.Errorf("getting username failed: %v", err)
        }
        defer config.ReleaseSecret(username)
        password, err := h.Password.Get()
        if err != nil {
                return fmt.Errorf("getting password failed: %v", err)
        }
        defer config.ReleaseSecret(password)
        if len(username) != 0 || len(password) != 0 {
                request.SetBasicAuth(string(username), string(password))                                 
        }
        return nil
}

I'm only suggesting that you do something like:

...
                request.SetBasicAuth(string(username), string(password))
                // These local slices are backed by dynamically allocated arrays, so zero them out
                // manually so they don't persist in GC. Note: slices can also be on the heap based on
                // what the Go compiler decides and if what was append()ed to them exceeded the
                // original allocation. This also makes the window smaller for dumpable and swapped
                // memory.
                for i, _ := range username {
                        username[i] = '0'
                }
                for i, _ := range password {
                        password[i] = '0'
                }

Feel free to make this a helper function since the slice has an internal pointer to the backing array and this would access the backing array directly.

Hopefully this helps!

config/util.go Show resolved Hide resolved
@srebhan
Copy link
Contributor Author

srebhan commented Oct 20, 2022

@jdstrand regarding zeroing the secret's byte array, this is what the defer config.ReleaseSecret(...) lines do. They read

func ReleaseSecret(secret []byte) error {
	memguard.WipeBytes(secret)
	return nil
}

and on Linux the byte-array is mlocked between Get() and ReleaseSecret() to prevent swapping it out. Sorry, but I do not get what you want me to do beyond that!? :-(

Copy link
Contributor

@powersj powersj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments, should be quick and we land

docs/COMMANDS_AND_FLAGS.md Outdated Show resolved Hide resolved
docs/COMMANDS_AND_FLAGS.md Outdated Show resolved Hide resolved
Copy link
Contributor

@powersj powersj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@powersj powersj merged commit c98115e into influxdata:master Dec 8, 2022
@telegraf-tiger
Copy link
Contributor

telegraf-tiger bot commented Dec 8, 2022

Copy link
Contributor

@Hipska Hipska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bummer I missed this PR while it was still open, but here some remarks anyway. I hope I can still get a reply on them or they are addressed in a future PR.

First of all, there isn't a docs/SECRETSTORES.md to explain what is required when creating a new plugin just like there is for any other type of plugin. Beware to also reference this in README.md#documentation.

plugins/secretstores/jose/README.md Show resolved Hide resolved
if err != nil {
return fmt.Errorf("getting username failed: %v", err)
}
defer config.ReleaseSecret(username)
Copy link
Contributor

@Hipska Hipska Dec 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can someone explain why this release is needed to be in this place? It seems easy to overlook and forget to implement this in other plugins.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The (clear-text) secret given to you is a byte-array. To protect this clear-text to be e.g. swapped-out to disk, we protect the memory region and want to release this protection. Furthermore, we want to safely wipe the secret from memory once it is not used anymore. Therefore, we need some indication when to do this (i.e. the release). We could of course also clean this up on garbage-collection, but this would make the plain-text secret to stay in memory for an unforeseeable time...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that's all clear, but then those clear-text bytes are copied to a string and passed to request.SetBasicAuth where they also will be readable in memory. And also the username byte array will be removed when the current method is returned? So that will not stay longer in memory than the execution time of this specific method, or is this not the case in GO?

plugins/secretstores/jose/jose.go Show resolved Hide resolved
plugins/secretstores/jose/jose.go Show resolved Hide resolved
plugins/secretstores/jose/sample.conf Show resolved Hide resolved
plugins/secretstores/os/README.md Show resolved Hide resolved
@srebhan srebhan mentioned this pull request Dec 15, 2022
3 tasks
powersj added a commit to powersj/telegraf that referenced this pull request Jan 3, 2023
PR influxdata#12127 added the ability to reload on files in configuration
directories. However, this broke any and all reloading. The PR assumed
that the list of config files was initialized and set up, however it is
always empty as-is.

To populate the config files variable, the secret-store PR, influxdata#11232,
added the loadConfiguraiton call. This needs to be run first before
attempting to watch any files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. security raise security concerns or improve the security of Telegraf
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hide passwords in Telegraf config
6 participants