Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement tbot init subcommand and ACL management #10289

Merged
merged 82 commits into from
Mar 10, 2022
Merged

Conversation

timothyb89
Copy link
Contributor

@timothyb89 timothyb89 commented Feb 10, 2022

This adds a new CLI subcommand to initialize a tbot destination directory by creating required files ahead of time and assigning proper permissions (and ACLs, where possible). It also adds a number of UX improvements as a result of some significant identity refactoring.

ACLs are used to allow the bot and end-user workflow to run as separate UNIX users. The bot can be assigned a dedicated UNIX user and given exclusive access to /var/lib/teleport/bot where the renewable certificate is stored. The end user then uses tbot init to create the secondary (non-renewable) certificate directory with only their own UNIX permissions, but uses ACLs to allow the bot user to deposit the non-renewable certificates to their own directory securely. Should an attacker compromise the workflow user, they may be able to steal the secondary certificates, but should not be able to access the bot's renewable certificates. This should limit the longevity and (depending on configuration) degree of access an attacker might gain on a compromised system.

Summary of changes

  1. Identity refactoring: save and load now accept kind tags and only attempt to store/load artifacts matching the specified tags
  2. Implement kinds in the config which was previously not honored, so now only artifacts of the given kind are stored and loaded (i.e. ssh, tls, or both)
    • Also includes compatibility fixes to support one identity type (SSH or TLS) where previously quite a lot of code assumed a fully realized identity always existed.
  3. Implements tbot init subcommand to configure file ACLs automatically
  4. Add configurable ACL checking and symlink attack mitigation
    • ACL checking is warn-only by default on the bot's end, but can be configured to quit on error by setting acls: on in the directory YAML config.
    • Symlink attack mitigation makes writes fail if a path contains any symlinks. It's enabled by default on Linux 5.6+, but can be disabled by setting symlinks: insecure in the directory YAML config.
  5. Test destinations before attempting a renewal.
    • Previously, if certificates couldn't be written, subsequent renewals would trigger a generation counter lock. The bot now attempts a write first to help ensure the destination is usable, and if not, fails early without locking the bot user.
  6. Store a hash of the join token and throw away the old identity when it changes, fixing a common edge case
  7. Create files directories automatically if they don't exist (to support the non-ACL use-case)
  8. Add hint to use tbot init as part of the tctl bots add ... help message

Usage

For simple deployments where file permissions aren't critical, the standard tbot instructions still work, i.e. just running the example command printed by tctl bots add .... For users that wish to run the bot as a separate Linux user and restrict other users on the system from accessing the renewable certs, they can do the following (assuming their OS and filesystem support ACLs):

# on the cluster: create the bot user
$ tctl -c ~/.local/share/teledev/foo.example.com/teleport.yaml bots add test --roles=access
The bot token: 7ff4407e924cec953f679be719a93fdb
This token will expire in 59 minutes.

Run this on the new bot node to join the cluster:

> tbot start \
   --destination-dir=./tbot-user \
   --token=7ff4407e924cec953f679be719a93fdb \
   --ca-pin=sha256:634470e113d96a18ec1d76057fff0921f43347397fc107ccabe597dbbe07fdc8 \
   --auth-server=auth.example.com

Please note:

  - This invitation token will expire in 59 minutes
  - 192.168.122.10:3025 must be reachable from the new node

# on the bot node: create a user and storage dir for the bot
alice $ sudo useradd -r tbot
alice $ sudo mkdir -p /var/lib/teleport/bot
alice $ sudo chown tbot:tbot /var/lib/teleport/bot

# on the bot node as root, initialize the destination and grant read/write to the bot and end user
alice $ sudo tbot init --destination-dir=./tbot-user --bot-user=tbot --reader-user=alice

# now that permissions have been configured, we can start the bot like usual
# on the bot node as the bot user (using sudo for simplicity)
alice $ sudo -u tbot tbot start \
   --destination-dir=./tbot-user \
   --token=7ff4407e924cec953f679be719a93fdb \
   --ca-pin=sha256:634470e113d96a18ec1d76057fff0921f43347397fc107ccabe597dbbe07fdc8 \
   --auth-server=auth.example.com

# now certificates have been issued to ./tbot-user: owned by nobody, but written by tbot and readable by ailce

This adds a new `tbot` tool to continuously renew a set of
certificates after registering with a Teleport cluster using a
similar process to standard node joining.

This makes some modifications to user certificate generation to allow
for certificates that can be renewed beyond their original TTL, and
exposes new gRPC endpoints:
 * `CreateBotJoinToken` creates a join token for a bot user
 * `GenerateInitialRenewableUserCerts` exchanges a token for a set of
   certificates with a new `renewable` flag set

A new `tctl` command, `tctl bots add`, creates a bot user and calls
`CreateBotJoinToken` to issue a token. A bot instance can then be
started using a provided command.
* Use role requests to split renewable certs from end-user certs
* Add bot configuration file
* Use `teleport.dev/bot` label
* Remove `impersonator` flag on initial bot certs
* Remove unnecessary `renew` package
* Misc other cleanup
This adds additional restrictions on when a certificate's `renewable`
flag is carried over to a new certificate. In particular, it now also
denies the flag when either role requests are present, or the
`disallowReissue` flag has been previously set.

In practice `disallow-reissue` would have prevented any undesired
behavior but this improves consistency and resolves a TODO.
* Fully flesh out config template rendering
* Fix rendering for SSH configuration templates
* Added `String()` impls for destination types
* Improve certificate renewal logging; show more detail
* Properly fall back to default (all) roles
* Add mode hints for files
* Add/update copyright headers
* Add `CreateBot`, `DeleteBot`, and `GetBotUsers` gRPC endpoints
* Replace `tctl bot (add|rm|ls)` implementations with gRPC calls
* Define a few new constants, `DefaultBotJoinTTL`, `BotLabel`,
  `BotGenerationLabel`
* Fixed a few nil pointer derefs when using config from CLI args
* Properly create destination if `--destination-dir` flag is used
* Remove improper default on CLI flag
* `DestinationConfig` is now a list of pointers
Fixes the majority of smaller issues caught by reviewers, thanks all!
Issuing initial renewable certificate ended up requiring a lot of
hacks to skip checks that prevented anonymous bots from getting
certs even though we'd verified their identity elsewhere (via token).

This reverts all those hacks and splits initial bot cert logic into a
dedicated `generateInitialRenewableUserCerts()` function which should
make the whole process much easier to follow.
This adds a new CLI subcommand to initialize a tbot destination
directory by creating required files ahead of time and assigning
proper permissions (and ACLs, where possible).
This was referenced Feb 10, 2022
@timothyb89
Copy link
Contributor Author

I've now reworked the flow per the above comment. tbot init is now run by a (ideally) secondary user or root and grants permissions to the reader user via ACLs, which bypasses OpenSSH's permissions check. There's a few valid configurations for this:

  • Run as root, owned by nobody:nobody (or any other permissionless user), with read/write granted via ACL to the bot and reader. tbot init will correct ownership and permissions automatically. This is the default.
  • Run as the file owner, with separate bot and reader users.
  • Run as the bot, with a separate reader user.
  • These do work but generate a warning as they will break OpenSSH:
    • Bot, reader, and owner are all the same. This is allowed but weird, and will break OpenSSH. ACLs should just not be used (the warnings say to turn them off), and for that matter tbot init is superfluous.
    • Reader and owner are the same. This is the original behavior and, while valid for some use-cases, breaks OpenSSH.

tool/tbot/botfs/botfs.go Outdated Show resolved Hide resolved
tool/tbot/botfs/botfs.go Outdated Show resolved Hide resolved
tool/tbot/botfs/botfs.go Outdated Show resolved Hide resolved
// supported on all platforms and will return a trace.NotImplemented in that
// case.
func GetOwner(fileInfo fs.FileInfo) (*user.User, error) {
if runtime.GOOS == constants.WindowsOS {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the Windows check necessary, or will the type assertion below also fail on Windows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can probably rely on that.

tool/tbot/botfs/fs_linux.go Outdated Show resolved Hide resolved
defer file.Close()

// Blatantly stolen from go's os/file.go since there's no stdlib function
// for reading everything from an _already open_ file, sigh.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about io.ReadFull(f)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think io.ReadAll() is even better. Honestly perplexed how my googling failed to find that one, I guess I'm losing my touch.

tool/tbot/botfs/fs_linux.go Outdated Show resolved Hide resolved
tool/tbot/config/destination_directory.go Outdated Show resolved Hide resolved
tool/tbot/main.go Outdated Show resolved Hide resolved
tool/tbot/main.go Outdated Show resolved Hide resolved
- Rename ACLOn -> ACLRequired
- Simplify fs_linux.Read()
- Add missing fs_other.Read()
- Hoist renewal loop logic into its own function
- A few misc bugfixes
tool/tbot/botfs/botfs.go Outdated Show resolved Hide resolved
tool/tbot/botfs/botfs.go Outdated Show resolved Hide resolved
tool/tbot/botfs/botfs.go Outdated Show resolved Hide resolved
tool/tbot/botfs/fs_linux.go Show resolved Hide resolved
return trace.Wrap(err)
}

// TODO: this will be very noisy on older systems. Maybe flip a global
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also wrap this in a sync.Once

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems nice enough, thanks!

// permissions, appropriate for secret data.
ModeHintSecret
)

// Destination can persist renewable certificates.
type Destination interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: destination.Destination stutters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like it, but it needs to live in its own package to avoid import cycles and will most likely remain alone in there. I'm failing to come up with a package name that makes more sense, but am open to suggestions!

tool/tbot/identity/identity.go Outdated Show resolved Hide resolved
tool/tbot/init.go Outdated Show resolved Hide resolved
Comment on lines +479 to +481
if err := botfs.Create(path, isDir, destDir.Symlinks); err != nil {
return trace.Wrap(err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this fails should we clean up any files already created?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's idempotent, so I think the workflow for failure here is to correct the issue and re-run anyway. Cleaning up in general is a little bit scary (we only do it with --clean right now) and likely to fail anyway if a create just failed.

tool/tbot/init.go Outdated Show resolved Hide resolved
timothyb89 and others added 3 commits March 8, 2022 11:41
Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
- Only log syscall warning once
- Formatting and wording changes
- Improve error handling for `--clean`
aggregate := trace.NewAggregate(errors...)
if dd.ACLs == botfs.ACLRequired {
// Hard fail if ACLs are specifically requested and there are errors.
return aggregate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aggregate could be nil here though, right?

I'm wondering if you want to move the Warnf below because it looks like there's at least one possible code path that could exit with error and not log.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thinking is that it will log the error by virtue of exiting with it (assuming there are any errors).

@r0mant r0mant mentioned this pull request Mar 9, 2022
38 tasks
Copy link
Collaborator

@r0mant r0mant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm if you've made sure this builds on macOS/Windows (do we build tbot in Windows?)

tool/tbot/botfs/botfs.go Outdated Show resolved Hide resolved
tool/tbot/botfs/botfs.go Outdated Show resolved Hide resolved
tool/tbot/botfs/botfs.go Outdated Show resolved Hide resolved
tool/tbot/botfs/fs_linux.go Outdated Show resolved Hide resolved
return trace.Wrap(err)
}

switch dd.Symlinks {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd consider implementing CheckAndSetDefaults on SymlinksMode and ACLMode that do this.

tool/tbot/identity/artifact.go Show resolved Hide resolved
tool/tbot/identity/identity.go Outdated Show resolved Hide resolved
now := time.Now().UTC()
if now.After(validBefore) {
log.Errorf(
"Identity appears to have expired. The renewal is likely to fail. (expires: %s, current time: %s)",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think it's pretty definitive, no? :)

Suggested change
"Identity appears to have expired. The renewal is likely to fail. (expires: %s, current time: %s)",
"Identity has expired. The renewal will fail. (expires: %s, current time: %s)",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hedging since it's also possible the system time is off, but I think this is more clear anyway.

@timothyb89 timothyb89 enabled auto-merge (squash) March 10, 2022 04:30
@timothyb89 timothyb89 merged commit 32e4801 into master Mar 10, 2022
@timothyb89 timothyb89 deleted the timothyb89/tbot-init branch March 10, 2022 06:09
timothyb89 added a commit that referenced this pull request Mar 10, 2022
---

Implement `tbot init` subcommand and ACL management (#10289)

* Add certificate renewal bot

This adds a new `tbot` tool to continuously renew a set of
certificates after registering with a Teleport cluster using a
similar process to standard node joining.

This makes some modifications to user certificate generation to allow
for certificates that can be renewed beyond their original TTL, and
exposes new gRPC endpoints:
 * `CreateBotJoinToken` creates a join token for a bot user
 * `GenerateInitialRenewableUserCerts` exchanges a token for a set of
   certificates with a new `renewable` flag set

A new `tctl` command, `tctl bots add`, creates a bot user and calls
`CreateBotJoinToken` to issue a token. A bot instance can then be
started using a provided command.

* Cert bot refactoring pass

* Use role requests to split renewable certs from end-user certs
* Add bot configuration file
* Use `teleport.dev/bot` label
* Remove `impersonator` flag on initial bot certs
* Remove unnecessary `renew` package
* Misc other cleanup

* Do not pass through `renewable` flag when role requests are set

This adds additional restrictions on when a certificate's `renewable`
flag is carried over to a new certificate. In particular, it now also
denies the flag when either role requests are present, or the
`disallowReissue` flag has been previously set.

In practice `disallow-reissue` would have prevented any undesired
behavior but this improves consistency and resolves a TODO.

* Various tbot UX improvements; render SSH config

* Fully flesh out config template rendering
* Fix rendering for SSH configuration templates
* Added `String()` impls for destination types
* Improve certificate renewal logging; show more detail
* Properly fall back to default (all) roles
* Add mode hints for files
* Add/update copyright headers

* Add stubs for tbot init and watch commands

* Add gRPC endpoints for managing bots

* Add `CreateBot`, `DeleteBot`, and `GetBotUsers` gRPC endpoints
* Replace `tctl bot (add|rm|ls)` implementations with gRPC calls
* Define a few new constants, `DefaultBotJoinTTL`, `BotLabel`,
  `BotGenerationLabel`

* Fix outdated destination flag in example tbot command

* Bugfix pass for demo

* Fixed a few nil pointer derefs when using config from CLI args
* Properly create destination if `--destination-dir` flag is used
* Remove improper default on CLI flag
* `DestinationConfig` is now a list of pointers

* Address first wave of review feedback

Fixes the majority of smaller issues caught by reviewers, thanks all!

* Add doc comments for bot.go functions

* Return the token TTL from CreateBot

* Split initial user cert issuance from `generateUserCerts()`

Issuing initial renewable certificate ended up requiring a lot of
hacks to skip checks that prevented anonymous bots from getting
certs even though we'd verified their identity elsewhere (via token).

This reverts all those hacks and splits initial bot cert logic into a
dedicated `generateInitialRenewableUserCerts()` function which should
make the whole process much easier to follow.

* Set bot traits to silence log messages

* tbot log message consistency pass

* Implement `tbot init` subcommand

This adds a new CLI subcommand to initialize a tbot destination
directory by creating required files ahead of time and assigning
proper permissions (and ACLs, where possible).

* Resolve lints

* Add config tests

* Remove CreateBotJoinToken endpoint

Users should instead use the CreateBot/DeleteBot endpoints.

* Create a fresh private key for every impersonated identity renewal

* Hide `config` subcommand

* Rename bot label prefix to `teleport.internal/`

* Use types.NewRole() to create bot roles

* Clean up error handling in custom YAML unmarshallers

Also, add notes about the supported YAML shapes.

* Fetch proxy host via gRPC Ping() instead of GetProxies()

* Update lib/auth/bot.go

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* Fix some review comments

* Add renewable certificate generation checks (#10098)

* Add renewable certificate generation checks

This adds a new validation check for renewable certificates that
maintains a renewal counter as both a certificate extension and a
user label. This counter is used to ensure only a single certificate
lineage can exist: for example, if a renewable certificate is stolen,
only one copy of the certificate can be renewed as the generation
counter will not match

When renewing a certificate, first the generation counter presented
by the user (via their TLS identity) is compared to a value stored
with the associated user (in a new `teleport.dev/bot-generation`
label field). If they aren't equal, the renewal attempt fails.
Otherwise, the generation counter is incremented by 1, stored to the
database using a `CompareAndSwap()` to ensure atomicity, and set on
the generated certificate for use in future renewals.

* Add unit tests for the generation counter

This adds new unit tests to exercise the generation counter checks.

Additionally, it fixes two other renewable cert tests that were
failing.

* Remove certRequestGeneration() function

* Emit audit event when cert generations don't match

* Fully implement `tctl bots lock`

* Show bot name in `tctl bots ls`

* Lock bots when a cert generation mismatch is found

* Make CompareFailed respones from validateGenerationLabel() more actionable

* Update lib/services/local/users.go

Co-authored-by: Nic Klaassen <nic@goteleport.com>

* Backend changes for tbot IoT and AWS joining (#10360)

* backend changes

* add token permission check

* pass ctx from caller

Co-authored-by: Roman Tkachenko <roman@goteleport.com>

* fix comment typo

Co-authored-by: Roman Tkachenko <roman@goteleport.com>

* use UserMetadata instead of Identity in RenewableCertificateGenerationMismatch event

* Client changes for tbot IoT joining (#10397)

* client changes

* delete replaced APIs

* delete unused tbot/auth.go

* add license header

* don't unecessarily fetch host CA

* log fixes

* s/tunnelling/tunneling/

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* auth server addresses may be proxies

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* comment typo fix

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* move *Server methods out of auth_with_roles.go (#10416)

Co-authored-by: Tim Buckley <tim@goteleport.com>

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>
Co-authored-by: Tim Buckley <tim@goteleport.com>

Co-authored-by: Roman Tkachenko <roman@goteleport.com>
Co-authored-by: Tim Buckley <tim@goteleport.com>
Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

Co-authored-by: Nic Klaassen <nic@goteleport.com>
Co-authored-by: Roman Tkachenko <roman@goteleport.com>
Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* Address another batch of review feedback

* Addres another batch of review feedback

Add `Role.SetMetadata()`, simplify more `trace.WrapWithMessage()`
calls, clear some TODOs and lints, and address other misc feedback
items.

* Fix lint

* Add missing doc comments to SaveIdentity / LoadIdentity

* Remove pam tag from tbot build

* Update note about bot lock deletion

* Another pass of review feedback

Ensure all requestable roles exist when creating a bot, adjust the
default renewable cert TTL down to 1 hour, and check types during
`CompareAndSwapUser()`

* Remove ModeHint

* Rename Identity.Cert and Identity.XCert

* Add `symlinks` flag to tbot config

The optional symlinks flag for directory destinations allows users to
opt in / out of whichever symlink attack hardening mode is selected
by default.

* Add mostly-working secure implementation of botfs.Create/Write

This adds symlink mode selection (secure, try-secure, insecure) and
Linux `Create()`/`Write()` implementations to open files safely.

* Add configurable ACL modes and verify ACL support in tbot init

* Initialize destinations at startup and test before renewal

This initializes destinations at startup (to create directories if
not using `tbot init`) and tests them to ensure the bot can write
_before_ attempting to renew certificates; this should prevent most
accidental generation counter locks.

* Hide watch for now

* Issue a new identity if a token change is detected

* Warn if identity appears to be expired on startup

* Fully implement ACL Verify and Configure

 - Fully implements ACL support for Linux
 - Adds bot-side verification support to ensure ACLs are configured
   properly at runtime.
 - Gracefully falls back to no ACLs if the platform / filesystem
   doesn't support them
 - Clear up outstanding lints

* Make `tbot init` work without a config file

* Show init instructions in tctl bots add

Also:
 - Make --bot-user a flag in init (the tctl instructions were
   confusing otherwise)
 - Handle IsOwnedBy sanely on unsupported platforms
 - Add Bold colorizing support

* Clear some TODOs and rephrase tctl help

* Fix typo

* Fix token hash detection bug

* Actually read and write certs with symlink enforcement

Also, fix a config loading bug where CheckAndSetDefaults() wasn't
being called in all cases with CLI destinations.

* Add workaround for OpenSSH permissions check with ACLs

OpenSSH has an overly-paranoid permissions check that forces key
files to be exclusively owner-readable. Unfortunately, for POSIX
compatibility purposes, when ACLs are set, the ACL mask is set as
the group permissions. This effectively makes any ACL incompatible
with OpenSSH.

However, OpenSSH's check does have an escape hatch: it only applies
if the current user is the owner of the file. Therefore, this change
tweaks the `tbot init` flow to create files as root, owned by a
separate user (either `nobody` or even the bot user), with ACL
permissions granting both the bot and reader user access to the
certificates. This effectively bypasses OpenSSH's permissions check
and should preserve our security boundaries.

* Fix lints

* Fix an improper directory chmod to 0600 if ACL test fails

* First pass of tbot init unit tests

* Add symlink tests and fix bug with resolving the default owner

* Fix err misuse

* Fix an ACL error if the bot or reader user is the owner.

* Fix typo

* Fix missing error case in VerifyACL causing unreadable directories

* Address review feedback

- Rename ACLOn -> ACLRequired
- Simplify fs_linux.Read()
- Add missing fs_other.Read()
- Hoist renewal loop logic into its own function
- A few misc bugfixes

* Apply suggestions from code review

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>

* Address review feedback

- Only log syscall warning once
- Formatting and wording changes
- Improve error handling for `--clean`

* Fix lint error

* Fix imports in fs_other

* Fix possible nil pointer deref if storage is unset

* Use the bot user as default owner

This is more likely to be a safe owner choice than `nobody:nobody`.

* Apply suggestions from code review

Co-authored-by: Roman Tkachenko <roman@goteleport.com>

* Code review fixes

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>
Co-authored-by: Nic Klaassen <nic@goteleport.com>
Co-authored-by: Roman Tkachenko <roman@goteleport.com>
Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
timothyb89 added a commit that referenced this pull request Mar 10, 2022
---

Implement `tbot init` subcommand and ACL management (#10289)

* Add certificate renewal bot

This adds a new `tbot` tool to continuously renew a set of
certificates after registering with a Teleport cluster using a
similar process to standard node joining.

This makes some modifications to user certificate generation to allow
for certificates that can be renewed beyond their original TTL, and
exposes new gRPC endpoints:
 * `CreateBotJoinToken` creates a join token for a bot user
 * `GenerateInitialRenewableUserCerts` exchanges a token for a set of
   certificates with a new `renewable` flag set

A new `tctl` command, `tctl bots add`, creates a bot user and calls
`CreateBotJoinToken` to issue a token. A bot instance can then be
started using a provided command.

* Cert bot refactoring pass

* Use role requests to split renewable certs from end-user certs
* Add bot configuration file
* Use `teleport.dev/bot` label
* Remove `impersonator` flag on initial bot certs
* Remove unnecessary `renew` package
* Misc other cleanup

* Do not pass through `renewable` flag when role requests are set

This adds additional restrictions on when a certificate's `renewable`
flag is carried over to a new certificate. In particular, it now also
denies the flag when either role requests are present, or the
`disallowReissue` flag has been previously set.

In practice `disallow-reissue` would have prevented any undesired
behavior but this improves consistency and resolves a TODO.

* Various tbot UX improvements; render SSH config

* Fully flesh out config template rendering
* Fix rendering for SSH configuration templates
* Added `String()` impls for destination types
* Improve certificate renewal logging; show more detail
* Properly fall back to default (all) roles
* Add mode hints for files
* Add/update copyright headers

* Add stubs for tbot init and watch commands

* Add gRPC endpoints for managing bots

* Add `CreateBot`, `DeleteBot`, and `GetBotUsers` gRPC endpoints
* Replace `tctl bot (add|rm|ls)` implementations with gRPC calls
* Define a few new constants, `DefaultBotJoinTTL`, `BotLabel`,
  `BotGenerationLabel`

* Fix outdated destination flag in example tbot command

* Bugfix pass for demo

* Fixed a few nil pointer derefs when using config from CLI args
* Properly create destination if `--destination-dir` flag is used
* Remove improper default on CLI flag
* `DestinationConfig` is now a list of pointers

* Address first wave of review feedback

Fixes the majority of smaller issues caught by reviewers, thanks all!

* Add doc comments for bot.go functions

* Return the token TTL from CreateBot

* Split initial user cert issuance from `generateUserCerts()`

Issuing initial renewable certificate ended up requiring a lot of
hacks to skip checks that prevented anonymous bots from getting
certs even though we'd verified their identity elsewhere (via token).

This reverts all those hacks and splits initial bot cert logic into a
dedicated `generateInitialRenewableUserCerts()` function which should
make the whole process much easier to follow.

* Set bot traits to silence log messages

* tbot log message consistency pass

* Implement `tbot init` subcommand

This adds a new CLI subcommand to initialize a tbot destination
directory by creating required files ahead of time and assigning
proper permissions (and ACLs, where possible).

* Resolve lints

* Add config tests

* Remove CreateBotJoinToken endpoint

Users should instead use the CreateBot/DeleteBot endpoints.

* Create a fresh private key for every impersonated identity renewal

* Hide `config` subcommand

* Rename bot label prefix to `teleport.internal/`

* Use types.NewRole() to create bot roles

* Clean up error handling in custom YAML unmarshallers

Also, add notes about the supported YAML shapes.

* Fetch proxy host via gRPC Ping() instead of GetProxies()

* Update lib/auth/bot.go

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* Fix some review comments

* Add renewable certificate generation checks (#10098)

* Add renewable certificate generation checks

This adds a new validation check for renewable certificates that
maintains a renewal counter as both a certificate extension and a
user label. This counter is used to ensure only a single certificate
lineage can exist: for example, if a renewable certificate is stolen,
only one copy of the certificate can be renewed as the generation
counter will not match

When renewing a certificate, first the generation counter presented
by the user (via their TLS identity) is compared to a value stored
with the associated user (in a new `teleport.dev/bot-generation`
label field). If they aren't equal, the renewal attempt fails.
Otherwise, the generation counter is incremented by 1, stored to the
database using a `CompareAndSwap()` to ensure atomicity, and set on
the generated certificate for use in future renewals.

* Add unit tests for the generation counter

This adds new unit tests to exercise the generation counter checks.

Additionally, it fixes two other renewable cert tests that were
failing.

* Remove certRequestGeneration() function

* Emit audit event when cert generations don't match

* Fully implement `tctl bots lock`

* Show bot name in `tctl bots ls`

* Lock bots when a cert generation mismatch is found

* Make CompareFailed respones from validateGenerationLabel() more actionable

* Update lib/services/local/users.go

Co-authored-by: Nic Klaassen <nic@goteleport.com>

* Backend changes for tbot IoT and AWS joining (#10360)

* backend changes

* add token permission check

* pass ctx from caller

Co-authored-by: Roman Tkachenko <roman@goteleport.com>

* fix comment typo

Co-authored-by: Roman Tkachenko <roman@goteleport.com>

* use UserMetadata instead of Identity in RenewableCertificateGenerationMismatch event

* Client changes for tbot IoT joining (#10397)

* client changes

* delete replaced APIs

* delete unused tbot/auth.go

* add license header

* don't unecessarily fetch host CA

* log fixes

* s/tunnelling/tunneling/

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* auth server addresses may be proxies

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* comment typo fix

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* move *Server methods out of auth_with_roles.go (#10416)

Co-authored-by: Tim Buckley <tim@goteleport.com>

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>
Co-authored-by: Tim Buckley <tim@goteleport.com>

Co-authored-by: Roman Tkachenko <roman@goteleport.com>
Co-authored-by: Tim Buckley <tim@goteleport.com>
Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

Co-authored-by: Nic Klaassen <nic@goteleport.com>
Co-authored-by: Roman Tkachenko <roman@goteleport.com>
Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* Address another batch of review feedback

* Addres another batch of review feedback

Add `Role.SetMetadata()`, simplify more `trace.WrapWithMessage()`
calls, clear some TODOs and lints, and address other misc feedback
items.

* Fix lint

* Add missing doc comments to SaveIdentity / LoadIdentity

* Remove pam tag from tbot build

* Update note about bot lock deletion

* Another pass of review feedback

Ensure all requestable roles exist when creating a bot, adjust the
default renewable cert TTL down to 1 hour, and check types during
`CompareAndSwapUser()`

* Remove ModeHint

* Rename Identity.Cert and Identity.XCert

* Add `symlinks` flag to tbot config

The optional symlinks flag for directory destinations allows users to
opt in / out of whichever symlink attack hardening mode is selected
by default.

* Add mostly-working secure implementation of botfs.Create/Write

This adds symlink mode selection (secure, try-secure, insecure) and
Linux `Create()`/`Write()` implementations to open files safely.

* Add configurable ACL modes and verify ACL support in tbot init

* Initialize destinations at startup and test before renewal

This initializes destinations at startup (to create directories if
not using `tbot init`) and tests them to ensure the bot can write
_before_ attempting to renew certificates; this should prevent most
accidental generation counter locks.

* Hide watch for now

* Issue a new identity if a token change is detected

* Warn if identity appears to be expired on startup

* Fully implement ACL Verify and Configure

 - Fully implements ACL support for Linux
 - Adds bot-side verification support to ensure ACLs are configured
   properly at runtime.
 - Gracefully falls back to no ACLs if the platform / filesystem
   doesn't support them
 - Clear up outstanding lints

* Make `tbot init` work without a config file

* Show init instructions in tctl bots add

Also:
 - Make --bot-user a flag in init (the tctl instructions were
   confusing otherwise)
 - Handle IsOwnedBy sanely on unsupported platforms
 - Add Bold colorizing support

* Clear some TODOs and rephrase tctl help

* Fix typo

* Fix token hash detection bug

* Actually read and write certs with symlink enforcement

Also, fix a config loading bug where CheckAndSetDefaults() wasn't
being called in all cases with CLI destinations.

* Add workaround for OpenSSH permissions check with ACLs

OpenSSH has an overly-paranoid permissions check that forces key
files to be exclusively owner-readable. Unfortunately, for POSIX
compatibility purposes, when ACLs are set, the ACL mask is set as
the group permissions. This effectively makes any ACL incompatible
with OpenSSH.

However, OpenSSH's check does have an escape hatch: it only applies
if the current user is the owner of the file. Therefore, this change
tweaks the `tbot init` flow to create files as root, owned by a
separate user (either `nobody` or even the bot user), with ACL
permissions granting both the bot and reader user access to the
certificates. This effectively bypasses OpenSSH's permissions check
and should preserve our security boundaries.

* Fix lints

* Fix an improper directory chmod to 0600 if ACL test fails

* First pass of tbot init unit tests

* Add symlink tests and fix bug with resolving the default owner

* Fix err misuse

* Fix an ACL error if the bot or reader user is the owner.

* Fix typo

* Fix missing error case in VerifyACL causing unreadable directories

* Address review feedback

- Rename ACLOn -> ACLRequired
- Simplify fs_linux.Read()
- Add missing fs_other.Read()
- Hoist renewal loop logic into its own function
- A few misc bugfixes

* Apply suggestions from code review

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>

* Address review feedback

- Only log syscall warning once
- Formatting and wording changes
- Improve error handling for `--clean`

* Fix lint error

* Fix imports in fs_other

* Fix possible nil pointer deref if storage is unset

* Use the bot user as default owner

This is more likely to be a safe owner choice than `nobody:nobody`.

* Apply suggestions from code review

Co-authored-by: Roman Tkachenko <roman@goteleport.com>

* Code review fixes

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>
Co-authored-by: Nic Klaassen <nic@goteleport.com>
Co-authored-by: Roman Tkachenko <roman@goteleport.com>
Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>
Co-authored-by: Nic Klaassen <nic@goteleport.com>
Co-authored-by: Roman Tkachenko <roman@goteleport.com>
Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
@webvictim webvictim mentioned this pull request Apr 19, 2022
@webvictim webvictim mentioned this pull request Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-required machine-id tctl tctl - Teleport admin tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants