Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: add support for journal-gatewayd #967

Merged
merged 4 commits into from
Jan 11, 2019
Merged

*: add support for journal-gatewayd #967

merged 4 commits into from
Jan 11, 2019

Conversation

crawford
Copy link
Contributor

@crawford crawford commented Dec 20, 2018

This allows the user to stream logs from the bootstrap node without the
need for SSH. This read-only view into the bootstrap process is
convenient for those who wish not to use SSH in their cluster. Right
now, this requires the user to manually invoke curl to fetch the logs,
but this could be done automatically by the installer in the future.

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Dec 20, 2018
@@ -234,23 +234,68 @@ func (a *Bootstrap) addSystemdUnits(uri string, templateData *bootstrapTemplateD
}

for _, childInfo := range children {
name := childInfo.Name()
file, err := data.Assets.Open(path.Join(uri, name))
dir := path.Join(uri, childInfo.Name())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add a unit test to test this part of the code?

[Service]
ExecStart=
ExecStart=/usr/lib/systemd/systemd-journal-gatewayd \
--key=/opt/openshift/tls/journal-gatewayd-ca.key \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we are using the same set of cert/key pair for server,client-authn and client-authz can we drop the -ca

@@ -357,6 +359,10 @@ func (a *Bootstrap) addParentFiles(dependencies asset.Parents) {
dependencies.Get(asset)
a.Config.Storage.Files = append(a.Config.Storage.Files, ignition.FilesFromAsset(rootDir, 0600, asset)...)
}

journalCertKey := &tls.JournalCA{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels out of place in addSystemdUnits it seems like it belongs here 907a22d#diff-2d9b655c8052ca2e3bdd9004778c75bcR357

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't go there because the key and cert need to be readable by journal-gatewayd (which doesn't run as root). I should probably also change the owner of these files... I hope that works on RHCOS...

@openshift-ci-robot openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 4, 2019
@wking
Copy link
Member

wking commented Jan 4, 2019

Can you pull 0bc0117 out into it's own PRs (and maybe split the auto-generated SVG rebuild into a separate commit?)? Those look like orthogonal changes that we can land without further discussion (once CI gets working again).

unit := igntypes.Unit{Name: name, Contents: string(data)}
if _, ok := enabled[name]; ok {
unit.Enabled = util.BoolToPtr(true)
if info.IsDir() {
Copy link
Member

@wking wking Jan 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a check here that says "if the directory name doesn't end with .d we ignore it" (possibly with a debug-level log)? From the systemd docs, it looks like they require all drop-in directories to have a .d suffix. But I'm not all that familiar with systemd, maybe I'm just misreading the docs.

Copy link
Contributor Author

@crawford crawford Jan 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to add a trace-level log line for that case. Trace should really only be used by developers which seems appropriate for this kind of message. Should I mention the "trace" level in the help output, or just keep that goodie for ourselves?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I mention the "trace" level in the help output, or just keep that goodie for ourselves?

I'd mention it. Users should recognize it as "the level after debug" even if they're non-devs, so I don't expect it will be too distracting.

journalctl --unit=bootkube.service
```
1. If SSH is available, the following command can be run on the bootstrap node: `journalctl --unit=bootkube.service`
2. Regardless of whether or not SSH is available, the following command can be run: `curl --insecure --cert ${INSTALL_DIR}/tls/journal-gatewayd-client.crt --key ${INSTALL_DIR}/tls/journal-gatewayd-client.key 'https://${BOOTSTRAP_IP}:19531/entries?follow&_SYSTEMD_UNIT=bootkube.service'`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the installer connect to this automatically and stream it into the installer logs (like we do for Terraform) instead of polling for the API? I'm fine leaving that to follow-up work, but I don't think we need to document this manual approach if we're planning on adding a built-in approach in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. I'd prefer to always stream these logs, but I'm not sure how to handle the case where a network connection is not available. Do we just optimistically stream them and ignore it otherwise?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... but I'm not sure how to handle the case where a network connection is not available.

We currently rely on network connections to the load balancers for our Kubernetes API and event waits. When those work, I expect this access to the bootstrap journal will usually work as well. When it doesn't (access from inside a restrictive firewall?), I'm fine logging a warning and falling back to the Kubernetes API wait.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I once wrote a simple package to talk to gatewayd, in case you ever do want to do that: coreos/go-systemd#86

Also: Having an installer sub-command to that does the equivalent curl would be pretty nice.

@crawford
Copy link
Contributor Author

crawford commented Jan 4, 2019

Can you pull 0bc0117 out into it's own PRs

@wking done in #993.

@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 4, 2019
@crawford crawford added this to the Freeze milestone Jan 10, 2019
@crawford crawford changed the title WIP: add support for journal-gatewayd *: add support for journal-gatewayd Jan 10, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 10, 2019
@crawford
Copy link
Contributor Author

Let's see if the registry gateway issue has been resolved...
/retest

@@ -205,7 +208,7 @@ func (a *Bootstrap) addStorageFiles(base string, uri string, templateData *boots
} else {
mode = 0600
}
ign := ignition.FileFromBytes(strings.TrimSuffix(base, ".template"), mode, data)
ign := ignition.FileFromBytes(strings.TrimSuffix(base, ".template"), "root", mode, data)
if filename == ".bash_history" {
ign.User = &igntypes.NodeUser{Name: ignitionUser}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could set a variable for the user name and override for .bash_history before feeding into FileFromBytes. But certainly not a blocker; we can always do something like that later if we want ;).

@crawford
Copy link
Contributor Author

/retest

@crawford
Copy link
Contributor Author

2019/01/11 05:24:22 Build installer succeeded after 30m50s
2019/01/11 05:24:23 Ran for 37m45s
error: could not run steps: could not copy stable imagestreamtag: Timeout: request did not complete within allowed duration

That's not one I've seen before, but it sounds related to the recent registry failures we've seen.

/retest

@sttts
Copy link
Contributor

sttts commented Jan 11, 2019

This looks awesome!

@crawford
Copy link
Contributor Author

level=error msg="Error: Error applying plan:"
level=error
level=error msg="1 error occurred:"
level=error msg="\t* aws_route53_record.etcd_a_nodes[2]: 1 error occurred:"
level=error msg="\t* aws_route53_record.etcd_a_nodes.2: [ERR]: Error building changeset: timeout while waiting for state to become 'accepted' (timeout: 5m0s)"

/retest

This allows systemd drop-ins to be included in the bootstrap data
directory. The drop-ins must be located under an appropriately-named
directory, the same way that systemd requires on the host.
This gives the ability to specify the owner of the files. So far, only
the root user is used, but support for other users will be needed in
order to support journal-gatewayd. This service doesn't run as root, so
in order to read its TLS assets, they either need to be world-readable
or they need to be owned by the correct user.
The key for the root CA should not end up in the cluster. This is most
easily prevented by ensuring that it doesn't end up on the bootstrap
node at all. This change also makes the root CA certificate
world-readable.
This allows the user to stream logs from the bootstrap node without the
need for SSH. This read-only view into the bootstrap process is
convenient for those who wish not to use SSH in their cluster. Right
now, this requires the user to manually invoke curl to fetch the logs,
but this could be done automatically by the installer in the future.
@crawford
Copy link
Contributor Author

Rebased onto master. @wking can I get another one of those /lgtms?

@abhinavdahiya
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 11, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, crawford

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [abhinavdahiya,crawford]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 7880721 into openshift:master Jan 11, 2019
@crawford crawford deleted the gatewayd branch January 11, 2019 23:33
@wking
Copy link
Member

wking commented Jan 12, 2019

@sferich888, there were some troubleshooting-doc changes here.

@@ -0,0 +1,6 @@
[Service]
ExecStart=
ExecStart=/usr/lib/systemd/systemd-journal-gatewayd \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this service defined? Where is it's source.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this service defined? Where is it's source.

The service this is a drop-in for? I'd guess in a package RHCOS pulls in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was offered by the kubelet? And it's debug port.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is journal-gatewayd, which is included (at my request) in RHCOS.

@@ -0,0 +1,6 @@
[Service]
ExecStart=
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this line be dropped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All systemd directives act as lists and each entry adds another item to the list. Simple services (like journal-gatewayd) can only have a single item for ExecStart, so we have to clear the list with an empty entry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants