Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kots]: configure a log collector for ephemeral containers #10679

Merged
merged 2 commits into from
Jun 23, 2022

Conversation

mrsimonemms
Copy link
Contributor

@mrsimonemms mrsimonemms commented Jun 15, 2022

Description

This is in KOTS for two reasons - simplicity and because there's no namespace defined in the Fluent Bit chart so it always goes to default namespace. We can put it into the Installer if we wants, but I think that's a 🚲 not a 🛹 as we'd need to consider how it would affect SaaS

Add a Fluent Bit Helm chart to the installation and configure it to listen for logs created. This then stores the logs on the node under /gitpod/log-collector (use of second directory in case we need to any other stuff in future). We can then use the KOTS log selector to pull these into the support bundle (included).

As the only logs copied are the Installer, image builders and workspaces, this shouldn't ever get so big as to need removal/rotation - in any case, Fluent Bit doesn't support this so would imagine it's an edge-case anyway.

When a user generates a support bundle, this then pulls these files in off every node (this is a single node instance, but I've tried in on multi-node instances and they're put in a separate folder under /log-collector/gitpod/fluent-bit-xxxxx)

This is the support bundle page in the KOS dashboard

image

Related Issue(s)

Fixes #10399

How to test

Deploy via KOTS and raise a support bundle.

Release Notes

[kots]: configure a log collector for ephemeral containers

Documentation

@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-sje-kots-log-collector.4 because the annotations in the pull request description changed
(with .werft/ from main)

@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-sje-kots-log-collector.5 because the annotations in the pull request description changed
(with .werft/ from main)

@mrsimonemms mrsimonemms marked this pull request as ready for review June 16, 2022 15:26
@mrsimonemms mrsimonemms requested a review from a team June 16, 2022 15:26
@github-actions github-actions bot added the team: delivery Issue belongs to the self-hosted team label Jun 16, 2022
@mrsimonemms mrsimonemms marked this pull request as draft June 16, 2022 15:28
@mrsimonemms mrsimonemms changed the title [kots]: configure a log collector [kots]: configure a log collector for ephemeral containers Jun 16, 2022
@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-sje-kots-log-collector.7 because the annotations in the pull request description changed
(with .werft/ from main)

@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-sje-kots-log-collector.8 because the annotations in the pull request description changed
(with .werft/ from main)

@mrsimonemms mrsimonemms marked this pull request as ready for review June 16, 2022 21:41
@mrsimonemms mrsimonemms marked this pull request as draft June 16, 2022 21:50
@mrsimonemms mrsimonemms changed the title [kots]: configure a log collector for ephemeral containers [installer]: configure a log collector for ephemeral containers Jun 16, 2022
@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-sje-kots-log-collector.9 because the annotations in the pull request description changed
(with .werft/ from main)

@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-sje-kots-log-collector.10 because the annotations in the pull request description changed
(with .werft/ from main)

@mrsimonemms mrsimonemms changed the title [installer]: configure a log collector for ephemeral containers [kots]: configure a log collector for ephemeral containers Jun 16, 2022
@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-sje-kots-log-collector.11 because the annotations in the pull request description changed
(with .werft/ from main)

@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-sje-kots-log-collector.12 because the annotations in the pull request description changed
(with .werft/ from main)

@mrsimonemms mrsimonemms marked this pull request as ready for review June 16, 2022 22:08
@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-sje-kots-log-collector.13 because the annotations in the pull request description changed
(with .werft/ from main)

@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-sje-kots-log-collector.14 because the annotations in the pull request description changed
(with .werft/ from main)

@adrienthebo
Copy link
Contributor

adrienthebo commented Jun 20, 2022

/werft run no-preview publish-to-kots

👍 started the job as gitpod-build-sje-kots-log-collector.15
(with .werft/ from main)

@mrsimonemms
Copy link
Contributor Author

mrsimonemms commented Jun 21, 2022

/werft run no-preview publish-to-kots

👍 started the job as gitpod-build-sje-kots-log-collector.17
(with .werft/ from main)

@mrsimonemms
Copy link
Contributor Author

mrsimonemms commented Jun 21, 2022

/werft run no-preview

👍 started the job as gitpod-build-sje-kots-log-collector.18
(with .werft/ from main)

@corneliusludmann
Copy link
Contributor

corneliusludmann commented Jun 22, 2022

/werft run no-preview publish-to-kots

👍 started the job as gitpod-build-sje-kots-log-collector.19
(with .werft/ from main)

@mrsimonemms
Copy link
Contributor Author

/werft run no-preview publish-to-kots

+1 started the job as gitpod-build-sje-kots-log-collector.19 (with .werft/ from main)

This will fail because of a change in this PR for Werft - suggest either finding the command in Notion for making it use the .werft folder in this branch or running make create_dev_release in /install/kots

@corneliusludmann
Copy link
Contributor

werft run github -j .werft/build.yaml -a no-preview=true -a publish-to-kots=true

did the trick

@adrienthebo
Copy link
Contributor

adrienthebo commented Jun 22, 2022

I've ran into some errors while verifying this, it looks like the /gitpod/log-collector path isn't being fully created:

❯ tar -O - -xf support-bundle-2022-06-22T13_58_13.tar.gz support-bundle-2022-06-22T13_58_13/log-collector/gitpod/fluent-bit-f7lb2/kots/gitpod/log-collector-errors.json
{
  "/gitpod/log-collector/error": "failed to stream command output: command terminated with exit code 1",
  "/gitpod/log-collector/stderr": "tar: log-collector: No such file or directory\ntar: error exit delayed from previous errors\n"
}

I manually created /gitpod/log-collector on my cluster and re-ran the installer and was able to remove this error.

gcloud compute instances list | sed -e '1d' | rargs gcloud compute ssh '{1}' --zone '{2}' -- 'sudo mkdir -p /gitpod/log-collector; sudo chmod 1777 /gitpod/log-collector'

I see that we have mkdir enabled for these log outputs, so I'm pretty puzzled as to why this is failing. I'm running this on a cluster based on the gitpod-gke-guide, where are you running it?

@corneliusludmann
Copy link
Contributor

In my 3-node-setup, I see the error @adrienthebo mentioned as well but only on the 2 worker nodes. The main node shows me a few lines of the installer job.

And this makes totally sense:

  • I haven't started any workspaces yet. That's why the log-collector couldn't collect any logs on the worker nodes yet.
  • Unfortunately, I don't see the beginning of the installer job on the main node. I guess it's because the installation started before the log-collector was up.

After another re-install (config change), I see the whole installer logs. After starting a workspace, I see the workspace logs. For me, this PR looks good.

FYI: Tested this on an air-gap setup just to make sure it works there as well. It does.

Copy link
Contributor

@corneliusludmann corneliusludmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested in an air-gap setup and it works pretty well. Adding hold to give @adrienthebo the chance to react to the issue above. Feel free to remove the hold label if everything looks good.

/hold

@mrsimonemms
Copy link
Contributor Author

My feeling is that this may not work in every scenario but, so long as it doesn't cause catastrophic failures for users, we can iterate on those problems - my understanding of support bundles is that they can fail and just report the error, rather than causing any problems.

@adrienthebo I wonder if it's a permissions issue when creating the underlying directory. I tried this on an Azure AKS instance and that seemed to work ok, so it may be a GCP-specific problem

@adrienthebo
Copy link
Contributor

Thanks @corneliusludmann @mrsimonemms, running an image build/workspace got things working for me!

tar -tf support-bundle-2022-06-23T08_12_39.tar.gz|grep -i log-collector
support-bundle-2022-06-23T08_12_39/log-collector/gitpod/fluent-bit-bw2gp/kots/gitpod/log-collector-errors.json
support-bundle-2022-06-23T08_12_39/log-collector/gitpod/fluent-bit-gv8k9/kots/gitpod/log-collector/imagebuild-5663cdb6-1363-442b-bc64-67859038fffa.workspace
support-bundle-2022-06-23T08_12_39/log-collector/gitpod/fluent-bit-m59lp/kots/gitpod/log-collector-errors.json
support-bundle-2022-06-23T08_12_39/log-collector/gitpod/fluent-bit-8c5k8/kots/gitpod/log-collector-errors.json
support-bundle-2022-06-23T08_12_39/log-collector/gitpod/fluent-bit-xfptv/kots/gitpod/log-collector/imagebuild-421ff17f-df10-4d15-bb2b-f61f1124f1ab.workspace
support-bundle-2022-06-23T08_12_39/log-collector/gitpod/fluent-bit-4jwl9/kots/gitpod/log-collector-errors.json
support-bundle-2022-06-23T08_12_39/log-collector/gitpod/fluent-bit-gv8k9/kots/gitpod/log-collector/ws-f5888ebf-533d-4f27-9ad2-b9af12f563c8.workspace
support-bundle-2022-06-23T08_12_39/log-collector/gitpod/fluent-bit-xfptv/kots/gitpod/log-collector/ws-1d65b1e5-002e-4290-98bd-4ebd197c5f6d.workspace

@corneliusludmann
Copy link
Contributor

/hold (due to build error)

@corneliusludmann
Copy link
Contributor

corneliusludmann commented Jun 23, 2022

/werft run no-preview=true publish-to-kots=false

👍 started the job as gitpod-build-sje-kots-log-collector.21
(with .werft/ from main)

@corneliusludmann
Copy link
Contributor

corneliusludmann commented Jun 23, 2022

/werft run no-preview

👍 started the job as gitpod-build-sje-kots-log-collector.22
(with .werft/ from main)

@corneliusludmann
Copy link
Contributor

/unhold

@roboquat roboquat merged commit e1dea35 into main Jun 23, 2022
@roboquat roboquat deleted the sje/kots-log-collector branch June 23, 2022 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note size/L team: delivery Issue belongs to the self-hosted team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support bundle to include logs from ephemeral containers
4 participants