Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Onboarding] k8s quickstart flow #186380

Conversation

mykolaharmash
Copy link
Contributor

@mykolaharmash mykolaharmash commented Jun 18, 2024

Depends on: elastic/elastic-agent#4754
Depends on: #186106
Closes: #182407

Summary

Adds a Kubernetes onboarding quick start flow using kubectl kustomize command.
CleanShot 2024-06-18 at 15 10 27@2x

How to test

  1. Run Kibana and ES locally (make sure to expose ES on 0.0.0.0 so elastic agent can reach it from within a container, I use this command yarn es snapshot --license trial -E xpack.security.authc.api_key.enabled=true -E http.host=0.0.0.0)
  2. Setup a test cluster with minikube
  3. Open Kibana and navigate to the Onboarding screen
  4. Make sure Kubernetes quick start card is visible under the infrastructure category and click on it
  5. Copy the command snippet
  6. Paste the command into a terminal, but don't run it yet
  7. Replace localhost in the command with you local IP ipconfig getifaddr en0
  8. In case Adding kustomize templates for k8s onboarding elastic-agent#4754 was not merged yet, you'd need to also clone the elastic-agent repo and replace the template URL with a local path to the elastic-agent-kustomize/default/elastic-agent-standalone folder.
  9. Run the command and make sure all resources were created
  10. Go back to Kibana, after ~1 minute UI should identify that the data was ingested
  11. Click on the cluster overview link and make sure it works

@mykolaharmash mykolaharmash requested review from a team as code owners June 18, 2024 13:24
@botelastic botelastic bot added the ci:project-deploy-observability Create an Observability project label Jun 18, 2024
@obltmachine
Copy link

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@mykolaharmash mykolaharmash marked this pull request as draft June 18, 2024 13:24
Copy link
Contributor

@thomheymann thomheymann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great!

Code review

I like how much you simplified the REST APIs. I've got a couple questions how that ties into the other flows and non-functional requirements like telemetry below but nothing major.

Let me know if you want to discuss synchronously or do the merging of #186106 together in case there are conflicts.

Testing feedback

I could not get the end-to-end to work. My elastic-agent-standalone pod is in a failed state so there might be an issue with the kustomize template:

root@minikube:/usr/share/elastic-agent# elastic-agent status
State: FAILED
Message: could not create the map from the configuration: missing field accessing 'outputs'
Fleet State: STOPPED
Fleet Message: Not enrolled into Fleet
Components: (none)

UX feedback

I appreciate that from a technical perspective we're polling for data to arrive as soon as the page is opened but from a user perspective it's confusing to not even having read the first step and already seeing a loading spinner in step 2.

I would suggest to hide the loading spinner until a window.blur event has been detected. The user has to switch to the terminal to run the command so in the absence of any other signal this should be a good indicator that they have copy and pasted the command and are in the process of running it.

<EuiLink
data-test-subj="observabilityOnboardingDataIngestStatusViewDashboardLink"
href={dashboardLocator?.getRedirectUrl({
dashboardId: CLUSTER_OVERVIEW_DASHBOARD_ID,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be hard coded or could the ID change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I implemented a logic to search for dashboard ID by it's title as there seemed to be no other way to reliably reference it, but then I saw that dashboard IDs are hardcoded in a few places in Fleet plugin. Those ID seem to be pard of the dashboard definition and don't change, in the integration code there are multiple iterations on a single dashboard without the ID being changed, so I assume it's save to use it on or side as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting, I assumed these need to be loaded in dynamically.

@kpollich Are the installed dashboard IDs returned by Fleet packages static? We need a way of determining which dashboard should be the main one we direct users to, after log data has been successfully ingested, so hard coding an ID for each integration would make that easy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are defined via the integration package, so theoretically they could change on package update. Changing this ID would also affect bookmarks etc., so they should normally stay stable.

It feels OK to rely on this, especially if we already do it in other places - if it breaks, we could consider it a bug in the integration. But also interested in @kpollich s opinion here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there might be a case where the id is different when multiple spaces are involved - not sure how this case is normally handled...

@mykolaharmash
Copy link
Contributor Author

mykolaharmash commented Jun 19, 2024

I would suggest to hide the loading spinner until a window.blur event has been detected. The user has to switch to the terminal to run the command so in the absence of any other signal this should be a good indicator that they have copy and pasted the command and are in the process of running it.

That's a great suggestion, thank you!

@mykolaharmash
Copy link
Contributor Author

I could not get the end-to-end to work. My elastic-agent-standalone pod is in a failed state so there might be an issue with the kustomize template

Could you please show the kubectl command and if there are any errors in the agent logs kubectl logs <pod_id> -n kube-system?

@mykolaharmash mykolaharmash changed the title 182407 observability onboarding k8s quickstart flow [Onboarding] k8s quickstart flow Jun 21, 2024
@mykolaharmash mykolaharmash force-pushed the 182407-observability-onboarding-k8s-quickstart-flow branch from b158a7f to 11cca5f Compare June 24, 2024 13:36
@mykolaharmash mykolaharmash marked this pull request as ready for review June 24, 2024 14:42
@mykolaharmash mykolaharmash added v8.15.0 release_note:skip Skip the PR/issue when compiling release notes labels Jun 24, 2024
@mykolaharmash
Copy link
Contributor Author

@elasticmachine merge upstream

@flash1293
Copy link
Contributor

Trying to run this gave me error: failed to run '/opt/homebrew/bin/git fetch --depth=1 https://github.com/elastic/elastic-agent 8.14.1': fatal: couldn't find remote ref 8.14.1 : exit status 128

is there something else we need to do on the kustomize side?

@mykolaharmash mykolaharmash force-pushed the 182407-observability-onboarding-k8s-quickstart-flow branch from fe227ce to dbe4503 Compare July 1, 2024 12:58
@kibana-ci
Copy link
Collaborator

kibana-ci commented Jul 1, 2024

💛 Build succeeded, but was flaky

Failed CI Steps

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
observabilityOnboarding 215 220 +5

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
observabilityOnboarding 209.1KB 215.5KB +6.4KB

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@flash1293
Copy link
Contributor

flash1293 commented Jul 8, 2024

@mykolaharmash I tried to run this from the created serverless instance. It doesn't start shipping data, the container comes up but it logs this:

Failed to connect to backoff(elasticsearch(https://ceb770a52ae04850888f47ddf966f125.es.eu-west-1.aws.qa.elastic.cloud:443)): Get "https://ceb770a52ae04850888f47ddf966f125.es.eu-west-1.aws.qa.elastic.cloud:443\": decode 'ca_trusted_fingerprint': encoding/hex: invalid byte: U+0025 '%'

and also

Error fetching data for metricset kubernetes.controllermanager: error getting metrics: error making http request: Get "https://192.168.49.2:10257/metrics\": dial tcp 192.168.49.2:10257: connect: connection refused

Could you give some instructions to test this? I used a local minikube v1.32.0

@flash1293
Copy link
Contributor

cc @gizas maybe you have an idea about these errors - the first one could be related to serverless specifically, but we still need to figure out what it's about, the second one seems to be some configuration error? kube-state-metrics is running fine.

@flash1293
Copy link
Contributor

flash1293 commented Jul 8, 2024

OK, one part of the problem is definitely this:

data:
  ca_trusted: '%CA_TRUSTED%'
  host: 'https://ceb770a52ae04850888f47ddf966f125.es.eu-west-1.aws.qa.elastic.cloud:443'

I ran the kustomize command without applying it and this is in there.

Do we need another search/replace to make this work?

Still not sure about the second error...

@mykolaharmash
Copy link
Contributor Author

%CA_TRUSTED% is to add a custom fingerprint to the certificate chain, it supposed to be only for cases like running https on localhost. @gizas would it be possible to optionally add this variable to the config only if it was provided?

@gizas
Copy link
Contributor

gizas commented Jul 9, 2024

Hello @flash1293 , @mykolaharmash , I was running some tests locally.

would it be possible to optionally add this variable to the config only if it was provided?

Unfortunately not as kustomize does not accept conditions or logic to apply something based on checks. What we can do is to consider creating another folder with specific configuration and apply sth like kubectl kustomize https://github.com/elastic/elastic-agent/deploy/kubernetes/elastic-agent-kustomize/default/elastic-agent-standalone-withoutca

I dont like this idea so much, but if we think that a new folder will include many of our cases we can consider creating it

@flash1293

Do we need another search/replace to make this work?

kubectl kustomize https://github.com/elastic/elastic-agent/deploy/kubernetes/elastic-agent-kustomize/default/elastic-agent-standalone | sed -e "s/JUFQSV9LRVkl/YlV3eVlteHdRVUpEVVZwcGEzQlpTWEZNVFhJNlV6bFdlV05CVDJwUldVTXRRVGR0ZFd3NGJWbDZRUT09/g" -e "s/%ES_HOST%/https:\/\/elasticsearch:9200/g" -e  "/{CA_TRUSTED}/c\ "  | kubectl apply -f-

The "/{CA_TRUSTED}/c\ " removes the matching lines, so basically it removes the ssl.ca_trusted_fingerprint: ${CA_TRUSTED}

Error fetching data for metricset kubernetes.controllermanager:

As you can see here we enable the controllermanager dataset in elastic agent standalone manifest. There are clusters where the controlmanger access is not allowed by default and should be configured. So I suspect that in this case the access to controlmanager is not configured in your minikube.

@mykolaharmash, @flash1293 maybe we should consider also to remove it controlmanager and kube-scheduler from default standalone manifests? Those datastreams are disabled by default in managed agent policy

For this error for now:

kubcetl kustomize https://github.com/elastic/elastic-agent/deploy/kubernetes/elastic-agent-kustomize/default/elastic-agent-standalone > manifest.yaml

And then we edit the manifest and remove kubernetes.controllermanager

@flash1293
Copy link
Contributor

The "/{CA_TRUSTED}/c\ " removes the matching lines, so basically it removes the ssl.ca_trusted_fingerprint: ${CA_TRUSTED}

@mykolaharmash could you look into this?

As you can see here we enable the controllermanager dataset in elastic agent standalone manifest. There are clusters where the controlmanger access is not allowed by default and should be configured. So I suspect that in this case the access to controlmanager is not configured in your minikube.

As long as it doesn't break the rest, it might be OK. I don't know how common this problem is, you are the expert here.

@gizas
Copy link
Contributor

gizas commented Jul 9, 2024

This error should not break the rest. The rest datastreams should continue receiving metrics/logs

FYI see AKS, EKS, GKE where kube-scheduler and kube-controller-manager components are not available. So basically in all major cloud providers

But on the other, that is the notion of standalone, that users need to adjust it per cluster. We enable all and they should tune it

@flash1293
Copy link
Contributor

flash1293 commented Jul 9, 2024

Tested again on serverless and it works as expected 🎉

@flash1293
Copy link
Contributor

One nit I noticed: When going to the dashboard, it shows data from all connected Kubernetes clusters (if there are multiple). We could pre-filter the dashboard by the onboarding id to only show the data from the current onboarding. Not a blocker though for sure.

@mykolaharmash
Copy link
Contributor Author

@elasticmachine merge upstream

@mykolaharmash mykolaharmash enabled auto-merge (squash) July 10, 2024 07:32
@elasticmachine
Copy link
Contributor

elasticmachine commented Jul 10, 2024

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
observabilityOnboarding 218 223 +5

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
observabilityOnboarding 234.3KB 241.2KB +6.9KB

History

Copy link
Contributor

@flash1293 flash1293 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mykolaharmash mykolaharmash merged commit 141e619 into elastic:main Jul 10, 2024
23 checks passed
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Jul 10, 2024
Depends on: elastic/elastic-agent#4754
Depends on: elastic#186106
Closes: elastic#182407

## Summary

Adds a Kubernetes onboarding quick start flow using `kubectl kustomize`
command.
![CleanShot 2024-06-18 at 15 10
27@2x](https://github.com/elastic/kibana/assets/793851/522d2481-6a0e-43d3-b9ef-d09ee9953b3c)

## How to test
1. Run Kibana and ES locally (make sure to expose ES on 0.0.0.0 so
elastic agent can reach it from within a container, I use this command
`yarn es snapshot --license trial -E
xpack.security.authc.api_key.enabled=true -E http.host=0.0.0.0`)
2. Setup a test cluster with
[minikube](https://minikube.sigs.k8s.io/docs/start/?arch=%2Fmacos%2Fx86-64%2Fstable%2Fbinary+download)
3. Open Kibana and navigate to the Onboarding screen
4. Make sure Kubernetes quick start card is visible under the
infrastructure category and click on it
5. Copy the command snippet
6. Paste the command into a terminal, but don't run it yet
7. Replace `localhost` in the command with you local IP `ipconfig
getifaddr en0`
8. In case elastic/elastic-agent#4754 was not
merged yet, you'd need to also clone the elastic-agent repo and replace
the template URL with a local path to the
`elastic-agent-kustomize/default/elastic-agent-standalone` folder.
9. Run the command and make sure all resources were created
10. Go back to Kibana, after ~1 minute UI should identify that the data
was ingested
11. Click on the cluster overview link and make sure it works

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
(cherry picked from commit 141e619)
@kibanamachine
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.15

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

@flash1293 flash1293 removed the v8.15.0 label Jul 10, 2024
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Jul 10, 2024
pgayvallet pushed a commit to pgayvallet/kibana that referenced this pull request Jul 11, 2024
Depends on: elastic/elastic-agent#4754
Depends on: elastic#186106
Closes: elastic#182407

## Summary

Adds a Kubernetes onboarding quick start flow using `kubectl kustomize`
command.
![CleanShot 2024-06-18 at 15 10
27@2x](https://github.com/elastic/kibana/assets/793851/522d2481-6a0e-43d3-b9ef-d09ee9953b3c)

## How to test
1. Run Kibana and ES locally (make sure to expose ES on 0.0.0.0 so
elastic agent can reach it from within a container, I use this command
`yarn es snapshot --license trial -E
xpack.security.authc.api_key.enabled=true -E http.host=0.0.0.0`)
2. Setup a test cluster with
[minikube](https://minikube.sigs.k8s.io/docs/start/?arch=%2Fmacos%2Fx86-64%2Fstable%2Fbinary+download)
3. Open Kibana and navigate to the Onboarding screen
4. Make sure Kubernetes quick start card is visible under the
infrastructure category and click on it
5. Copy the command snippet
6. Paste the command into a terminal, but don't run it yet
7. Replace `localhost` in the command with you local IP `ipconfig
getifaddr en0`
8. In case elastic/elastic-agent#4754 was not
merged yet, you'd need to also clone the elastic-agent repo and replace
the template URL with a local path to the
`elastic-agent-kustomize/default/elastic-agent-standalone` folder.
9. Run the command and make sure all resources were created
10. Go back to Kibana, after ~1 minute UI should identify that the data
was ingested
11. Click on the cluster overview link and make sure it works

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting ci:project-deploy-observability Create an Observability project release_note:skip Skip the PR/issue when compiling release notes v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Observability Onboarding] K8s quickstart flow
9 participants