Skip to content
This repository has been archived by the owner on Jul 23, 2020. It is now read-only.

Providing che.openshift.io status and monitoring tools #4730

Open
slemeur opened this issue Jan 24, 2019 · 0 comments
Open

Providing che.openshift.io status and monitoring tools #4730

slemeur opened this issue Jan 24, 2019 · 0 comments

Comments

@slemeur
Copy link
Collaborator

slemeur commented Jan 24, 2019

Goals

The idea of this epic is to improve the different mechanism we have in place to report the state of che.openshift.io. We would add solutions to monitor the state, report the metrics and expose those in a way which could be leverage to provide a better user experience.

Example: When we detect that the platform is having issues to start the workspaces, we should inform the user that his workspace might take longer than usual to get ready.

User Story 1: As a user, I should be able to know the status of the platform

Today, we are already measuring some elements on of the platform is behaving:

  • Workspace startup time
  • PVC mount time with empty PVC
  • PVC mount time with big PVC
    Those metrics are currently not exposed and not everybody can access those.

With those metrics, we want to setup the basis of a status page and our monitoring tools:

  • Have an agent that is running the test with a defined interval
  • Expose the metrics to the prometheus format
  • Add a status page which will display the information (like status.io)

There are many different online services that are providing information about the state of their platform:

User Story 2: As an admin, or ops of the system, I should be notified/alerted when the platform is not behaving properly.

Once we have the metrics reported into prometheus format and the status available to end-user, we need to put in place an alerting system so when someone is going bad with the platform we have the information that is reported to the right people.

User Story 3: As an ops or admin of the platform, I'd like to get more insights about how the platform is behaving.

Once the basis are setup, we would enrich the metrics we are following:

  • time of pulling images
  • time of pulling images that are already cached
  • time to create routes
  • time to clone a repository
  • time it spent in initializing a Language Server

User Story 4: As a user using the product, I should be notified in my environment if there is something behaving wrong on the platform.

User Story 5: As one deploying Che, I'd like to benefit from this tooling.

We should be able to provide those tools for anyone who setup Che on their own.

User Story 6: As a user, I want to get in-context feedback about the state of the platform.

There are multiple aspects where we could provide information about the state of the platform:

  • When starting the workspace, if the state of the platform doesn't provide fast start of the workspace, we should provide a message "The platform is currently under load, your workspace may take longer than usual to get ready."
  • When in the IDE, we could have a small status widget in the status bar - showing different indicators about the state of the platform
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant