Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFE] Display problems from pods #2919

Closed
marusak opened this issue Mar 27, 2018 · 8 comments
Closed

[RFE] Display problems from pods #2919

marusak opened this issue Mar 27, 2018 · 8 comments
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@marusak
Copy link

marusak commented Mar 27, 2018

We, in ABRT team, focus on catching, processing and reporting problems. For now our main focus were servers and workstations. Sometime ago we started to support catching core problems from containers. Now we successfully showed how we can catch exceptions from interpreted languages from containers as well.[1] That means, that now we can fully inform about what is not behaving correctly in container. For that a universal tool was created.[2]

That being done we want to show these information in a more discoverable way. One step, mainly for servers was integration into Cockpit [3]. Now we want to continue and help OpenShift to be even better.
Therefore I propose this RFE. It could look maybe something like this:
origin_s1
origin_s2
origin_s3

What do you think?

[1] http://post-office.corp.redhat.com/archives/aos-devel/2018-February/msg00402.html
[2] https://github.com/abrt/container-exception-logger
[3] https://abrt.github.io/abrt/cockpit/2017/06/29/ABRT-in-cockpit/

@spadgett
Copy link
Member

Are the problems captured as events in Kubernetes? Or some other way? I think that would change how we present them. It's not clear to me from the mocks since sometimes they're displayed with events and sometimes displayed separately.

@openshift/team-ux-review

@marusak
Copy link
Author

marusak commented Mar 27, 2018

The problems are available in logs. Other tools (for example node-problem-detector, ABRT etc...) can parse these logs. The same approach was in my mind for the console.

The mock-up with events was just idea, I am not sure how those work. I am pretty sure that it can be displayed in pods as seen in mock-ups 2 and 3.

@ncameronbritt
Copy link

I'm not sure what the correct level to surface this kind of information is. We have logs for the pods, so that level makes sense to me. How would users be made aware of these problems, or how would they know they need to look at their pods?

Currently events and warnings are more at the platform level, and a user can take action through the platform to do something about these problems. That doesn't seem to be the case if the problem is with the code running inside of my container. From OpenShift's perspective, everything could be fine--the container is running. Given that, surfacing every individual problem as an event does not seem like the right level. But maybe aggregating errors in an event that says something like "there are runtime errors in pod-x" could make sense?

Is there a need for different levels/views of monitoring, something like application and platform, depending on your persona, or what you're interested in?

@beanh66
Copy link
Member

beanh66 commented Mar 29, 2018

@marusak I'm also curious if you have seen the notification drawer or interacted with it at all? Right now the drawer is not configurable but I wonder if it would be possible in the future to allow users to configure the types of events they want to be notified about.

screen shot 2018-03-29 at 9 59 26 am

@marusak
Copy link
Author

marusak commented Mar 31, 2018

Hi @ncameronbritt

I'm not sure what the correct level to surface this kind of information is. We have logs for the pods, so that level makes sense to me.

Agree. Pods is the correct place to show this, no doubt about that.

How would users be made aware of these problems, or how would they know they need to look at their pods?

Great question. I was thinking about that and I came up with showing it in Monitoring both in events and by pod name (triangle that there is something wrong, see first mockup). However I am not very satisfied with it, I was only showing my ideas. And as you explained, events seem not to be the correct place.

But maybe aggregating errors in an event that says something like "there are runtime errors in pod-x" could make sense?

Possibly, but it would only make sense to have one event for each problematic pod - which still can be a lot.

Is there a need for different levels/views of monitoring, something like application and platform, depending on your persona, or what you're interested in?

I would rather see this integrated into existing environment. It is hard to imagine admin who in his/her free time scrolls around and reads some monitoring views. I think that the best solution would be having one tab in pods as suggested, than possibly having triangle on pods overview and notify user somehow. I really like the notification drawer as @beanh66 suggested. I was not aware of that and I believe it is more suitable place for this than events.

@marusak
Copy link
Author

marusak commented Apr 9, 2018

ping @spadgett @ncameronbritt Do you agree with my last comment? Can I start working on this?

@ncameronbritt
Copy link

IMO this should be discussed at a higher level in terms of how/whether/where this feature is integrated into the console, particularly as we look to integrate with the Tectonic console. Thoughts @jwforres ?

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

7 participants