New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] auto-refresh implementation #1459

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
10 participants
@batikanu
Copy link
Contributor

batikanu commented Nov 21, 2016

  • This is a sample auto-refresh implementation for dashboard resources which was requested at #1296
  • Currently it is implemented only for replication controller list and node list but can be used for every other resources and views.
  • It is not complete yet.
  • It requires improvement and discussion before further development if it is needed at all.

Auto-refresh feature is added to desired list or detail page as a directive with parameters of data resource delay and etc. It can be also used as a service but it has some pros and cons.

It is based on polling action in particular intervals for retrieving data from back-end.

Auto-refresh service automatically stops when user changes the states to avoid unnecessary polling unrelated to the selected view. It also breaks if any pop up menu is open to avoid disappearing menus involuntarily on update and it continues polling after the pop up menu closes.

Multiple polling is possible for one view also with different delays.

The graph component wasn't suitable for auto-refresh, therefore I made some small changes there to enable it.

@bryk

This comment has been minimized.

Copy link
Collaborator

bryk commented Nov 21, 2016

Wow, this is cool! Checked the behavior and it works amazingly well.

Let's start a discussion regarding polling versus watching. You proposed polling here, which is nice and easy to implement. But can we do watching for changes (with throttling perhaps)?

This would be much faster and responsive system, as the delay between a change and population into the UI would be instantaneous. Also, the reason I push for watching is that I've seen and worked on UIs that do polling, and it never worked well. It was either spamming backend with requests or having huuge lag in the UI. Watching is harder to implement at first, but user benefit is great.

For watching we could have one-endpoint-per-page to listen to. E.g., RC detail and list pages would have a websocket endpoit /api/v1/replicationcontroller/default/watch that UI can attach to. Whenever backend thinks something has changed, it'd send a new version of the resource there. Backend can use watch endpoints from the apiserver or send new values in intervals (for graphs).

What do you think? cc @rf232 @cheld

@rf232

This comment has been minimized.

Copy link
Contributor

rf232 commented Nov 21, 2016

Let's start a discussion regarding polling versus watching. You proposed
polling here, which is nice and easy to implement. But can we do watching
for changes (with throttling perhaps)?

I agree with the preference for watching, although it can indeed be quite a
PITA to implement at first.

For watching we could have one-endpoint-per-page to listen to. E.g., RC
detail and list pages would have a websocket endpoit
/api/v1/replicationcontroller/default/watch that UI can attach to.
Whenever backend thinks something has changed, it'd send a new version of
the resource there. Backend can use watch endpoints from the apiserver or
send new values in intervals (for graphs).

The graphs I have to think about, but indeed for the more 'vanilla'
resources getting these updates pushed from the backend sounds like the way
to go.

On a more project management kind of view. I rather get this feature in
after we did the 1.5 release because it is quite a big change and might be
a bit tricky to get right in one go.

@cheld

This comment has been minimized.

Copy link
Member

cheld commented Nov 22, 2016

It seems that kubernetes API server supports watching...I didn't know that. So we should try to use this functionality.

@bryk if I understood your proposal correctly, you suggest to watch for changes, but only send a hint to the UI to indicate if it is worthwhile to refresh. Seems a good approach to me.

Of course we do not have to watch for changes for the graphs - we know they update periodically. But it would be good to synchronize the updates of UI with heapster backend. Not sure if this is possible.

BTW: a couple of UX problems arise as indicated by Batikan. e.g.

  • the user selects item (pop up menu open) and item disappears because the item is deleted or 'paginated' to other page
  • the user opens a modal dialog and the page in the background disappears, because item is deleted
  • the user has one resource (e.g. deployment) selected and the resource is deleted.

I assume any solution that is simple and somehow reasonable is good enough...

@bryk

This comment has been minimized.

Copy link
Collaborator

bryk commented Nov 23, 2016

BTW: a couple of UX problems arise as indicated by Batikan. e.g.

the user selects item (pop up menu open) and item disappears because the item is deleted or 'paginated' to other page

the user opens a modal dialog and the page in the background disappears, because item is deleted

I think we can stop autorefresh when a modal or menu is opened. Should prevent from things disappearing at the time you modify them.

the user has one resource (e.g. deployment) selected and the resource is deleted.
I assume any solution that is simple and somehow reasonable is good enough...

Yes, agreed :) I guess when on details page of an object, we'll stop updating it when it is deleted.

@batikanu

This comment has been minimized.

Copy link
Contributor

batikanu commented Nov 23, 2016

@bryk Thank you for the feedback :) The websocket proposal is more promising for me too. I will check how can I implement it in the backend.
@cheld

the user selects item (pop up menu open) and item disappears because the item is deleted or 'paginated' to other page

the user opens a modal dialog and the page in the background disappears, because item is deleted

in the current solution, polling just waits if any menu is open and continues after the menu is closed. So these problems shouldn't occur.

@marcacohen

This comment has been minimized.

Copy link

marcacohen commented Dec 21, 2016

Hello! I'm new to this project (thanks to @ianlewis for pointing me here). The lack of auto-refresh was one thing I noticed right away so was happy to learn about this PR. Using this approach, I've experimented with adding auto-refresh to podlist and job list (commits on forked repo here) and it seems to work quite well. I see above the comments about observing vs. polling and agree that would be ideal. Is anyone actively working on this? (adding support for auto-refresh via backend notifications?) If not, I'd be happy to give it some time but didn't want to dup effort if someone else is already working on it.

@batikanu

This comment has been minimized.

Copy link
Contributor

batikanu commented Dec 21, 2016

Hi @marcacohen welcome. Actually I was working on this. I implemented basic websocket implementation by upgrading existing webservice and using watch for a sample resource. Frontend implementation is missing. Unfortunately, I have another tasks right now to complete therefore I can only work on it in spare time. You can do it much faster I think.

Things to consider:

  • Websockets don't work with browsersync therefore you can only work with production as far as I know. There was a discussion about dropping browsersync but it requires additional effort to make it work because we use it to serve frontend in dev modus. It is also relevant to #1455. I think it should be an additional pull request and has to be solved anyway in my opinion.

  • Current graphs are not refresh-able. I fixed it in this pool request.

  • Opening menus, popups should be taken into attention before making any update in the page to avoid magical disappearing menus or item in the list etc. I implemented a workaround in this pull request to avoid such scenarios.

  • The watches may trigger too many updates from backend. There might be some filter or throttling implementation needed.

I can support you if you need anything.

@marcacohen

This comment has been minimized.

Copy link

marcacohen commented Dec 31, 2016

Hi @batikanu, thanks for the guidance! In reviewing your thoughts (esp. the concerns about testing and throttling), I have the sense that this may turn into a relatively complex undertaking, requiring subtle changes to both FE & BE. I'm wondering if it might be advisable to move ahead with your polling approach for now, to get something, even if suboptimal, working soon.

The way you inhibit polling from all but the active view is relatively efficient, works with some simple client-only changes, and can be easily reused/extended to other pages. The main downside seems to be suboptimal responsiveness but slightly delayed refresh still beats none/manual refresh.

So, I'm thinking let's finish this PR and get some improvements now (I'm happy to run with it if you're too busy) and then take on the more systemic notification approach as a follow on. Wdyt?

(cc @ianlewis)

@batikanu

This comment has been minimized.

Copy link
Contributor

batikanu commented Jan 9, 2017

Hi @marcacohen thanks for the feedback. Sure I can update the pull request to make it able to merge.
What do others think? @bryk @rf232

@bryk

This comment has been minimized.

Copy link
Collaborator

bryk commented Jan 18, 2017

Hi @marcacohen thanks for the feedback. Sure I can update the pull request to make it able to merge.
What do others think? @bryk @rf232

That would be awesome! :)

@jwforres

This comment has been minimized.

Copy link
Member

jwforres commented Jan 25, 2017

websockets are great and we use them almost exclusively in the openshift console, but a few warnings:

  1. IE has a limitation that you can only have 6 websockets open for the same domain. kube today only gives you one websocket per resource type, so if you need to keep an eye on more than 6 resource types, you end up still having to poll some of them. We opted to only fall back to polling some resources and only in IE. We make decisions on what to watch vs poll based on frequency of updates. There is an open kube issue to support a multi-resource watch.

  2. To watch a kube resource it is generally recommended to do a list, get the resourceVersion from that, and then open a watch from that resourceVersion. OpenShift console has a lot of code in place already for handling list then watch, and handling of watch retries. You will need retries because in a lot of environments there is a limited time a connection is allowed to remain open with no data transfer before it will get closed. If you are interested this is our DataService https://github.com/openshift/origin-web-console/blob/master/app/scripts/services/data.js

Another advantage to doing list then watch, if websockets fail to establish for whatever reason, you still have the initial load of data. (3) below is an example where this has been helpful. But also environments where people have proxies or LBs and haven't set them up to work correctly with websockets. Helps to diagnose things to see the List call work and then the Watch fail.

  1. In Safari websockets + self-signed certs is problematic. If you just accept the cert using the cert exception dialog that pops up, it does not extend that acceptance to websocket connections (for some strange reason). You have to have the cert accepted at the system level.
@ianlewis

This comment has been minimized.

Copy link
Contributor

ianlewis commented Jan 26, 2017

I agree that we will want to move to some kind of websocket interface in the future rather than polling. OpenShift's console does pretty well here. That said, I think we should get the current implementation in first given limited bandwidth.

  1. IE has a limitation that you can only have 6 websockets open for the same domain. kube today only gives you one websocket per resource type, so if you need to keep an eye on more than 6 resource types, you end up still having to poll some of them. We opted to only fall back to polling some resources and only in IE. We make decisions on what to watch vs poll based on frequency of updates. There is an open kube issue to support a multi-resource watch.

Dashboard has a server side component and clients don't directly hit the Kubernetes API so we could relatively easily handle the individual watch API websockets on the server side and only need to have 1 websocket going to the client. We could solve the IE issue that way.

  1. In Safari websockets + self-signed certs is problematic. If you just accept the cert using the cert exception dialog that pops up, it does not extend that acceptance to websocket connections (for some strange reason). You have to have the cert accepted at the system level.

This sounds gnarly. I guess that, as a practical matter, Safari won't work with websockets at all without a proper cert then? No terminal etc. Is there anything that can be done to mitigate it?

@jwforres

This comment has been minimized.

Copy link
Member

jwforres commented Jan 26, 2017

@batikanu batikanu force-pushed the batikanu:auto-refresh branch 3 times, most recently from 286dc49 to a9df338 Feb 2, 2017

@batikanu

This comment has been minimized.

Copy link
Contributor

batikanu commented Feb 3, 2017

@marcacohen @bryk
It is rebased now.
I've activated auto-refresh polling for replicationcontrollerlist, replicasetlist, nodelist and podlist. It can be done for other list pages easily.
For detail pages it can be a bit tricky because of reused podlists but I think it is doable with some changes.
Don't forget to remove '::' from metrics parameter assignments in templates like here (metrics="::$ctrl.replicationControllerList.cumulativeMetrics"). Otherwise graphic components can't get model update.
The tests are missing currently.
Let me know if any change is needed.

@olekzabl
Copy link
Contributor

olekzabl left a comment

Hello! I like this PR, it's great that you've done this! :)

Still, I had a few doubts about details (and a few other comments). Feel free to disagree :)

BTW, are you going to provide tests for this?

export default function autoRefreshDirective(kdPoll) {
return {
scope: {},
bindToController: {

This comment has been minimized.

@olekzabl

olekzabl Feb 6, 2017

Contributor

Can you comment on the meaning of parameters? E.g.

  • what is the time unit of "delay"
  • what are the accepted values of "type"
  • maybe also what is the meaning of "source" and "target"

This comment has been minimized.

@batikanu

batikanu Feb 6, 2017

Contributor

Ok I write short desc.

* Initialized from the scope.
* @private {!angular.$resource}
*/
this.source;

This comment has been minimized.

@olekzabl

olekzabl Feb 6, 2017

Contributor

Could you mark private fields with an underscore? (Here and in other files).

This comment has been minimized.

@batikanu

batikanu Feb 6, 2017

Contributor

Sure

* @param delay {number}
* @returns {!angular.$q.Promise}
*/
poll(resource, params, delay) {

This comment has been minimized.

@olekzabl

olekzabl Feb 6, 2017

Contributor

I don't understand how it works. Moreover, I think that I see a scenario how it might break:

  1. Visit the list of Pods.
    Add a new Pod using kubectl.
    So far, this will work - within 10 seconds, the new Pod appears on the page.
  2. Visit the list of ReplicaSets.
    At this point, this.currentOptions regards ReplicaSets, and this.optionList contains also PollOptions regarding Pods.
  3. Visit again the list of Pods. Then, poll() is called again for Pods, and inside that:
    • lines 59-62 are skipped because Pods are already involved in this.optionList.
      In particular, this.currentOptions is still about ReplicaSets.
    • start() is called, and all it does (i.e. lines 87-88) is about ReplicaSets, not Pods (because of this.currentOptions).
      In particular, the promise returned by poll() involves ReplicaSets.
  4. Add a new ReplicaSet (including some Pods) using kubectl.

Result:

  • the new Pods will not show on the page, even after a minute. (I checked this).
  • the page partially breaks, in that the resource graphs disappeared:
    g8ttr3evb5v
    I guess this might be because the list of ReplicaSets sneaked into the page where list of Pods was expected.

This comment has been minimized.

@batikanu

batikanu Feb 6, 2017

Contributor

Thank you for spotting the bug! It wasn't working as intended. I've created new method for pollService to remove all optionsList on state change. It was designed to keep multiple running polls on the same state. In this case it was holding the options from previous state which had to be removed from the list.

(err) => {
options.deferred.reject(err);
// Resource can't retrieve any record. It is not necessary to continue polling anymore.
this.stop();

This comment has been minimized.

@olekzabl

olekzabl Feb 6, 2017

Contributor

This will cancel the interval associated with this.currentOptions.
OTOH, the currently considered options are not necessarily this.currentOptions.
Is this desired?

This comment has been minimized.

@batikanu

batikanu Feb 6, 2017

Contributor

Correct.
In this case the failing options is not necessarily current one. I will modify it to stop and remove the failing one instead of current one.

isRestartInProgress = true;
};
if (isRestartInProgress && !timeout) {
clearTimeout(timeout);

This comment has been minimized.

@olekzabl

olekzabl Feb 6, 2017

Contributor

What's the result of clearTimeout(timeout) when "!timeout" is true?
(Is it legal to make such call?)

This comment has been minimized.

@batikanu

batikanu Feb 6, 2017

Contributor

it is failure thanks.
It should be

let timeout;
// Continue polling if any stopped after any menu is open.
/** @type {function()} */
let menuCloseEvent = scope.$root.$on('$mdMenuClose', () => {
...
if (isRestartInProgress && timeout) {
clearTimeout(timeout);
startDelayForRestart();
return;
}
startDelayForRestart();
});

@@ -34,3 +34,9 @@
</kd-content>
</kd-content-card>
<kd-zero-state ng-if="$ctrl.shouldShowZeroState()"></kd-zero-state>

<kd-auto-refresh source="::$ctrl.replicaSetListResource"

This comment has been minimized.

@olekzabl

olekzabl Feb 6, 2017

Contributor

Another problematic scenario (here, I've no idea why it behaves so):

  1. Open a new Dashboard window; go to the pod list.
  2. Change the namespace to "kube-system".
  3. After 10 seconds, the list of pods will change back to that for namespace "default".
    (However, the namespace selector will still show "kube-system").

This comment has been minimized.

@batikanu

batikanu Feb 6, 2017

Contributor

I can't reproduce it in my version. Maybe related to optionList bug which I've solved in my copy.

This comment has been minimized.

@olekzabl

olekzabl Feb 7, 2017

Contributor

Indeed, now I can't reproduce it as well.

let stateChangeEvent = scope.$root.$on('$stateChangeStart', () => {
kdPoll_.stopAll();
});
scope.$on('$destroy', stateChangeEvent);

This comment has been minimized.

@olekzabl

olekzabl Feb 6, 2017

Contributor

I don't understand what's happening here.
Why are we hitting as high as scope.$root, and doing this as late as on $destroy?
I'm not claiming it's wrong, but could you explain?

This comment has been minimized.

@batikanu

batikanu Feb 6, 2017

Contributor

I can't catch $mdMenuOpen event in scope. I need to remove it on scope.$destroy to avoid triggering in unregistered pages as well.
For $stateChangeStart I will fix it as sope.$on and $destroy event won't be needed for it because it is removed with the scope already.

Any other suggestion for mdMenuOpen ?

@batikanu batikanu force-pushed the batikanu:auto-refresh branch from a9df338 to 3c536db Feb 6, 2017

@batikanu

This comment has been minimized.

Copy link
Contributor

batikanu commented Feb 6, 2017

@olekzabl Thank you for the testing. I made quick fixes. PTAL.
I think I can't deliver tests very soon bcs I work on that only in free time. What about creating a different pull request for the tests?

}, delay);
isRestartInProgress = true;
};
if (isRestartInProgress && timeout) {

This comment has been minimized.

@olekzabl

olekzabl Feb 7, 2017

Contributor

Ok, now I roughly understand how it works :) Let me just ask out of curiosity:

  • Why actually are you using timeout here?
    I mean: what would go wrong with simply ...$on('$mdMenuClose', () => kdPoll_.startAll()) ?
  • In this line, do I think correctly that "&& timeout" is not needed?
    (Whenever isRestartInProgress == true, this is because of line 74. And if so, then timeout is truthy because of line 70, right?)

This comment has been minimized.

@batikanu

batikanu Feb 7, 2017

Contributor

Previously If the user opens and closes menu several times in a row, then it was polling faster than delay parameter. I implemented this to avoid that situation but I realized that later I modified the logic in poll service to avoid this case. I can't reproduce it anymore. So it can be omitted :)

@@ -34,3 +34,9 @@
</kd-content>
</kd-content-card>
<kd-zero-state ng-if="$ctrl.shouldShowZeroState()"></kd-zero-state>

<kd-auto-refresh source="::$ctrl.replicaSetListResource"

This comment has been minimized.

@olekzabl

olekzabl Feb 7, 2017

Contributor

Indeed, now I can't reproduce it as well.


// Cancel all polls in case of state change to avoid accumulated polls.
scope.$on('$stateChangeStart', () => {
kdPoll_.removeAll();

This comment has been minimized.

@olekzabl

olekzabl Feb 7, 2017

Contributor

So this is the place where you fixed the failure scenario, right?
Indeed, now it works for me :)
Still, I have a feeling that the code would be a little safer if you just passed the "options" thing from poll() to start() and stop() as a parameter, rather than introducing "currentOptions" as a de-facto global variable. (I'm fine about "optionsList"). Or am I missing something? Anyway, I won't insist on that.

This comment has been minimized.

@batikanu

batikanu Feb 7, 2017

Contributor

Honestly I can't remember exactly the scenario. It is long time ago but they were intended to be public methods can be called outside without knowing about options. However, I see that they are not used publicly anywhere. I will remove them completely. They can be implemented later in a suitable way according to the need.

This comment has been minimized.

@batikanu

batikanu Feb 7, 2017

Contributor

Yes it was the solution. It also solved namespaces issue I think.

@batikanu batikanu force-pushed the batikanu:auto-refresh branch 2 times, most recently from 9cdab4a to 6e93161 Feb 7, 2017

@batikanu batikanu force-pushed the batikanu:auto-refresh branch from 6e93161 to 1a06bed Feb 7, 2017

@batikanu

This comment has been minimized.

Copy link
Contributor

batikanu commented Feb 7, 2017

@olekzabl thanks for the improvements. PTAL

@olekzabl
Copy link
Contributor

olekzabl left a comment

Looks nice! Let me just click through it once more.

@olekzabl
Copy link
Contributor

olekzabl left a comment

Unfortunately, one more issue, found by @bryk:
In a list of pods, let's have 50 items in total, shown as 10 per page, and navigate to the 2nd or 3rd page. Then, the polling requests sent to the backend still have &page=1.

@batikanu

This comment has been minimized.

Copy link
Contributor

batikanu commented Feb 8, 2017

@olekzabl It is easy to fix it but I've found bigger issue with pagination lists.
If user is in page 2 and all the pods in page 2 are deleted before polling then user gets "There is nothing to display here" page. Other way around, if user has less than 10 pods which means single page and if new pods are created which require 2 pages, pagination arrows are not displayed for navigation. Currently I am thinking about how to update resourcecardlistpagination on each poll. Any ideas are more than welcome :)

@maciaszczykm

This comment has been minimized.

Copy link
Member

maciaszczykm commented Sep 25, 2017

It will be done in a different way. Closing it as stale. We can still check it when needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment