Make single API call to get events and filter them on backend #587

floreks · 2016-03-29T13:53:12Z

This is my proposal to solve issue #443 as discussed with @bryk here: floreks@f050cf3

This change will improve performance of both replication controllers list page and replication controller detail page. Now instead of making N api calls to get pod events there is only 1 API call that gets all events in given namespace and then filters events based on given list of pods.

This should be extended to all places where N api calls are made based on some resource i.e. RCs.

This also fixes #402. When last pod on the list didn't have error, card state was shown as pending.

@maciaszczykm @bryk @cheld PTAL

This change is

codecov-io · 2016-03-29T14:07:17Z

Current coverage is `86.89%`

Merging #587 into master will not affect coverage as of 184f8aa

@@            master    #587   diff @@
======================================
  Files          102     102       
  Stmts          870     870       
  Branches         0       0       
  Methods          0       0       
======================================
  Hit            756     756       
  Partial          0       0       
  Missed         114     114

Review entire Coverage Diff as of 184f8aa

Powered by Codecov. Updated on successful CI builds.

bryk · 2016-03-30T08:59:40Z

Can we have one events list request per page refresh? Right now it is O(number or cards). In normal case it takes long time to get all events. I did the timing:

2016/03/30 10:57:22 Outcoming response to [::1]:42433 with 201 status code
2016/03/30 10:57:35 Incoming HTTP/1.1 GET /api/v1/replicationcontrollers request from [::1]:42434
2016/03/30 10:57:35 Getting list of all replication controllers in the cluster
2016/03/30 10:57:36 Get took 321.305297ms
2016/03/30 10:57:36 Filter took 1.442698ms
2016/03/30 10:57:36 Get took 184.528953ms
2016/03/30 10:57:36 Filter took 1.551455ms
2016/03/30 10:57:36 Get took 129.390919ms
2016/03/30 10:57:36 Filter took 3.112µs
2016/03/30 10:57:36 Get took 129.93973ms
2016/03/30 10:57:36 Filter took 4.145µs
2016/03/30 10:57:37 Get took 130.349689ms
2016/03/30 10:57:37 Filter took 4.43µs
2016/03/30 10:57:37 Get took 129.520676ms
2016/03/30 10:57:37 Filter took 4.229µs
2016/03/30 10:57:37 Get took 129.582782ms
2016/03/30 10:57:37 Filter took 3.962µs

Reviewed 3 of 5 files at r1.
Review status: 3 of 5 files reviewed at latest revision, 1 unresolved discussion.

src/app/backend/eventscommon.go, line 67 [r1] (raw file):
How about filtering by UID rather than the name? event.InvolvedObject.UID exists. It is more precise than the name.

Comments from the review on Reviewable.io

floreks · 2016-03-30T09:33:06Z

It's done this way to narrow number of events that we get for filtering to single namespace. I was tempted to do single call but also it may be dangerous for big clusters with many namespaces. Filtering takes O(n + m) where n is number of pods and m is number of events we need to filter.

For now we can make single call but keep in mind that for some corner cases it may be actually slower.

Review status: 3 of 5 files reviewed at latest revision, 1 unresolved discussion.

src/app/backend/eventscommon.go, line 67 [r1] (raw file):
Sure, that makes sense. I've used name as it's unique for the namespace and I couldn't think of a case where we get events for list of pods located in different namespaces.

Comments from the review on Reviewable.io

bryk · 2016-03-30T09:39:13Z

I think this is slower in typical cases (many valid cards) and faster in corner cases (there's a card with many pods).

In the case of RC list we get events from all namespaces anyway, so we could get them in one call.

Or, how about making one call for each namespace? Instead of a call for each card?

Review status: 3 of 5 files reviewed at latest revision, 1 unresolved discussion.

Comments from the review on Reviewable.io

floreks · 2016-03-30T10:12:41Z

I agree. Ideally we should get events per namespace only once and then use already retrieved data without making API call again.

Review status: 3 of 5 files reviewed at latest revision, 1 unresolved discussion.

Comments from the review on Reviewable.io

floreks · 2016-03-31T09:06:52Z

Review status: 2 of 6 files reviewed at latest revision, 2 unresolved discussions.

src/app/backend/replicationcontrollerlist.go, line 101 [r2] (raw file):
I'm thinking about solution design for this problem. We need some kind of storage to keep events per namespace at least for the time when they are used in single place. Now responsibility of freeing the memory and invalidating 'cached' events is on dev. We could always make some go routine and set timeout to some short period of time to make sure that it will be cleared. Other possibility is to use some 3rd party or built in caching library for go.

Simple solution that would not need any caching would be to get all events in all namespaces and just filter them.

@maciaszczykm @bryk What do you think?

Comments from the review on Reviewable.io

bryk · 2016-03-31T09:15:02Z

Review status: 2 of 6 files reviewed at latest revision, 2 unresolved discussions.

src/app/backend/replicationcontrollerlist.go, line 101 [r2] (raw file):
Do not do global caching in the UI code for now. We should keep the backend stateless for as long as possible. When we decide to add state to the backend, this should be carefully designed, not just added. Do you agree?

How about you do caching on per-request basis? I.e., you create a cache (e.g., Map) at the start of the request and then pass it to methods that use/fill it. When the request is done, the cache is gone.

Other way is to get all events for the list page and from one namespace for details page. Basically two implementations.

Does this help?

Comments from the review on Reviewable.io

maciaszczykm · 2016-03-31T09:19:02Z

Review status: 2 of 6 files reviewed at latest revision, 2 unresolved discussions, some commit checks broke.

src/app/backend/replicationcontrollerlist.go, line 101 [r2] (raw file):
I'd also try to avoid global caching. It can bring many issues and we are planning to do many changes in the application now. Of course it would improve performance, but at the moment I'd use maps or something like that.

Comments from the review on Reviewable.io

floreks · 2016-03-31T09:20:03Z

Review status: 2 of 6 files reviewed at latest revision, 2 unresolved discussions, some commit checks broke.

src/app/backend/replicationcontrollerlist.go, line 101 [r2] (raw file):
Creating map at the beginning of the rquest was my first thought but then I'd need method which accepts map, client and namespace. It just isn't very intuitive. I don't like global cache also because it introduces other complex issues. What I need is to keep state of object at least per method call, but what I want is to keep code simple and not force devs to pass map to the method.

Should we just stick with getting all events for list page and ignore namespaces for now?

Comments from the review on Reviewable.io

floreks · 2016-03-31T09:46:41Z

Getting all events should be sufficient for most use cases. Now for 1000 pods and 650 events it takes roughly ~500us to filter them.

2016/03/31 11:45:09 Events: 654
2016/03/31 11:45:09 Pods: 1000
2016/03/31 11:45:09 Filterting took: 421.13µs
2016/03/31 11:45:09 Events: 654
2016/03/31 11:45:09 Pods: 45
2016/03/31 11:45:09 Filterting took: 218.857µs
2016/03/31 11:45:09 Events: 654
2016/03/31 11:45:09 Pods: 88
2016/03/31 11:45:09 Filterting took: 112.969µs
2016/03/31 11:45:09 Events: 654
2016/03/31 11:45:09 Pods: 100
2016/03/31 11:45:09 Filterting took: 121.288µs
2016/03/31 11:45:09 Events: 654
2016/03/31 11:45:09 Pods: 0
2016/03/31 11:45:09 Filterting took: 35.015µs

To not overcomplicate things I'll prepare simpler solution.

Review status: 2 of 6 files reviewed at latest revision, 2 unresolved discussions, some commit checks broke.

Comments from the review on Reviewable.io

floreks · 2016-03-31T09:51:19Z

PTAL

bryk · 2016-03-31T11:52:21Z

Nice! I see huge speedup. Thanks for fixing this!

Reviewed 1 of 4 files at r2, 4 of 5 files at r3.
Review status: all files reviewed at latest revision, 1 unresolved discussion.

src/app/backend/replicationcontrollerlist.go, line 101 [r2] (raw file):
Sounds really good!

Comments from the review on Reviewable.io

bryk · 2016-03-31T11:52:46Z

@maciaszczykm Please also review and if all is OK merge.

Review status: all files reviewed at latest revision, 1 unresolved discussion.

Comments from the review on Reviewable.io

floreks · 2016-03-31T12:13:05Z

Review status: 2 of 5 files reviewed at latest revision, 1 unresolved discussion.

src/app/backend/replicationcontrollerlist.go, line 101 [r2] (raw file):
Done.

Comments from the review on Reviewable.io

bryk · 2016-03-31T12:16:13Z

Reviewed 3 of 3 files at r4.
Review status: all files reviewed at latest revision, 1 unresolved discussion.

Comments from the review on Reviewable.io

maciaszczykm · 2016-03-31T12:22:52Z

Review status: all files reviewed at latest revision, all discussions resolved, all commit checks successful.

Comments from the review on Reviewable.io

googlebot added the cla: yes label Mar 29, 2016

floreks force-pushed the concurrent-api-calls branch from 86f10ec to 47ba892 Compare March 31, 2016 08:59

maciaszczykm mentioned this pull request Mar 31, 2016

Release v1.0.1 #601

Merged

floreks force-pushed the concurrent-api-calls branch from 47ba892 to 1fb4905 Compare March 31, 2016 09:51

floreks force-pushed the concurrent-api-calls branch from 1fb4905 to 0d6e5bc Compare March 31, 2016 11:57

Make single API call to get events and filter them on backend

0d6e5bc

maciaszczykm merged commit f0d0ee1 into kubernetes:master Mar 31, 2016

floreks deleted the concurrent-api-calls branch March 31, 2016 12:23

Make single API call to get events and filter them on backend #587

Make single API call to get events and filter them on backend #587

Uh oh!

Conversation

floreks commented Mar 29, 2016

Uh oh!

codecov-io commented Mar 29, 2016

Current coverage is 86.89%

Uh oh!

bryk commented Mar 30, 2016

Uh oh!

floreks commented Mar 30, 2016

Uh oh!

bryk commented Mar 30, 2016

Uh oh!

floreks commented Mar 30, 2016

Uh oh!

floreks commented Mar 31, 2016

Uh oh!

bryk commented Mar 31, 2016

Uh oh!

maciaszczykm commented Mar 31, 2016

Uh oh!

floreks commented Mar 31, 2016

Uh oh!

floreks commented Mar 31, 2016

Uh oh!

floreks commented Mar 31, 2016

Uh oh!

bryk commented Mar 31, 2016

Uh oh!

bryk commented Mar 31, 2016

Uh oh!

floreks commented Mar 31, 2016

Uh oh!

bryk commented Mar 31, 2016

Uh oh!

maciaszczykm commented Mar 31, 2016

Uh oh!

Uh oh!

Current coverage is `86.89%`