Backport: UI/API Slowness #19443

cjellick · 2019-04-08T13:52:33Z

Backport #18870

Fixes
NOTE: These fixes should improve performance for all types of clusters whether RKE, non-RKE, imported or not.

A. Filter by namespaces:
What: When a project is selected, norman retrieves its associated namespaces. Rancher then uses those namespaces to query workloads and other project resources.

B. Properly response to -1 pagination flag:
What: When -1 is passed to the API, the max limit is set. The max limit has been raised to 10k. This is because pagination is done on the way out of the backend. This means if X pages are provided then called by the UI, it takes X times as long as a non-paginated response.

C. Increase size of parent cache:
What: The cache used to map pods to their parents has had its size increased to 100k. When it was 1k, any resource load over 1k would cause all entries to shift out. This rendered the cache nearly unused if a user had many resources.

How to test: Check the "Steps to reproduce" section for steps corresponding to each fix.

Steps to reproduce (least amount of steps as possible):
I have been able to reproduce the behavior with the following steps:
NOTE: I mentioned the type of cluster I used however, any type of cluster should work for reproducing issues, as mentioned above.

A:

Use RKE on 8gb DO instance
Import to Rancher
Create new project and namespace.
Create additional project and namespace.
Launch 220 workloads with large environment variable: https://github.com/rmweir/many-workloads/ in first project/namespace pair.
Launch a a couple workloads without large environment variable in second project/namespace pair.
Create one more empty project with one empty namespace.
Switch between projects.

Before fix: Both projects and project related API calls will take similar times to load.

After fix: Projects containing less resources should load faster.

B:

Use GKE default instance.
Launch 2050 workloads in a single project and namespace. Can use the script from previous steps, but do NOT include environment variable.
Navigate to project, and inspect (networking tab).
Refresh page. Observe number of calls made to project related resources such as pods and workloads.

Before fix: multiple requests of varying yet similar length
After fix: 1 request for each project related resource

C:

Follow steps from B.
Navigate to project containing large number of pods

Before fix: Project related resource API calls, particularly pods, take a long time to load (if hosting rancher locally this time will likely be >50 seconds.

After fix: API calls complete and page is useable in a fractions of the time. (API calls should if rancher is local should be <10 seconds.

cjellick · 2019-04-08T14:01:13Z

@rmweir I need you to enumerate each enhancement/fix you made and how they can be tested

sowmyav27 · 2019-04-09T19:58:30Z

Verified in rancher:v2.2.2-rc5.
A.
Steps taken:

Created an Amazon EC2 cluster.
Created a project with 220 workloads, with large environment variable.
Created a project/namespace with 3 workloads without large environment variable.
Created a project/namespace without any workloads.
Created a project with 100 namespaces with one workload each
Verified the below:
Projects containing fewer/no resources, loaded faster.
When switching between projects, did not take much time. Significantly lesser than the v2.2.1 rancher.
Verified in v2.2.1, that projects which had fewer resources/no resources, took longer to load, similar to the project which had 220 workloads.

B.
Steps taken:

Created an Amazon EC2 Cluster.
Launched 2050 workloads in a single project/namespace.
Navigate to project, and inspect (networking tab)
Refresh the page.
Verified:
The number of calls being made - workloads and pods call was 1 each.
Verified in rancher:v2.2.1, number of calls being made - workloads and pods call were 3 each.
When switching between projects, did not take much time. Significantly lesser than the v2.2.1 rancher.

C.
Steps taken:

Created an Amazon EC2 Cluster.
Launched 2050 workloads in a single project/namespace.
GO to Workload tabs
Verified:
On loading the workloads tab, the api call took less than 30 seconds to fetch the workloads and around 25 seconds to load the pods.
In rancher:v2.2.1, the action took more than 3 minutes to load the workloads.
When switching between projects, did not take much time. Significantly lesser than the v2.2.1 rancher.

cjellick added the kind/bug Issues that are defects reported by users or that we know have reached a real release label Apr 8, 2019

cjellick added this to the v2.2.2 milestone Apr 8, 2019

cjellick mentioned this issue Apr 8, 2019

Backport performance improvements #19442

Merged

cjellick added status/resolved labels Apr 8, 2019

cjellick assigned sangeethah and rmweir Apr 8, 2019

sangeethah assigned sowmyav27 Apr 8, 2019

sowmyav27 closed this as completed Apr 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport: UI/API Slowness #19443

Backport: UI/API Slowness #19443

cjellick commented Apr 8, 2019 •

edited by rmweir

cjellick commented Apr 8, 2019

sowmyav27 commented Apr 9, 2019 •

edited

Backport: UI/API Slowness #19443

Backport: UI/API Slowness #19443

Comments

cjellick commented Apr 8, 2019 • edited by rmweir

cjellick commented Apr 8, 2019

sowmyav27 commented Apr 9, 2019 • edited

cjellick commented Apr 8, 2019 •

edited by rmweir

sowmyav27 commented Apr 9, 2019 •

edited