[FEATURE] Consolidate Instance Manager Engine & Replica for resource consumption reduction #5208

innobead · 2023-01-05T02:56:06Z

Is your improvement request related to a feature? Please describe (👍 if you like this request)

Currently, we will have an instance manager engine & replica pods on each node, and each of them will have a default 12% Guaranteed CPU request which is configurable. After upgrading Longhorn, there will be another two new instance manager pods running, so there will be four instance managers pods before all volumes are migrated to the new version. This causes high resource requirements for the fresh install and upgrade as well.

Engine/Replica processes running in the instance manager are process-based, so they will not impact the instance manager container. To simplify the architecture to decrease resource usage, the goal here is to consolidate the instance manager engine & replica to one pod, but continue serving all data plane operations on each node without any change. For volume migration, the same flow is as usual.

Describe the solution you'd like

Introduce the new architecture of the instance manager
Revamp the current automatic engine upgrade implementation
Benchmark the resource consumption based on the number/size of volume. Have a theoretical explanation for the resource usage by instance manager.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

rancher/image-mirror#329 (comment)

cc @longhorn/dev

c3y1huang · 2023-01-05T04:16:41Z

Note: #1691 (comment)

c3y1huang · 2023-03-07T01:26:18Z

Initial test results:

core tests: https://ci.longhorn.io/job/private/job/longhorn-tests-regression/3112/
core tests (upgrade): https://ci.longhorn.io/job/private/job/longhorn-tests-regression/3120/

longhorn-io-github-bot · 2023-03-15T07:58:04Z

Pre Ready-For-Testing Checklist

c3y1huang · 2023-05-03T05:11:32Z

@longhorn/qa , for this feature, please also do some tests using the UI to see if the frontend needs to adjust, thanks!

cc @smallteeths

yangchiu · 2023-05-04T07:03:30Z

Tested on master-head, although there are volumes created and attached, the Ref Count Volume in Instance Manager Image page is always 0, and the Volumes modal of each instance manager is always empty:

Also the Instance Manager Image in Components modal of each node is N/A:

Not sure if it's normal. @c3y1huang

innobead · 2023-05-04T07:08:46Z

This is not right.

@yangchiu btw, do you know if we have any e2e test cases related to the reference count of instance image?

yangchiu · 2023-05-04T08:59:39Z

This is not right.

@yangchiu btw, do you know if we have any e2e test cases related to the reference count of instance image?

We only have test cases for engine image reference count, and there's no test case for instance manager reference count.

c3y1huang · 2023-05-08T01:09:13Z

Tested on master-head, although there are volumes created and attached, the Ref Count Volume in Instance Manager Image page is always 0, and the Volumes modal of each instance manager is always empty:

Also the Instance Manager Image in Components modal of each node is N/A:

Not sure if it's normal. @c3y1huang

Thanks @yangchiu , this should be related to the UI as we have not yet implemented support for the new aio instance managers, for example: https://github.com/longhorn/longhorn-ui/blob/master/src/models/host.js#L122-L127.

Let's track the UI implementation in #5876

yangchiu · 2023-06-02T01:18:30Z

Verified passed on master-head (longhorn-manager 6631855, longhorn-ui 51e912d). Operations work well on Longhorn ui, and it presenta the correct information about the new aio type of instance manager. The above mentioned ui issue has been corrected.

- Add the setting added in longhorn/longhorn-manager#1731 in the helm chart - Related to longhorn#5208 Signed-off-by: Yarden Shoham <git@yardenshoham.com>

- Add the setting added in longhorn/longhorn-manager#1731 in the helm chart - Related to #5208 Signed-off-by: Yarden Shoham <git@yardenshoham.com>

- Add the setting added in longhorn/longhorn-manager#1731 in the helm chart - Related to #5208 Signed-off-by: Yarden Shoham <git@yardenshoham.com> (cherry picked from commit 339e501)

- Add the setting added in longhorn/longhorn-manager#1731 in the helm chart - Related to longhorn#5208 Signed-off-by: Yarden Shoham <git@yardenshoham.com>

innobead added component/longhorn-instance-manager Longhorn instance manager (interface between control and data plane) priority/0 Must be fixed in this release (managed by PO) kind/improvement Request for improvement of existing function labels Jan 5, 2023

innobead added this to the v1.5.0 milestone Jan 5, 2023

innobead assigned c3y1huang Jan 5, 2023

innobead added area/performance System, volume performance highlight Important feature/issue to highlight labels Jan 5, 2023

derekbit mentioned this issue Jan 13, 2023

[FEATURE] Local volume data path pass-through #4935

Open

innobead added the area/storage-network Storage network for control plane or data plane label Mar 29, 2023

This was referenced Apr 11, 2023

[IMPROVEMENT] Remove proxy server for instance manager replica pod #3968

Closed

[IMPROVEMENT] Upgrade with limited resources #3897

Closed

c3y1huang mentioned this issue May 3, 2023

[IMPROVEMENT] Deprecate instance manager type #5843

Closed

innobead changed the title ~~[IMPROVEMENT] Consolidate Instance Manager Engine & Replica for resource consumption reduction~~ [FEATURE] Consolidate Instance Manager Engine & Replica for resource consumption reduction May 3, 2023

innobead added kind/feature Feature request, new feature and removed kind/improvement Request for improvement of existing function labels May 3, 2023

innobead mentioned this issue May 3, 2023

[TASK] Remove deprecated instances field and instance type from instance manager CR #5844

Open

2 tasks

innobead assigned yangchiu May 3, 2023

yangchiu mentioned this issue May 4, 2023

test: fix instance manager consolidation fails pod updating related test cases longhorn/longhorn-tests#1345

Merged

c3y1huang mentioned this issue May 8, 2023

[TASK][UI] support new aio typed instance managers #5876

Closed

1 task

c3y1huang mentioned this issue May 11, 2023

[TASK] Remove Guaranteed Engine Manager CPU, Guaranteed Replica Manager CPU, and Guaranteed Engine CPU settings. #5917

Closed

yangchiu closed this as completed Jun 2, 2023

roger-ryao mentioned this issue Jun 2, 2023

Add manual test for Test Node Drain Policy Setting AND remove deprecated allow-node-drain-with-last-healthy-replica setting longhorn/longhorn-tests#1395

Merged

yardenshoham mentioned this issue Aug 6, 2023

chart: Update settings based on the instance managers consolidation #6458

Merged

innobead pushed a commit that referenced this issue Aug 7, 2023

chart: Update settings based on the instance managers consolidation

339e501

- Add the setting added in longhorn/longhorn-manager#1731 in the helm chart - Related to #5208 Signed-off-by: Yarden Shoham <git@yardenshoham.com>

roger-ryao mentioned this issue Nov 1, 2023

[TEST] Update Test System Upgrade with New Instance Manager #7013

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Consolidate Instance Manager Engine & Replica for resource consumption reduction #5208

[FEATURE] Consolidate Instance Manager Engine & Replica for resource consumption reduction #5208

innobead commented Jan 5, 2023 •

edited

c3y1huang commented Jan 5, 2023

c3y1huang commented Mar 7, 2023 •

edited

longhorn-io-github-bot commented Mar 15, 2023 •

edited by c3y1huang

c3y1huang commented May 3, 2023 •

edited

yangchiu commented May 4, 2023

innobead commented May 4, 2023

yangchiu commented May 4, 2023

c3y1huang commented May 8, 2023 •

edited

yangchiu commented Jun 2, 2023

[FEATURE] Consolidate Instance Manager Engine & Replica for resource consumption reduction #5208

[FEATURE] Consolidate Instance Manager Engine & Replica for resource consumption reduction #5208

Comments

innobead commented Jan 5, 2023 • edited

Is your improvement request related to a feature? Please describe (👍 if you like this request)

Describe the solution you'd like

Describe alternatives you've considered

Additional context

c3y1huang commented Jan 5, 2023

c3y1huang commented Mar 7, 2023 • edited

longhorn-io-github-bot commented Mar 15, 2023 • edited by c3y1huang

Pre Ready-For-Testing Checklist

c3y1huang commented May 3, 2023 • edited

yangchiu commented May 4, 2023

innobead commented May 4, 2023

yangchiu commented May 4, 2023

c3y1huang commented May 8, 2023 • edited

yangchiu commented Jun 2, 2023

innobead commented Jan 5, 2023 •

edited

c3y1huang commented Mar 7, 2023 •

edited

longhorn-io-github-bot commented Mar 15, 2023 •

edited by c3y1huang

c3y1huang commented May 3, 2023 •

edited

c3y1huang commented May 8, 2023 •

edited