Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Consolidate Instance Manager Engine & Replica for resource consumption reduction #5208

Closed
2 of 3 tasks
innobead opened this issue Jan 5, 2023 · 9 comments
Closed
2 of 3 tasks
Assignees
Labels
area/performance System, volume performance area/storage-network Storage network for control plane or data plane component/longhorn-instance-manager Longhorn instance manager (interface between control and data plane) highlight Important feature/issue to highlight kind/feature Feature request, new feature priority/0 Must be fixed in this release (managed by PO)
Milestone

Comments

@innobead
Copy link
Member

innobead commented Jan 5, 2023

Is your improvement request related to a feature? Please describe (馃憤 if you like this request)

Currently, we will have an instance manager engine & replica pods on each node, and each of them will have a default 12% Guaranteed CPU request which is configurable. After upgrading Longhorn, there will be another two new instance manager pods running, so there will be four instance managers pods before all volumes are migrated to the new version. This causes high resource requirements for the fresh install and upgrade as well.

Engine/Replica processes running in the instance manager are process-based, so they will not impact the instance manager container. To simplify the architecture to decrease resource usage, the goal here is to consolidate the instance manager engine & replica to one pod, but continue serving all data plane operations on each node without any change. For volume migration, the same flow is as usual.

Describe the solution you'd like

  • Introduce the new architecture of the instance manager
  • Revamp the current automatic engine upgrade implementation
  • Benchmark the resource consumption based on the number/size of volume. Have a theoretical explanation for the resource usage by instance manager.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

rancher/image-mirror#329 (comment)

cc @longhorn/dev

@innobead innobead added component/longhorn-instance-manager Longhorn instance manager (interface between control and data plane) priority/0 Must be fixed in this release (managed by PO) kind/improvement Request for improvement of existing function labels Jan 5, 2023
@innobead innobead added this to the v1.5.0 milestone Jan 5, 2023
@innobead innobead added area/performance System, volume performance highlight Important feature/issue to highlight labels Jan 5, 2023
@c3y1huang
Copy link
Contributor

Note: #1691 (comment)

@c3y1huang
Copy link
Contributor

c3y1huang commented Mar 7, 2023

@longhorn-io-github-bot
Copy link

longhorn-io-github-bot commented Mar 15, 2023

Pre Ready-For-Testing Checklist

  • Where is the reproduce steps/test steps documented?
    The reproduce steps/test steps are at:

  • Is there a workaround for the issue? If so, where is it documented?
    The workaround is at:

  • Does the PR include the explanation for the fix or the feature?

  • Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart?
    The PR for the YAML change is at:

  • Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)?
    The PR is at:

  • Which areas/issues this PR might have potential impacts on?
    Area manager, instance manager
    Issues

  • If labeled: require/LEP Has the Longhorn Enhancement Proposal PR submitted?
    The LEP PR is at:

  • If labeled: area/ui Has the UI issue filed or ready to be merged (including backport-needed/*)?
    The UI issue/PR is at

  • If labeled: require/doc Has the necessary document PR submitted or merged (including backport-needed/*)?
    The documentation issue/PR is at:

  • If labeled: require/automation-e2e Has the end-to-end test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue (including backport-needed/*)
    The automation skeleton PR is at
    The automation test case PR is at:

  • If labeled: require/automation-engine Has the engine integration test been merged (including backport-needed/*)?
    The engine automation PR is at

  • If labeled: require/manual-test-plan Has the manual test plan been documented?
    The updated manual test plan is at

  • If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility?
    The compatibility issue is filed at

@innobead innobead added the area/storage-network Storage network for control plane or data plane label Mar 29, 2023
@innobead innobead changed the title [IMPROVEMENT] Consolidate Instance Manager Engine & Replica for resource consumption reduction [FEATURE] Consolidate Instance Manager Engine & Replica for resource consumption reduction May 3, 2023
@innobead innobead added kind/feature Feature request, new feature and removed kind/improvement Request for improvement of existing function labels May 3, 2023
@c3y1huang
Copy link
Contributor

c3y1huang commented May 3, 2023

@longhorn/qa , for this feature, please also do some tests using the UI to see if the frontend needs to adjust, thanks!

cc @smallteeths

@yangchiu
Copy link
Member

yangchiu commented May 4, 2023

Tested on master-head, although there are volumes created and attached, the Ref Count Volume in Instance Manager Image page is always 0, and the Volumes modal of each instance manager is always empty:

Screenshot from 2023-05-04 13-52-25

Screenshot from 2023-05-04 13-52-41

Also the Instance Manager Image in Components modal of each node is N/A:

Screenshot from 2023-05-04 13-51-53

Not sure if it's normal. @c3y1huang

@innobead
Copy link
Member Author

innobead commented May 4, 2023

This is not right.

@yangchiu btw, do you know if we have any e2e test cases related to the reference count of instance image?

@yangchiu
Copy link
Member

yangchiu commented May 4, 2023

This is not right.

@yangchiu btw, do you know if we have any e2e test cases related to the reference count of instance image?

We only have test cases for engine image reference count, and there's no test case for instance manager reference count.

@c3y1huang
Copy link
Contributor

c3y1huang commented May 8, 2023

Tested on master-head, although there are volumes created and attached, the Ref Count Volume in Instance Manager Image page is always 0, and the Volumes modal of each instance manager is always empty:

Screenshot from 2023-05-04 13-52-25

Screenshot from 2023-05-04 13-52-41

Also the Instance Manager Image in Components modal of each node is N/A:

Screenshot from 2023-05-04 13-51-53

Not sure if it's normal. @c3y1huang

Thanks @yangchiu , this should be related to the UI as we have not yet implemented support for the new aio instance managers, for example: https://github.com/longhorn/longhorn-ui/blob/master/src/models/host.js#L122-L127.

Let's track the UI implementation in #5876

@yangchiu
Copy link
Member

yangchiu commented Jun 2, 2023

Verified passed on master-head (longhorn-manager 6631855, longhorn-ui 51e912d). Operations work well on Longhorn ui, and it presenta the correct information about the new aio type of instance manager. The above mentioned ui issue has been corrected.

@yangchiu yangchiu closed this as completed Jun 2, 2023
yardenshoham added a commit to yardenshoham/longhorn that referenced this issue Aug 6, 2023
- Add the setting added in longhorn/longhorn-manager#1731 in the helm chart
- Related to longhorn#5208

Signed-off-by: Yarden Shoham <git@yardenshoham.com>
yardenshoham added a commit to yardenshoham/longhorn that referenced this issue Aug 6, 2023
- Add the setting added in longhorn/longhorn-manager#1731 in the helm chart
- Related to longhorn#5208

Signed-off-by: Yarden Shoham <git@yardenshoham.com>
yardenshoham added a commit to yardenshoham/longhorn that referenced this issue Aug 6, 2023
- Add the setting added in longhorn/longhorn-manager#1731 in the helm chart
- Related to longhorn#5208

Signed-off-by: Yarden Shoham <git@yardenshoham.com>
innobead pushed a commit that referenced this issue Aug 7, 2023
- Add the setting added in longhorn/longhorn-manager#1731 in the helm chart
- Related to #5208

Signed-off-by: Yarden Shoham <git@yardenshoham.com>
mergify bot pushed a commit that referenced this issue Aug 7, 2023
- Add the setting added in longhorn/longhorn-manager#1731 in the helm chart
- Related to #5208

Signed-off-by: Yarden Shoham <git@yardenshoham.com>
(cherry picked from commit 339e501)
innobead pushed a commit that referenced this issue Aug 7, 2023
- Add the setting added in longhorn/longhorn-manager#1731 in the helm chart
- Related to #5208

Signed-off-by: Yarden Shoham <git@yardenshoham.com>
(cherry picked from commit 339e501)
derekbit pushed a commit to derekbit/longhorn that referenced this issue Sep 3, 2023
- Add the setting added in longhorn/longhorn-manager#1731 in the helm chart
- Related to longhorn#5208

Signed-off-by: Yarden Shoham <git@yardenshoham.com>
derekbit pushed a commit to derekbit/longhorn that referenced this issue Dec 18, 2023
- Add the setting added in longhorn/longhorn-manager#1731 in the helm chart
- Related to longhorn#5208

Signed-off-by: Yarden Shoham <git@yardenshoham.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance System, volume performance area/storage-network Storage network for control plane or data plane component/longhorn-instance-manager Longhorn instance manager (interface between control and data plane) highlight Important feature/issue to highlight kind/feature Feature request, new feature priority/0 Must be fixed in this release (managed by PO)
Projects
None yet
Development

No branches or pull requests

4 participants