Skip to content

HA dpservice#656

Merged
guvenc merged 9 commits intomainfrom
feature/dpservice_ha
Apr 1, 2025
Merged

HA dpservice#656
guvenc merged 9 commits intomainfrom
feature/dpservice_ha

Conversation

@PlagueCZ
Copy link
Copy Markdown
Contributor

@PlagueCZ PlagueCZ commented Mar 18, 2025

This PR adds the ability to run parallel dpservices (both with separate hugepage area) where only one is ever the active one, the others are standing by (ensured by a flock() on a common file).

To properly orchestrate them so both have the same configuration, underlay address generation needs to be externally (in OSC by two metalnets), which is now supported (but optional) in gRPC protocol.

The last thing is a changed Mellanox/DPDK configuration to not create a default rule, which needlessly duplicates packets when two instances are running. This was previously possible by just changing the configuration of PF0, now also a DPDK patch is needed (see #643).

Fixes #643

Of course without metalnet logic changes, proper HA is not possible, but all needed changes to dpservice are done.

@github-actions github-actions bot added documentation Improvements or additions to documentation enhancement New feature or request size/XXL labels Mar 18, 2025
@PlagueCZ PlagueCZ force-pushed the feature/dpservice_ha branch from 828324f to 3d665ed Compare March 26, 2025 23:29
@PlagueCZ PlagueCZ force-pushed the feature/dpservice_ha branch from 3d665ed to 1b99d51 Compare March 26, 2025 23:37
@PlagueCZ
Copy link
Copy Markdown
Contributor Author

This is now running in OSC on multiple clusters and seems to be working fine.

I suggest looking at the commits of this PR separately as the are actually simple changes, just overly verbose due to the nature of what has been changed. For example for the gRPC protocol change, you can always see 6 blocks of changes (changed 6 calls), but in many places.

Also pre-DPDK-23.11.3 commit, everything is also working (except the number of VMs) so that can also simplify the review process if you look at that part separately.

@PlagueCZ PlagueCZ marked this pull request as ready for review March 27, 2025 16:13
@PlagueCZ PlagueCZ requested a review from a team as a code owner March 27, 2025 16:13
Comment thread src/grpc/dp_grpc_impl.c Outdated
@byteocean
Copy link
Copy Markdown
Contributor

I am able to successfully run the tests after applying all patches to DPDK on a server.

@PlagueCZ PlagueCZ force-pushed the feature/dpservice_ha branch from b3d10b6 to 6602ec3 Compare April 1, 2025 11:31
Copy link
Copy Markdown
Contributor

@guvenc guvenc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@guvenc guvenc merged commit 4395739 into main Apr 1, 2025
5 of 6 checks passed
@guvenc guvenc deleted the feature/dpservice_ha branch April 1, 2025 17:59
@hardikdr hardikdr added this to Roadmap Jun 26, 2025
@hardikdr hardikdr moved this to Done in Roadmap Oct 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/networking documentation Improvements or additions to documentation enhancement New feature or request size/XXL

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Investigate how two dpservice instances can be run in parallel

5 participants