Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Maglev hashing to scheduler options in config.proto #89

Merged
merged 1 commit into from Jun 12, 2020

Conversation

unicell
Copy link
Contributor

@unicell unicell commented Jun 11, 2020

Maglev hashing became available since v4.18 kernel. This change adds the
support to make it a choice with Seesaw.

@liuyuan10
Copy link
Member

mh is more useful to avoid using ipvs connection sync daemon. Have you tried to use mh to do a seamless failover? Last time I check it, you need net.ipv4.vs.sloppy_tcp to keep existing connection going. It's a global config so I'm not sure what's the impact of that for other non mh services.

@unicell
Copy link
Contributor Author

unicell commented Jun 11, 2020

Yes, you're right. Let me explain our use case a little bit. I'm currently running an in-house control plane with some pieces from seesaw. And it is based on an in-house consistent hashing kernel module ipvs. And yes, sloppy_tcp is required in that setup, so seamless failover can happen without conntrack sync.

The plan my side is to migrate over to Maglev hashing our side. Through that process, I'm trying to sync some patches back to seesaw.

So in short, I'm hoping to shape it in the way that:

  • seesaw can still be leveraged in our use cases (Not as a single full piece. For example, there's no conntrack sync in our setup)
  • upstream in-house patches as much as possible
  • not to break how seesaw being used in Google or in the community

This PR is one of them. It does not break existing use cases while making it possible for someone (like me) to fully switch to Maglev without having conntrack sync. Does that make sense to you? Let me know. Thanks.

@liuyuan10
Copy link
Member

Your change is completely fine. I'm just curious what's the impact of turning on sloppy_tcp in a use case where both mh and other scheduling method is used (schedule method is a per service config while sloppy_tcp is global). I think the impact is that for other scheduling method, a failover session will not be reset immediately by seesaw machine but instead by real backends which seems OK to me.

Could you add some comment int the proto so that people know what it is. I think we need these in the comment:

  1. this is only for 4.18+
  2. you need net.ipv4.vs.sloppy_tcp to have seamless failover (why can't I find a ipv6 counterpart)
  3. impact of sloppy_tcp to other non-mh services.

Maglev hashing became available since v4.18 kernel. This change adds the
support to make it a choice with Seesaw.
@unicell
Copy link
Contributor Author

unicell commented Jun 11, 2020

Your change is completely fine. I'm just curious what's the impact of turning on sloppy_tcp in a use case where both mh and other scheduling method is used (schedule method is a per service config while sloppy_tcp is global). I think the impact is that for other scheduling method, a failover session will not be reset immediately by seesaw machine but instead by real backends which seems OK to me.

I think it's not just about turning on sloppy_tcp but also turning off contrack sync entirely, which probably doesn't make sense if on other scheduler settings.

Could you add some comment int the proto so that people know what it is. I think we need these in the comment:

  1. this is only for 4.18+

Done

  1. you need net.ipv4.vs.sloppy_tcp to have seamless failover (why can't I find a ipv6 counterpart)

Done. I think the same sysctl setting works for ipv6 as well per ipvs implementation in net/netfilter/ipvs/ip_vs_proto_tcp.c

  1. impact of sloppy_tcp to other non-mh services.

As above, I think conntrack sync is the key here not sloppy_tcp. It probably doesn't make sense to have Maglev + conntrack sync + sloppy_tcp. And probably the same for using Non-Maglev + turning off conntrack sync.

So updated comments in proto to reflect the thoughts above and assume actual deployer understand the implication.

Let me know if that makes any sense.

@liuyuan10 liuyuan10 merged commit 3e9b09e into google:master Jun 12, 2020
@unicell unicell deleted the maglev branch June 12, 2020 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants