Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make local VM host IPs '10.0.2.2' configurable #1468

Merged
merged 1 commit into from Jul 29, 2020

Conversation

okurz
Copy link
Member

@okurz okurz commented Jul 12, 2020

@codecov
Copy link

codecov bot commented Jul 12, 2020

Codecov Report

Merging #1468 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1468   +/-   ##
=======================================
  Coverage   56.13%   56.13%           
=======================================
  Files          54       54           
  Lines        6362     6362           
=======================================
  Hits         3571     3571           
  Misses       2791     2791           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bd050f8...93d6cf3. Read the comment docs.

@kalikiana
Copy link
Member

Consider this highlight in lieu of requesting a review from @AdamWill which is conceputally impossible.

@okurz
Copy link
Member Author

okurz commented Jul 13, 2020

Sorry, I can't parse your english :) Could you rephrase for me please?

@kalikiana
Copy link
Member

Sorry, I can't parse your english :) Could you rephrase for me please?

I can't request a review from Adam for this pull request. So I used a highlight.

@AdamWill
Copy link
Contributor

Hum. So let's see...

  • Most importantly, I think the os-autoinst-openvswitch bit here doesn't quite work, at least if I'm understanding it right. The existing code is essentially taking traffic from 10.0.2.* to 10.0.2.2 and rewriting it to appear to come from 10.0.1.something , using the MAC address of the interface to choose the 'something' in order to try and make sure there are not conflicts. This is so if we have a test hard coded to use, say, 10.0.2.10, and we get two instances of that test running at the same time, they don't conflict - we rewrite their traffic to 10.0.1.something, where something will be different for each test. This is also why the subnet mask has to cover 10.0.1. and 10.0.2. . AFAICS, your code here doesn't change the rewrite target from 10.0.1.something to anything else, so if you set OS_AUTOINST_BRIDGE_LOCAL_IP to 172.16.2.2 or something and have all the tests use 172.16.2.x , we'll still rewrite their traffic to appear to come from 10.0.1.x , and the bridge's netmask won't cover that range and it won't work. In my patch to avoid having to calculate this stuff I didn't allow the bridge IP and netmask to be arbitrary, I just added one additional choice and its appropriate 'rewrite target' to the logic.

Smaller notes:

  • OS_AUTOINST_BRIDGE_LOCAL_IP and _NETMASK should probably be documented somewhere / how? Including a note that the netmask can't be arbitrary but has to cover the range of IPs the bridge and workers will take, plus the range the worker traffic will be rewritten to appear to come from.

  • Couldn't we set the qemu options appropriately for QEMU_HOST_IP if they are not explicitly specified, rather than requiring the admin to specify them?

@okurz
Copy link
Member Author

okurz commented Jul 13, 2020

Well, the os-autoinst-openvswitch part is probably independant of the qemu usermode networking bits so I think it's better if I leave these changes out for now. About setting according qemu parameters, sure we could do this but I thought we could try one thing at a time. After all I did not really test this and would rely on you ;)

@AdamWill
Copy link
Contributor

"Well, the os-autoinst-openvswitch part is probably independant of the qemu usermode networking bits"

Hmm, I'm kinda struggling with this. You can kinda see it both ways. But I'm not sure how you can really use this as-is, at least if you have any tap tests, because - where are you going to set QEMU_HOST_IP where it will only be used for non-tap tests? You can't set it anywhere a tap test will pick it up, because then the test will try to talk to the 'host' at the specified IP but the bridge will still be at 10.0.2.2...

@okurz
Copy link
Member Author

okurz commented Jul 14, 2020

You can't set it anywhere a tap test will pick it up

yes, you are right. I haven't considered this. I misinterpreted the line return check_var('BACKEND', 'qemu') ? … as "if qemu-usermode-network ? $use_qemu_host_ip : $other_stuff_also_applicable_for_tap". So this becomes more complicated then :) Not sure if I can come up with a good idea

@AdamWill
Copy link
Contributor

Well, I don't think it's impossible. I think your initial proposal would work if we deal with the rewrite issue (perhaps by adopting my version which only allows for one specific alternate configuration?) and documented it properly ("you have to set QEMU_HOST_IP and OS_AUTOINST_BRIDGE_LOCAL_IP to the same value"). Alternatively, host_ip could check the value of NICTYPE and return QEMU_HOST_IP with a default of 10.0.2.2 if it's anything but "tap", but return OS_AUTOINST_BRIDGE_LOCAL_IP with a default of 10.0.2.2 if it's "tap"...

I think any solution here is gonna have some slightly rough edges, but we're working towards something that's manageable as long as we document it properly, and make sure the simple case (just leave everything at 10.0.2.2) remains simple.

WDYT?

@AdamWill
Copy link
Contributor

oh, I just remembered, in your proposal OS_AUTOINST_BRIDGE_LOCAL_IP is an environment variable, not an openQA test variable. So, I guess just document it. :P Unless we want to do something clever and have os-autoinst set the test variable during startup if the environment variable is set...

@okurz
Copy link
Member Author

okurz commented Jul 14, 2020

I agree with both your latest proposal. In general I would also foresee environment variables + test variables together with a clear precedence (test > env > default). So, where can we see your version then? :)

@AdamWill
Copy link
Contributor

AdamWill commented Jul 14, 2020

I should've seen that one coming :P

I'll try and work something up combining your basic framework with my half-done version, with BONUS ADDED DOCS, this week.

@okurz
Copy link
Member Author

okurz commented Jul 14, 2020

I would not fine with just very rough, hacky, dirty, incomplete code snippets

@AdamWill
Copy link
Contributor

AdamWill@010fdb4 is what I had so far. I haven't tested it yet, but that's where I was going.

@okurz okurz force-pushed the feature/ip branch 2 times, most recently from d0904d1 to f78218c Compare July 16, 2020 10:15
@okurz
Copy link
Member Author

okurz commented Jul 16, 2020

Ok, so what I have done now is simply make the "10.1.0.0" rewrite target configurable by environment variable as well and mentioned that in doc/networking.md . I would prefer to be not too smart in the scripts and not try to guess the rewrite target based on ip and netmask but rather ask for explicit configuration. Would this work for you?

@AdamWill
Copy link
Contributor

It should be workable, yeah. If we're going to make the admin do all the work it'll be good to document it carefully I guess. I'll try this out today and see how it goes - I still didn't actually test any version of this yet so maybe there are bear traps we haven't spotted yet :P

@AdamWill
Copy link
Contributor

AdamWill commented Jul 17, 2020

So I tried this out briefly today but it didn't go entirely smoothly. It did seem like two tests that took 172.16.2.x IPs could talk to each other, but tests didn't seem to be able to reach the public internet or, perhaps, the host IP. May well be errors on my part in the local config rather than anything in the PR, but thought I'd mention it. It's very Friday right now, so I'm gonna try it again on Monday.

@okurz
Copy link
Member Author

okurz commented Jul 18, 2020

Ok, good to know. I am awaiting further test results of yours

@AdamWill
Copy link
Contributor

So I found at least one problem in this: there's a sneaky hidden hardcoded IP address we missed. In the ARP rule, load:0xa010000->NXM_OF_ARP_SPA - the 0xa010000 there is 10.1.0.0 in hex. Because who doesn't love hexadecimal IP addresses, huh?

So, here's my fix for that:

my $rewrite_target = $ENV{OS_AUTOINST_BRIDGE_REWRITE_TARGET} // '10.1.0.0';
# we also need a hex-converted form of the rewrite target, thanks
# https://www.perlmonks.org/?node_id=704295
my $rewrite_target_hex = unpack('H*', pack('C*', split ('\.', $rewrite_target)));

...

    # arp e.g. 10.0.2.2 - learn rule for handling replies, rewrite ARP sender IP to e.g. 10.1.x.x range and send >
    # the learned rule rewrites ARP target to the original IP and sends the packet to the original port
    "table=0,priority=100,dl_type=0x0806,nw_dst=$local_ip,actions=" .                                            >
    'learn(table=1,priority=100,in_port=LOCAL,dl_type=0x0806,NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_OF_ARP_SP>
    "load:0x$rewrite_target_hex->NXM_OF_ARP_SPA[],move:NXM_OF_ETH_SRC[0..$netmask]->NXM_OF_ARP_SPA[0..$netmask],">
    'local',

I'm testing this out now.

@AdamWill
Copy link
Contributor

OK, I have my modified version deployed to Fedora production and it appears to be working. Yay.

@okurz
Copy link
Member Author

okurz commented Jul 24, 2020

cool, updated my commit accordingly.

@AdamWill
Copy link
Contributor

Thanks, I think you missed a bit though - it needs to be load:0x$rewrite_target_hex not load:$rewrite_target_hex. Well, I didn't actually test whether the latter works, but the former definitely does :)

@okurz
Copy link
Member Author

okurz commented Jul 24, 2020

Yeah, sorry. Got that wrong. Should have used a verbatim copy :D

@okurz
Copy link
Member Author

okurz commented Jul 25, 2020

ready

Copy link
Contributor

@Martchus Martchus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it is good to merge. (I haven't followed the entire conversation and all the details of the progress issue.)

@okurz okurz merged commit f9254b2 into os-autoinst:master Jul 29, 2020
@okurz okurz deleted the feature/ip branch July 29, 2020 12:21
@AdamWill
Copy link
Contributor

AdamWill commented Aug 1, 2020

yeah it was good to go. our instance is still working fine with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants