-
Notifications
You must be signed in to change notification settings - Fork 62
Fix ownership bug in OPTE port management that can lead to deadlock #1695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I was able to reproduce the original deadlock with the following test. To repro, place the following code in #[cfg(test)]
mod test {
use super::PortManager;
use super::Logger;
use slog::o;
use super::Uuid;
use super::NetworkInterface;
use super::SourceNatConfig;
use slog::Drain;
use crate::illumos::dladm::MockDladm;
use crate::illumos::dladm::PhysicalLink;
use std::os::unix::process::ExitStatusExt;
use std::thread;
use std::time::Duration;
fn run_deadlock_repro() {
let decorator = slog_term::TermDecorator::new().build();
let drain = slog_term::FullFormat::new(decorator).build().fuse();
let drain = slog_async::Async::new(drain).chan_size(0x2000).build().fuse();
let log = Logger::root(drain, o!());
let underlay_ip = "fd00::1".parse().unwrap();
let mac = "a8:25:40:01:01:01".parse().unwrap();
let manager = PortManager::new(log, underlay_ip, mac);
let ctx = MockDladm::get_vnics_context();
ctx.expect().return_once(|| Ok(vec![]));
let ctx = MockDladm::create_vnic_context();
ctx.expect().return_once(|_: &PhysicalLink, _, _, _| Ok(()));
let ctx = MockDladm::set_linkprop_context();
ctx.expect().return_once(|_, _, _| Ok(()));
let ctx = crate::illumos::execute_context();
ctx.expect().times(..).returning(|_| Ok(std::process::Output {
status: std::process::ExitStatus::from_raw(0),
stdout: vec![],
stderr: vec![],
}));
let id = Uuid::new_v4();
let nic = NetworkInterface {
ip: "172.30.0.5".parse().unwrap(),
mac: "a8:25:40:01:01:02".parse().unwrap(),
name: "net0".parse().unwrap(),
primary: true,
slot: 0,
subnet: "172.30.0.0/22".parse().unwrap(),
vni: omicron_common::api::external::Vni::random(),
};
let port = manager.create_port(
id,
&nic,
Some(SourceNatConfig {
ip: "10.0.0.1".parse().unwrap(),
first_port: 0,
last_port: 1 << 14,
}),
None,
).unwrap();
let ticket = port.inner.ticket;
// Drop the port before the ticket, which causes a deadlock in the
// original implementation.
//
// Dropping the Port first means that we, drop the Arc<PortInner>, which
// just decrements the refcount. But we still have
//
// - A reference to the PortManagerInner, in the PortTicket
// - A reference to the Port, PortInner, and PortTicket in the
// PortManager.
//
// The critical thing here is that the PortTicket in the manager is
// _different_ from the port ticket we're holding here in `ticket`.
drop(port);
// Now drop the ticket itself. This does the following:
//
// - Call `ticket.release()`
// - Take out of the `ticket.manager` option, and acquire the lock on
// the `PortManager::ports` field.
// - Call `ports.remove()` to pull that `Port` out of the `ports` map,
// which immediately drops it.
// - That drops the Port, PortInner, and a _different_ PortTicket.
// - That calls `release` on that contained ticket, which tries to
// acquire the lock we've taken in step 2.
drop(ticket);
}
#[test]
fn test_port_ticket_drop_does_not_deadlock() {
let runner = thread::spawn(run_deadlock_repro);
thread::sleep(Duration::from_secs(5));
assert!(
runner.is_finished(),
"Apparent deadlock while dropping `PortTicket`"
);
}
}The main issue is described in the comments, but basically we try to acquire the lock in the The fix here is to...not do that. Specifically, we don't need the manager to maintain the ports at all. It's required as part of the external IP address workaround in OPTE that we maintain the MAC address, but not the port itself. Instead of a bunch of smart pointers and buggy ownership, I opted to maintain just a set of MAC addresses in the All of that is much better than what we had, but will also go away entirely when the OPTE external IP address hack is removed. At that point, there is no shared state at all between the port and manager, since the manager just needs to create ports and move them into the zone. No callbacks, no tickets, no smart pointers. That'll be nice. |
d5724e2 to
316aad8
Compare
|
For completeness, I've verified that I can create three instances with external IP addresses; SSH into each of them; and ping 1.1.1.1 from them. Also, stopping and restarting the instances in a bunch of ways shows that the guest VNICs and OPTE ports are all cleaned up when the zone goes away (since it owns the |
316aad8 to
ec2474d
Compare
So, in #1636 I was going to use the |
It seems that the sole owner of the @bnaecker , was this change intentional? It seems like this would require firewall-modifying APIs to act on specific instances, which would probably necessitate a lookup in Nexus. |
| vnic, | ||
| ); | ||
| self.add_secondary_mac(mac)?; | ||
| /* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this dead code be deleted?
|
Yes, this was intentional. No, it was not the right change. I didn't think about needing access to port details in the context of firewall rules, or really anything else. I'll revert the relevant parts of this. |
ec2474d to
1df1dab
Compare
|
@smklein @plotnick I've reworked this. It's now much smaller, and much more similar to the original implementation. One of the main annoyances of the previous implementation was that the Anyway, let me know if y'all have questions. @plotnick You should be able to use the ports as you had planned in the firewall rule work now. |
1df1dab to
a00350c
Compare
- Make the PortTicket non-clonable, and _not_ owned by the Port itself. This causes deadlocks. - Dropping the port from the zone cleans up the resources, and the ports are removed from the manager (via the singleton tickets) in the `Instance::stop` method - Make sure to try to clean up all ports for an instance, even if early ones fail
a00350c to
1618788
Compare
smklein
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - I think this brings the implementation more in-line with the instance usage. Thanks for patching!
- Dropping the ticket removes membership in the port manager
- Tickets are not cloneable
So tickets represent exclusive membership in the port manager.
PortInnerandPortTicketobjects, makingPortthe sole owner of information about the OPTE port.Ports that thePortManagerhad previously, obviating all the double-ownership possibilities. The port was only stored here so that secondary MAC addresses can be updated as ports are added / removed. That's entirely part of the external IP address hack in OPTE. Now, the manager stores only the MAC itself. ThePorthas a reference to its manager, and removes its MAC from the managers list when the port is dropped. But there is only a single owner and copy of thePortitself, owned by the zone and dropped when that is torn down.