Skip to content

Conversation

@bnaecker
Copy link
Collaborator

@bnaecker bnaecker commented Sep 8, 2022

  • Remove the PortInner and PortTicket objects, making Port the sole owner of information about the OPTE port.
  • Remove the copy of Ports that the PortManager had previously, obviating all the double-ownership possibilities. The port was only stored here so that secondary MAC addresses can be updated as ports are added / removed. That's entirely part of the external IP address hack in OPTE. Now, the manager stores only the MAC itself. The Port has a reference to its manager, and removes its MAC from the managers list when the port is dropped. But there is only a single owner and copy of the Port itself, owned by the zone and dropped when that is torn down.

@bnaecker bnaecker requested a review from smklein September 8, 2022 23:38
@bnaecker
Copy link
Collaborator Author

bnaecker commented Sep 8, 2022

I was able to reproduce the original deadlock with the following test. To repro, place the following code in sled-agent/src/opte/illumos/port_manager.rs on the current main (d1fc7da). This will panic, which indicates that there's a deadlock when trying to release the port.

#[cfg(test)]
mod test {
    use super::PortManager;
    use super::Logger;
    use slog::o;
    use super::Uuid;
    use super::NetworkInterface;
    use super::SourceNatConfig;
    use slog::Drain;
    use crate::illumos::dladm::MockDladm;
    use crate::illumos::dladm::PhysicalLink;
    use std::os::unix::process::ExitStatusExt;
    use std::thread;
    use std::time::Duration;

    fn run_deadlock_repro() {
        let decorator = slog_term::TermDecorator::new().build();
        let drain = slog_term::FullFormat::new(decorator).build().fuse();
        let drain = slog_async::Async::new(drain).chan_size(0x2000).build().fuse();
        let log = Logger::root(drain, o!());
        let underlay_ip = "fd00::1".parse().unwrap();
        let mac = "a8:25:40:01:01:01".parse().unwrap();
        let manager = PortManager::new(log, underlay_ip, mac);

        let ctx = MockDladm::get_vnics_context();
        ctx.expect().return_once(|| Ok(vec![]));

        let ctx = MockDladm::create_vnic_context();
        ctx.expect().return_once(|_: &PhysicalLink, _, _, _| Ok(()));

        let ctx = MockDladm::set_linkprop_context();
        ctx.expect().return_once(|_, _, _| Ok(()));

        let ctx = crate::illumos::execute_context();
        ctx.expect().times(..).returning(|_| Ok(std::process::Output {
            status: std::process::ExitStatus::from_raw(0),
            stdout: vec![],
            stderr: vec![],
        }));

        let id = Uuid::new_v4();
        let nic = NetworkInterface {
            ip: "172.30.0.5".parse().unwrap(),
            mac: "a8:25:40:01:01:02".parse().unwrap(),
            name: "net0".parse().unwrap(),
            primary: true,
            slot: 0,
            subnet: "172.30.0.0/22".parse().unwrap(),
            vni: omicron_common::api::external::Vni::random(),
        };
        let port = manager.create_port(
            id,
            &nic,
            Some(SourceNatConfig {
                ip: "10.0.0.1".parse().unwrap(),
                first_port: 0,
                last_port: 1 << 14,
            }),
            None,
        ).unwrap();
        let ticket = port.inner.ticket;

        // Drop the port before the ticket, which causes a deadlock in the
        // original implementation.
        //
        // Dropping the Port first means that we, drop the Arc<PortInner>, which
        // just decrements the refcount. But we still have
        //
        // - A reference to the PortManagerInner, in the PortTicket
        // - A reference to the Port, PortInner, and PortTicket in the
        // PortManager.
        //
        // The critical thing here is that the PortTicket in the manager is
        // _different_ from the port ticket we're holding here in `ticket`.
        drop(port);

        // Now drop the ticket itself. This does the following:
        //
        // - Call `ticket.release()`
        // - Take out of the `ticket.manager` option, and acquire the lock on
        // the `PortManager::ports` field.
        // - Call `ports.remove()` to pull that `Port` out of the `ports` map,
        // which immediately drops it.
        // - That drops the Port, PortInner, and a _different_ PortTicket.
        // - That calls `release` on that contained ticket, which tries to
        // acquire the lock we've taken in step 2.
        drop(ticket);
    }

    #[test]
    fn test_port_ticket_drop_does_not_deadlock() {
        let runner = thread::spawn(run_deadlock_repro);
        thread::sleep(Duration::from_secs(5));
        assert!(
            runner.is_finished(),
            "Apparent deadlock while dropping `PortTicket`"
        );
    }
}

The main issue is described in the comments, but basically we try to acquire the lock in the PortManagerInner twice: once when calling PortTicket.release() from PortTicket::drop, and once when that calls Port::drop after removing the Port from the map maintained by the manager.

The fix here is to...not do that. Specifically, we don't need the manager to maintain the ports at all. It's required as part of the external IP address workaround in OPTE that we maintain the MAC address, but not the port itself. Instead of a bunch of smart pointers and buggy ownership, I opted to maintain just a set of MAC addresses in the PortManagerInner. The Port has no ticket and no "inner" version -- just a copy of the PortManager, which points to a single PortManagerInner and thus a single set of MAC addresses. When the Port is dropped (as it is when the running zone object goes out of scope), that MAC is removed, and the manager updates the secondary-macs property of the "underlay" interface net0.

All of that is much better than what we had, but will also go away entirely when the OPTE external IP address hack is removed. At that point, there is no shared state at all between the port and manager, since the manager just needs to create ports and move them into the zone. No callbacks, no tickets, no smart pointers. That'll be nice.

@bnaecker bnaecker force-pushed the fix-port-ticket-ownership branch from d5724e2 to 316aad8 Compare September 8, 2022 23:50
@bnaecker
Copy link
Collaborator Author

bnaecker commented Sep 8, 2022

For completeness, I've verified that I can create three instances with external IP addresses; SSH into each of them; and ping 1.1.1.1 from them. Also, stopping and restarting the instances in a bunch of ways shows that the guest VNICs and OPTE ports are all cleaned up when the zone goes away (since it owns the Ports now), and that net0's secondary-macs property accurately reflects the extant OPTE ports.

@bnaecker bnaecker force-pushed the fix-port-ticket-ownership branch from 316aad8 to ec2474d Compare September 9, 2022 00:04
@plotnick
Copy link
Contributor

plotnick commented Sep 9, 2022

Remove the copy of Ports that the PortManager had previously, obviating all the double-ownership possibilities. The port was only stored here so that secondary MAC addresses can be updated as ports are added / removed.

So, in #1636 I was going to use the Ports managed by PortManager to determine the port names for updating OPTE firewall rules by matching MAC address & VNI from the rule target (see PortManager::firewall_rules_ensure). What would be your recommendation for how to access them (or the equivalent information) after this change?

@smklein
Copy link
Collaborator

smklein commented Sep 9, 2022

Remove the copy of Ports that the PortManager had previously, obviating all the double-ownership possibilities. The port was only stored here so that secondary MAC addresses can be updated as ports are added / removed.

So, in #1636 I was going to use the Ports managed by PortManager to determine the port names for updating OPTE firewall rules by matching MAC address & VNI from the rule target (see PortManager::firewall_rules_ensure). What would be your recommendation for how to access them (or the equivalent information) after this change?

It seems that the sole owner of the Port objects after this PR would be the InstalledZone object. Although previously you'd be able to act on all ports managed by the sled through the PortManager, it seems like now you'll need to know the instances to which they belong before you can operate on them (to go from "instance UUID" to Instance object , and from Instance object to RunningZone, which contains the InstalledZone).

@bnaecker , was this change intentional? It seems like this would require firewall-modifying APIs to act on specific instances, which would probably necessitate a lookup in Nexus.

vnic,
);
self.add_secondary_mac(mac)?;
/*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this dead code be deleted?

@bnaecker
Copy link
Collaborator Author

bnaecker commented Sep 9, 2022

Yes, this was intentional. No, it was not the right change. I didn't think about needing access to port details in the context of firewall rules, or really anything else. I'll revert the relevant parts of this.

@bnaecker bnaecker force-pushed the fix-port-ticket-ownership branch from ec2474d to 1df1dab Compare September 9, 2022 19:41
@bnaecker
Copy link
Collaborator Author

bnaecker commented Sep 9, 2022

@smklein @plotnick I've reworked this. It's now much smaller, and much more similar to the original implementation. One of the main annoyances of the previous implementation was that the PortInner object actually contained a PortTicket. That only works if the ticket is Clone, which is part of where we get into trouble. If you make it a singleton, it also can't be owned by the PortInner, without deadlocks.

Anyway, let me know if y'all have questions. @plotnick You should be able to use the ports as you had planned in the firewall rule work now.

@bnaecker bnaecker force-pushed the fix-port-ticket-ownership branch from 1df1dab to a00350c Compare September 9, 2022 21:33
- Make the PortTicket non-clonable, and _not_ owned by the Port itself.
  This causes deadlocks.
- Dropping the port from the zone cleans up the resources, and the ports
  are removed from the manager (via the singleton tickets) in the
  `Instance::stop` method
- Make sure to try to clean up all ports for an instance, even if early
  ones fail
@bnaecker bnaecker force-pushed the fix-port-ticket-ownership branch from a00350c to 1618788 Compare September 9, 2022 21:42
@bnaecker bnaecker requested a review from smklein September 9, 2022 21:43
Copy link
Collaborator

@smklein smklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I think this brings the implementation more in-line with the instance usage. Thanks for patching!

  • Dropping the ticket removes membership in the port manager
  • Tickets are not cloneable

So tickets represent exclusive membership in the port manager.

@bnaecker bnaecker merged commit ae7ee81 into main Sep 9, 2022
@bnaecker bnaecker deleted the fix-port-ticket-ownership branch September 9, 2022 22:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants