nexus crate shattering: extract nexus-authn #1457

jgallagher · 2022-07-18T20:20:28Z

This is a first step toward #1154. I reran the same timing tests from #1441 and saw negligible differences, which I don't think is surprising given authn shouldn't be a big compile time hog. However, it is a stepping stone toward extracting some or all of nexus/db/*, which is. I'm not 100% sold that breaking nexus up is going to be an overall win; it should help compile times, but there may be some requirements in breaking it up (to break circular dependencies) that are annoying. If this PR is distasteful or you have concerns about the direction it's moving, please bring them up!

A bikeshed question if we want to move forward with the shattering is where the shattered crates should live. I'm inclined to say they should live under nexus/ to make it clear they're conceptually part of nexus (as opposed to top-level nexus, nexus-authn, nexus-db, etc.). I left this one under nexus/src/authn/, but maybe that's misleading in other ways since it's no longer a module? I could see arguments for nexus/authn, nexus/nexus-authn, nexus/SOMETHING/authn, etc., but have no strong arguments for any of them.

The changes other than renaming files/imports in this PR are:

The raw UUIDs for built in and test users/silos now live under authn::fixed_data, but the DB model lazy_statics constructed with them still live under db::fixed_data.
Some method / trait implementations moved from authn back to nexus, to avoid authn trying to (circularly) depend on nexus itself for things like OpContext. In a few cases it seemed natural for the methods to still exist on types defined in authn, and in those cases I used "extension traits" defined in nexus to avoid running afoul of orphan rules. I think the full list of things that moved is:
- SiloUserSilo, SessionStore, and TokenContext now have generic implementations over Arc<T> in authn so that nexus can implement them for ServerContext instead of Arc<ServerContext>.
- authn::Context::{silo_required(), silo_or_builtin()} are now provided by the new AuthnContextExt trait.
- ConsoleSessionWithSiloId is now defined in db::model:: next to ConsoleSession.
- authn::saga::Serialized::for_opctx() is now ::for_ctx() and takes an authn::Context instead of a nexus::OpContext.
- impl TryFrom<model::SamlIdentityProvider> for authn::SamlIdentityProvider now lives in db::model instead of authn.
- IdentityProviderType::lookup() is now provided by the new IdentityProviderLookup trait.
- Actor::actor_type() is removed in favor of impl From<&Actor> for IdentityType.

smklein

This PR LGTM - I think the trait-based approach you took makes sense, and lets Nexus provide extensions to the authn crate without imposing circular dependencies.

I'm holding off on actually giving the formal approval because I'd also like eyes from either @davepacheco or @jmpesp , who have poked at the authn code a lot more than me.

Cargo.toml

smklein · 2022-07-19T13:25:25Z

nexus/src/authn/Cargo.toml

@@ -0,0 +1,34 @@
+[package]


(no action) great to see that this does not depend on Neuxs

smklein · 2022-07-19T13:36:30Z

nexus/src/db/model/console_session.rs

+/// A console session with the silo id of the authenticated user
+#[derive(Clone, Debug)]
+pub struct ConsoleSessionWithSiloId {
+    pub console_session: ConsoleSession,
+    pub silo_id: Uuid,
+}


I think it's fine for this to be defined in Nexus - maybe even here - but I want to point out that it's a little quirky to have this struct contained within db::model, when it's not actually used as a database model.

Agreed, but I wasn't sure where a logical place to put it was. It's used in 3 places, none of which really seem ideal:

nexus/src/app/session.rs: ) -> LookupResult<db::model::ConsoleSessionWithSiloId> { nexus/src/app/session.rs: Ok(db::model::ConsoleSessionWithSiloId { nexus/src/app/session.rs: ) -> UpdateResult<db::model::ConsoleSessionWithSiloId> { nexus/src/context.rs:use crate::db::model::ConsoleSessionWithSiloId; nexus/src/context.rs: type SessionModel = ConsoleSessionWithSiloId; nexus/src/context.rs:impl Session for ConsoleSessionWithSiloId { nexus/src/db/datastore/console_session.rs:use crate::db::model::ConsoleSessionWithSiloId; nexus/src/db/datastore/console_session.rs: ) -> UpdateResult<ConsoleSessionWithSiloId> { nexus/src/db/datastore/console_session.rs: Ok(ConsoleSessionWithSiloId {

Do any of those seem better than under db::model::console_session to you?

eh... I guess this current location is fine. It is being returned from the datastore functions, so it probably still makes sense to live in db somewhere, and db::model::console_session is probably equivalent to having it exist in db::datastore::console_session - that's the only other place I'd consider placing it, since that's where the struct is created.

I agree this feels like the wrong place, and the lack of another place to put it makes me wonder if this is the right direction.

We discussed this when ConsoleSessionWithId came in. I don't think it's as exceptional as it looks -- I think it's common that there's a low level db::model struct that represents what's in the database, and a higher-level (application-level?) structure that we expect is more widely used in Nexus. At the time I said I think it makes sense to put such abstractions in the subsystems they're associated with (in this case, authn). With the current structure, I'm not sure where to put it. I'd suggested earlier that maybe db::model should depend on authn, but then it still wouldn't be possible to put this into authn I think.

davepacheco

This is a first step toward #1154. I reran the same timing tests from #1441 and saw negligible differences, which I don't think is surprising given authn shouldn't be a big compile time hog. However, it is a stepping stone toward extracting some or all of nexus/db/*, which is. I'm not 100% sold that breaking nexus up is going to be an overall win; it should help compile times, but there may be some requirements in breaking it up (to break circular dependencies) that are annoying. If this PR is distasteful or you have concerns about the direction it's moving, please bring them up!

First, thank you for looking into this. Incremental builds are a pain point and I know it takes a lot of effort to dig into them. I appreciate the effort you spent here both in the change itself and also the summary in the PR -- that helped make sense of it.

As I think you figured, yeah, I'm not sure about this particular direction. I've raised some specific issues inline. Maybe they're resolvable? But my sense is there are a lot of downsides here.

At the top level, I don't feel like I understand the underlying principles of organization here. My mental model has been:

where possible, have self-contained, isolated subsystems (currently modules, could be crates)
have low-level modules like db::model that to the extent possible only know about themselves
build higher-level (application-level) subsystems atop these. I think of authn (and authz) as atop db::model in that authn and authz necessarily uses the database structures provided by db::model

(some previous discussions: here, here, which I mention only for context, not a plan of record or anything)

I agree with what I think you and @smklein are getting at that some of these module boundaries aren't as clean as they could be, and that if we clean them up we may be able to split some out into separate crates, and that may improve incremental build time as well as code quality. What's missing for me is a clearer picture of the DAG and why -- why should db::model depend on authn for example? How will this generalize to other subsystems?

@jgallagher wrote:

* Some method / trait implementations moved from `authn` back to `nexus`, to avoid `authn` trying to (circularly) depend on `nexus` itself for things like `OpContext`.

This worries me. The whole idea of OpContext is to provide a single place where we can hang functionality that necessarily permeates the program: authz checks, logging, tracing, etc. That functionality also depends on context that the subsystem doesn't know about. e.g., I think it's really valuable to be able to emit log entries from authn that include the current HTTP request URL even though authn doesn't know anything about that.

In a few cases it seemed natural for the methods to still exist on types defined in authn, and in those cases I used "extension traits" defined in nexus to avoid running afoul of orphan rules. I think the full list of things that moved is:

  * `SiloUserSilo`, `SessionStore`, and `TokenContext` now have generic implementations over `Arc<T>` in `authn` so that nexus can implement them for `ServerContext` instead of `Arc<ServerContext>`.
  * `authn::Context::{silo_required(), silo_or_builtin()}` are now provided by the new `AuthnContextExt` trait.
  * `ConsoleSessionWithSiloId` is now defined in `db::model::` next to `ConsoleSession`.
  * `authn::saga::Serialized::for_opctx()` is now `::for_ctx()` and takes an `authn::Context` instead of a `nexus::OpContext`.
  * `impl TryFrom<model::SamlIdentityProvider> for authn::SamlIdentityProvider` now lives in `db::model` instead of `authn`.
  * `IdentityProviderType::lookup()` is now provided by the new `IdentityProviderLookup` trait.
  * `Actor::actor_type()` is removed in favor of `impl From<&Actor> for IdentityType`.

I commented on a bunch of these inline but I think the summary is:

It's not clear to me when you need these traits and what goes in them (since it's not all functionality that goes into them). This could be addressed with documentation maybe?
It seems like a bunch of these moved authn-private implementation details outside the authn module. This feels like a pretty big downside?
More minor and maybe just personal preference: I find it annoying to have the cognitive overhead and the time overhead associated with a trait when there's only one impl and we're not using it for polymorphism or abstraction. Like, we'd be fine with the struct, but we have to use a trait. Admittedly the Ext trait pattern is common. These examples feel different from the way I've usually seen that used, but I can't put my finger on why.

Anyway, I feel like that's all pretty negative and I'm sorry about that. It's possible I'm missing some small thing that would make it all click for me.

davepacheco · 2022-07-19T17:36:46Z

nexus/authn/src/external/mod.rs

 }

+#[async_trait]
+impl<T> SiloUserSilo for std::sync::Arc<T>


Sorry for the basic question, but why is this needed?

If I have x: T where T: SiloUserSilo, I can call x.silo_user_silo(). If I have x: Arc<T>, can't I already call x.silo_user_silo() because of Arc's Deref behavior?

I see there are several others of these (e.g, in session_cookie.rs) and I'm wondering under what conditions you need an impl like this.

davepacheco · 2022-07-19T17:46:44Z

nexus/authn/src/fixed_data.rs

@@ -0,0 +1,101 @@
+// This Source Code Form is subject to the terms of the Mozilla Public


Hmm. My first thought is that it's weird to have database fixed data outside the db::model module. The module-level comment here still refers to the database and that's really what this was initially intended for.

Maybe we're saying that there are really two purposes here: one is "fixed data that gets put into the database at startup" (which remains in db::fixed_data) and "well-known identifiers for specific resources [that happen to correspond with things that get inserted into the database]". Maybe it's okay that the well-known values are here and the database structures are in fixed_data? On the other hand, maybe db::model should be agnostic to any specific data (just provides model types and associated functions)? That would invert the dependency here: authn would depend on the model but not the other way around.

This gets at a deeper question for me around what principles drive the organization here -- I'll mention that elsewhere though.

davepacheco · 2022-07-19T17:50:02Z

nexus/authn/src/saga.rs

-    pub fn for_opctx(opctx: &OpContext) -> Serialized {
-        Serialized { kind: opctx.authn.kind.clone() }
+    pub fn for_ctx(ctx: &Context) -> Serialized {
+        Serialized { kind: ctx.kind.clone() }


This feels significant, although I guess it doesn't actually change the serialized data. Is there any reason we might want other elements from the OpContext here in the future?

davepacheco · 2022-07-19T17:55:32Z

nexus/src/authz/api_resources.rs

    type AllowedRoles = SiloRole;
 }

+pub trait AuthnContextExt {


I'm struggling with this use of traits...

What are the conditions where we need traits like these? How do we decide what goes in them? A bunch of other public stuff in authn::Context didn't wind up in this trait, for example.

We're also moving authn-specific functionality outside of the authn module, in that the implementations of silo_required() and silo_or_builtin() are now here instead of in authn. But they're authn concerns operating on authn module data structures. Why would they be here?

This one that really bothered me, but the short answer is that despite these methods previously living in authn, Silo is defined in this module as an authz_resource!, so these aren't (at least in terms of the namespacing of the types) operating on authn module data structures.

The conditions where I used a trait like this are: A type in a dependent crate (authn::Context in this case) has one or more methods that have a dependency (either directly by the types they take/return, Silo in this case, or indirectly by their implementation details) on a crate they cannot depend on. An extension trait allows moving the methods out of the dependent crate while still leaving them callable on that type (albeit now requiring call sites to import the extension trait). It feels like a bandaid though, and might be a sign that we should rethink the organization more carefully.

davepacheco · 2022-07-19T18:07:23Z

nexus/src/db/fixed_data/mod.rs

 //! Fixed (hardcoded) data that gets inserted into the database programmatically
 //! either when the rack is set up or when Nexus starts up.

-// Here's a proposed convention for choosing uuids that we hardcode into


I get why you moved this...but now it doesn't feel like there's a great place to put this. (It's unfortunate to lose it here because we have uuids here.) Before, this spot was the top-level of all the fixed data and so all the fixed uuids. Now those are potentially all over the place.

davepacheco · 2022-07-19T18:33:18Z

nexus/src/db/model/console_session.rs

+/// A console session with the silo id of the authenticated user
+#[derive(Clone, Debug)]
+pub struct ConsoleSessionWithSiloId {
+    pub console_session: ConsoleSession,
+    pub silo_id: Uuid,
+}


I agree this feels like the wrong place, and the lack of another place to put it makes me wonder if this is the right direction.

We discussed this when ConsoleSessionWithId came in. I don't think it's as exceptional as it looks -- I think it's common that there's a low level db::model struct that represents what's in the database, and a higher-level (application-level?) structure that we expect is more widely used in Nexus. At the time I said I think it makes sense to put such abstractions in the subsystems they're associated with (in this case, authn). With the current structure, I'm not sure where to put it. I'd suggested earlier that maybe db::model should depend on authn, but then it still wouldn't be possible to put this into authn I think.

davepacheco · 2022-07-19T18:34:37Z

nexus/src/db/model/role_assignment.rs


+impl From<&'_ Actor> for IdentityType {
+    fn from(actor: &'_ Actor) -> Self {
+        match actor {


It feels like this is an implementation detail of Actor and could even use private details so it seems problematic to put it here.

jgallagher · 2022-07-19T19:36:52Z

Anyway, I feel like that's all pretty negative and I'm sorry about that. It's possible I'm missing some small thing that would make it all click for me.

No apology necessary - this is excellent feedback, and important to get it in and resolved before I keep pushing down a road that may not make sense. I had some vague uneasy feelings about the direction I was going, hence my solicitation in the PR description. I am (both in this PR and in looking at what comes next) struggling with a lot of the cross-dependencies currently present in nexus, and I wonder if the concerns you've outlined here are applicable (to some degree) to that property of the existing codebase?

I'll respond to some of the individual comments below, but it might make sense to sync offline on this too.

jgallagher · 2022-07-19T20:24:18Z

@davepacheco and I chatted extensively about this offline. We came to the conclusion that splitting out authn as the first leaf crate that other crates could then depend on is probably not right, even if it was mechanically relatively easy. I'm going to close this PR and take a shot at extracting nexus-types crate (pulled mostly from nexus::external_api::params and related), follow up by extracting nexus-db-model, and see if that lands in a better place.

nexus crate shattering: extract nexus-authn

ee42417

jgallagher requested review from davepacheco and smklein July 18, 2022 20:20

smklein reviewed Jul 19, 2022

View reviewed changes

smklein requested review from jmpesp and smklein July 19, 2022 13:45

Move nested crates from nexus/src/ to nexus/

e309665

davepacheco reviewed Jul 19, 2022

View reviewed changes

jgallagher closed this Jul 19, 2022

jgallagher mentioned this pull request Jul 19, 2022

Move crates nested under nexus out of src/ #1462

Merged

		@@ -0,0 +1,101 @@
		// This Source Code Form is subject to the terms of the Mozilla Public

nexus crate shattering: extract nexus-authn #1457

nexus crate shattering: extract nexus-authn #1457

Uh oh!

Conversation

jgallagher commented Jul 18, 2022

Uh oh!

smklein left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davepacheco left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jgallagher commented Jul 19, 2022

Uh oh!

jgallagher commented Jul 19, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants