Discussion of alternative crates is not publicly available #56

mikeyhew · 2020-10-07T21:58:35Z

Hey Oxiders!

I heard about this project from your post on twitter. Since we're creating a public API at my work, I was interested to know what you thought of the alternative crates in the ecosystem, and why you decided to roll your own. I was excited to see that in the documentation for this crate, you wrote:

For a discussion of alternative crates considered, see Oxide RFD 10.

Unfortunately it looks like that's an internal discussion that I can't see. I was wondering if you could share any of the specifics about what other crates you looked at and what you thought about them?

davepacheco · 2020-10-08T17:17:58Z

Hi @mikeyhew, sorry for the link to the non-public doc. (We weren't really sure whether anybody else would be interested in dropshot!) I'll summarize the key points here. I really don't want to knock the other crates out there -- many of them look quite good. Also, keep in mind that we did this research back in January, and some of this is definitely out of date.

Our primary goal was to find or build a foundational component for serving APIs over HTTP in the control plane that we're building. This is a little unusual for a few reasons. We have a small team of generalists working on the whole product. We don't expect to have a dedicated API team. One person might do the work of implementing SSD firmware upgrade in the OS, in the control plane, and in the public API. So we wanted the programming interface for this crate to be easy to use and hard to misuse, with as much checked at compile-time as possible. Also, the components we're building will be running on-prem in somebody else's datacenter. We won't have an SRE team familiar with the components to debug them. (We'll have support, of course, but that's a different kind of engagement.) This is another reason we want stuff to fail at compile-time more than runtime, and it also means we're prioritizing good debuggability and predictable behavior.

As we mentioned in the talk, another important goal for us was to build something with strong OpenAPI support, and particularly where the code could be the source of truth and a spec could be generated from the code that thus could not diverge from the implementation. We weren't sure that would be possible, but it seemed plausible and worthwhile if we could do it. None of the crates we found had anything like this. If I recall correctly, there was once some code (middleware, maybe?) for generating OpenAPI from Rocket, but it wasn't as integrated as we wanted and I believe the code fell out of date.

I spent a while developing a prototype atop actix-web, and Dropshot's interface is definitely inspired by it. But I found several things frustrating about working with actix-web. These may well have been my own issues. I was very new to Rust at the time.

Here's one small example. I accidentally wrote:

    config.service(actix_web::web::resource("/instances")
        .route(actix_web::web::get().to(demo_api_instances_get)));

The correct code would have been:

    config.service(actix_web::web::resource("/instances")
        .route(actix_web::web::get()).to(demo_api_instances_get));

(The difference is where the close parenthesis around the call to get() goes.) I believe the first chunk creates a route with no handler, which results in a default 404 handler. Then it separately attaches a second handler that calls demo_api_instances_get for all methods. The net result is that GET /instances reports a 404, though something like PUT /instances actually does invoke that handler. This is kind of a silly example, but it was one of several cases where I felt: we chose Rust for this component so that we could leverage the type system to validate this sort of thing at compile time. Why use an interface that's so easy to mis-use in a way that only fails at runtime?

I ran into a similar problem trying to provide shared, mutable state for request handlers. The Actix Guide
documents how to provide this shared state to request handlers. The example there invokes App::app_data(). The documentation for App::app_data() indicates that it can be used with the Data<T> extractor, which is what you see in the Actix Guide example. And the Data<T> struct is documented as "application data" that "could be added during application configuration process with App::data() method". That's a different
method from App::app_data(). But the example under the Data class uses app_data(). I couldn't figure out which one to use. The App::data() docs say

Set application data. Application data could be accessed by using Data extractor where T is data type...Note: http server accepts an application factory rather than an application instance. Http server
constructs an application instance for each thread, thus application data must be constructed multiple times. If you want to share data between different threads, a shared object should be used, e.g. Arc. Internally Data type uses Arc so data could be created outside of app factory and clones could be stored via App::app_data() method.

Again, I found (disappointingly) that this code appears to work -- as long as you're at low levels of concurrency:

    let app_state = api_server::setup_server_state();
    let server = HttpServer::new(move || {
        App::new()
            .app_data(app_state.clone())
	    ...

but that's not correct -- we need to be wrapping app_state in a web::Data pointer so that it only clones the smart pointer, not the whole application state. But when I tried:

let app_state = web::Data::new(api_server::setup_server_state());

then I found Actix emitted this error to the HTTP client:

App data is not configured, to configure use App::data()

I gave up before resolving this.

Relatedly, Actix-Web's pattern of emitting runtime errors to the HTTP client was a bit off-putting -- I know that's convenient for developers, but it's not the experience we want for our customers when we make a mistake.

There were a few other API quirks like not being able to name the type of an HttpServer, so I couldn't, say, return it from a function.

Overall, actix-web seemed to work fine, but I kept feeling like it was easy to use wrong in a way that would only fail at runtime, and that was likely to bite other team members too as they came on board. This was also around the time of uncertainty around Actix-Web's future. I've heard the situation is much healthier now.

I spent some time looking at Rocket, too. Rocket does have a lot of nice features. We weren't jazzed about the fact that it depended on nightly Rust. They've since fixed that (not yet released). I was surprised that the logging system was not easy to customize and little progress had been made there. (In the past I've found structured HTTP server logs pretty critical for post hoc debugging.) It was a similar story with configuration. Dropshot adopted an approach similar to what's suggested in that issue, where the application defines its own config, of which Dropshot's is just one part, and you can cons up a Dropshot config from your config file format via serde.

While thinking about fault tolerance, I looked into how to configure TCP keep-alive, as well as HTTP keep-alive for performance. I was pretty confused by the docs -- I couldn't tell if this was HTTP or TCP keep-alive, and it didn't seem to work (see the related issues, too). At the time it struck me that this wasn't heavily tested around partitions. I found this comment that suggests that Rocket is designed to sit behind some reverse proxy (i.e., that it's not designed to be public-facing itself), which really wasn't our disposition for this component.

I also looked briefly at Tower, Iron, Warp, Nickel, and Gotham. At the time, most of these seemed light on docs or not actively developed. Tower and Gotham seemed potentially promising.

None of this is to disparage any of these crates. We were inspired by their work! We just came to the conclusion that if we built atop hyper instead of these higher-level frameworks, there wasn't that much we'd have to build in order to get what we wanted and we'd have good control over details like configuration, logging, keep-alive, hardening, and anything else we needed. Again, we knew this was going to be a pretty core component for us and we were willing to invest a bit more than one might expect.

mikeyhew · 2020-11-10T03:54:18Z

@davepacheco thanks for getting back to me about this! Having learned a bit more about OpenAPI since I opened this issue, I now realize that it describes your entire API, right down to the HTTP headers, and because it goes down to that level, it makes sense to implement this library on top of hyper instead of a higher-level framework.

What I was really asking about were the other crates that use some form of code generation to help you implement an OpenAPI server. The only other two I've found are paperclip and swagger. They take a different approach from Dropshot: whereas Dropshot generates an OpenAPI spec from your Rust code, they generate Rust code from an OpenAPI spec.

paperclip is a work-in-progress, it only supports OpenAPI v2 right now, and it looks like the server it generates is based on actix-web? I haven't used it, but it looks interesting. It seems a bit too early stage at this point, though.
The swagger crate is actually just a runtime library for a server that you generate using the OpenAPI generator tools, located at https://github.com/OpenAPITools/openapi-generator. It looks like they recently moved to that from the swagger-codgen tools. The code for the code generator is in a big Java repo with generators for lots of other languages, and uses a set of template files to generate the code. It seems to generate the pet-store example OK, but when I tried to generate my own hello-world example, something in the Java code was null and it ended up spitting out code that didn't work.

Anyway, this crate seems pretty cool! I like how you took advantage of the existing crates in the ecosystem, notably schemars for generating JSON schemas, and serde.

davepacheco · 2020-11-10T18:42:24Z

Sorry for misunderstanding your question! Yes, we decided early on in favor of generating the spec from the code, and that's largely what the talk you linked to is about. With a code generator, it's not clear to me how the server can evolve when the spec needs to change, since regenerating the code from the new spec will clobber anything you've filled in. @ahl has more experience with different approaches to OpenAPI and might have more thoughts here.

mikeyhew · 2020-11-14T05:14:12Z

No worries, the question wasn't very clear. Plus, it may have evolved a bit after I asked it 😄

The way the Java-based generator works, or is supposed to work, is it generates a crate with an Api trait, with an async method for each endpoint in your schema; and a Server struct. You create your own type in another crate, implement Api for it, and create the server with it. At least, that's what I could glean from looking at the generated code for the pet store example.

Perhaps the advantage of that approach is it (in theory, if there weren't any bugs) lets you quickly generate an API server in Rust from an existing schema. Since it generates from the schema, it also means (again, in theory) that you have complete control over the schema, assuming it can handle any schema that you throw at it.

Since we don't have an existing API schema, the first point is moot for us. For the second point, I'm hoping this crate will evolve over time to give a lot of control over the generated schema, if needed. I may be able to help with that, depending on if it becomes a business need for us and we can justify the time spent.

davepacheco · 2020-11-16T17:10:25Z

Makes sense. Thanks for that context! Are there specific areas where you have an idea that you'll want to provide more control over the generated schema?

davepacheco closed this as completed Oct 8, 2020

Kinrany mentioned this issue Oct 31, 2023

Reference the public summary of RFD 10 #814

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion of alternative crates is not publicly available #56

Discussion of alternative crates is not publicly available #56

mikeyhew commented Oct 7, 2020 •

edited

Loading

davepacheco commented Oct 8, 2020

mikeyhew commented Nov 10, 2020

davepacheco commented Nov 10, 2020

mikeyhew commented Nov 14, 2020

davepacheco commented Nov 16, 2020

Discussion of alternative crates is not publicly available #56

Discussion of alternative crates is not publicly available #56

Comments

mikeyhew commented Oct 7, 2020 • edited Loading

davepacheco commented Oct 8, 2020

mikeyhew commented Nov 10, 2020

davepacheco commented Nov 10, 2020

mikeyhew commented Nov 14, 2020

davepacheco commented Nov 16, 2020

mikeyhew commented Oct 7, 2020 •

edited

Loading