Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reducing Configuration Representation Boilerplate #346

Open
XAMPPRocky opened this issue Jul 22, 2021 · 7 comments
Open

Reducing Configuration Representation Boilerplate #346

XAMPPRocky opened this issue Jul 22, 2021 · 7 comments
Labels
area/filters Related to Quilkin's filter API. area/user-experience Pertaining to developers trying to use Quilkin, e.g. cli interface, configuration, etc kind/design Proposal discussing new features / fixes and how they should be implemented priority/medium Issues that we want to resolve, but don't require immediate resolution.

Comments

@XAMPPRocky
Copy link
Collaborator

Currently most of the boilerplate in writing a filter, is writing a configuration that is statically and dynamically configurable. Right now, you essentially have to write twice, once in Rust and another in Protobuf. I'd like to propose that we try to generate one these representations from the other, so that we can much more easily iterate and add to a filter's configuration. This would come in the form of a tool that is either run separately, or as part of the build process.

  • Protobuf -> Rust
    • The issue with this route is that the protobuf parsers in Rust, aren't at the same level quality as the Rust parser libraries. So we'd have to put more work in the parser than I think would be worth it.
  • Rust -> Protobuf
    • I think this the most realistic both in terms of the implementation as well as flexibility, as we can do a lot more in Rust than we can in Protobuf.

One way we could do this is have a procedural macro for the configuration struct, that generates protobuf files from their type definitions, and then dump the generated definition into the folder, so you'd have to run cargo build after changing a configuration struct to generate the update protobuf, but that seems reasonable and in return we wouldn't have to manually edit protobuf definitions.

// src/lib.rs
quilkin::generate_config_protobufs!("proto/quilkin");

// src/filters/capture_bytes.rs

#[derive(quilkin::ConfigEnum)]
enum Strategy {
    #[quilkin(rename = "PREFIX")]
    /// Looks for the set of bytes at the beginning of the packet
    Prefix = 0,
    #[quilkin(rename = "SUFFIX")]
    /// Look for the set of bytes at the end of the packet
    Suffix = 1,
}

#[derive(quilkin::Config)]
struct Config {
    #[quilkin(default)]
    strategy: Strategy,
    /// the number of bytes to capture
    #[quilkin(rename = "size")]
    size: usize,
    /// the key to use when storing the captured bytes in the filter context
    #[quilkin(rename = "metadataKey")]
    #[quilkin(default = "default_metadata_key")]
    metadata_key: String,
    /// whether or not to remove the set of the bytes from the packet once captured
    #[quilkin(default = "default_remove")]
    remove: bool,
}
@XAMPPRocky XAMPPRocky added kind/design Proposal discussing new features / fixes and how they should be implemented area/filters Related to Quilkin's filter API. priority/low Issues that don't need to be addressed in the near term. labels Jul 22, 2021
@markmandel
Copy link
Member

markmandel commented Jul 26, 2021

I love this idea, because the double definition of models for each config type annoys me no end. 🥳

I'm also on the side of Rust -> Protobuf - the other thing we could do is have some opinionated defaults on how the protobuf gets generated (or maybe even start with not even letting people have the option).

For example, above we have #[quilkin(rename = "metadataKey")] - but if we have strong opinions on how the protobuf should be generated, and we would be doing code reviews to ensure that people align with that standard - we might not want to give them the option to do anything other than convert metadata_key ➡️ metadataKey ... or maybe a better idea is that we could build in linting tools into the macro that would let you write snake_case -- or something like that 😄 .

🤔 I might lean towards this being part of our build.rs than a tool that is manually run, since it will get rebuilt automatically on each compile, and there is less for new developers to do -- but we could also build it into our CI pipeline to run the tool and see if there is any difference in the .proto files (which helps us, but not others that use this as a library 🤔 ).

What are your thoughts on build.rs vs manual tool?

@XAMPPRocky
Copy link
Collaborator Author

Thinking on it more, I think it makes sense to run on every build. Also to clarify, I don't think this would run in build.rs, we should run this as a proc macro, as it makes it easier to catch all of the configuration in the crate. Like the above syntax should work as-is with this macro IMO.

the other thing we could do is have some opinionated defaults on how the protobuf gets generated (or maybe even start with not even letting people have the option).

I don't have any opinions on Protobuf style (yet 😄) as this is my first production project using it. But I do think we should try to use or have some convention that we follow and make that the easy path. I don't see a problem with having a rename attribute as an escape hatch if you need it but it should be the uncommon case.

@XAMPPRocky XAMPPRocky added priority/medium Issues that we want to resolve, but don't require immediate resolution. and removed priority/low Issues that don't need to be addressed in the near term. labels Aug 11, 2021
@XAMPPRocky
Copy link
Collaborator Author

XAMPPRocky commented Aug 11, 2021

Working #358 I've realised that this problem is actually a bit bigger than the filter configuration, as there are types we want to use in configurations as well as use in filters such as the Endpoint type. With the addition of #360 now requiring a Go representation of the configuration, as well as #13 requiring a representation that can safely used across a VM/language barrier.

This brings us up to five potential different representations of a type (Rust, Go, Protobuf, YAML/JSON, FFI). Implementing all of those for each type would be a lot of work, and is not really sustainable to maintain. What I'd like to propose is instead that we create a Quilkin data model layer between the types and the formats (similar to how serde works), that essentially represents the same types as JSON/YAML, but we can use this layer to have all types that implement Quilkin's data model have a single implementation and generate the five different representations from this data model. This would also make it easier to replace, add, or remove formats in the future as new formats only need to implement translating from Quilkin's data model to their format.

@XAMPPRocky XAMPPRocky changed the title Reducing Filter Configuration Boilerplate Reducing Configuration Representation Boilerplate Aug 12, 2021
@XAMPPRocky XAMPPRocky added the area/user-experience Pertaining to developers trying to use Quilkin, e.g. cli interface, configuration, etc label Aug 12, 2021
@markmandel
Copy link
Member

What I'd like to propose is instead that we create a Quilkin data model layer between the types and the formats (similar to how serde works), that essentially represents the same types as JSON/YAML, but we can use this layer to have all types that implement Quilkin's data model have a single implementation and generate the five different representations from this data model.

To confirm - I'm assuming this would still be a series of macros (much like serde/what is above) that would be applied to our config data models as well as our Filter config, and users that write their own custom models as well.

This makes a lot of sense to me.

Small thing:

This brings us up to five potential different representations of a type (Rust, Go, Protobuf, YAML/JSON, FFI)

I think the Go representation would come from the generated proto, so I don't think we need to concern ourselves with that layer in Quilkin.

@iffyio Would this affect the xDS layer at all? Make that more complicated / easier / the same? I can't think of any concerns, but figured I would ask.

@markmandel markmandel mentioned this issue Aug 12, 2021
3 tasks
@iffyio
Copy link
Collaborator

iffyio commented Aug 13, 2021

@XAMPPRocky could you give some pseudocode+config examples of what this would look like? I'm not too sure I understand what's being proposed beyond the rust->protobuf side of things unfortunately. My current assumption is that we still specify a config as a struct then the macro generates pretty much all formats or maybe only some formats (if e.g the go would still come from protobuf)? Then where does the quilkin data model layer and translation between layers fit in (is it the macro also generating code that translate between any two layers)? Or is this something else entirely? I don't think I have a picture of it yet unfortunately.

@XAMPPRocky
Copy link
Collaborator Author

XAMPPRocky commented Aug 13, 2021

I think the Go representation would come from the generated proto, so I don't think we need to concern ourselves with that layer in Quilkin.

Looking at #360 there's types here that aren't part of the protobuf that we need to generate.

// Endpoint represents an xds endpoint
type Endpoint struct {
IP string
Port int
Metadata map[string]interface{}
}
// Cluster represents an xds cluster
type Cluster struct {
Endpoints []Endpoint
}
// FilterConfig represents a filter's config
type FilterConfig struct {
Name string
Config interface{}
}
// Resources represents an xds resource.
type Resources struct {
Clusters map[string]Cluster
FilterChain []FilterConfig
}
type debugFilterConfig struct {
Id string
}

could you give some pseudocode+config examples of what this would look like?

Sure, the configuration as it stands today should be entirely the same, the only things we're changing is what code is generated, and how we encode/decode to a format like YAML or Protobuf. At compile-time, the macro generates three representations of the config. Protobuf schema, Go, and Rust. The Rust generated code uses a layer between the type and the runtime format, so that we can decouple the types from the format.

I've written a basic example of what this could look like for boolean types, in reality we'd have probably at least the same kind of types as YAML. While this pattern has a bit of boilerplate it's incredibly powerful, and hopefully it shows enough to demonstrate how we can take advantage of this kind of layout going forward. We only implement formats in terms of Quilkin's layer, which is then transformed into the desired format. This design means that now whenever we have any bool like type, we just call serialise_bool on it, and through the traits we now have a single implementation of that type that works with JSON, YAML, and Protobuf. This also means we can add and remove formats as we want without also having to refactor the types in the codebase, you just add or remove the Serializer/Deserializer implementations and they support all types that implement the traits.

// Abstract traits that represent encoding and decoding a type into Quilkin's "data layer".
pub trait Serializer {
    fn serialize_bool(&mut self, value: bool) -> Result<()>;
}

pub trait Deserializer {
    fn deserialize_bool(&mut self) -> Result<bool>;
}

pub trait Serialize {
    fn encode<S>(&self, encoder: &mut S) -> Result<()> where S: Serializer;
}

pub trait Deserialize {
    fn decode<D>(decoder: &mut D) -> Result<Self> where D: Deserializer;
}

// Implementations for types

impl Deserialize for bool {
    fn decode<D>(decoder: &mut D) -> Result<Self> where D: Deserializer {
        decoder.decode_bool()
    }
}

impl Serialize for bool {
    fn encode<S>(&self, encoder: &mut S) -> Result<()> where S: Serializer {
        encoder.encode_bool(self)
    }
}

// Serializer formats

struct JsonEncoder {
    ser: serde_json::Serializer,
}

struct YamlEncoder {
    ser: serde_yaml::Serializer,
}

struct ProtobufEncoder {
    buf: Vec<u8>
}

impl Serializer for Json {
    fn serialize_bool(&mut self, value: bool) -> Result<()> {
         self.ser.encode_bool(value)
    }
}

impl Serializer for Yaml {
    fn serialize_bool(&mut self, value: bool) -> Result<()> {
         self.ser.encode_bool(value)
    }
}

impl Serializer for Protobuf {
    fn serialize_bool(&mut self, value: bool) -> Result<()> {
         prost::Message::encode(value, &mut self.buf)
    }
}

// Deserializer formats

struct JsonDecoder {
    de: serde_json::Deserializer,
}

struct YamlDecoder {
    de: serde_yaml:: Deserializer,
}

struct ProtobufDecoder<'input> {
    buf: &'input mut [u8]
}

impl Deserializer for JsonDecoder {
    fn deserialize_bool(&mut self) -> Result<bool> {
         self.de.decode_bool()
    }
}

impl Deserializer for YamlDecoder {
    fn deserialize_bool(&mut self) -> Result<bool> {
         self.de.decode_bool()
    }
}

impl Deserializer for ProtobufDecoder<'_> {
    fn serialize_bool(&mut self, value: bool) -> Result<()> {
         prost::Message::decode_length_delimited::<bool>(value, self.buf)
    }
}

@markmandel
Copy link
Member

This definitely makes a lot of sense to me.

On the point of Go types, I'm going to go back to #360 - because either (a) those specific types (like Endpoint), don't changes, they are set to the specifications of xDS or (b) we can generate the Filter specific configs from protobuf - but I think that's a minor point to the overall strategy, which makes perfect sense to me 👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/filters Related to Quilkin's filter API. area/user-experience Pertaining to developers trying to use Quilkin, e.g. cli interface, configuration, etc kind/design Proposal discussing new features / fixes and how they should be implemented priority/medium Issues that we want to resolve, but don't require immediate resolution.
Projects
Development

No branches or pull requests

3 participants