Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trait rework: Variables should not be an associated type, but a marker trait #329

Open
Ten0 opened this issue May 27, 2020 · 6 comments
Open

Comments

@Ten0
Copy link
Contributor

Ten0 commented May 27, 2020

GraphQLQuery::Variables is the way currently used to represent that some type is, once serialized, a correct variables set for the given query.

However we could imagine that several structures, once serialized, would be valid for that query, e.g. GraphQLQuery::Variables and &GraphQLQuery::Variables, or the same structure but with &strs instead of Strings.

So what we see from that is that fundamentally, "being a valid variable set for query Q once serialized" is a property of some structures, and there may be several of them, so a single associated type is not the proper way to represent them. Instead, this property should be represented using a marker trait.

The same applies to Response, with an additional lifetime in the trait (and would help when trying to implement #30). In that case though, we may still want to provide a DefaultResponseData as an associated type, because it may still be useful when writing generic code running the query and returning deserialized owned data. (fn send_some_query<Q: GraphQLQuery>() -> Q::DefaultResponseData)

Here is what I think the traits should look like:

pub trait GraphQLQuery: Sized {
    /// The top-level shape of the response data (the `data` field in the GraphQL response). In practice this should be generated, since it is hard to write by hand without error.
    type DefaultResponseData: for<'de> IsValidResponseFor<'de, Self>;

    /// Produce a GraphQL query struct that can be JSON serialized and sent to a GraphQL API.
    fn build_query<V>(variables: V) -> QueryBody<V>
    where
        V: IsValidVariablesFor<Self>;
}

/// This struct is, once serialized, a valid `variables` parameter for this GraphQL query
/// 
/// `struct`s implementing this should be generated `struct`s most of the time.
pub trait IsValidVariablesFor<Q: GraphQLQuery>: serde::Serialize {}

/// This struct can be deserialized from the response data (the `data` field in the GraphQL response)
///
/// `struct`s implementing this should be generated `struct`s, since they are hard to write by hand without error.
pub trait IsValidResponseFor<'de, Q: GraphQLQuery>: serde::Deserialize<'de> {}

// With this new trait system, one could preserve some variables from one query to the other if needed
impl<'a, Q: GraphQLQuery, V: IsValidVariablesFor<Q>> IsValidVariablesFor<Q> for &'a V {}

With this new trait system, we could also generate non-owned versions of the structures using &str instead of Strings and serialize/deserialize from/to them. (#30 🎉), or better yet, replace the current generated structures with generic ones where you could specify the types for each of the fields, and:

//  ------ IN graphql_client ------
pub trait StringSer: Serialize {}
impl StringSer for String {}
impl<'a> StringSer for &'a str {}
// + std::borrow::Cow, etc, and lib users could add their own
// Also, maybe there's already a lib doing that kind of thing?

pub trait StringDe<'de>: Deserialize<'de> {}
impl<'de> StringDe<'de> for String {}
impl<'de> StringDe<'de> for &'de str {}
// + std::borrow::Cow, etc, and lib users could add their own
// Also, maybe there's already a lib doing that kind of thing?

// ------ CODEGEN ------
pub struct Query;
impl GraphQLQuery for Query {
    type DefaultResponseData = Response<String, String>;
    fn build_query<V>(variables: V) -> QueryBody<V>
    where
        V: IsValidVariablesFor<Self>,
    {
        todo!() // Same as currently implemented
    }
}

#[derive(Serialize)]
pub struct Variables<T1, T2> {
    pub field1: T1,
    pub field2: T2,
}
impl<T1, T2> IsValidVariablesFor<Query> for Variables<T1, T2>
where
    T1: StringSer,
    T2: StringSer,
{
}

#[derive(Deserialize)]
pub struct Response<T1, T2> {
    pub field1: T1,
    pub field2: T2,
}
impl<'de, T1, T2> IsValidResponseFor<'de, Query> for Response<T1, T2>
where
    T1: StringDe<'de>,
    T2: StringDe<'de>,
{
}

Obviously this would imply a breaking change but given the current download count this seems reasonable, and better do it now than later! 😉

What do you think ?

@mathstuf
Copy link
Contributor

I think this would massively complicate things. What would IsValidVariablesFor be? Do we lose type safety because of this? Same with the response trait.

Other than Cow<'a, str>, what types do you forsee being valid here? Would it instead be sufficient to just use Cow<'a, str> (maybe optionally using 'static if wanted?) instead of String?

@Ten0
Copy link
Contributor Author

Ten0 commented Jun 18, 2020

Hello @mathstuf, I hope you are having a great day.

I will try to address your concerns here.

I think this would massively complicate things.

It's important to note that what is really important here is the first (small) code block, and the second one with StringDe and the like are just examples of additional user-friendliness improvements that could then be made in a semver-compatible way.

Now, although this would indeed complicate things a bit as it introduces new traits, I think it would not complicate them too much: it's just a slightly different way to express the notion "SomeStruct is a valid Variables for Query Q". (You would use f(variables: V) where V: IsValidVariablesFor<Q> instead of f(variables: Q::Variables).) I also beleive the benefit to doing it this way is actually very high as it is actually the correct way to represent what we need to be able to represent (I will explain what I mean by "correct way" a few lines below).

What would IsValidVariablesFor be? Do we lose type safety because of this?

I will try to explain in more detail than I did in my previous message:

  • IsValidVariablesFor<Q> is a trait that provides type safety for the notion that "SomeStruct, once serialized is a valid Variables for Query Q".
  • It does not lose type safety: if you provide SomeOtherStruct which does not implement IsValidVariablesFor<Q>, it will be rejected by the constraint on build_query: V: IsValidVariablesFor<Self>, in the same way as it would currently be rejected by the constraint build_query(variables: Self::Variables). And of course, generated code would implement IsValidVariablesFor<Query> on the Variables struct in the generated module of query Query and any pair Query/struct for which it would work (that is, the server would accept it), and not for all pairs of Query/struct.
  • The main advantage compared to the current representation is that it allows the typesystem to represent when there are several (eg. SomeStruct1 and SomeStruct2) that, once serialized, are valid json Variables for Query Q.
  • A good example of that last scenario is: if currently we have Q: GraphQLQuery<Variables=SomeStruct1> with this system we would have Q: Query and SomeStruct1: IsValidVariablesFor<Q>, but also &SomeStruct1: IsValidVariablesFor<Q> (thanks to the generic impl at the bottom of my first code block). Our current implementation does not allow to represent that we could call build_query with &SomeStruct1, but if it did, as &SomeStruct1 serializes in the exact same way as SomeStruct1 (enforced by serde), the generated json would also be valid, so why not allowing the user to call build_query passing a reference ? That would enable them to reuse the contents of the query they sent later, which may be useful in some scenarios.
  • Another example of several structures being valid is if you write SomeStruct2 that is basically the same as SomeStruct1 (the current Variables struct for a given query), but you replace a type String inside by a type &str. Again, once serialized, it would be considered valid by the server, as String serializes in the exact same way as &str (again, enforced by serde). But it would also remove the need for the user to write code that reallocates a String when they already have an &str, when building Variables ! 😃
  • These two needs (and I'm sure we could find more) cannot be represented with the current system, even though they would be valid, because we are constrained by our representation that uses an associated type, hence enforcing that a single type can be used as variables. The representation described in the previous post however, allows for this possibility that conceptually exists, making it the exact correct way to represent our concepts.

Same with the response trait.

  • The IsValidResponseFor trait is basically the mirror of IsValidVariablesFor, for responses. There may be several structs we could deserialize to (e.g. owned with Strings, or borrowed when implementing Zero-copy response deserialization #30), but thanks to the IsValidResponseFor constraint, when deserializing a response one knows they can only attempt to deserialize it in a struct that implements IsValidResponseFor<TheirQuery>, in the same way as currently they would know they can only attempt to deserialize it in TheirQuery::Response. Again, we do not lose type safety, but we allow for more flexibility by properly representing any scenario where a response for a particular query could be successfully deserialized into several different types, making it, here again, the correct way to represent that a response to the query Q could be successfully deserialized into Response.

Other than Cow<'a, str>, what types do you forsee being valid here? Would it instead be sufficient to just use Cow<'a, str> (maybe optionally using 'static if wanted?) instead of String?

I could indeed imagine that one could want to build their query for Cows when generating part of their data and borrowing another part. However, I think most common use-cases would be:

  • Replacing Strings by &strs, as you can just borrow the data from elsewhere and do not need to clone it inside the Variables. Creating the Cow would introduce some boilerplate and runtime overhead.
  • Sometimes you'd still need the Strings when you need your Variables struct to be 'static, because you want e.g. to send it across threads to send the query later. (hence the need for both String and &str representations). Again, creating the Cow would introduce some boilerplate and runtime overhead.
  • One could also want to create newtypes that represent IDs for a particular object of a particular graphql service, that would propagate the Deserialize implementation of the inner object and hence still serialize as a json string. One may want to use these directly, instead of having to unpack the &str/String inside everytime they build a Variables struct. Cow would restrict you to only String and &'a str.
  • As a conclusion regarding that last point, I think simply replacing the Strings by Cows would introduce a lot of boilerplate for the user (need to create the Cow), as well as runtime overhead (need to handle the Cow serialization/deallocation), still require some form of genericity (need to be generic on 'a or give up on not cloning when using 'static), while also allowing for less use-cases (no newtype) compared to the solution with generic presented in the second code block of the initial post.

As a general conclusion, I would like to underline that these are only a few examples I could think of now, but there could be many more I didn't think of: I beleive this trait system is simply the one that perfectly matches the reality of our concepts (by the fact the concept of trait is the notion of "some property/capability satisfied by some structure", which is precisely what we're trying to represent here, where the property is "once serialized, will be accepted as a valid variables by the graphql server for this query"), making it the correct way to represent them. And once designed, as that is always the case when you're designing a computer system and your representation matches your concepts perfectly, it naturally comes with tons of advantages, among which user-friendliness (generic Variables structs, allowing to build query by reference on Variables), performance (#30), better type safety (allowing for newtype patterns), etc.

I hope I was able to answer all of your concerns as well as better explain what I tried to design and the philosophy behind it, and would love to read more feedback.

Do you (anybody else welcome of course) have any other question, comment, remark ?

@mathstuf
Copy link
Contributor

It does not lose type safety: if you provide SomeOtherStruct which does not implement IsValidVariablesFor, it will be rejected by the constraint on build_query: V: IsValidVariablesFor, in the same way as it would currently be rejected by the constraint build_query(variables: Self::Variables). And of course, generated code would implement IsValidVariablesFor on the Variables struct in the generated module of query Query and any pair Query/struct for which it would work (that is, the server would accept it), and not for all pairs of Query/struct.

What enforces that IsValidVariablesFor<Q> is actually true? What is stopping me from doing (the moral equivalent of):

impl<Q> IsValidVariablesFor<Q> for () {
    fn as_variables(&self) -> serde_json::Value {
        json!({})
    }
}

@Ten0
Copy link
Contributor Author

Ten0 commented Jun 18, 2020

What enforces that IsValidVariablesFor<Q> is actually true? What is stopping me from doing (the moral equivalent of):

impl<Q> IsValidVariablesFor<Q> for () {
    fn as_variables(&self) -> serde_json::Value {
        json!({})
    }
}

As a side note which may be useful for easily understanding the rest of this message, IsValidVariablesFor simply inherits and relies on Serialize and does not need an additional as_variables function.

Saying () is valid a valid Variables for any query Q is prevented by the fact we can implement a trait on a type only if either the trait or the type is local to our crate, which will not be the case for any user of this crate.

For other moral equivalents (typically, where you own the query or the type you're trying to implement it on), a user could specify by themselves that they know that some type is a valid Variables for a given query, but they could know that for very good reasons, e.g. because that type forwards the Serialize impl to that of the original code-generated Variables that already implemented IsValidVariablesFor<TheirQuery>, by building it in the Serialize impl.
It seems to be a legitimate use case (especially since it would be zero-cost if we had non-owned versions of Variables).

If a user implements that trait themselves, they take responsibility that they're saying "I know this type is valid for that query" (which they can only do if they own the query or the given type). It provides them flexibility and allows again for more user-friendly interfaces.

I'm not sure I understand in which scenario a user would mistakenly manually impl IsValidVariablesFor<Q> for one of their types that does not satisfy the predicate this trait is meant for, hence breaking some of their type safety, if they don't have a legitimate scenario for trying to manually implement it for their type/query pair in the first place.

@mathstuf
Copy link
Contributor

Saying () is valid a valid Variables for any query Q is prevented by the fact we can implement a trait on a type only if either the trait or the type is local to our crate, which will not be the case for any user.

Replace () with MyLocalZST. It's isomorphic.

If a user implements that trait themselves, they take responsibility that they're saying "I know this type is valid for that query" (which they can only do if they own the query or the given type). It provides them flexibility and allows again for more user-friendly interfaces.

That's not how Rust traits work in practice. At least it's not how I'd want traits I'm relying on to ensure that I got my types right to work. It's certainly not an improvement to the status quo in my eyes. I'd find it far more interesting to replace String usages with Cow<'a, str> if you're really concerned about unnecessary allocations. Cow wrapping is likely going to be trivial overhead compared to the JSON building that is going to happen at the end of this chain anyways.

(Also, just as a side note, IsValidVariablesFor simply inherits Serialize and does not need as_variables or anything like it.)

I saw that. It means that I can't actually use this trait in practice because it doesn't actually guarantee what I'd want. And it wouldn't work anyways because how to I signal that in Query1 I skip some field while the same struct as a variables that Query2 needs? Why am I not ending up with exactly one struct per Query in this new pattern? And why do I now have to write the code and manually keep in sync with my query code.

I find it far more useful for the compiler to tell me "you forgot to add this field you added to your query" than the GraphQL server to tell me after I deploy it. We may as well just take raw JSON for the variables parameter for how much it actually helps me get my types right.

Maybe this could be an option for a way to generate code, but I certainly don't think it should be the default.

@Ten0
Copy link
Contributor Author

Ten0 commented Jun 18, 2020

Oh, I beleive I found where the misunderstanding is coming from.

And why do I now have to write the code and manually keep in sync with my query code.

As I tried to explain in my first post, the IsValidVariablesFor<Query> implementation for the automatically generated Variables struct would still be automatically generated in the same way the associated type Variables = Variables currently is, so you wouldn't have anything more to manually write than you currently do.

I find it far more useful for the compiler to tell me "you forgot to add this field you added to your query" than the GraphQL server to tell me after I deploy it.

As a consequence, that's precisely what it still does.

Replace () with MyLocalZST. It's isomorphic.

A user could define a type, and declare that they want it to typecheck with any query, but again, people would have to do that on purpose and use that type on purpose, so they wouldn't do it unless they have a legitimate use case for that. So as a prerequisite to doing that they would need that feature. So "somebody does that" implies "this feature is needed for him". (Though I don't know what use case that would be since it kinda defeats the whole point of this library).

That's not how Rust traits work in practice.

I'm afraid I don't see how they work any differently from that. The Send trait means "your type can be sent from a thread to another without breaking everything". The FusedIterator trait means "calling next on a fused iterator that has returned None once is guaranteed to return None again". And the IsValidVariablesFor<Query> trait means "calling .serialize() will produce a variable the server will consider as valid for Query". You can find tons of other examples of that in the great library that is diesel.

At least it's not how I'd want traits I'm relying on to ensure that I got my types right to work.

I beleive (as per the examples I've pointed out) that's unfortunately the idiomatic way Rust works.
Why do you think that is not desirable ?

I saw that. It means that I can't actually use this trait in practice because it doesn't actually guarantee what I'd want.

I beleive I've addressed this: it does.

But if I'm mistaken and that wasn't the misunderstanding I thought, can you further explain what property you think is not guaranteed unless you specifically write additional code for removing these guarantees ?

Why am I not ending up with exactly one struct per Query in this new pattern?

As I said, because several different structures could be valid. What's the point of chosing a single one of them for your user, when you could allow him to use any that is valid (while still preventing the ones that are not valid) ? We could for instance automatically generate, through the macro, the owned and non-owned versions of the structures. Or we could generate a single structure, but generic (second code of first post), which would allow for maximum flexibility.

Neither of these require you to manually implement IsValidVariablesFor<Query> at any point, nor to write your structures yourself, as these would also be automatically implemented by the codegen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants