Create GraphQL to SQL compiler #61

Closed
Horusiath opened this Issue Jul 20, 2016 · 1 comment

Projects

None yet

1 participant

@Horusiath
Contributor
Horusiath commented Jul 20, 2016 edited

Motivation

Right now we can execute any kind of query. However one of the common scenarios would be to create a reply by making SQL query to a database and then returning the computation result back to requester. Here we have a problem - since we are defining resolution of per-field basis, any nested calls will result in N+1 SELECT problems, eg. given a GraphQL query:

query Example {
    users {
        id
        name
        pets {
            name
        }
    }
}

and schema defined like:

Define.Object("User", [
    Define.Field("id", String, fun _ u -> u.ID)
    Define.Field("name", String, fun _ u -> u.ID)
    Define.Field("pets", ListOf Pet, (fun _ u -> 
        query {
            for pet in u.Pets do
            select pet }))
]);

we would end up with making a one query to retrieve user, and then for each user retrieved this way, one additional query to get his pets. This is a common problem in O/R mapping and something we'd like to avoid.

Possible solution

Because we have already implemented planning phase, we have a convenient representation of GraphQL queries in form of ExecutionPlan data structure attached to context. What we can do from here is to create a compiler, which would translate GraphQL query into SQL queries.

Currently, beside data about things like field name, alias, output type, arguments or containing object data, we can also represent a structure of the query itself by using ExecutionPlanKind discriminated union, ie. GraphQL query above would look close to the following:

SelectFields [             // users
    ResolveValue           // users.id
    ResolveValue           // users.name
    ResolveCollection (    // users.pets
        ResolveValue       // users.pets.name
    )
]

We could translate those into their SQL query equivalent.

How?

ResolveValue

This can be translated directly to matching SQL table column. Given an execution plan info structure the most straightforward translation would look like: info.ParentDef.Name + "." + info.Definition.Name + " AS " + info.Identifier . Remark: this is not actual code, we would need some table aliasing etc.

SelectFields

We have two cases here:

  1. For top level fields we can make direct SELECT .. FROM clause.
  2. For set of consecutively nested SelectFields we can use selection with JOIN clause between inner and outer SelectFields (because SelectFields [ SelectFields [ .. ] ] is always 1-1 relation. Example:

GraphQL query:

{
    product {
        id 
        customer {
             name
        }
    }
}

into SQL:

SELECT product.id, customer.name
FROM Product product 
JOIN Customer customer ON product.customer_id = customer.id

ResolveCollection

Since this kind always describes 1-many relationship, realizing it with joins may not be a good idea i.e.: given customers.orders.products - where we have 10 customers, each of them has 50 orders and each order has 20 products, we would end up with a SQL query returning 10 000 rows, where actual number of entities we care about is 80.

Solution:

Make a separate sub select - as SQL command can take more than one SELECT statement, once we find ResolveCollection we can translate its subtree into separate SELECT FROM statement and join data for it directly from the result set. Remark: this will require and additional field to correlate rows from a separate selects.

Example:

GraphQL query:

{
    customers {
        customer_id
        name
        orders {
             order_id
        }
    }
}

into SQL queries:

SELECT customer_id, name FROM customers;

SELECT customer_id, order_id FROM orders;

ResolveAbstraction

This kind is used for both GraphQL interfaces and unions and carry the typeMap info (Map<string, ExecutionPlanInfo list>), which describes a list of fields to be resolved for each type target abstraction refers to. We basically need to use two cases here:

Union

Union should be pretty easy to translate. Example - given type union Pet = Dog | Cat we could assume there are some Dog and Cat tables in the SQL schema, we could translate following GraphQL query:

{
    animals {
        ... on Dog { barks }
        ... on Cat { meows }
    }
}

into SQL looking like that:

SELECT 'Dog' AS TypeName, Barks AS barks, NULL AS meows FROM Dogs
UNION
SELECT 'Cat' AS TypeName, NULL AS barks, Meows AS meows FROM Cats

Then just use TypeName as discriminator for creating result objects.

Interface

Dunno yet. They are basically unrepresentable in SQL.

Security issues

One of the most obvious problems is to apply some constrains on the queries, so that not everyone would be able to traverse through the whole database. Two possibilities I had in mind:

Tree cutting

Given an exectuion plan tree, we could introduce some operators to split it in two before executing. ie given user query:

{
    user {
        credentials { id }
        privateField
    }
}

split it into twp:

{
    user {
        credentials { id }
    }
}
{
    user {
        privateField
    }
}

Once first query will be executed, the result can be authorized and then (if auth was successful) the second query would be executed. We could add some methods to the API to make execution plan tree cutting easier.

Resolve value exceptions

The second option - more costly but also easier to achieve - would be to simply apply authorization of field resolution level, i.e:

Define.Field("PrivateField", String, fun ctx user ->
    let auth = ctx.RootObject :> AuthIdentity
    if auth.HasAccess "PrivateField"
    then user.PrivateField
    else raise (UnathorizedException()))

This won't save from retrieving that data from the database, but it won't return it outside.

Upgrade F# compiler dependency

Other thing that could be taken into account withing that task is to think about switching FieldDef.Resolve from function to expression - this could give us possibility to interpret the body of resolution function. This will be the breaking change for F# compiler < 4.0 - we could raise that requirement and mark function passed as parameter with ReflectedDefinition attribute to make it work in transparent way.

@Horusiath Horusiath self-assigned this Aug 8, 2016
@Horusiath
Contributor
Horusiath commented Aug 11, 2016 edited

As mentioned on Slack, I've decided first to make GraphQL-to-LINQ interpreter as this is more flexible approach, and can be potentially used for more scenarios and solve problems such as selects over many-to-many joints for free. Downside of it is performance cost - another level of indirection that requires additional computations and may result in slower code.

Right now I've managed to refactor FieldDef.Resolve field localy to use F# quotation expressions instead of delegates - this will allow us to interpret resolvers to construct output LINQ queries. To visualize, lets use following example:

GraphQL query:

{
    users(id: 1) {
        id
        info {
            firstName
            lastName
        }
        contacts {
            firstName
        }
    }
}

This would produce following execution plan:

ResolveCollection: users
    SelectFields: users
        ResolveValue: id
        SelectFields: info
            ResolveValue: firstName
            ResolveValue: lastName
        ResolveCollection: contacts
            SelectFields: contacts
                ResolveValue: firstName

By applying that execution plan we would expect following LINQ query:

users.Where(u => u.Id = 1).Select(u =>      //  ResolveCollection: users
    new User {                              //      SelectFields: users
        Id = u.Id,                          //          ResolveValue: id
        Info = new UserInfo {               //          SelectFields: info
            FirstName = u.FirstName         //              ResolveValue: firstName
            LastName = u.LastName },        //              ResolveValue: lastName
        Contacts = u.Contacts.Select(i =>   //          ResolveCollection: contacts
            new UserInfo {                  //              SelectFields: contacts
                FirstName = i.FirstName     //                  ResolveValue: firstName
            })
    })

Therefore:

  • ResolveCollection - applies LINQ query methods (i.e. Select). It could also apply other methods, when field is contains some arguments we could use for additional operations (like idWhere method with equality comparison).
  • SelectFields - constructs expression for new returned object. This is problematic part, as GraphQL is highly dynamic and any combination of fields should be allowed. Therefore it seems, that necessary limitation would be to set requirement on the materialized type to contain default constructor and introduce mutable fields, that could be filled later. I'm afraid, this is a must, but it results in non-idiomatic F# code.
  • ResolveValue - this would create an assignment expression, based on matching object property to expression from FieldDef.Resolve.
  • ResolveAbstraction - no idea yet.

LINQ queries should also respect arguments, at least the ones defined in Relay specification:

  • id which describes an element by specific Id. This is hard part, as in Relay ID could be int or string (how to avoid accidental casting to string within LINQ queries). Also with this attribe we should use FirstOrDefault (for nullable definitions) or First (for non-nullable) instead of Select.
  • Combination of first (required) and after (optional) - this works instead of standard pagination. Its actually better, as it takes into account possibility of changes in data set between two page requests. first is equal to LINQ Take, while after sets a cursor for the last known id (make sure you know how to read Relay cursors ;) ). Therefore after in terms of LINQ looks like .OrderBy(x => x.Id).Where(x => x.Id > <cursor.id>).
  • Combination of last (required) and before (optional) - just like previous one, but in reversed direction. last is equal to LINQ Take, but to make it work before should work like .OrderByDescending(x => x.Id).Where(x => x.Id > <cursor.id>).

All of those arguments work on SelectFields (id is supposed to return object implementing GraphQL Node interface, rest returns a special Connection object).

@Horusiath Horusiath closed this Sep 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment