New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] Graphql schema refactor #4261

Open
pieh opened this Issue Feb 27, 2018 · 38 comments

Comments

10 participants
@pieh
Copy link
Contributor

pieh commented Feb 27, 2018

Who will own this?

What Area of Responsibility does this fall into? Who will own the work, and who needs to be aware of the work?

Area of Responsibility:

Select the Area of Responsibility most impacted by this Epic

  • Admin

  • Cloud

  • Customer Success

  • Dashboard

  • Developer Relations

  • OSS

  • Learning

  • Marketing

  • Sales

    AoR owner @KyleAMathews
    Domain owner TBD
    Project manager TBD
    Tech lead TBD
    Contributors TBD

Summary

Make graphql schema generation code more maintainable and easier to add new features like allowing user specified types on fields instead of automatic inferring.

How will this impact Gatsby?

Domains

List the impacted domains here

Components

List the impacted Components here

Goals

What are the top 3 goals you want to accomplish with this epic? All goals should be specific, measurable, actionable, realistic, and timebound.

How will we know this epic is a success?

What changes must we see, or what must be created for us to know the project was a success. How will we know when the project is done? How will we measure success?

User Can Statement

  • User can...

Metrics to Measure Success

  • We will see an increase /decrease in...

Additional Description

In a few sentences, describe the current status of the epic, what we know, and what's already been done.

What are the risks to the epic?

In a few sentences, describe what high-level questions we still need to answer about the project. How could this go wrong? What are the trade-offs? Do we need to close a door to go through this one?

What questions do we still need to answer, or what resources do we need?

Is there research to be done? Are there things we don’t know? Are there documents we need access to? Is there contact info we need? Add those questions as bullet points here.

How will we complete the epic?

What are the steps involved in taking this from idea through to reality?

How else could we accomplish the same goal?

Are there other ways to accomplish the goals you listed above? How else could we do the same thing?

--- This is stub epic - need to convert old description to new format

Main issue im trying to solve is that type inferring will not create fields/types for source data that:

  • has conflicting types (sometimes source plugins have no way of knowing what is correct type and can’t type correct themselves),
  • has no data for some fields (optional data field/node will not be inferred if that field/node is not used in source of the data and queries will fail because schema doesn’t contain that field)
    This is not inferring implementation issue - it’s just this approach simply can’t handle such cases.

My approach to handle that is to allow defining field types by

  • using graphql schema definition language (for data that will have static types - i.e. File nodes will always have same data structure)
  • exposing some function for gatsby node api (for data that types are dynamic but we can get information about structure - for example Contentful content model)

Problem:

Current implementation of schema creation looks something like this:
current

Input/output type creation is not abstracted and implementation has to be duplicated for each source of information.

In my proof of concept ( repository ) I added another source (graphql schema definition language) and just implemented subset of functionality:
poc

As testing ground I used this barebones repository. Things to look for:

Implementing it way this way is fine for proof of concept but it’s unmaintainable in long term. So I want to introduce common middleman interface:
proposed

Goals:

  • This should be backward compatible
  • Make it easy to add new source of field types and/or schema features
  • Remove code duplication
  • Initial change should only introduce middleman interface and provide feature parity with current implementation to ease review

Questions:

  1. What are potential features / use cases to take into consideration when designing details of this (not features of schema - it’s how it could be used)? I see 1 potential cases where this might be important (to not need to do big refactor again later):
    • Live previews - right now gatsby can’t modify schema when it’s running in develop mode but it can refresh data (builtin refresh for filesystem source + __refresh hook to refresh all source data) - it might be worth looking to be able to refresh schema too?
  2. How would schema stitching fit into it (merging external remote graphql endpoints with gatsby graphql layer)? Basic schema stitching would not interact with gatsby graphql part (for example we have our gatsby queries markdown etc and then we have fields from github graphql api repository - if there’s no connection between them then then this would be out of scope for this RFC), but if we would like to add connection - for example allow linking frontmatter field to github repository then this would need to be thought out ahead of time. I was looking at graphql-tools schema stitching and it does have some nice tooling for merging schemas and option to add resolvers between schemas - is this something that was planned to be used for that?
@MarcCoet

This comment has been minimized.

Copy link
Contributor

MarcCoet commented Feb 27, 2018

Thanks a lot for the research @pieh .
Maybe I am completely off road here but couldn't we use schema stitching to add missing fields?
But maybe it could not solve the issue 1 you raise (conflicting types on a field) and it probably is a weaker solution overall on the long run...?
I love your idea about live preview refresh! That would be a super solid feature to add to Gatsby IMHO.
To be honest my main concern is the time such a refactor will take...

@pieh

This comment has been minimized.

Copy link
Contributor

pieh commented Feb 27, 2018

@MarcCoet
Not sure on what level you would want to stitch schema - this is not magic that would make it work automatically :) . There is multiple "side effects" for single field in data - it produces output type, it produces input type for filtering, it produces input type for sorting, it produces input type for grouping. This would still suffer same problems - it would need to be implemented in multiple places.

There is not much distinction between fields with no data and with conflicting types in terms of creating schema currently - gatsby discards fields with have conflicting types (so they become fields with no data at the stage of creating schema) - the distinction is more for website/apps developers - they have data but field is not in schema.

You can use my proof of concept branch (at least for testing things out) - it has all basic features of getting fields - it can resolve local files, linked nodes (both of single and multiple types - unions) and of course inline fields. But to get full feature set I would have to implement this 3 more times in different places (filtering, sorting, grouping).

Or you now can use setFieldsOnGraphQLNodeType function ( https://www.gatsbyjs.org/docs/node-apis/#setFieldsOnGraphQLNodeType ) to add/overwrite "inline" field types (fields that aren't linked to other nodes). Not really super easy to use and can't reference other types that are available in schema.

I totally get your time concern and frustration about this issue - I have this problem too with some of my sites - it's hard to explain your content editors why things suddenly stopped working when they cleared some optional field ( this is why I started working on this! ), but this has to be done right sooner than later, as features that will need to be refactored will pile up.

@KyleAMathews

This comment has been minimized.

Copy link
Contributor

KyleAMathews commented Feb 27, 2018

How would schema stitching fit into it

It wouldn't — the schema stitching process basically takes two entirely separate schemas and lets you query both of them at the same time. Unless people name their types the same as the default Gatsby ones, there'd be no interaction between the two schemas.

@KyleAMathews

This comment has been minimized.

Copy link
Contributor

KyleAMathews commented Feb 27, 2018

Love the direction you're going here! This feels like the right approach and direction for a refactor and will unlock a lot of really nice capabilities!

@pieh

This comment has been minimized.

Copy link
Contributor

pieh commented Feb 27, 2018

About schema stitching - I was researching this a bit earlier and graphql-tools provide way to add resolvers between schemas - https://www.apollographql.com/docs/graphql-tools/schema-stitching.html#adding-resolvers as part of their schema stitching toolkit. So hypothetically we could create custom resolver (or rather user on project level or on plugin level would) that could transform:
repository: "https://github.com/gatsbyjs/gatsby" (<- that's frontmatter) into response from repository query from github graphql api (similar to how we link/map to nodes currently). This doesn't have to land in initial version of schema stitching, but is something worth keeping in mind.

@KyleAMathews

This comment has been minimized.

Copy link
Contributor

KyleAMathews commented Feb 27, 2018

Huh! That'd be amazing! Yeah, there's a ton of possibilities here — you could link to tweets, flickr images, facebook profiles, etc. anything accessible via an API and as long as you have the right source plugin installed, everything would be linked up. That'd be crazy powerful.

@jlengstorf

This comment has been minimized.

Copy link
Member

jlengstorf commented Feb 27, 2018

@pieh @KyleAMathews This is something I've got a bit of experience with. When I was at IBM, we needed to keep data sources discrete, but allow them to be combined in queries to avoid complex data processing on the front-end. I ended up creating and open sourcing GrAMPS to address this. (I wrote up a "hello world" example in this post.)

One of the goals of GrAMPS is to allow what I've clumsily dubbed "asynchronous stitching", where a data source can define how it extends another data source if that data source exists in the schema. This would allow plugins to build on each other when possible, but wouldn't require them to be peerDependencies. From an open/shareable standpoint, this seems like a way to have our cake and eat it, too: we optionally upgrade the GraphQL schema, rather than hard-coding fragile relationships.

The logic behind this wouldn't require GrAMPS to function; it's basically checking the schema for a given type before applying the mergeSchemas call.

I'm not sure how well this fits into the overall goal of this RFC, but I think it could help us implement the "Schema Builder" with more flexibility and extendability.

Happy to help out on this however I can. Let me know if you want me to expand on any of this.

@pieh

This comment has been minimized.

Copy link
Contributor

pieh commented Feb 27, 2018

@jlengstorf Wow, GrAMPS is cool! Not sure if it will fit, but I will definitely read up more on it (either to use it or at least steal some ideas!) I will for sure reach out to you for your insight.

I'd like to keep this RFC to not focus on implementation details (too much 😄). I want this to serve as requirements gathering place so we could later design APIs that could be extended if needed (to not over-engineer initial refactor) but not changed (wishful thinking 😄). I think we could expose same internal APIs to plugins but to do that they need to be well designed and not be subject to breaking changes in near future.

@i8ramin

This comment has been minimized.

Copy link
Contributor

i8ramin commented May 16, 2018

Hi. Has there been any update on this issue? Just wondering. I really wanna use Contentful + graphql ... but this issue makes it very hard to do so :(

@niklasravnsborg

This comment has been minimized.

Copy link
Contributor

niklasravnsborg commented Jul 30, 2018

Just reading into this concept. I wrote a custom source plugin where a field from my JSON API can be null. These fields don't end up in my schema as I would expect. Are there any updates on this?

@pieh Awesome work! Keep it up 😊

@calcsam calcsam changed the title [RFC] graphql schema refactor [RFC] Allow user to define GraphQL schemas Aug 11, 2018

@calcsam

This comment has been minimized.

Copy link
Member

calcsam commented Aug 11, 2018

@i8ramin -- it's definitely in our backlog!

@pieh -- I've renamed this issue for clarity

@pieh pieh changed the title [RFC] Allow user to define GraphQL schemas Graphql schema refactor Aug 24, 2018

@pieh pieh changed the title Graphql schema refactor [EPIC] Graphql schema refactor Aug 24, 2018

@pieh pieh assigned pieh and KyleAMathews and unassigned pieh Aug 24, 2018

@KyleAMathews

This comment has been minimized.

Copy link
Contributor

KyleAMathews commented Sep 8, 2018

Was talking to @sgrove today with @calcsam and he had a really interesting idea which could apply here. Basically it was how to estimate when you've sufficiently inferred a graphql type from data. He said you could assign a "novelty" score to each type you're inferring i.e. how novel do you expect each new item to be. You evaluate sample data item by item. Each time you "learn" something new e.g. a new field, the expected novelty score goes up. Whenever an item matches the existing inferred type, the expected novelty score drops. After evaluating enough new items and not learning anything new, you can quit.

This could speed up processing large data sets as we could pull out random samples and often times (especially on data that has representative data for each field on every object) we could stop the inference process quite a bit sooner than we do now.

@KyleAMathews

This comment has been minimized.

Copy link
Contributor

KyleAMathews commented Sep 28, 2018

@stefanprobst

This comment has been minimized.

Copy link
Contributor

stefanprobst commented Nov 29, 2018

Maybe as a first step we could simply expose graphql-compose's schemaComposer to setFieldsOnGraphQLNodeTypes, which would give access to all registered types and their corresponding findOne and findMany resolvers.

@freiksenet

This comment has been minimized.

Copy link
Contributor

freiksenet commented Nov 29, 2018

So we definetely shouldn't expose graphql-compose APIs directly to Gatsby users, because this way we lose control over those APIs. We can expose a wrapped function to get resolvers, though.

@KyleAMathews

This comment has been minimized.

Copy link
Contributor

KyleAMathews commented Jan 1, 2019

On externalizing schemas, my recent thinking is that each new field should always be added. So when a user adds a new markdown frontmatter field, we add it. But if the user removes that field, we leave the field in the schema. We only ask the user what it wants to do when there's a field conflict (when we should show the user how the field is used — we can be smart and say, it looks like all current usages of this field align with the new field type and say "you should probably upgrade the field".

@stefanprobst

This comment has been minimized.

Copy link
Contributor

stefanprobst commented Jan 2, 2019

Quick update on where I'm at with this:

  • it is now possible to merge inferred types with explicit type definitions
  • now correctly merges in third-party schemas. I had trouble getting this to work with graphql-tools, so I'm using a much simplified approach
  • added addResolvers API. I'd still love some input what to expose to resolver context here. Personally I'm not sure any more if we even have to pass on anything extra: we already have info.schema, so you can query with info.schema.getQueryType().getFields().allMarkdownRemark.resolve()

Apart from that some questions (more to follow):

  • do we want to keep SitePlugin in the schema? What would be a usecase for it?
  • do we want to infer strings as File type, even when gatsby-source-filesystem is not present to handle File types?
  • in File type: what should relativePath be relative to. I have found it confusing that it is relative to sourceInstance, so I've changed this to be interpreted as relative to the page or StaticQuery.
@stefanprobst

This comment has been minimized.

Copy link
Contributor

stefanprobst commented Jan 2, 2019

Also: much better at type reuse. In a default starter, from 592 types to 96.

@pieh

This comment has been minimized.

Copy link
Contributor

pieh commented Jan 2, 2019

Also: much better at type reuse. In a default starter, from 592 types to 96.

Yup, we create way too much input types currently - for each string we create separate type with eq, ne, regex etc operator fields

That's one easy win we can fix to make schema more readable

@freiksenet

This comment has been minimized.

Copy link
Contributor

freiksenet commented Jan 4, 2019

@stefanprobst We should default to keeping old functionality, the schema refactor should be 100% backwards compatible, so we should keep the types like SitePlugin that are currently in the schema.

added addResolvers API. I'd still love some input what to expose to resolver context here. Personally I'm not sure any more if we even have to pass on anything extra: we already have info.schema, so you can query with info.schema.getQueryType().getFields().allMarkdownRemark.resolve()

What I meant is that we should expose gatsby's "model layer", which is the resolvers we use ourselves to do things. That would be the way for people to reuse Gatsby functionality when they are rewriting Gatsby types.

@stefanprobst

This comment has been minimized.

Copy link
Contributor

stefanprobst commented Jan 4, 2019

100% backwards compatible

Oh, I didn't realize, I thought of this more as a v3.0 thing, so there would be an opportunity to change some things. It's easy enough to change back to current behavior -- I just modified the two things that confused me when I started out with Gatsby, namely what relativePath referred to, and that query results were by default wrapped in a pagination object (edges/nodes). I made the allTypeName endpoints return results directly, and added a pageTypeName field that returns { items, count, pageInfo }

What I meant is that we should expose gatsby's "model layer", which is the resolvers we use ourselves to do things.

Hmm, what you get with info.schema.getQueryType().getFields().allMarkdownRemark.resolve is exactly what is being used internally (namely findMany('MarkdownRemark'). Maybe I'm still misunderstanding?

@rexxars

This comment has been minimized.

Copy link
Member

rexxars commented Jan 5, 2019

Excellent work, @stefanprobst - I've been battling for a month trying to find a good way to declare fields and schema types without finding this issue. This looks like a dream come true.

What's the plan going forward - do you have a specific roadmap you're working towards? Is there anything other contributors could do to help?

@stefanprobst

This comment has been minimized.

Copy link
Contributor

stefanprobst commented Jan 6, 2019

@rexxars Thanks! I think there are two issues for this to move forward.

First, API: there hasn't been a whole lot of discussion if the proposed API makes sense and covers all usecases? To summarize:

  • addTypeDefs to register type definitions. Inferred types are merged in when possible, while explicit definitions trump inferred ones.
  • addResolvers to add custom field resolvers.
  • @link directive to declare foreign-key fields. This would replace the ___NODE convention and mappings in gatsby-config.
  • keep setFieldsOnGraphQLNodeType, but allow adding nested fields.

Second, implementation: all of the above should work, but there are other changes mixed in - some of which might be useful (like query operators for Dates), others are just my personal preference (like gettings rid of edges, or requiring query args to always be on a filter field). I'm motivated enough to bring this into a more mergeable state, but I don't know what the planning is in Gatsby HQ.

@rexxars

This comment has been minimized.

Copy link
Member

rexxars commented Jan 6, 2019

I don't work at Gatsby HQ and have not been involved in Gatsby for long, so I can't comment on whether it covers all usecases, but it certainly addresses all the issues I've been battling with (see #10856).

Couple of questions:

  • Does it handle unions? Both in fields, lists and as the target for foreign key fields?
  • If you specify a Date field, does it get the date formatting arguments one would get through inferring? This is one of the issues I've run into with any approach I've tried so far - inferring gets you the formatting arguments, explicitly defining the field misses them

As for the changes you've introduced, I wholeheartedly agree on them being a good change, but I don't have the background on why they are not modeled this way currently. I think the edges approach is a Relay-inspired thing, but I'm not sure if it serves any specific purpose within the Gatsby ecosystem - I don't think anyone is using Gatsby with Relay afterall.

@stefanprobst

This comment has been minimized.

Copy link
Contributor

stefanprobst commented Jan 6, 2019

Excellent questions!

Does it handle unions?

Unions are currently not supported at all (but Interfaces are). This has to do with a limitation in graphql-compose, which is the library used for schema construction. I'll take a look how much it would take to add this.

If you specify a Date field, does it get the date formatting arguments one would get through inferring?

The way date formatting is implemented is sort of the other way around, namely with a @dateformat directive, which lets you define field defaults, that can be overridden in the selection set:

type Foo {
  date: Date @dateformat(defaultFormat: "yyyy/MM/dd", defaultLocale: "en-GB")
}

and

query {
  foo {
    date(locale: "en-US")
  }
}

One advantage is that when constructing the InputObjectType from the ObjectType, the field still has the correct Date type, which is only converted to String type when the directive is processed. This is why you get Date query operators like $gt, $lt, and not the String operators.
What's missing is to add the field args to the inferred Date fields - I'll look into this next week.

@pieh

This comment has been minimized.

Copy link
Contributor

pieh commented Jan 6, 2019

100% backwards compatible

Oh, I didn't realize, I thought of this more as a v3.0 thing, so there would be an opportunity to change some things. It's easy enough to change back to current behavior.

For the most part this is because we are likely far away from 3.0 and if this is implemented in backward compatible way, we can roll it out during ^2.0. I'm not against potential changes in the future or even behind feature flags (i.e. shallow_connection_type or whatever), but those would need to be discussed and researched - single person preference is a bit anecdotical and is not enough to justify breaking change.

@freiksenet

This comment has been minimized.

Copy link
Contributor

freiksenet commented Jan 9, 2019

@stefanprobst We definitely do want to fix the issues with Gatsby schema. I'm currently responsible for this issue at Gatsby and I'm very interested in helping more with that. Maybe we should sync up on that more? We can chat eg in Discord (Gatsby Discord https://discord.gg/jUFVxtB) or we can set up a voice/video call.

Hmm, what you get with info.schema.getQueryType().getFields().allMarkdownRemark.resolve is exactly what is being used internally (namely findMany('MarkdownRemark'). Maybe I'm still misunderstanding?

So resolve functions depend on resolver type signature (parent, args, context, info). That means that resolvers that do the same thing, but get the data from different parts wouldn't be reusable. Eg there can be two resolvers that basically get same node by id, but one gets it from parent and other gets it from args. My idea is that we'll expose Gatsby "model layer", that will have the functions to eg operate on nodes. Those will be both used inside Gatsby resolvers and available for users in their custom resolvers. Plugins would be able to add more functions to model layer, so users can write custom resolvers with their functionality, eg for remark transformations. This is a pretty typical way to do it eg in GraphQL servers.

@freiksenet freiksenet self-assigned this Jan 9, 2019

@freiksenet freiksenet moved this from To do to In progress in OSS Roadmap Jan 9, 2019

@stefanprobst

This comment has been minimized.

Copy link
Contributor

stefanprobst commented Jan 10, 2019

@freiksenet Excellent, let's chat early next week! Monday? I'm in GMT+1 timezone.

there can be two resolvers that basically get same node by id, but one gets it from parent and other gets it from args

Ah, I get your point now. However: how consequential is this with how things work in Gatsby (at least currently)? Node resolvers always query with args, and don't use parent, no? In any case, I have now put link, findMany, findOne and findById on context.resolvers.

Plugins would be able to add more functions to model layer

Interesting!

@freiksenet

This comment has been minimized.

Copy link
Contributor

freiksenet commented Jan 10, 2019

@stefanprobst Most of Gatsby is having a company gathering at Barcelona next week, so I'm afraid I won't have much time :( I'm available tomorrow or then any day after next week. I'm in GMT+2, so timing shouldn't be a problem. Could you select a good time for you in my calendly? https://calendly.com/freiksenet/60min

@stefanprobst

This comment has been minimized.

Copy link
Contributor

stefanprobst commented Jan 10, 2019

@freiksenet Tomorrow won't work for me unfortunately, I have scheduled tue 22th in your calendar. Thanks!

@freiksenet

This comment has been minimized.

Copy link
Contributor

freiksenet commented Jan 11, 2019

@stefanprobst can you make a PR, mark it as WIP and tick "allow maintainers to edit"? Would make it much easier to work together on this.

@stefanprobst

This comment has been minimized.

Copy link
Contributor

stefanprobst commented Jan 11, 2019

Done. #10995

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment