1.0 new GraphQL data layer #420

Closed
KyleAMathews opened this Issue Sep 3, 2016 · 29 comments

Comments

Projects
10 participants
@KyleAMathews
Contributor

KyleAMathews commented Sep 3, 2016

Pull data into components instead of pushing

Data in Gatsby currently is pushed into templates to be rendered into HTML (like pretty much every static site generator). This is a simple pattern and works great for many use cases. But when you start working on more complex sites, you really start to miss the flexibility of building a database-driven site. With a database, all your data is available to query against in any fashion that you'd like. Whatever bits of data you need to assemble a page, you can pull in. You want to create author pages showing their bio & last 5 posts? It's just a query away. I want this same flexibility for Gatsby. I want to be able to query my markdown (or picture or data, etc) files and treat them as a database of sorts.

This is especially important for Gatsby as unlike traditional static-site-generators, all data used to build a page is loaded into the client. Currently Gatsby loads all data for the site into the client. This is both wasteful (your site doesn't use all that data) as well as costly. Time-to-interactivity is an important web performance metric. The larger your javascript bundle, the longer it takes to download and evaluate the Javascript. This is especially noticeable on low-end phones on poor networks.

With this change in Gatsby 1.0 both code and data will be split on a per-route basis. When a user visits a page, they will load just the javascript & data it needs and then lazy-load more once the first page is initialized.

Now a site can easily have "heavy" pages (in terms of data and/or code) without affecting other parts of the site. E.g. a search page or a page with data visualizations.

New GraphQL data layer

Gatsby uses Webpack right now for everything. Javascript, CSS, images, Markdown, JSON, YAML, etc. are all handled using Webpack's rather brilliant system of treating everything as JS modules.

Using Webpack has worked out really really well for Gatsby. It gives us a ton out of the box. A lovely hot-reloading development experience. Easy interoperability with all the latest and greatest web tools. And fast, optimized production builds. It's truly a swiss-army knife of tools.

But Webpack has some problems with data.

First it only understands files. If you want to integrate data from any other source e.g. external APIs you have to first convert that data into files.

Webpack can get weird if you try to reference files from outside of the webroot. I've been bitten by this several times as have others.

Another big problem is you can't use just some data from a file. What if you wanted to use data in your site from a 1 gigabyte CSV file? There's no way to get around loading the entire file unless again you first preprocess the file.

The last problem is data splitting. Ideally each route can load only the data it needs. But how? Often a route will want a bit of data from a number of files or other data sources. How can a route both easily specify what data it needs as well as tell Webpack to package that minimal data set together to be shipped to the browser to power the react component(s) for that route.

I've thought through a number of different possibilities (this issue explores one of those) but could never quite figure out how to make Webpack do what I wanted it to.

So eventually I concluded the simplest thing would be to split the data layer off and remove it from Webpack's control. Let Webpack do what it does best and build a data system tailor-made for Gatsby's needs.

I've been prototyping this new data layer the past few weeks with GraphQL and am really really pleased with how well it's working.

How it'll work

When you setup a site, you'll add one to many source plugins. These source plugins can be file-based e.g. a markdown source plugin which you point at a directory of markdown files or network-based e.g. for consuming an internal API or a 3rd-party API like Github.

Each source plugin defines types which get composed together to form a schema for your site.

This combined schema is consumed by GraphQL and made available to query against.

That's fairly straightforward. What was tricky though was figuring out how to integrate the new data layer with React components. The pattern which I eventually settled on for my initial prototype is pleasingly simple.

All routes are powered by React.js components. A route component can either power one path e.g. about.js or can power many paths e.g. for all blog posts blog-post.js. Route components need data. To get data, they can export a GraphQL query. This query is run during bootstrap and the result is written out as a JSON file which is inserted into the route component as props. During development the "query runner" watches both route components and source files for changes and re-runs queries overwriting the JSON files which then Webpack hot-reloads.

So a very minimal example. Say you have a blog and you want to create an index page listing your blog posts. In your /pages directory you'd create an index.js which would look something like:

import React from 'react'
import get from 'lodash/get'
import Link from 'react-router/lib/Link'

const BlogIndex = ({ data }) => {
  const blogPosts = get(data, 'allMarkdown.edges')
  const postList = blogPosts.map((post) => {
    return (
      <li>
        <Link
          to={post.node.path}
        >
          {post.node.frontmatter.title}
        </Link>
      </li>
    )
  })
  return (
    <div>
      <h1>Blog posts</h1>
      <ul>{postList}</ul>
    </div>
  )
}

export default BlogIndex

export const routeQuery = `
{
  allMarkdown {
    edges {
      node {
        path
        frontmatter {
          title
        }
      }
    }
  }
}
`

You can now think of the various content/data files you have as a "database" to query against however you want. E.g. to create a page listing tags you could export this query.

export const routeQuery = `
{
  allMarkdown {
    edges {
      node {
        frontmatter {
          tags
        }
      }
    }
  }
}
`

I created a page like this on my blog (which is running Gatsby-1.0-alpha1) https://www.bricolage.io/tags/

Stuff like pagination, tag pages, and other "meta" pages are now pretty straightforward.

Going with GraphQL also gives us access to fantastic tooling. Facebook uses GraphQL heavily and one of the most useful internal GraphQL tools they've released is Graph_i_QL. An IDE for GraphQL.

Here's a gif of me exploring my blog's GraphQL schema.

graphiql

I'm super duper excited about all the possibilities the new GraphQL layer opens up. Here's a sampling of some ideas I've had.

  • Use React Docgen to make
    PropType or Flow information from your React components queryable.
    Create a living styleguide.
  • Do something similar for other JS docs systems e.g. JSDocs. Imagine
    writing code documentation while the documentation hot-reloads your
    changes.
  • Programmable data. GraphQL fields can take arguments. Query for images
    and pass a width value as an argument and have the image source plugin
    resize the image on the fly. Pass a format string to a date field and
    get back a formatted date (no more loading moment.js into the client).
  • Connect to 3rd party APIs e.g. Github, Twitter, Facebook, etc.
  • Build sites using hosted CMSs e.g. Contentful, DatoCMS, or Prismic.
  • Validate data e.g. require that all Markdown files have a title
    field that's of a minimum length.
  • Connect data. GraphQL let's you easily connect types together e.g. the
    author field in the frontmatter of a markdown file can be connected to
    data from an authors.yaml file which let's you write queries like:
{
  markdown {
    frontmatter {
      author {
        firstName
        lastName
      }
    }
  }
}
  • Query Markdown AST for advanced use cases. E.g. custom footnote
    rendering.
  • Pulling data from legacy systems e.g. use a Wordpress source plugin
    and rebuild an old site on Gatsby while still maintaining content in
    Wordpress.
  • Extend source plugin schemas with custom fields for your site.
  • Add standard query operators to schema so you can easily sort, filter,
    search, glob, regex, groupBy, sum, etc. data.

With the coming source plugin architecture, getting data into your site will soon be straightforward. Identify the sources of data, compose source plugins, play in Graph_i_QL to create queries, drop queries in route components, write components.

@SachaG

This comment has been minimized.

Show comment
Hide comment
@SachaG

SachaG Sep 5, 2016

Contributor

Wow, this is super exciting! But just to clarify, these GraphQL queries would only be running at bootstrap time, right? Or would there also be a GraphQL client in the app bundle?

Contributor

SachaG commented Sep 5, 2016

Wow, this is super exciting! But just to clarify, these GraphQL queries would only be running at bootstrap time, right? Or would there also be a GraphQL client in the app bundle?

@KyleAMathews

This comment has been minimized.

Show comment
Hide comment
@KyleAMathews

KyleAMathews Sep 5, 2016

Contributor

@SachaG yup! The queries are run during bootstrap (or when the query or source data changes). The results of the queries are written out as JSON files. I'll be writing this up as part of describing idea for code/data splitting but basically you'll get a big directory of JSON files. There's then a child-routes file written out by Gatsby which for each route, requires its JSON file and calls the component using the result of the query as props.

With this system, to Webpack and your code, as far as they know it's just a normal webpack module so no need for special handling or a graphql client.

Contributor

KyleAMathews commented Sep 5, 2016

@SachaG yup! The queries are run during bootstrap (or when the query or source data changes). The results of the queries are written out as JSON files. I'll be writing this up as part of describing idea for code/data splitting but basically you'll get a big directory of JSON files. There's then a child-routes file written out by Gatsby which for each route, requires its JSON file and calls the component using the result of the query as props.

With this system, to Webpack and your code, as far as they know it's just a normal webpack module so no need for special handling or a graphql client.

@KyleAMathews KyleAMathews referenced this issue in syntax-tree/mdast Sep 5, 2016

Closed

Creating new node types from inline JSX #13

@SachaG

This comment has been minimized.

Show comment
Hide comment
@SachaG

SachaG Sep 5, 2016

Contributor

Makes sense, thanks for the details!

Contributor

SachaG commented Sep 5, 2016

Makes sense, thanks for the details!

@alizain

This comment has been minimized.

Show comment
Hide comment
@alizain

alizain Sep 14, 2016

This is really great stuff!

I was recently looking into gatsby for a couple projects, but decided to build my own solution because the data model wasn't flexible enough - essentially I needed exactly what you've described above - a filesystem as a database.

The project is called catalyst. It also uses React to render views. Since gatsby is a much more mature project, and the 1.0 roadmap seems to be really fantastic, I wanted to share the catalyst data model with you as feedback, and at minimum, as a source plugin for gatsby. The GraphQL stuff is also exactly where I was thinking of going as well; I'm currently in the process of separating the filesystem stuff into a separate project called fsdb, and implementing the GraphQL compatibility layer.

High level summary of fsdb

  • all files are nodes, with data and content
  • folders can be nodes too, with data and content
  • there's inheritance built in, for common properties amongst siblings
  • there's references built in, but with GraphQL, this shouldn't be necessary

Details

All files are nodes, with data and content

The line between data and content is really thin, especially when using fsdb to build websites for artists and other, non-blog projects. So with that in mind, the data model is not tied to markdown documents, and treats key-value stores as first-class citizens. When loaded and transformed into memory, fsdb combines the multiple declarations into one atomic data object. Content cannot be merged, only data properties.

Folders can be nodes too, with data and content

By either using an data/authors/index.yaml file or a sibling file with the exact same name, data/authors.yaml file, we can open up so many more possibilities for convenient data modelling.

There's inheritance built in, for common properties amongst siblings

By declaring a (configurable) data/authors/common.yaml file, all nodes inside the data/authors will inherit from the common.yaml file. Again, like the previous point, this is something that I needed for my own work, and found to be quite wonderful in building complex data structures using just the filesystem. Additionally, there's an option to set it up where you take advantage of the prototype chain to build inherited data-structures using the nested relationships of files and folders, so that all properties of a folder can be inherited by its children. Content cannot be inherited, only data properties.

There's references built in, but with GraphQL, this shouldn't be necessary

GraphQL is definitely the superior option here, but I had set it up so that the parser would look for a special string sequence and reference the required file to reduce duplication

Example

data/authors/muju.yaml

title: Muju

data/books/common.md

---
type: book
---

data/books/shasekishu/index.yaml

title: Shasekishū
author: "*/authors/muju"
published: 1283

data/books/shasekishu/common.yaml

type: koan

data/books/shasekishu/a-cup-of-tea.md

---
title: A Cup of Tea
---

Twenty monks and one nun, who was named Eshun, were practicing meditation with a certain Zen master.

Eshun was very pretty even though her head was shaved and her dress plain. Several monks secretly fell in love with her. One of them wrote her a love letter, insisting upon a private meeting.

Eshun did not reply. The following day the master gave a lecture to the group, and when it was over, Eshun arose. Addressing the one who had written to her, she said: "If you really love me so much, come and embrace me now."

In memory

{
  "authors": {
    slug: "authors",
    path: [],
    parent: undefined,
    children: {
      "muju": {
        slug: "muju",
        sources: [
          "data/authors/muju.yaml"
        ],
        path: [ "authors" ],
        data: {
          title: "Muju"
        },
        parent: { /* authors */ },
        children: {}
      }
    }
  },
  "books": {
    slug: "books",
    path: [],
    parent: undefined,
    children: {
      "shasekishu": {
        slug: "shasekishu",
        sources: [
          "data/books/common.md",
          "data/books/shasekishu/index.yaml"
        ],
        path: [ "books" ],
        data: {
          title: "Shasekishū",
          author: { /* authors.children.muju */ },
          type: "book",
          published: 1283
        },
        parent: { /* books */ },
        children: {
          "a-cup-of-tea": {
            slug: "a-cup-of-tea",
            sources: [
              "data/books/shasekishu/common.yaml",
              "data/books/shasekishu/a-cup-of-tea.md"
            ],
            path: [ "books", "shasekishu" ],
            data: {
              title: "A Cup of Tea",
              type: "koan"
            },
            contentRaw: "Twenty monks and one nun, who was named Eshun, were practicing meditation with a certain Zen master.\n\nEshun was very pretty even though her head was shaved and her dress plain. Several monks secretly fell in love with her. One of them wrote her a love letter, insisting upon a private meeting.\n\nEshun did not reply. The following day the master gave a lecture to the group, and when it was over, Eshun arose. Addressing the one who had written to her, she said: \"If you really love me so much, come and embrace me now.\"",
            contentFormat: "markdown",
            parent: { /* books.children.shasekishu */ },
            children: {}
          }
        }
      }
    }
  }
}

The data is outputted as a tree and as a flat hash like so:

{
    "authors": {},
    "authors/muju": {},
    "books": {},
    "books/shasekishu": {},
    "books/shasekishu/a-cup-of-tea": {}
}

Since we're always referencing objects, it's really easy to move around from one node to another.

Queries

// with parent prototypical inheritance and common data files enabled!
books.children["shasekishu"].children["a-cup-of-tea"].published === 1283
books.children["shasekishu"].children["a-cup-of-tea"].author.data.title === "Muju"

Thoughts

If you're interested, I'd love to get your, and the Gatsby.js community's thoughts on using this. I'd also be happy to finalize the GraphQL layer for use in the 1.0 release!

alizain commented Sep 14, 2016

This is really great stuff!

I was recently looking into gatsby for a couple projects, but decided to build my own solution because the data model wasn't flexible enough - essentially I needed exactly what you've described above - a filesystem as a database.

The project is called catalyst. It also uses React to render views. Since gatsby is a much more mature project, and the 1.0 roadmap seems to be really fantastic, I wanted to share the catalyst data model with you as feedback, and at minimum, as a source plugin for gatsby. The GraphQL stuff is also exactly where I was thinking of going as well; I'm currently in the process of separating the filesystem stuff into a separate project called fsdb, and implementing the GraphQL compatibility layer.

High level summary of fsdb

  • all files are nodes, with data and content
  • folders can be nodes too, with data and content
  • there's inheritance built in, for common properties amongst siblings
  • there's references built in, but with GraphQL, this shouldn't be necessary

Details

All files are nodes, with data and content

The line between data and content is really thin, especially when using fsdb to build websites for artists and other, non-blog projects. So with that in mind, the data model is not tied to markdown documents, and treats key-value stores as first-class citizens. When loaded and transformed into memory, fsdb combines the multiple declarations into one atomic data object. Content cannot be merged, only data properties.

Folders can be nodes too, with data and content

By either using an data/authors/index.yaml file or a sibling file with the exact same name, data/authors.yaml file, we can open up so many more possibilities for convenient data modelling.

There's inheritance built in, for common properties amongst siblings

By declaring a (configurable) data/authors/common.yaml file, all nodes inside the data/authors will inherit from the common.yaml file. Again, like the previous point, this is something that I needed for my own work, and found to be quite wonderful in building complex data structures using just the filesystem. Additionally, there's an option to set it up where you take advantage of the prototype chain to build inherited data-structures using the nested relationships of files and folders, so that all properties of a folder can be inherited by its children. Content cannot be inherited, only data properties.

There's references built in, but with GraphQL, this shouldn't be necessary

GraphQL is definitely the superior option here, but I had set it up so that the parser would look for a special string sequence and reference the required file to reduce duplication

Example

data/authors/muju.yaml

title: Muju

data/books/common.md

---
type: book
---

data/books/shasekishu/index.yaml

title: Shasekishū
author: "*/authors/muju"
published: 1283

data/books/shasekishu/common.yaml

type: koan

data/books/shasekishu/a-cup-of-tea.md

---
title: A Cup of Tea
---

Twenty monks and one nun, who was named Eshun, were practicing meditation with a certain Zen master.

Eshun was very pretty even though her head was shaved and her dress plain. Several monks secretly fell in love with her. One of them wrote her a love letter, insisting upon a private meeting.

Eshun did not reply. The following day the master gave a lecture to the group, and when it was over, Eshun arose. Addressing the one who had written to her, she said: "If you really love me so much, come and embrace me now."

In memory

{
  "authors": {
    slug: "authors",
    path: [],
    parent: undefined,
    children: {
      "muju": {
        slug: "muju",
        sources: [
          "data/authors/muju.yaml"
        ],
        path: [ "authors" ],
        data: {
          title: "Muju"
        },
        parent: { /* authors */ },
        children: {}
      }
    }
  },
  "books": {
    slug: "books",
    path: [],
    parent: undefined,
    children: {
      "shasekishu": {
        slug: "shasekishu",
        sources: [
          "data/books/common.md",
          "data/books/shasekishu/index.yaml"
        ],
        path: [ "books" ],
        data: {
          title: "Shasekishū",
          author: { /* authors.children.muju */ },
          type: "book",
          published: 1283
        },
        parent: { /* books */ },
        children: {
          "a-cup-of-tea": {
            slug: "a-cup-of-tea",
            sources: [
              "data/books/shasekishu/common.yaml",
              "data/books/shasekishu/a-cup-of-tea.md"
            ],
            path: [ "books", "shasekishu" ],
            data: {
              title: "A Cup of Tea",
              type: "koan"
            },
            contentRaw: "Twenty monks and one nun, who was named Eshun, were practicing meditation with a certain Zen master.\n\nEshun was very pretty even though her head was shaved and her dress plain. Several monks secretly fell in love with her. One of them wrote her a love letter, insisting upon a private meeting.\n\nEshun did not reply. The following day the master gave a lecture to the group, and when it was over, Eshun arose. Addressing the one who had written to her, she said: \"If you really love me so much, come and embrace me now.\"",
            contentFormat: "markdown",
            parent: { /* books.children.shasekishu */ },
            children: {}
          }
        }
      }
    }
  }
}

The data is outputted as a tree and as a flat hash like so:

{
    "authors": {},
    "authors/muju": {},
    "books": {},
    "books/shasekishu": {},
    "books/shasekishu/a-cup-of-tea": {}
}

Since we're always referencing objects, it's really easy to move around from one node to another.

Queries

// with parent prototypical inheritance and common data files enabled!
books.children["shasekishu"].children["a-cup-of-tea"].published === 1283
books.children["shasekishu"].children["a-cup-of-tea"].author.data.title === "Muju"

Thoughts

If you're interested, I'd love to get your, and the Gatsby.js community's thoughts on using this. I'd also be happy to finalize the GraphQL layer for use in the 1.0 release!

@KyleAMathews

This comment has been minimized.

Show comment
Hide comment
@KyleAMathews

KyleAMathews Sep 14, 2016

Contributor

@alizain oh very cool! Good to see we're thinking along the same lines.

My plan right now is that there'll be a very thin contract between source plugins and Gatsby. Basically the source plugin will give Gatsby GraphQL types to add to the schema and then Gatsby in turn will ask the source plugin to resolve queries as needed.

I'd been using Relay in a product so am building my source plugins with https://github.com/graphql/graphql-relay-js which has some handy helpers plus good ideas. But I'd really love to see other ideas explored and your idea of auto-linking folders and files w/ some conventions is really interesting and would lend itself nicely to GraphQL/Gatsby.

Basically this stuff is super duper brand new so yes, please explore and build a source plugin or three (once the plugin system is released — hopefully the next alpha) and we'll all learn together what works.

And also you could have multiple src plugins over the same data which would let you query the data in multiple ways depending on your use case.

Contributor

KyleAMathews commented Sep 14, 2016

@alizain oh very cool! Good to see we're thinking along the same lines.

My plan right now is that there'll be a very thin contract between source plugins and Gatsby. Basically the source plugin will give Gatsby GraphQL types to add to the schema and then Gatsby in turn will ask the source plugin to resolve queries as needed.

I'd been using Relay in a product so am building my source plugins with https://github.com/graphql/graphql-relay-js which has some handy helpers plus good ideas. But I'd really love to see other ideas explored and your idea of auto-linking folders and files w/ some conventions is really interesting and would lend itself nicely to GraphQL/Gatsby.

Basically this stuff is super duper brand new so yes, please explore and build a source plugin or three (once the plugin system is released — hopefully the next alpha) and we'll all learn together what works.

And also you could have multiple src plugins over the same data which would let you query the data in multiple ways depending on your use case.

@alizain

This comment has been minimized.

Show comment
Hide comment
@alizain

alizain Sep 14, 2016

This sounds great, let me know if I can help 😄

alizain commented Sep 14, 2016

This sounds great, let me know if I can help 😄

@KyleAMathews

This comment has been minimized.

Show comment
Hide comment
@KyleAMathews

KyleAMathews Sep 14, 2016

Contributor

Awesome! You'll be super helpful as we work out the APIs needed for the data layer. I'll post here once the plugin system plus a handful of source plugins are released so you (and others) can test and try building your own. Super excited to see all the directions this can go.

Contributor

KyleAMathews commented Sep 14, 2016

Awesome! You'll be super helpful as we work out the APIs needed for the data layer. I'll post here once the plugin system plus a handful of source plugins are released so you (and others) can test and try building your own. Super excited to see all the directions this can go.

@KyleAMathews

This comment has been minimized.

Show comment
Hide comment
@KyleAMathews

KyleAMathews Nov 9, 2016

Contributor

Some updates.

A basic version of GraphQL data layer has been implemented and I'm feeling really happy with it. GraphQL makes it very easy to specify in each component the exact component's data requirements. This ensures that we're shipping only the bits to the browser that are necessary. On my blog for example, the vast majority of page data bundles are < 5kb.

A few things that are in progress.

Data transformation expressed through GraphQL is something I'm really excited about. I spoke on this recently at the GraphQLSummit (video here: https://www.youtube.com/watch?v=y588qNiCZZo, slides here: https://graphql-gatsby-slides.netlify.com).

I built a simple image gallery using Gatsby 1.0 and some experimental image manipulation graphql types (not shipped yet). You can see the code here: https://github.com/gatsbyjs/gatsby/tree/14b0320379dee196a182ce8f6d3db5087fb419b2/examples/image-gallery

Really happy with how expressive it is. The index page of the gallery including the react component and graphql query is all of 54 lines of code. This is what the query looks like:

export const pageQuery = `
query allImages {
  allImages {
    edges {
      node {
        path
        regular: image(height: 290, width: 387) {
          src
          height
          width
        }
        retina: image(height: 580, width: 794) {
          src
        }
      }
    }
  }
}
`

What's really fun is that queries hot reload so you can modify the image sizes for example and see changes almost immediately.

I'm also R&Ding the best way to dynamically build a GraphQL schema from files. It'd be a very poor user experience if every Gatsby user had to manually create their own GraphQL schema. I've always been very impressed when I use Elasticsearch as you can just send data to them and they generally do a very good job of inferring your data types for you so the db immediately feels useful. I'd like that same experience with Gatsby & GraphQL. You point Gatsby at a bunch of files and you should be shocked by how much Gatsby knows already about your content.

But at the same time — similar to Elasticsearch — users should retain full ability to control the schema as they wish.

What I've been stuck on for the past while is deciding on the data structure to represent files and the various ways they can be parsed and extended e.g. a markdown file has various file-level attributes then the file is parsed into markdown which has various parts including the frontmatter which is parsed into a json object then one of those fields could point to a file which happens to be an image.

The data structure would need to represent this while allowing Gatsby plugins to modify and extend the data structure in arbitrary ways while also supporting being able to eventually convert the structure into a GraphQL schema.

After going back and forth on a number of ways of representing this, it occurred to me that what I was doing was very similar to compiler.

Take a compiler/transpiler like Babel. Babel takes a javascript file, parses it into an Abstract Syntax Tree (AST), allows plugins to modify the tree in various ways, and finally generates the final resulting JS file.

We could do the same thing for Gatsby and GraphQL. We "parse" the files for a site into an AST, allow plugins to extend or modify the tree, and then finally use this to generate the GraphQL schema.

A bit odd perhaps but I think it'll work :-)

There's an excellent generic library that I think will work for this https://github.com/wooorm/unist

It's the basis for the excellent Markdown parser http://remark.js.org/

I'll be building a prototype on this idea next week so more then.

Contributor

KyleAMathews commented Nov 9, 2016

Some updates.

A basic version of GraphQL data layer has been implemented and I'm feeling really happy with it. GraphQL makes it very easy to specify in each component the exact component's data requirements. This ensures that we're shipping only the bits to the browser that are necessary. On my blog for example, the vast majority of page data bundles are < 5kb.

A few things that are in progress.

Data transformation expressed through GraphQL is something I'm really excited about. I spoke on this recently at the GraphQLSummit (video here: https://www.youtube.com/watch?v=y588qNiCZZo, slides here: https://graphql-gatsby-slides.netlify.com).

I built a simple image gallery using Gatsby 1.0 and some experimental image manipulation graphql types (not shipped yet). You can see the code here: https://github.com/gatsbyjs/gatsby/tree/14b0320379dee196a182ce8f6d3db5087fb419b2/examples/image-gallery

Really happy with how expressive it is. The index page of the gallery including the react component and graphql query is all of 54 lines of code. This is what the query looks like:

export const pageQuery = `
query allImages {
  allImages {
    edges {
      node {
        path
        regular: image(height: 290, width: 387) {
          src
          height
          width
        }
        retina: image(height: 580, width: 794) {
          src
        }
      }
    }
  }
}
`

What's really fun is that queries hot reload so you can modify the image sizes for example and see changes almost immediately.

I'm also R&Ding the best way to dynamically build a GraphQL schema from files. It'd be a very poor user experience if every Gatsby user had to manually create their own GraphQL schema. I've always been very impressed when I use Elasticsearch as you can just send data to them and they generally do a very good job of inferring your data types for you so the db immediately feels useful. I'd like that same experience with Gatsby & GraphQL. You point Gatsby at a bunch of files and you should be shocked by how much Gatsby knows already about your content.

But at the same time — similar to Elasticsearch — users should retain full ability to control the schema as they wish.

What I've been stuck on for the past while is deciding on the data structure to represent files and the various ways they can be parsed and extended e.g. a markdown file has various file-level attributes then the file is parsed into markdown which has various parts including the frontmatter which is parsed into a json object then one of those fields could point to a file which happens to be an image.

The data structure would need to represent this while allowing Gatsby plugins to modify and extend the data structure in arbitrary ways while also supporting being able to eventually convert the structure into a GraphQL schema.

After going back and forth on a number of ways of representing this, it occurred to me that what I was doing was very similar to compiler.

Take a compiler/transpiler like Babel. Babel takes a javascript file, parses it into an Abstract Syntax Tree (AST), allows plugins to modify the tree in various ways, and finally generates the final resulting JS file.

We could do the same thing for Gatsby and GraphQL. We "parse" the files for a site into an AST, allow plugins to extend or modify the tree, and then finally use this to generate the GraphQL schema.

A bit odd perhaps but I think it'll work :-)

There's an excellent generic library that I think will work for this https://github.com/wooorm/unist

It's the basis for the excellent Markdown parser http://remark.js.org/

I'll be building a prototype on this idea next week so more then.

@SachaG

This comment has been minimized.

Show comment
Hide comment
@SachaG

SachaG Nov 10, 2016

Contributor

Exciting stuff! I'm coincidentally also working a lot with GraphQL these days (porting http://telescopeapp.org to Apollo), it's nice to see two of my favorite open-source projects converge :)

Contributor

SachaG commented Nov 10, 2016

Exciting stuff! I'm coincidentally also working a lot with GraphQL these days (porting http://telescopeapp.org to Apollo), it's nice to see two of my favorite open-source projects converge :)

@wooorm

This comment has been minimized.

Show comment
Hide comment
@wooorm

wooorm Nov 10, 2016

Cool!


So some background on the things I’m doing. I’m doing it lot’s of little projects so you can pick and choose what you do or don’t want.

unist is the “node” format, describing that objects have a type set to a descriptive string; possibly children, with a list of child nodes; or a value, with string content.

mdast, hast, nlcst are “namespaces” of unist, respectively for markdown, HTML, and natural language.

vfile is a very small virtual file format, focussing on storing messages (linting is a big part of the ecosystem). vfile’s can be used for binary data too.

unified is a middleware stack for processing (parse/transform/compile) syntax trees through plugins. There’s parse plugins (read markdown to syntax tree), transform plugins (add a table of contents), and stringify plugins (write markdown to man pages).

remark, rehype, retext are unified processors which come with a parser/compiler plugin packaged.

The ecosystem consists of utilities and plugins. The former works with unist/mdast/hast/nlcst nodes, are prefixed with -util-, e.g., unist-util- and hast-util-. No need to use unified/remark/rehype/retext with them.

The plugins, prefixed with their processor name, often do bigger things: remark-, retext-.

The essence, or the future, kinda looks like “Gulp for syntax tree transformations”:

var unified = require('unified');
var markdown = require('remark-parse');
var toc = require('remark-toc');
var remark2rehype = require('remark-rehype');
var document = require('rehype-document');
var minify = require('rehype-preset-minify');
var html = require('rehype-stringify');

process.stdin
  .pipe(unified())
  .use(markdown)
  .use(toc)
  .use(remark2rehype)
  .use(document)
  .use(minify)
  .use(html)
  .pipe(process.stdout);

In the example above we take stdin, read it as markdown, add a table of contents, transform it to an HTML syntax tree, wrap it in a valid document (doctype, etc.), minify, compile as HTML, and write to stdout.


Disclaimer: I have no experience with GraphQL.

I’m wondering, what languages do you have in mind to connect to Gatsby? How would binary files work?
Where does compilation to a string happen? On the client? Server? What gets shipped over the wire? The syntax tree? Where do plugins come in? What do plugins do?

👋

wooorm commented Nov 10, 2016

Cool!


So some background on the things I’m doing. I’m doing it lot’s of little projects so you can pick and choose what you do or don’t want.

unist is the “node” format, describing that objects have a type set to a descriptive string; possibly children, with a list of child nodes; or a value, with string content.

mdast, hast, nlcst are “namespaces” of unist, respectively for markdown, HTML, and natural language.

vfile is a very small virtual file format, focussing on storing messages (linting is a big part of the ecosystem). vfile’s can be used for binary data too.

unified is a middleware stack for processing (parse/transform/compile) syntax trees through plugins. There’s parse plugins (read markdown to syntax tree), transform plugins (add a table of contents), and stringify plugins (write markdown to man pages).

remark, rehype, retext are unified processors which come with a parser/compiler plugin packaged.

The ecosystem consists of utilities and plugins. The former works with unist/mdast/hast/nlcst nodes, are prefixed with -util-, e.g., unist-util- and hast-util-. No need to use unified/remark/rehype/retext with them.

The plugins, prefixed with their processor name, often do bigger things: remark-, retext-.

The essence, or the future, kinda looks like “Gulp for syntax tree transformations”:

var unified = require('unified');
var markdown = require('remark-parse');
var toc = require('remark-toc');
var remark2rehype = require('remark-rehype');
var document = require('rehype-document');
var minify = require('rehype-preset-minify');
var html = require('rehype-stringify');

process.stdin
  .pipe(unified())
  .use(markdown)
  .use(toc)
  .use(remark2rehype)
  .use(document)
  .use(minify)
  .use(html)
  .pipe(process.stdout);

In the example above we take stdin, read it as markdown, add a table of contents, transform it to an HTML syntax tree, wrap it in a valid document (doctype, etc.), minify, compile as HTML, and write to stdout.


Disclaimer: I have no experience with GraphQL.

I’m wondering, what languages do you have in mind to connect to Gatsby? How would binary files work?
Where does compilation to a string happen? On the client? Server? What gets shipped over the wire? The syntax tree? Where do plugins come in? What do plugins do?

👋

@KyleAMathews

This comment has been minimized.

Show comment
Hide comment
@KyleAMathews

KyleAMathews Nov 14, 2016

Contributor

Thanks for the tour! I didn't know about all these things so very helpful to get the big picture view.

So what I'm proposing using Unist for with Gatsby is a bit different.

Instead of parsing a "file" from one format to another e.g. Markdown to HTML Gatsby will parse "file directories" and compile them to a GraphQL schema.

The focus will be on the file metadata e.g. that a file is a markdown file is the important point not what's in the file because this means we should add to our GraphQL schema support for querying for markdown.

The intention of the parsing phase is to explore the latent possibilities within the files. Parsing plugins can add support for Markdown, Asciidoctor, images, PDFs, CSVs, YAML, etc. These "possibilities", now expressed within Unist, will then be compiled to a GraphQL schema against which someone can write queries against to actually perform various file transformations, etc.

For example, a markdown file could be discovered to have frontmatter which is transformed into a JSON structure which one of its fields is discovered to point to another file, an image. Once this is compiled to a GraphQL schema, you could write a query against the schema to get a url to the image which has been transformed to 1000px wide.

{
  markdown(filePath: "path/to/markdown/file") {
    frontmatter {
      coverImage { # this is a frontmatter field that links to file, in this case an image that's intended as a cover image.
        image(width: 1000) {
          src
        }
      }
    }
  }
}

All this would be discovered automatically during the parse step without any needed intervention by the user.

Why I think Unist is a perfect fit is a) the tree data structure of connected nodes fits nicely and b) the Unist utilities will really simplify compiling the AST into a GraphQL schema e.g. to create a GraphQL type that let's you query against only markdown will be trivial with https://github.com/eush77/unist-util-select

Make sense?

Contributor

KyleAMathews commented Nov 14, 2016

Thanks for the tour! I didn't know about all these things so very helpful to get the big picture view.

So what I'm proposing using Unist for with Gatsby is a bit different.

Instead of parsing a "file" from one format to another e.g. Markdown to HTML Gatsby will parse "file directories" and compile them to a GraphQL schema.

The focus will be on the file metadata e.g. that a file is a markdown file is the important point not what's in the file because this means we should add to our GraphQL schema support for querying for markdown.

The intention of the parsing phase is to explore the latent possibilities within the files. Parsing plugins can add support for Markdown, Asciidoctor, images, PDFs, CSVs, YAML, etc. These "possibilities", now expressed within Unist, will then be compiled to a GraphQL schema against which someone can write queries against to actually perform various file transformations, etc.

For example, a markdown file could be discovered to have frontmatter which is transformed into a JSON structure which one of its fields is discovered to point to another file, an image. Once this is compiled to a GraphQL schema, you could write a query against the schema to get a url to the image which has been transformed to 1000px wide.

{
  markdown(filePath: "path/to/markdown/file") {
    frontmatter {
      coverImage { # this is a frontmatter field that links to file, in this case an image that's intended as a cover image.
        image(width: 1000) {
          src
        }
      }
    }
  }
}

All this would be discovered automatically during the parse step without any needed intervention by the user.

Why I think Unist is a perfect fit is a) the tree data structure of connected nodes fits nicely and b) the Unist utilities will really simplify compiling the AST into a GraphQL schema e.g. to create a GraphQL type that let's you query against only markdown will be trivial with https://github.com/eush77/unist-util-select

Make sense?

@wooorm

This comment has been minimized.

Show comment
Hide comment
@wooorm

wooorm Nov 16, 2016

Very cool! I like having Unist used this new way. Let me know if I can help answer questions or provide more background!

wooorm commented Nov 16, 2016

Very cool! I like having Unist used this new way. Let me know if I can help answer questions or provide more background!

@KyleAMathews

This comment has been minimized.

Show comment
Hide comment
@KyleAMathews

KyleAMathews Nov 18, 2016

Contributor

@wooorm cool! My initial prototyping is looking very promising :-) will definitely have some questions about the right way to do things. Thanks!

Contributor

KyleAMathews commented Nov 18, 2016

@wooorm cool! My initial prototyping is looking very promising :-) will definitely have some questions about the right way to do things. Thanks!

@carlsverre

This comment has been minimized.

Show comment
Hide comment
@carlsverre

carlsverre Dec 14, 2016

@KyleAMathews when we talked the other day it sounded like this was ready to go... Whats the ETA on this project?

@KyleAMathews when we talked the other day it sounded like this was ready to go... Whats the ETA on this project?

@KyleAMathews

This comment has been minimized.

Show comment
Hide comment
@KyleAMathews

KyleAMathews Dec 16, 2016

Contributor

@carlsverre most of the graphql stuff described above should come out in the next alpha whenever that is. Have several client projects I'm developing with it.

Contributor

KyleAMathews commented Dec 16, 2016

@carlsverre most of the graphql stuff described above should come out in the next alpha whenever that is. Have several client projects I'm developing with it.

@KyleAMathews

This comment has been minimized.

Show comment
Hide comment
@KyleAMathews

KyleAMathews Dec 16, 2016

Contributor

Polished 1.0 + docs should be coming out in Jan/Feb.

Contributor

KyleAMathews commented Dec 16, 2016

Polished 1.0 + docs should be coming out in Jan/Feb.

@carlsverre

This comment has been minimized.

Show comment
Hide comment
@carlsverre

carlsverre Dec 16, 2016

Thanks for the update - how stable is your latest alpha? We were hoping to start working with it over the next 2 weeks

Thanks for the update - how stable is your latest alpha? We were hoping to start working with it over the next 2 weeks

@KyleAMathews

This comment has been minimized.

Show comment
Hide comment
@KyleAMathews

KyleAMathews Dec 16, 2016

Contributor

Alphas should be considered experimental. So not stable. The plugin system hasn't landed yet which will change how much of the core Gatsby code is arranged hence how your site is structured. Also everything is undocumented so I wouldn't use them yet unless you feel like reading a lot of code.

Contributor

KyleAMathews commented Dec 16, 2016

Alphas should be considered experimental. So not stable. The plugin system hasn't landed yet which will change how much of the core Gatsby code is arranged hence how your site is structured. Also everything is undocumented so I wouldn't use them yet unless you feel like reading a lot of code.

@intermundos

This comment has been minimized.

Show comment
Hide comment
@intermundos

intermundos Dec 21, 2016

Thank you for your efforts Kyle. Gatsby looks very interesting. I have a question, the answer for which I didn't find or perhaps missed. Is it possible, after build to update pages contents? Say I create admin page and manage all the site's content, forcing other pages that were updated to rebuild? A GatsbyJs CMS of kind...

Thanks in advance.

Thank you for your efforts Kyle. Gatsby looks very interesting. I have a question, the answer for which I didn't find or perhaps missed. Is it possible, after build to update pages contents? Say I create admin page and manage all the site's content, forcing other pages that were updated to rebuild? A GatsbyJs CMS of kind...

Thanks in advance.

@KyleAMathews

This comment has been minimized.

Show comment
Hide comment
@KyleAMathews

KyleAMathews Dec 21, 2016

Contributor

@intermundos great question! But it deserves its own issue — could you click the green new issues button at top and post your question there?

Contributor

KyleAMathews commented Dec 21, 2016

@intermundos great question! But it deserves its own issue — could you click the green new issues button at top and post your question there?

@vning93

This comment has been minimized.

Show comment
Hide comment
@vning93

vning93 Jan 7, 2017

Contributor

Hi @KyleAMathews love the work here marrying Gatsby with GraphQL! Do you know when/if there will be support to make GraphQL requests with something like Apollo to an external GraphQL API to fetch data?

Contributor

vning93 commented Jan 7, 2017

Hi @KyleAMathews love the work here marrying Gatsby with GraphQL! Do you know when/if there will be support to make GraphQL requests with something like Apollo to an external GraphQL API to fetch data?

@KyleAMathews

This comment has been minimized.

Show comment
Hide comment
@KyleAMathews

KyleAMathews Jan 7, 2017

Contributor
Contributor

KyleAMathews commented Jan 7, 2017

@vning93

This comment has been minimized.

Show comment
Hide comment
@vning93

vning93 Jan 7, 2017

Contributor

Ideally dynamically fetched. I'm trying to build a site with a couple forms, so being able to run mutations and re-render components dynamically would be awesome!

Contributor

vning93 commented Jan 7, 2017

Ideally dynamically fetched. I'm trying to build a site with a couple forms, so being able to run mutations and re-render components dynamically would be awesome!

@KyleAMathews

This comment has been minimized.

Show comment
Hide comment
@KyleAMathews

KyleAMathews Jan 7, 2017

Contributor
Contributor

KyleAMathews commented Jan 7, 2017

@vning93

This comment has been minimized.

Show comment
Hide comment
@vning93

vning93 Jan 7, 2017

Contributor

Got it, that makes a lot of sense! Thanks 👍

Contributor

vning93 commented Jan 7, 2017

Got it, that makes a lot of sense! Thanks 👍

@plandem plandem referenced this issue in andreypopp/sitegen Mar 17, 2017

Closed

so...is project dead?! #22

@aje4u2i

This comment has been minimized.

Show comment
Hide comment
@aje4u2i

aje4u2i May 5, 2017

Hi kayle,
Could you please brief about how you integrated graphql in gatsby. i mean the step by step procedure.
when i am running the gatsbygram app in my local environment i am stuck with the following error:

calling api handler in D:/alpha13 for api createPages
[ { GraphQLError: Cannot query field "allPosts" on type "RootQueryType".
at Object.Field (D:\alpha13\node_modules\gatsby\node_modules\graphql\validation\rules\FieldsOnCorrectType.js:66:31)
at Object.enter (D:\alpha13\node_modules\gatsby\node_modules\graphql\language\visitor.js:296:29)
at Object.enter (D:\alpha13\node_modules\gatsby\node_modules\graphql\language\visitor.js:338:25)
at visit (D:\alpha13\node_modules\gatsby\node_modules\graphql\language\visitor.js:228:26)
at visitUsingRules (D:\alpha13\node_modules\gatsby\node_modules\graphql\validation\validate.js:75:22)
at validate (D:\alpha13\node_modules\gatsby\node_modules\graphql\validation\validate.js:60:10)
at Promise.then.error.errors (D:\alpha13\node_modules\gatsby\node_modules\graphql\graphql.js:54:51)
at graphql (D:\alpha13\node_modules\gatsby\node_modules\graphql\graphql.js:51:10)
at graphqlRunner (D:\alpha13\node_modules\gatsby\dist\bootstrap\index.js:364:43)
at Promise (D:\alpha13\gatsby-node.js:27:5)
at Promise._execute (D:\alpha13\node_modules\bluebird\js\release\debuggability.js:300:9)
at Promise._resolveFromExecutor (D:\alpha13\node_modules\bluebird\js\release\promise.js:483:18)
at new Promise (D:\alpha13\node_modules\bluebird\js\release\promise.js:79:10)
at Object.exports.createPages (D:\alpha13\gatsby-node.js:16:10)
at runAPI (D:\alpha13\node_modules\gatsby\dist\utils\api-runner-node.js:94:33)
at D:\alpha13\node_modules\gatsby\dist\utils\api-runner-node.js:136:33
message: 'Cannot query field "allPosts" on type "RootQueryType".',
locations: [ [Object] ],
path: undefined } ]
UNHANDLED REJECTION TypeError: Cannot read property 'allPosts' of undefined
at graphql.then.result (D:\alpha13\gatsby-node.js:51:25)
at process._tickCallback (internal/process/next_tick.js:103:7)

aje4u2i commented May 5, 2017

Hi kayle,
Could you please brief about how you integrated graphql in gatsby. i mean the step by step procedure.
when i am running the gatsbygram app in my local environment i am stuck with the following error:

calling api handler in D:/alpha13 for api createPages
[ { GraphQLError: Cannot query field "allPosts" on type "RootQueryType".
at Object.Field (D:\alpha13\node_modules\gatsby\node_modules\graphql\validation\rules\FieldsOnCorrectType.js:66:31)
at Object.enter (D:\alpha13\node_modules\gatsby\node_modules\graphql\language\visitor.js:296:29)
at Object.enter (D:\alpha13\node_modules\gatsby\node_modules\graphql\language\visitor.js:338:25)
at visit (D:\alpha13\node_modules\gatsby\node_modules\graphql\language\visitor.js:228:26)
at visitUsingRules (D:\alpha13\node_modules\gatsby\node_modules\graphql\validation\validate.js:75:22)
at validate (D:\alpha13\node_modules\gatsby\node_modules\graphql\validation\validate.js:60:10)
at Promise.then.error.errors (D:\alpha13\node_modules\gatsby\node_modules\graphql\graphql.js:54:51)
at graphql (D:\alpha13\node_modules\gatsby\node_modules\graphql\graphql.js:51:10)
at graphqlRunner (D:\alpha13\node_modules\gatsby\dist\bootstrap\index.js:364:43)
at Promise (D:\alpha13\gatsby-node.js:27:5)
at Promise._execute (D:\alpha13\node_modules\bluebird\js\release\debuggability.js:300:9)
at Promise._resolveFromExecutor (D:\alpha13\node_modules\bluebird\js\release\promise.js:483:18)
at new Promise (D:\alpha13\node_modules\bluebird\js\release\promise.js:79:10)
at Object.exports.createPages (D:\alpha13\gatsby-node.js:16:10)
at runAPI (D:\alpha13\node_modules\gatsby\dist\utils\api-runner-node.js:94:33)
at D:\alpha13\node_modules\gatsby\dist\utils\api-runner-node.js:136:33
message: 'Cannot query field "allPosts" on type "RootQueryType".',
locations: [ [Object] ],
path: undefined } ]
UNHANDLED REJECTION TypeError: Cannot read property 'allPosts' of undefined
at graphql.then.result (D:\alpha13\gatsby-node.js:51:25)
at process._tickCallback (internal/process/next_tick.js:103:7)

@alexbassy

This comment has been minimized.

Show comment
Hide comment
@alexbassy

alexbassy Aug 5, 2017

Contributor

Is it possible to run a standalone graphql server? So as to have a static site with a search box. I had a look at the develop script but it gets a bit hazy in the bootstrap 😛

Contributor

alexbassy commented Aug 5, 2017

Is it possible to run a standalone graphql server? So as to have a static site with a search box. I had a look at the develop script but it gets a bit hazy in the bootstrap 😛

@thundernixon thundernixon referenced this issue in thundernixon/blog2017 Aug 11, 2017

Closed

Fix TypeErrors #1

@PabloLeon PabloLeon referenced this issue in PabloLeon/model-driven-journalism Aug 16, 2017

Closed

JSON specs for articles #12

@KyleAMathews

This comment has been minimized.

Show comment
Hide comment
@KyleAMathews

KyleAMathews Nov 29, 2017

Contributor

Shipped in v1!

Contributor

KyleAMathews commented Nov 29, 2017

Shipped in v1!

@MaralS

This comment has been minimized.

Show comment
Hide comment
@MaralS

MaralS Jan 15, 2018

@KyleAMathews I have a question about the date format in graphQL, for my blog i need to put the months in french. How can I change this parameter ?

MaralS commented Jan 15, 2018

@KyleAMathews I have a question about the date format in graphQL, for my blog i need to put the months in french. How can I change this parameter ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment