Skip to content

1.0 new GraphQL data layer #420

Closed
Closed
@KyleAMathews

Description

@KyleAMathews

Pull data into components instead of pushing

Data in Gatsby currently is pushed into templates to be rendered into HTML (like pretty much every static site generator). This is a simple pattern and works great for many use cases. But when you start working on more complex sites, you really start to miss the flexibility of building a database-driven site. With a database, all your data is available to query against in any fashion that you'd like. Whatever bits of data you need to assemble a page, you can pull in. You want to create author pages showing their bio & last 5 posts? It's just a query away. I want this same flexibility for Gatsby. I want to be able to query my markdown (or picture or data, etc) files and treat them as a database of sorts.

This is especially important for Gatsby as unlike traditional static-site-generators, all data used to build a page is loaded into the client. Currently Gatsby loads all data for the site into the client. This is both wasteful (your site doesn't use all that data) as well as costly. Time-to-interactivity is an important web performance metric. The larger your javascript bundle, the longer it takes to download and evaluate the Javascript. This is especially noticeable on low-end phones on poor networks.

With this change in Gatsby 1.0 both code and data will be split on a per-route basis. When a user visits a page, they will load just the javascript & data it needs and then lazy-load more once the first page is initialized.

Now a site can easily have "heavy" pages (in terms of data and/or code) without affecting other parts of the site. E.g. a search page or a page with data visualizations.

New GraphQL data layer

Gatsby uses Webpack right now for everything. Javascript, CSS, images, Markdown, JSON, YAML, etc. are all handled using Webpack's rather brilliant system of treating everything as JS modules.

Using Webpack has worked out really really well for Gatsby. It gives us a ton out of the box. A lovely hot-reloading development experience. Easy interoperability with all the latest and greatest web tools. And fast, optimized production builds. It's truly a swiss-army knife of tools.

But Webpack has some problems with data.

First it only understands files. If you want to integrate data from any other source e.g. external APIs you have to first convert that data into files.

Webpack can get weird if you try to reference files from outside of the webroot. I've been bitten by this several times as have others.

Another big problem is you can't use just some data from a file. What if you wanted to use data in your site from a 1 gigabyte CSV file? There's no way to get around loading the entire file unless again you first preprocess the file.

The last problem is data splitting. Ideally each route can load only the data it needs. But how? Often a route will want a bit of data from a number of files or other data sources. How can a route both easily specify what data it needs as well as tell Webpack to package that minimal data set together to be shipped to the browser to power the react component(s) for that route.

I've thought through a number of different possibilities (this issue explores one of those) but could never quite figure out how to make Webpack do what I wanted it to.

So eventually I concluded the simplest thing would be to split the data layer off and remove it from Webpack's control. Let Webpack do what it does best and build a data system tailor-made for Gatsby's needs.

I've been prototyping this new data layer the past few weeks with GraphQL and am really really pleased with how well it's working.

How it'll work

When you setup a site, you'll add one to many source plugins. These source plugins can be file-based e.g. a markdown source plugin which you point at a directory of markdown files or network-based e.g. for consuming an internal API or a 3rd-party API like Github.

Each source plugin defines types which get composed together to form a schema for your site.

This combined schema is consumed by GraphQL and made available to query against.

That's fairly straightforward. What was tricky though was figuring out how to integrate the new data layer with React components. The pattern which I eventually settled on for my initial prototype is pleasingly simple.

All routes are powered by React.js components. A route component can either power one path e.g. about.js or can power many paths e.g. for all blog posts blog-post.js. Route components need data. To get data, they can export a GraphQL query. This query is run during bootstrap and the result is written out as a JSON file which is inserted into the route component as props. During development the "query runner" watches both route components and source files for changes and re-runs queries overwriting the JSON files which then Webpack hot-reloads.

So a very minimal example. Say you have a blog and you want to create an index page listing your blog posts. In your /pages directory you'd create an index.js which would look something like:

import React from 'react'
import get from 'lodash/get'
import Link from 'react-router/lib/Link'

const BlogIndex = ({ data }) => {
  const blogPosts = get(data, 'allMarkdown.edges')
  const postList = blogPosts.map((post) => {
    return (
      <li>
        <Link
          to={post.node.path}
        >
          {post.node.frontmatter.title}
        </Link>
      </li>
    )
  })
  return (
    <div>
      <h1>Blog posts</h1>
      <ul>{postList}</ul>
    </div>
  )
}

export default BlogIndex

export const routeQuery = `
{
  allMarkdown {
    edges {
      node {
        path
        frontmatter {
          title
        }
      }
    }
  }
}
`

You can now think of the various content/data files you have as a "database" to query against however you want. E.g. to create a page listing tags you could export this query.

export const routeQuery = `
{
  allMarkdown {
    edges {
      node {
        frontmatter {
          tags
        }
      }
    }
  }
}
`

I created a page like this on my blog (which is running Gatsby-1.0-alpha1) https://www.bricolage.io/tags/

Stuff like pagination, tag pages, and other "meta" pages are now pretty straightforward.

Going with GraphQL also gives us access to fantastic tooling. Facebook uses GraphQL heavily and one of the most useful internal GraphQL tools they've released is Graph_i_QL. An IDE for GraphQL.

Here's a gif of me exploring my blog's GraphQL schema.

graphiql

I'm super duper excited about all the possibilities the new GraphQL layer opens up. Here's a sampling of some ideas I've had.

  • Use React Docgen to make
    PropType or Flow information from your React components queryable.
    Create a living styleguide.
  • Do something similar for other JS docs systems e.g. JSDocs. Imagine
    writing code documentation while the documentation hot-reloads your
    changes.
  • Programmable data. GraphQL fields can take arguments. Query for images
    and pass a width value as an argument and have the image source plugin
    resize the image on the fly. Pass a format string to a date field and
    get back a formatted date (no more loading moment.js into the client).
  • Connect to 3rd party APIs e.g. Github, Twitter, Facebook, etc.
  • Build sites using hosted CMSs e.g. Contentful, DatoCMS, or Prismic.
  • Validate data e.g. require that all Markdown files have a title
    field that's of a minimum length.
  • Connect data. GraphQL let's you easily connect types together e.g. the
    author field in the frontmatter of a markdown file can be connected to
    data from an authors.yaml file which let's you write queries like:
{
  markdown {
    frontmatter {
      author {
        firstName
        lastName
      }
    }
  }
}
  • Query Markdown AST for advanced use cases. E.g. custom footnote
    rendering.
  • Pulling data from legacy systems e.g. use a Wordpress source plugin
    and rebuild an old site on Gatsby while still maintaining content in
    Wordpress.
  • Extend source plugin schemas with custom fields for your site.
  • Add standard query operators to schema so you can easily sort, filter,
    search, glob, regex, groupBy, sum, etc. data.

With the coming source plugin architecture, getting data into your site will soon be straightforward. Identify the sources of data, compose source plugins, play in Graph_i_QL to create queries, drop queries in route components, write components.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions