Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update normalized form to follow JSON API #157

Closed
dgeb opened this issue Jun 19, 2015 · 17 comments
Closed

Update normalized form to follow JSON API #157

dgeb opened this issue Jun 19, 2015 · 17 comments
Milestone

Comments

@dgeb
Copy link
Member

dgeb commented Jun 19, 2015

I'd like to update Orbit's normalized form to follow the document structure defined by JSON API.

Instead of flattening attributes, relationships, and keys at a single level, I'd like to structure them as follows:

{
  type: 'planet',
  id: '29e127ea-fb2b-4f7d-9f49-3d502f31d74a',
  attributes: {
    name: 'Jupiter',
    classification: 'gas giant'
  },
  relationships: {
    moons: {
      data: [
        {type: 'moon', id: '13567508-f2a9-4db4-b895-49273ea42009'},
        {type: 'moon', id: '76897508-f2a9-4db4-b895-49273ea42123'}
      ]
    }
  }
}

Some details:

  • type - the record type (i.e. model name)
  • id - the value of the primary key (note that this is Orbit's primary key)
  • attributes - attributes, previously stored at the record root
  • relationships - relationships, previously stored as __rel

Advantages

Moving to this normalized form should provide the following benefits:

  • The structure allows for much more to be determined in sources semantically without requiring constant references to the schema.
  • The structure is extensible and allows for links and meta information.
  • It should pave the way for supporting heterogeneous collections and polymorphic relationships.
  • The JSONAPISerializer should largely be a noop.

TBD

Secondary keys (e.g. remote IDs) could be stored either:

  • in a separate keys object for each resource, or
  • exclusively in a map shared centrally by sources.

If stored in a map, the map's contents will need to be extractable so it can be kept in browser storage.

I'm leaning towards using a separate keys object for simplicity and consistency. Obviously, this element would only be needed for schema that use secondary keys.

@opsb
Copy link
Contributor

opsb commented Jun 19, 2015

Yup, can see the advantages. Polymorphic relationships would be useful and having as we discussed before having meta data on the relationships will be useful for tracking the load-status (not-loaded, partially-loaded, fully-loaded).

We don't use secondary keys so that doesn't effect us. One question though, is there any way to determine the type of a relationship? (hasOne vs hasMany) without going back to the schema?

@dgeb
Copy link
Member Author

dgeb commented Jun 20, 2015

One question though, is there any way to determine the type of a relationship? (hasOne vs hasMany) without going back to the schema?

I've been puzzling over this myself. The one obvious method would be to require a default value (null or []) even when uninitialized. This goes against my preference to use undefined for uninitialized values, but maybe it's a strong argument against this preference.

@opsb
Copy link
Contributor

opsb commented Jun 20, 2015

Having thought about this more, perhaps we're trying to squeeze schema information into the operation where really we should just accept that the schema is necessary to interpret it? We have a schema so we might as well lean on it. The operation encoder is already acting as a convenience layer for dispatching based on the operation's properties. Perhaps we consolidate more here? i.e.

operationEncoder.identify(operation) # most fine grained dispatching, based on specific operation type
operationEncoder.isHasOneOperation(operation) # more general
operationEncoder.isLinkOperation(operation)  # more general still

With the operation encoder refactor though I did find that explicitly listing handlers by operation type did really help to clarify what was going on. It also avoided erroneous handling of operations due to misidentification. I think the dispatching could be a little prettier (the switch statement isn't great, particularly when you have multiple cases with the same handler) but it does the job.

So perhaps we don't need to worry about hasOne vs hasMany within the operation. The main benefits I see here are handling of polymorphic relationships and the extensible structure.

@dgeb
Copy link
Member Author

dgeb commented Jun 22, 2015

@opsb I agree with everything you've said.

There's no hard requirement to completely avoid usage of the schema to analyze operations (although minimizing schema references should help performance).

And I agree that we probably should expand the operation encoder with more convenience methods.

@opsb
Copy link
Contributor

opsb commented Jul 4, 2015

This is looking like an extremely useful change at the moment. I'm now seeing that it would be really useful to to track in-progress state for loading of links i.e. loadStatus: not loaded / partially loaded / >>loading<< / fully loaded

@dgeb
Copy link
Member Author

dgeb commented Sep 1, 2015

I've been rethinking this proposal and would like to consider using a slightly modified version of the JSON API format for Orbit's normalized form.

Here are the proposed changes and reasoning:

  • Express collections as maps instead of arrays - Although the JSON API format is ideal for transport and for expressing order, it is not ideal for fast, repeated access to elements because it relies on arrays for collections. Maps can be accessed more quickly by key than arrays which require inspection of members. Furthermore, maps can be modified cleanly with JSON Patch operations, while arrays require messy positional access.
  • Require that every primary ID in the system be a UUID - By requiring primary UUIDs, all resources can be referenced uniquely only by ID (not type / ID pairs). This allows for polymorphic data while still enabling fast access to elements in a map keyed by ID. If appropriate, these IDs can be Orbit-generated and mapped to secondary server-generated keys maintained in a new keys collection.

Here are the contents of a simple cache using this format (note: UUIDs have been simplified for illustration):

{
  data: {
    p1: {
      type: 'planet',
      id: 'p1',
      attributes: {
        name: 'Jupiter',
        classification: 'gas giant'
      },
      relationships: {
        sun: {
          data: 's1'
        }
        moons: {
          data: {
            m1: true,
            m2: true
          }
        }
      }
    },

    m1: {
      type: 'moon',
      id: 'm1',
      attributes: {
        name: 'Europa'
      }
    },

    m2: {
      type: 'moon',
      id: 'm2',
      attributes: {
        name: 'Io'
      }
    },

    s1: {
      type: 'star',
      id: 's1',
      attributes: {
        name: 'The Sun'
      }
    }
  }
}

Resource objects could represent server-generated non-UUID keys in keys:

{
  data: {
    p1: {
      type: 'planet',
      id: 'p1',
      keys: {
        serverId: '123'
      },
      attributes: {
        name: 'Jupiter',
        classification: 'gas giant'
      }
    }
  }
}

Resource and relationship objects could also contain links and meta:

{
  data: {
    p1: {
      type: 'planet',
      id: 'p1',
      attributes: {
        name: 'Jupiter',
        classification: 'gas giant'
      },
      relationships: {
        sun: {
          meta: {
            lastFetched: 1441143855702
          },
          links: {
            self: '/planets/p1/relationships/sun',
            related: '/planets/p1/sun'
          },
          data: 's1'
        }
        moons: {
          meta: {
            lastFetched: 1441143855702
          },
          links: {
            self: '/planets/p1/relationships/moons',
            related: '/planets/p1/moons'
          },
          data: {
            m1: true,
            m2: true
          }
        }
      },
      meta: {
        lastFetched: 1441143855702
      },
      links: {
        self: '/planets/p1'
      },
    }
  }
}

@opsb
Copy link
Contributor

opsb commented Sep 1, 2015

The meta section certainly satisfies my current requirements (and keeping a timestamp for lastFetched is also a good idea). I wonder how querying against this structure would work (thinking of the MemorySource) and particularly fetchAll. As a structure for synchronising it looks great though. The only thing I'm wondering about is how sources would know when to ignore meta information. Potentially there could be information that makes sense to synchronise remotely whereas some would only make sense locally (e.g. lastFetched). Perhaps the sources will just need to have knowledge of meta information where it's relevant.

I see the relationship type isn't indicated anyway, are you thinking we'd refer to the schema for this?

@opsb
Copy link
Contributor

opsb commented Sep 1, 2015

Taking lastFetched as an example: Given an app has JSON API, localstorage and memory sources when the app is loaded and the memory source synchronises with localstorage, would the lastFetched timestamp reflect the value stored in localstorage or the current timestamp?

@dgeb
Copy link
Member Author

dgeb commented Sep 1, 2015

I wonder how querying against this structure would work (thinking of the MemorySource) and particularly fetchAll.

Good question. I've been thinking about introducing indices for caches, which would exist as a sibling of data, and would allow for maps between type and id, secondary keys and id, and other fields that need quick lookups. Currently, the secondary to primary key indices are dynamically generated, but it might be better to allow them to be stored statically, say in local storage.

@dgeb
Copy link
Member Author

dgeb commented Sep 1, 2015

would the lastFetched timestamp reflect the value stored in localstorage or the current timestamp?

I was just throwing an example in here - thinking it would represent fetching from a remote server. We could segment meta - allowing an object per source - to make this clear. Sources could ignore metadata stored by other sources.

@opsb
Copy link
Contributor

opsb commented Sep 1, 2015

Indices would certainly be more flexible. Indices for mappings between keys could go in localstorage, indices on attributes for instance though would need to be refreshed on load (perhaps you weren't considering indices on anything except key and type mappings though).

Feels like there's 3 different categories for meta data:

  • all sources
  • one type of source
  • an individual source

The first two make sense within the normalized form but not the third.

@dgeb
Copy link
Member Author

dgeb commented Sep 2, 2015

I am backtracking on my proposal to identify resources only by UUIDs.

Although it would work well in theory to simply mandate unique IDs, in practice it could lead to some very thorny problems. For instance, a backend that uses UUIDs is probably only guaranteeing uniqueness across a single table. Shortcuts may have been taken to generate some of those IDs. If the IDs are only merged into a single namespace at the client, those shortcuts could finally come to light, leading to problems in production that are a nightmare to debug.

I'm back to believing that the best course is to only require type / id uniqueness.

@dgeb
Copy link
Member Author

dgeb commented Sep 2, 2015

The following normalized structure could be used to accommodate the type / id uniqueness requirement without losing the expressiveness of the previous proposal:

data: {
  planet: {
    p1: {
      type: 'planet',
      id: 'p1',
      attributes: {
        name: 'Jupiter',
        classification: 'gas giant'
      },
      relationships: {
        sun: {
          data: 'star:s1'
        },
        moons: {
          data: {
            'moon:m1': true,
            'moon:m2': true
          }
        }
      }
    }
  },

  moon: {
    m1: {
      type: 'moon',
      id: 'm1',
      attributes: {
        name: 'Europa'
      },
      relationships: {
        planet: {
          data: 'planet:p1'
        }
      }
    },

    m2: {
      type: 'moon',
      id: 'm2',
      attributes: {
        name: 'Io'
      },
      relationships: {
        planet: {
          data: 'planet:p1'
        }
      }
    }
  },

  star: {
    s1: {
      type: 'star',
      id: 's1',
      attributes: {
        name: 'The Sun'
      },
      relationships: {
        planets: {
          data: {
            'planet:p1': true
          }
        }
      }
    }
  }
}

This again groups data by type, then by id, which eliminates the need for the type / id index discussed above.

It also introduces a condensed type:id string version of resource identifier objects for identifying resources in relationships while still allowing for polymorphism as well as quick map-based access.

@opsb regarding per-source meta data: I'm leaning toward separating this into a parallel data structure that could be maintained only in the appropriate source. Per-source meta data won't be represented in the normalized form. meta objects within resources should be reserved for global meta data that might concern any source.

@opsb
Copy link
Contributor

opsb commented Sep 3, 2015

It also introduces a condensed type:id string version of resource identifier objects for identifying resources in relationships while still allowing for polymorphism as well as quick map-based access.

Seems like a nice solution.

I'm leaning toward separating this into a parallel data structure that could be maintained only in the appropriate source.

Yeah, that's where I was getting to.

@dgeb
Copy link
Member Author

dgeb commented Sep 26, 2015

WIP in the rethink branch.

@opsb opsb added this to the 0.7.x milestone Nov 27, 2015
@dgeb
Copy link
Member Author

dgeb commented Nov 27, 2015

Having implemented the type:id identifier concept and found it to be cumbersome, @opsb and I would like to go ahead with the global id concept. If any implementations are hesitant about the validity of their UUIDs, they should simply set them as remote keys, and orbit will retain a mapping between them and local ids.

@dgeb
Copy link
Member Author

dgeb commented May 24, 2016

type:id identifiers have been introduced via #299

We will not be moving to a global (i.e. across all types) id concept yet.

@dgeb dgeb closed this as completed May 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants