REST API: Cypher result format #414

Closed
technige opened this Issue Jan 8, 2013 · 8 comments

Comments

Projects
None yet
3 participants
Contributor

technige commented Jan 8, 2013

Following on from a mailing list discussion on the metadata available from a Cypher query made via REST, this issue puts forward some options for enhancements to the output.

Discussion: https://groups.google.com/forum/?fromgroups=#!topic/neo4j/p5Xh1249Wts

Currently the output from a Cypher query over REST contains a JSON object with two keys: columns and data. To my knowledge, these are guaranteed to be output in this order by the server despite the lack of order typically inherent within a JSON object. This of course helps client applications to gain access to the data in a consistent way.

Further to the specific metadata which should be made available, there is also a recurring question as to whether streaming results should be presented in a single JSON document. Doing so causes complications for client applications which want to parse the results incrementally since JSON parsers which support such incremental decoding are not generally available.

An example of the current output format can be seen below:

{
  "columns" : [ "n" ],
  "data" : [ [ {
    "paged_traverse" : "http://localhost:7474/db/data/node/1130/paged/traverse/{returnType}{?pageSize,leaseTime}",
    "outgoing_relationships" : "http://localhost:7474/db/data/node/1130/relationships/out",
    "data" : {
      "foo" : "bar",
    },
    "all_typed_relationships" : "http://localhost:7474/db/data/node/1130/relationships/all/{-list|&|types}",
    "traverse" : "http://localhost:7474/db/data/node/1130/traverse/{returnType}",
    "self" : "http://localhost:7474/db/data/node/1130",
    "all_relationships" : "http://localhost:7474/db/data/node/1130/relationships/all",
    "property" : "http://localhost:7474/db/data/node/1130/properties/{key}",
    "outgoing_typed_relationships" : "http://localhost:7474/db/data/node/1130/relationships/out/{-list|&|types}",
    "properties" : "http://localhost:7474/db/data/node/1130/properties",
    "incoming_relationships" : "http://localhost:7474/db/data/node/1130/relationships/in",
    "incoming_typed_relationships" : "http://localhost:7474/db/data/node/1130/relationships/in/{-list|&|types}",
    "extensions" : {
    },
    "create_relationship" : "http://localhost:7474/db/data/node/1130/relationships"
  } ] ]
}

The specific requirements are:

  • To allow incremental delivery and parsing of result data (i.e. a summary cannot be determined up-front)
  • To provide a useful set of metadata including column names, a row count and an execution time
  • To handle server errors gracefully
  • To work with commonly available libraries (e.g. JSON parsers)

It is also clear that some metadata may only be available after the query results are complete (e.g. execution time) while other metadata is available and more useful up-front (e.g. column names).

I propose a result format which allows multiple JSON objects to be returned from the server, each separated by a single blank line. Such blank lines should not generally appear within result data itself and so should be easily idntifiable by client applications. Each object returned may contain one of more top-level keys drawn from a reserved list, maintained and published as part of the server documentation. Currently, these keys would be columns and data but could also include time, count and so on.

Each JSON object would form a logical "block" of information and the general sequence would be:

[header] [data] [data] [data] ... [footer]

or, in the case of a server failure part way through:

[header] [data] [data] [data] [error]

The keys generally used by each block could be as follows:

  • [header] : {"columns" : }
  • [data] : {"data" : [ [...], [...] ], "count" : }
  • [footer] : {"count" : , "time": }
  • [error] : {"exception": ... , "message": ... , "stacktrace", ...}

The exact allocation of these keys would depend on both ease of server side implementation and the requirement for backward compatibility (i.e. a [data] block might allow columns to be specified). Similarly, it would be up to the server to decide on the number of rows returned within each data block and this could even be a figure customisable by the client request.

The example output above could therefore instead look like the following:

{
  "columns" : [ "n" ]
}

{
  "data" : [ [ {
    "paged_traverse" : "http://localhost:7474/db/data/node/1130/paged/traverse/{returnType}{?pageSize,leaseTime}",
    "outgoing_relationships" : "http://localhost:7474/db/data/node/1130/relationships/out",
    "data" : {
      "foo" : "bar",
    },
    "all_typed_relationships" : "http://localhost:7474/db/data/node/1130/relationships/all/{-list|&|types}",
    "traverse" : "http://localhost:7474/db/data/node/1130/traverse/{returnType}",
    "self" : "http://localhost:7474/db/data/node/1130",
    "all_relationships" : "http://localhost:7474/db/data/node/1130/relationships/all",
    "property" : "http://localhost:7474/db/data/node/1130/properties/{key}",
    "outgoing_typed_relationships" : "http://localhost:7474/db/data/node/1130/relationships/out/{-list|&|types}",
    "properties" : "http://localhost:7474/db/data/node/1130/properties",
    "incoming_relationships" : "http://localhost:7474/db/data/node/1130/relationships/in",
    "incoming_typed_relationships" : "http://localhost:7474/db/data/node/1130/relationships/in/{-list|&|types}",
    "extensions" : {
    },
    "create_relationship" : "http://localhost:7474/db/data/node/1130/relationships"
  } ] ],
  "count" : 1
}

{
  "count" : 1,
  "time": 1.234
}

This is just a rough plan but can hopefully serve as the basis for a discussion. Thoughts welcome :-)

Contributor

freeeve commented Jan 18, 2013

+1

One thing I'd like to see added while we're changing the REST Cypher result format, for people looking at these results from a typed language, would be the types of the result columns. The CypherType would be great.

Something like:

columns: ["n", "count(*)"],
column_types: ["NodeType", "IntegerType"]

This would make it easier for REST library authors to automatically type the results of the query in a smart way, mapping and converting from JSON to a native type. At least, that's what I was thinking of using it for.

Contributor

technige commented Jan 18, 2013

@wfreeman: great idea for the column types - there is no way for py2neo (and I imagine other client libs) to know what the return types of arbitrary queries would be so currently have to make a best guess based on the available keys!

Member

jexp commented Jan 18, 2013

+1 If they are exposed in the ExcecutionResult we can easily make them available in the REST results too.

Member

jexp commented Jan 18, 2013

@wfreeman can you look into that? Should be just a tiny commit which can then also be backported to 1.8

Contributor

freeeve commented Feb 11, 2013

Just for the record, the column types idea isn't as trivial as expected (after discussing/reviewing actual implementation), because of the potential for there to be results with different types, and it being lazy so we don't want to need to scan all the results before sending them. Anyway, wishful thinking I guess. Other ideas for that welcome. But yeah, back to the original idea of this issue... improving the REST responses. How about clearing out all of the redundant stuff from the Node/Rel objects? Isn't node id sufficient to reconstruct all of those longer pointer URLs?

Contributor

technige commented Feb 11, 2013

I'm possibly channelling @jimwebber by saying this but should a RESTful URI not be opaque and therefore never be constructed from constituent parts?

That said, I think there's certainly scope for improvement here. How about looking at the "get relationship" URIs within a node:

"all_relationships" : "http://localhost:7474/db/data/node/47146/relationships/all",
"incoming_relationships" : "http://localhost:7474/db/data/node/47146/relationships/in",
"outgoing_relationships" : "http://localhost:7474/db/data/node/47146/relationships/out",
"all_typed_relationships" : "http://localhost:7474/db/data/node/47146/relationships/all/{-list|&|types}",
"incoming_typed_relationships" : "http://localhost:7474/db/data/node/47146/relationships/in/{-list|&|types}",
"outgoing_typed_relationships" : "http://localhost:7474/db/data/node/47146/relationships/out/{-list|&|types}",

It seems these could instead be reduced to something like:

"relationships" : "http://localhost:7474/db/data/node/47146/relationships/{all|in|out}",
"typed_relationships" : "http://localhost:7474/db/data/node/47146/relationships/{all|in|out}/{-list|&|types}",

There are quite a few similar places where template URIs could reduce the need to list every conceivable combination. Also, I've not yet seen a use case for including the "extensions" key within every entity returned. This feels like it should sit at the service root only.

Member

jexp commented Feb 11, 2013

I doubt that those URI's can be constructed automatically, too much intrinsic knowledged needed to build them correctly. Also there is no information given about content-types and payload structure (which has changed in the past and broke drivers).

Besides the binary driver, I would still love to see a compact format for the REST api with out exploratory links (Factor 10 in size).

Contributor

technige commented Nov 13, 2013

Aside from an outstanding bug (#1406) this is now available in 2.0.0-M06 thanks to the resultDataContents option. Closing.

@technige technige closed this Nov 13, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment