Skip to content

Loading…

Feature request: Cypher node return format with labels in the JSON #1931

Closed
freeeve opened this Issue · 20 comments

10 participants

@freeeve

I've currently got an ugly workaround in place to get the labels for a node returned in the transactional HTTP endpoint REST format that calls the URI for labels. It would be awesome if there were a format that had (all in one response):

  1. node id
  2. labels--one big list
  3. properties
  4. nothing else; no bloat!
@DirkMahler

+1 from my side: I'm planning to create a datastore binding for the transactional HTTP endpoint for the CDO project (https://github.com/buschmais/cdo-neo4j). It determines the java types to map a node to from its labels. If labels are not returned by a query this requires another get request for each returned node - and that's not efficient.

@jakewins
Neo4j member

@wfreeman so we'll want to optimize for the use case of a client knowing what data it wants, and only return exactly whats been asked for (eg. no labels or id if you did not tell Cypher to return it). However, it's true that there is a use case where a middle layer needs meta-data access for ad-hoc results.

You can do this today, although it's not pretty, by asking for the "graph" and the "rest" response formats, and pull the labels out of the "graph" result. However, it would be nice to have something akin to the rest format without the bells, agreed.

We're currently crunching down in the kernel, so it'll be a while before we have time to look into this. If anyone wants to try their hand at a PR, have a look at RestRepresentationWriter and ResultDataContent for how to plug another format in.

@aseemk

Huge +1 to this. Especially to the fact that labels are needed in order for any "ORM"-type layer to do (part of) its job: map returned objects to instances of classes based on type.

You're right that it's technically possible by combining the REST and "graph" response formats. I think I played around with that idea briefly once, and I might have ran into some issues, but I'll revisit.

@akollegger
Neo4j member

Please see the related Idea Board card. Vote there, and read about a proposed format linked from there.

@akollegger akollegger closed this
@cleishm
Neo4j member

I'm wondering why one wouldn't simply RETURN n, labels(n) to achieve this?

@thobe
Neo4j member

I agree with @cleishm, if what you are doing is "ORM" (or "OGM"), you really should be pushing as much of the mapping into the query logic as possible, and then have the consumer of the result know exactly what the structure is and be able to build the object structure from that. In fact the query should be structured in such a way that the result is such that building an object (structure) from it is trivial. That is the real job an "ORM" should be doing, constructing the queries in such a way that mapping to objects is trivial.

I do however accept that my opinion differs from most of the rest of the developer community, and am willing to accept that it might not be our role to educate the masses, and that we should just build something like this if people think that they want it.

@aseemk

@akollegger: awesome, didn't know there was an ideas board. =) So is that the place for feature requests, and GitHub Issues solely for bugs?

@aseemk

@cleishm @thobe: absolutely, expanding the query to RETURN labels(n) works... if you're the one writing the query.

But if I'm writing business logic queries, should I really be expected to RETURN labels(n) for every single node I'm returning, in every query? Even though I'm already specifying the labels in the MATCH? (E.g. MATCH (user:User) <-[:follows]- (follower:User))

Perhaps "ORM"/"OGM" was the wrong word to use, and has confused this issue.

We aren't programmatically generating Cypher queries. We're still writing regular Cypher by hand, because expressiveness is probably Cypher's biggest purpose and strength.

Our helper code is applied on the response of the query, to look at the returned results, and map them (a) to a more usable format (a list of dictionaries, instead of separate lists of columns and data rows), and (b) to instances of classes with methods of them (instead of simple JSON objects). It's for the latter that it needs to know nodes' types. We achieve that today by having a type property on our nodes. We'd like to move to labels.

Does that help clarify my context and use case?

@aseemk

I will add that writing an ORM/OGM to programmatically generate queries (a la ActiveRecord et al.) is something I've been thinking of exploring (for Node.js)... but I haven't yet gone down that road, precisely because Cypher is so expressive and useful today. I have a hard time believing that programmatic object methods would do a better job expressing graph queries than Cypher.

I agree that if we programmatically generated our Cypher queries, via this kind of ORM/OGM or something similar, then we don't need Neo4j to return our labels for us, because the code parsing the Cypher results will be the same code that generated the query. That's not our case today.

I just realized: should I be posting these thoughts on the ideas board and not here? Apologies if so!

@freeeve

It boils down to performance.

We can get the labels from the REST call in the REST format, but why make a second request for the 99% use case that labels are a small list.

We can get the labels from the graph format, but in a stream of a large number of results, that information is far after the row information, which means we have to parse the whole thing before being able to tell the end user the labels.

I will probably submit a pull request targeted at 2.1 soon, with what I think is the best format for row-oriented drivers that want to support node/rel types. The good news is there is already the functionality for returning different formats.

@cleishm
Neo4j member

@wfreeman I agree that a solution requiring the REST api in addition to cypher isn't really a stable approach for the long term.

The problem I see is that this isn't about OGM. It's about wanting to use cypher for exploring the graph and returning the graphs native entities. That's not something it actually does well right now: it's intended use is to return primitive content in a row form. That you can even do MATCH (n) RETURN n (ie. without specifying a property to return) is a bit odd. So I think this needs a bit more holistic thinking, in the context of using cypher to inspect and return graph entities - and not just the primitive content.

@jexp
Neo4j member

@thobe the problem is if you are doing an OGM you can do this for every query you generate yourself (although it gets really nasty with nested maps/collections and paths).

@cleishm the problem is about mostly OGMs as the people that report this issue are writing OGMs.

The real problem comes in when you support users provide their own queries. There is no way you can force this additional cruft and boilerplate query constructs on them.

@boggle
Neo4j member

For now the labels can be obtained using the labels function, i.e. return labels(n) gives all labels of a node n.

@DirkMahler

@boggle This is possible but requires one additional query per node - this is inefficient for large results.

@systay
Neo4j member

@DirkMahler No need to do an extra query - you can just add the labels(n) to the one query you send over, can't you? If you share your query maybe we can help you figure out how you can accomplish that.

@DirkMahler

One of the first things I'm asked after introducing Neo4j to Java developers is for the existence of OGM solutions, they don't want to deal with REST, Json, etc. Thus it would be good to allow OGM users to work with Cypher also for their own queries without being forced to add additional metadata parts to the result clause, e.g. labels(n) - which btw. can become difficult if more than one result column returns nodes.
For an OGM it is necessary being able to obtain relevant infornation from the result itself. First priority are id and type information (i.e. labels) , second are properties and relations (as they could also be fetched lazily afterwards if required). Maybe an OGM could indicate during the request (e.g. special header) that it requires meta information?

@DirkMahler

@systay That's the problem: the OGM has no real chance to do that. Imagine the following sitation: A developer uses an OGM (like eXtended Objects which I'm currenlty creating for Java) to map nodes/relationships to objects and vice versa. There are nodes having a label "Person" and they can be specialized by other labels like "Adult" or "Child". This could map to Java interfaces or classes like this:

@Label
public interface Person {
  String getName();
  ...
}

@Label
public interface Adult extends Person {
}

@Label
public interface Child extends Person {
}

The OGM offers a functionality to execute cypher queries, which the developer can use to create a arbitrary queries:

Iterable<Person> persons = ogm.executeQuery("match (p:Person)  return p");
for (Person p : persons) {
  if (p instanceof Adult)
    ...
  {
  } else if (p instanceof Child) {
    ...
  }
}

Within the query and its result the developer expresses that he expects nodes representing persons - that's no problem so far. But he also expects that he can also determine the specialized types (Adult and Child) - that's common sense for mapping frameworks (e.g. JPA).

If the nodes returned by Neo4j do not contain labels there would be the following options left for the OGM to solve this:

1a. The OGM must parse the cyper query and prepend "labels(p)" to the return clause. It slows things down and makes the OGM very dependent on specific Cypher versions.
1b. The OGM could offer its own query language which is translated to Cypher. But Cypher is very expressive and simply rocks, why do that? ;-)
2. The developer must be aware to always add "labels(p)" to the query. Beside the fact that this is difficult to explain to him it's getting quite expressive and less comprehensive if more than one node is part of the result (e.g. "match (p1:Person)-[:KNOWS]->(p2:Person) return p1, labels(p1), p2, labels(p2)".
3. While iterating over the result returned by Neo4j the OGM would have to execute a "match (n) where id(n)={id} return labels(n)" for each node passing the id as parameter. This causes a huge performance drop for larger results (known as the n+1 problem for object-relational-mappers).

All these options are (from an OGM perspective) quite bad ideas. Things would be much simpler if nodes and relationships would provide a minimum set of meta information within the result. Keep in mind that people are actually asking for OGMs to map objects from their preferred programming language to the Neo4j artifacts.

Hope this example/explanation make my motivation for commenting heavily on this ticket a bit more clear, I think the same holds for @aseemk.

@aseemk

I can't +1 @DirkMahler's comment enough. I hope I can succinctly summarize it to this:

If the layer doing the parsing of results ("OGM" or otherwise) is not the same as the layer writing Cypher queries (business logic layer), there's no good solution currently.

Either the OGM layer has to parse Cypher syntax, or the business logic layer has to add extra boilerplate to every Cypher query just for the OGM (which breaks abstraction/encapsulation and would be insane), or we have to invent our own new DSL for generating Cypher queries programmatically so that the OGM can amend it.

Edit: people have also mentioned the "just make a second request" solution, which is also @DirkMahler's option 3. Sorry I'm not even addressing this; this is absolutely not even on the table for a real app. Doubling the network requests for every query? Come on. =)

@aseemk

I have extensive code I am happy to share with Neo folks, but the code is proprietary and confidential. We are paying Enterprise customers, so I'm happy to do so via our support channel.

@cleishm
Neo4j member

I think @DirkMahler correctly identifies the concern here: an OGM and a query language are two separate things, which we're trying to conflate here.

Right now, the Cypher language returns row based primitive data. It doesn't return graph elements - the fact that MATCH (n) RETURN n works is simply because it coerces the node to a primitive map that can be returned in the row. This return format can be used to implement an OGM, much like SQL can be used to implement an ORM, as long as enough of the data (and metadata) is returned in each row.

But a query language that works with that OGM is a different beast. I'm not entirely sure the use cases, but I'd bet they don't involve returning row based primitive data. I'm guessing they're for querying the Object Graph: which means the query language should be identifying graph elements (nodes/rels/paths) only - and probably returning a single one of them per result "row". That's a little different from what Cypher does today, which is why having some really good examples would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.