Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature exploration/request: nested data (nth-degree recursion) #237

Open
arxpoetica opened this issue Nov 2, 2016 · 29 comments
Open

Feature exploration/request: nested data (nth-degree recursion) #237

arxpoetica opened this issue Nov 2, 2016 · 29 comments

Comments

@arxpoetica
Copy link

arxpoetica commented Nov 2, 2016

Two prior conversations triggered this ticket. This (closed) ticket #91 and this conversation in Slack https://graphql.slack.com/archives/general/p1478088194004298. Unlike the closed ticket, this should explicitly be considered a feature exploration/request, not necessarily a question about how to do nesting/recursion.

Understanding that GraphQL is about being explicit and declarative, how would/could one be declarative and specify recursion. This is a known limitation that would need to be addressed: https://facebook.github.io/graphql/#sec-Fragment-spreads-must-not-form-cycles One suggestion from the Slack thread was to relax the requirement a bit by allowing @recursive limits. There may need to be a default limit too.

Potential example.

fragment PersonFields on Person {
  name
  primaryAddress { street }
  parents @recursive(maxDepth: 8)
}
fragment EmployeeFields on EmployeeInterface {
  primaryAddress { street}
  boss @recursive(maxDepth: 4)
}

{
  person(id: 1) {
  … PersonFields
  … EmployeeFields
  }
}

One strong aspect of graphs is deep nesting. Is this a possibly worthy feature request?

@arxpoetica
Copy link
Author

Example of the tedium that can be explicit recursion: https://github.com/graphql/graphql-js/blob/master/src/utilities/introspectionQuery.js#L74-L105 Family trees (32 nested deep? 64?) and hacker news/reddit-style comment threads are examples.

@benwilson512
Copy link

benwilson512 commented Nov 2, 2016

Minor note, since that was an example I originally sketched out (to be clear, I'm not actually for this idea). It needs to specify the fragment being recursed upon, IE

fragment PersonFields on Person {
  name
  primaryAddress { street }
  parents {
    ... PersonFields @recursive(maxDepth: 8)
  }
}
fragment EmployeeFields on EmployeeInterface {
  primaryAddress { street}
  boss  {
    ... EmployeeFields @recursive(maxDepth: 4)
  }
}

{
  person(id: 1) {
  … PersonFields
  … EmployeeFields
  }
}

@arxpoetica
Copy link
Author

arxpoetica commented Nov 2, 2016

Yeah, sorry I didn't h/t. ;)

@samuelhorwitz
Copy link

A bonus to a declarative syntax is the ability to write better resolvers. I'm already using DataLoader but with recursion, for example in a Reddit style comment thread, every layer is going to need to hit the database for each table that layer needs no matter what batching is set up.

If there was a way to declare recursion then one could write a resolver that is recursion aware and makes one database hit for everything in the tree.

@smolinari
Copy link

This might be my own inexperience talking here, but why does the data need to arrive in the nested form?

Scott

@arxpoetica
Copy link
Author

@smolinari Trees. Trees are messy data structures that don't really have very good working tools in the wild—this is a rash generalization, but think about it for a second. Is there a Google/Excel Docs equivalent for trees? Ironically, GraphQL is a declarative structure for (nested) trees (graphs), but it doesn't really do recursive nesting very well.

@smolinari
Copy link

That doesn't really answer my question. Let me ask a different way. How can the returned response of the data in the form of a tree help with the representation of that tree in the UI? Or asked another way, could the representation of the tree in the UI be just as simple (or hard), if the data for the tree is returned as a simple flat list?

Scott

@benwilson512
Copy link

This seems like an unrelated discussion. The return representations of the values are a general GraphQL thing when using JSON, this seems tangential at best to whether or not fragments should be allowed some form of limited recursion.

@smolinari
Copy link

smolinari commented Jan 3, 2017

Nothing tangential at all. Lee Byron made it fairly clear in the linked issue above that this suggestion won't be handled. If you want them to move on this at all, you'll have to come up with better reasoning than "GraphQL is hierarchical, thus it should also have fragment recursion on the server" or "GraphQL is explicit and declarative, thus recursion of fragments should be declarable". That reasoning isn't going to motivate the devs to work on this.

So, my intention is to get the answers to the questions, what is the real problem this suggestion is supposed to be solving? What is the pain you feel, when you don't get the fragment recursion?

Scott

@pcarrier
Copy link

pcarrier commented Jan 3, 2017

@smolinari it's a bit silly to write tree flattening on the server and tree grafting on the client, and whilst it provides a good opportunity to revisit textbook algorithms, app developers probably have better things to do.

The argument for including that feature as a directive in the graphql spec is the same as @skip: it affects the constraints on the shape of the response, which are strictly defined in the specification.

Should graphql have kept numerics out of the spec on the basis that one can serialize and deserialize them as strings without native support? That's easier to implement with [insert standard library of choice]...

@pcarrier
Copy link

pcarrier commented Jan 3, 2017

@smolinari I disagree with your assessment that @leebyron made "it fairly clear [that] this suggestion won't be handled".

He expressed concerns that, as far as I can tell, are specifically addressed by this request, and finished with "For now this limitation continues to exist for this balance of simplicity for the common case and protection from inadvertent infinite recursion. Perhaps we can revisit it in the future if a solid proposal to do what you suggest is compelling."

Shall we discuss whether this proposal is solid, and if not what would need to be addressed to move forward?

@smolinari
Copy link

@pcarrier -

it's a bit silly to write tree flattening on the server and tree grafting on the client

We are talking in the cases above about recursion on single fragments or data types. So to me, and correct me if I am wrong, there isn't any tree flattening being done on the server. It is flat, because the data is either in a single table or a single collection, like "comments" or "ancestors" or whatever. If we would be talking about tree flattening across different data types, then I'd agree with you. Because, doing that would be defeating the purpose of GraphQL.

Perhaps we can revisit it in the future if a solid proposal to do what you suggest is compelling.

Shall we discuss whether this proposal is solid, and if not what would need to be addressed to move forward?

You disagree with me, then add a question, that my questions are trying to accomplish? Nice. How about agreeing with me? 😄

So, to answer your question, yes. You all should be discussing what needs to be addressed to move this forward and as I keep saying, to do that, you need to explain what problems will this suggestion solve. If you had this suggestion done, what developer pain would be avoided? If you can make that clear, then Lee might feel your pain.

If you don't have any pain to avoid with this, then you aren't suggesting anything worthwhile to fix.

Scott

@pcarrier
Copy link

pcarrier commented Jan 4, 2017

We are talking in the cases above about recursion on single fragments or data types. So to me, and correct me if I am wrong, there isn't any tree flattening being done on the server. It is flat, because the data is either in a single table or a single collection, like "comments" or "ancestors" or whatever. If we would be talking about tree flattening across different data types, then I'd agree with you. Because, doing that would be defeating the purpose of GraphQL.

Not every storage system, or backend being queried by the graphql server, will expose tree structures as tables or collections. Even if they did, that's an implementation detail that my graphql server might very well want to abstract from its client.

Take the structure of comments in reddit. That's a tree structure. Imagine that my graphql server queries a backend service over HTTP+JSON that returns something like this:

{ text: "hello", author: "pierre",
  responses: [
    { text: "hello back", author: "jean",
      responses: [
        { text: "how are you?", author: "pierre"},
        { text: "what's up", author: "paul" }
      ]
    },
    { text: "hello back", author: "jacques" }
  ]
}

That's a simple example with a single datatype being nested; this issue does cover other cases, but it should suffice for our purposes. It's not unrealistic: nesting for replies is exactly how Reddit's API exposes comments (example).

Now let's say that I want my graphql server to expose this to the graphql client, which will then represent those in the DOM. For the client, the shape above is the most convenient representation.

What do you suggest the query look like? With @recurse the answer is easy. However if you want the graphql server to return a flat list of comments, the graphql server needs to flatten the tree coming from the backend (eg by exposing an ID and a parent ID for every comment), and the graphql client needs to graft that list back into a tree.

That's the general problem we're describing here and trying to solve. Having to flatten tree models in the server, and grafting them back in the client. I really can't think of a better phrasing.

And to be clear, yes I can do { text author responses { text author responses { text author responses { text author responses { text author responses { text author } } } } } }, but I don't know how many levels to include, and if I suddenly want my component to display a new field (say, the comment field), I have to make many edits.

@smolinari
Copy link

smolinari commented Jan 4, 2017

If the data source returns the data already structured as a tree or a graph, does the GraphQL resolver need recursion?

Scott

@pcarrier
Copy link

pcarrier commented Jan 4, 2017

@smolinari I'm not sure I understand the question, you might want to read #91 and this ticket again.

It might be helpful to refer specifically to the following parts of the spec (from this repo):

All GraphQL operations must specify their selections down to fields which return scalar values to ensure an unambiguously shaped response.

When querying an Object, the resulting mapping of fields are conceptually ordered in the same order in which they were encountered during query execution, excluding fragments for which the type does not apply and fields or fragments that are skipped via @Skip or @include directives. This ordering is correctly produced when using the CollectFields() algorithm.

So basically, the shape of the response will always match exactly the shape of the query, with a few exceptions (nullables being null, the effect of @skip & @include, fragments conditioned on types for selection sets on unions and interfaces), and some fields being arrays or arrays of arrays or [...] as exactly specified in the field definition (in the schema).

You can implement your own "JSON scalar" which returns arbitrary JSON, but then no traversal at execution, so no field selection, no aliases, no directives, no schema-based static typing, etc.; in short no GraphQL.

@smolinari
Copy link

smolinari commented Jan 4, 2017

I read the issue. Especially this answer from Lee

So, from my understanding this suggestion is asking for the avoidance of having to write out the levels of depth for whatever hierarchical data is needed.

Or is there more to it?

I am not against the suggestion, I am trying to get much more "compelling" reasoning for getting it implemented. This suggestion is basically coming up with a solution, without totally describing the real problems and pain that needs solving.

So ignore me and my ignorance. But do explain the problems and pain needing this solution.

I'll also explain where I often see this happen. I deal in customer service a lot. I've been doing it basically my whole career and often the customer comes to me with what he thinks is the solution to his problem. The first duty of good service person is to dig into the problem and ignore what the customer thinks is the solution. And often it turns out, the solution isn't what the customer really needs, but something totally different. I am not saying that is true here, but this situation is asking for the solution, without digging properly into the problem and Lee is correct is saying, "I'll call YAGNI here!".

So without a clear and compelling need, we'll go round in circles and waste time. And with that, I'll leave the discussion to you all to come up with the compelling problems and pain this problem is supposed to solve.

Scott

@pcarrier
Copy link

pcarrier commented Jan 4, 2017

@smolinari there was a response to Lee's answer explaining why this isn't the best solution, to which @leebyron agreed and suggested we revisited once we have a solid proposal, and this issue is an attempt at such a proposal.

"Having to write out the levels of depth", sure. What if I want to specify the depth as a query parameter / client-side option, my queries are pre-loaded into the server and public clients aren't allowed to send arbitrary queries?

This feature makes named fragments more generally useful. They could have been left out of GraphQL altogether: after all you can systematically substitute them with inline fragments. Having named fragments and being able to refer to them from other fragments seemed like a good compromise between complexity and usefulness.

Arguably, fragments being able to refer to other fragments and forming loops is also useful. The TraceNode type in https://optics-graphql.apollodata.com/ is such an example; we know that the traces cannot be more than X MB and that "infinite" recursion is impossible (and we can make everything nicely bounded if we don't expose a node's ancestors as fields, only its children). Now in the general case, really infinite loops sounds dangerous, so requiring that the maximum level of recursing be specified as an argument to the directive makes a lot of sense IMHO.

The schema we ended up building still allows for short yet very expensive queries (if anything it became easier in the absence of @recurse as we ended up exposing a bunch of convenience fields to work around the difficulties of querying whole subtrees: one can query {[...]{descendants{ancestors{descendants{ancestors{descendants{ancestors{id}}}}}}}} which returns exponentially more TraceNode objects than contained in the original trace).

To get back to a trace tree a client would have to either run a query like:

fragment traceNode on TraceNode {
  id
  parentId
  startTime
  endTime
  key
  alias
  type
}
[...]
      execute {
        ...traceNode
        descendants {
          ...traceNode
        }
      }
[...]

and graft through id/parentId, or repeat the list of field in the fragment N times inlined in the query.

Funnily meta, in the case of Optics, if the client queries up to N levels of trace nodes in a query, then its execution can trivially generate traces strictly deeper than N by recursively querying traces of itself ✨.

@serle
Copy link

serle commented May 11, 2017

I have a deeply nested configuration object where the nesting is the same structure repeated but in a ragged hierarchy i.e. it is not of constant depth on all the branches. I was happily streaming the whole thing over to the front end, before I decided to convert our communication layer to GraphQL, so as to avoid custom API creation. (Use Case)

I would be happy with an escape hatch (e.g. a "recursive type wrapper" with similar syntax to GraphQLList) to say to the GraphQL server, this is my recursive object type, stop when you hit it and let my resolver resolve the whole thing as one blob i.e. treat it as a primitive as far as your recursive resolution algorithm goes. The client should still be able to validate my result as it has received a well formed JSON object that is not of infinite length and knows that it contains a recursive type with a given set of fields.

@SanderMertens
Copy link

SanderMertens commented Jan 9, 2018

I find the discussion around recursion & the potential attack vector it represents confusing. IMO, unbounded recursion is just a type of query that potentially has a big resultset, and for that we can use pagination to slice up the result. I don't yet appreciate why large datasets from recursive queries should be treated differently from large datasets from non-recursive queries (when you can prevent infinite iteration over cycles- see below).

Unbounded recursion can make a lot of sense for acyclic graphs. Sometimes application logic dictates that I need the full (sub)tree, not just n levels. If I can't express that, I will have to potentially hit the backend multiple times which adds latency and increases load on the infrastructure, because of an arbitrary limitation of the query language.

In contrast, unbounded recursion never makes sense for cyclic graphs. A backend should ideally not even allow for unbounded recursion queries on cyclic graphs.

My suggestion would be annotate fields that are guaranteed acyclic in the GraphQL schema. Queries with unbounded recursion on fields that do not have this annotation should fail. That way the backend is safe from attacks by default, and can only allow unbounded recursion for parts of the schema where it makes sense.

The usecase that caused me to run into this is a package database that is managing dependencies. When a user needs to install a package, he/she needs to know what the dependencies of the package are. This application needs the full dependency tree in order to work. Dependencies are guaranteed to be acyclic, and there is thus a well-defined worst case size of the resulting dataset.

@arxpoetica
Copy link
Author

arxpoetica commented Jan 10, 2018

@smolinari a clear pain point is not knowing how deep the nesting lies. JSON is riddled with examples, but the two examples I outlined above are exactly along those lines. Family tree nesting (complete unknown of depth) and kind of nested forum-style data.

@ntkoopman
Copy link

ntkoopman commented Mar 18, 2018

I think that GraphQL lends itself very well for this and there should be a supported way. That said, I do think this proposal is missing a few things:

What would happen if the recursion limit is hit? I assume it will simply not be applied in that case and all the fields will be absent from the result. This would be similar to writing out the query by hand and would only really work on direct recursion (where you spread a Fragment inside itself).

What if there are multiple @recursion directives in a Fragment?

fragment PersonFields on Person {
  name
  parents {
    ... PersonFields @recursive(maxDepth: 8)
  }
  ancestors {
    ... PersonFields @recursive(maxDepth: 8)
  }
}

This would create a very large query if you would write it out. But I can imagine the author of the query only wanting to recurse both directions independently.

What if I want to set a limit on the total amount of returned nodes instead of a recursion limit? Could we add e.g. @recursive(maxNodes: 100) to have the server decide which branches to show? And I can think of another few strategies that are even more project dependent.

Do we even want this detail to be in the spec? The only thing stopping us from implementing any of this is 5.4.2.2 so would it be enough to amend that with a clause that you can opt-out of this validation if a @recursive directive is applied (and the actual implementation and arguments, if any, of that directive would be implementation specific)?

Is there maybe a GraphQL (POC) server that already implements recursion?

edit: That being said, all my current recursion needs are solved using tree flattening, so maybe we shoud look into tree grafting instead of truly recursive queries.

@pcarrier
Copy link

Is there maybe a GraphQL (POC) server that already implements recursion?

As far as I can tell, any by taking out some validation rules.

@arxpoetica
Copy link
Author

arxpoetica commented May 3, 2018

Looking forward to any movement on this one.

To throw down a little harder: the name GraphQL portends awesome support for Graph database structures, but I've noticed the GraphQL community is really heavy handed about actually supporting more complex nested edge/vertex structures.

Why should recursive support for nested fragment structures be any different for support for flattened (lists/arrays) fragment structures, which are supported out of the box and have the same limitations and concerns about size?

I'll revisit the use cases from before. Genealogical data (obvious recursive graph structures of unknown complexity and depth) and forum-style threads.


One more thing: if you look at graph databases in the wild, such as ArangoDB, limits on edge-vertex traversal is entirely left up to the person doing traversing. If something explodes, that onus should fall on the one not being wise about recursion and LIMIT style parameters, not necessarily the system. Exact same thing can be said about recursion and arrays in currently-existing GraphQL use cases.

@dylanjeffers
Copy link

I agree completely with @arxpoetica and others, the burden of proving acyclic recursion is for the back-end system, not the graphql validator. DBs like Datomic allow recursive queries anyhow, so let them do the work. The argument that this validation prevents an attack vector doesn't add up either, since a client could just create a query with a large number of nested fields. The solution to that problem, and the recursive one, belongs in the backend. Not having recursive types limits graphql's power greatly :/

@hnviradiya
Copy link

This is really frustrating that I implemented graphql framework and because of this I have to call it using rest api. If something goes infinite it should be problem of implementation. Or say when there is no children then one can simply return empty. But this REALLY not valid argument that because of infinite loop this should not be on spec.

@ArcaneEngineer
Copy link

ArcaneEngineer commented Oct 7, 2019

PAIN POINT: Aside from taxonomic (e.g. biological) data, common use case for us is tools, rich UIs, games and interactive apps (not that dissimilar from one another), which you might say is a broad case of scene graphs. These have an arbitrary depth in general, but a fixed depth per application spec.

These apps tend to consist of nested hierarchies of components all in one super-tree. We do data driven dev: design data tree first, top down, updating it iteratively during development, until we wind up with a fairly complex tree that describes the application, and subtrees that describe finer and finer grained aspects thereof. These (sub)trees are individually traceable / diffable. We would ideally push up full tree or a subtree to update, depending on need, to reduce number of queries; and pull down similarly.

Reassembling this from flat structures without greatly increasing the complexity of overall development, on client, for every project, while also paying attention to data types, methods on the model that operate on these data, etc., while those factors are also changing through development, is untenable.

PROPOSAL: If we could assure non-cyclic, limited depth graphs on the client, and turn off / ignore the necessity for certain checks on the server, i.e. have server assume that the incoming data has limited recursion depth and no cycles (and crash in the unlikely event that our data was not sanitised), then GraphQL's performance would be preserved, the authors need touch very little if anything in the spec and source, and we could move solidly to GraphQL for all our projects.

simon-wacker added a commit to building-envelope-data/api that referenced this issue Sep 7, 2020
…ly nested the tree is but still should be able to request the whole tree in one query (this shortcoming of GraphQL was and is discussed in graphql/graphql-spec#91 and graphql/graphql-spec#237)
@rivantsov
Copy link
Contributor

rivantsov commented Mar 10, 2021

I have implemented allowing circular-referencing in fragments, as long as they are not on top level, this includes self-referencing fragments. This actually neatly solves one big pain point with introspection in GraphQL itself - fully unfolding field type. Clients typically have to do an ugly query with 10-level nested logic to fully unfold the type ref; with a simple self-ref fragment TypeDetails it works perfectly. See example here, in the execution log, query and response:

https://github.com/rivantsov/ngraphql/blob/master/misc/UnitTestsLog.txt#L655

As for limiting the length of unfolding the recursive chain - this is moot question, all well-built servers should have protective limits on query nesting depth, query complexity, number of objects to return etc. Otherwise any rogue user can bring down the entire server. So this should be covered anyway, the server should reject the query if it reaches the limit.

Suggestion: allow circular refs in fragments on non-top level.

@seeruk
Copy link

seeruk commented May 6, 2021

@ArcaneEngineer is spot on there. I'd been thinking about this recently as I've hit a similar problem to what he's described.

I have a tree based structure. It isn't recursive; no items in the tree can refer to their parents. The depth is currently arbitrary, but it will never be infinite, but it may be deep enough to be impractical to query each layer individually.

When I first stumbled across this issue, I actually just made a scalar called JSON and returned the whole thing. This worked, and it was returned as an object, just un-typed. I realised recently that with GraphQL Codegen and TypeScript I can make a custom scalar specifically for this type, and assign a TypeScript type to it, giving that scalar proper type information in my own code. The problem is, this is a public API and I'd love to be able to have all of the documentation about fields available to types in one place - provided by the GraphQL introspection endpoint.

To that end, wouldn't it be possible to make something like a scalar, but called something else (struct maybe?) that lets you document the shape of something, that maybe even still validates that the documented shape is what is returned, but lets you get a big blob of stuff all in one go without asking for the fields? Effectively this is what you can do already, just without documentation or validation.

So in your schema (this is a snippet of the real thing I'm working with):

struct ContentType {
  slug: String!
  name: String!
  fields: [ContentTypeField!]
}

struct ContentTypeField {
  kind: ContentTypeFieldKind!
  fields: [ContentTypeField!]
}

type Query {
  contentType(slug: String!): ContentType
}

And you'd be able to make a query like:

{
  contentType(slug: "blog-post")
}

Which could return:

{
  "contentType": {
    "slug": "blog-post",
    "name": "Blog Post",
    "fields": [
      {
        "kind": "FIELD_GROUP",
        "fields": [
          {
            "kind": "TEXT"
          }
        ]
      }
    ]
  }
}

This provides a solution to one aspect of this recursive type problem, but does take away from a fairly fundamental aspect of GraphQL; choosing the fields you want to return. Whether that's a deal-breaker or not isn't up to me though I suppose! For now I'll probably go ahead with a custom scalar, as flattening and unflattening tree based structures is tedious and a nice way to introduce unnecessary bugs.

@benjie
Copy link
Member

benjie commented May 6, 2021

Somewhat off topic: I think an input/output symmetric composite type is an interesting idea (one we’ve discussed a fair few times). The lack of selection set would be odd, but necessary for it to be symmetrical and desired for your recursive use case (thanks for sharing). Using this type for both input and output would mean the type could not be evolved though - any change (adding field, changing nullability) could potentially be a breaking change. Definitely worth investigating IMO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests