Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Graph data structure to Godot. #15647

Closed
willnationsdev opened this issue Jan 13, 2018 · 23 comments
Closed

Add a Graph data structure to Godot. #15647

willnationsdev opened this issue Jan 13, 2018 · 23 comments

Comments

@willnationsdev
Copy link
Contributor

willnationsdev commented Jan 13, 2018

I think Godot could benefit from providing a Reference object type that gives access to a labeled vertex, directed and labeled edge, graph data structure. Like how TreeItems work with the Tree, we could have a DataGraph (Reference, facade), DataGraphVertex (Object 0+ labels & a Variant meta), and DataGraphEdge (Object with 1 label & a Variant meta). I believe this format would keep it as flexible as possible.

Part of the advantage here is that you'd be getting access to a graph or tree data structure in the script code without having to rely on Control nodes like Tree or GraphNode. I can also think of some modules that would work quite well with a built in C++ type for graphs like this.

The advantages of exposing this to scripting could be pretty awesome for making data structures in projects. You could create scripts that use it as a graph, or convert it into a binary search tree, a splay tree, whatever you want.

What do you all think?

@Zylann
Copy link
Contributor

Zylann commented Jan 13, 2018

I thought about this when looking the Astar class, which basically is a graph, made for the sole purpose of being used for pathfinding (use-case oriented), Though a graph data structure can have more uses.

@vnen
Copy link
Member

vnen commented Jan 13, 2018

Using Control nodes for this might be more expensive than rolling your own implementation in GDScript.

I'm pro adding data structures to GDScript.

@willnationsdev
Copy link
Contributor Author

@vnen I intend to make something deriving Reference, geared towards generalized usage, so it could be used for visualization as a Control or it could be for something else entirely.

@willnationsdev
Copy link
Contributor Author

Would this be something that I can add to the engine core, would this have to be introduced through a module, or could it even be something else?

@vnen
Copy link
Member

vnen commented Jan 13, 2018

Well, reduz is quite picky about the core. OTOH some types can benefit of this so I'm not sure.

It could also be done with GDNative, instead of a regular module.

@willnationsdev
Copy link
Contributor Author

willnationsdev commented Jan 13, 2018

Is there a particularly good reason to make it with GDNative to where it only has access to the scripting API? Seems to me like it doesn't really belong in that context, especially if it were ever integrated into an AStar refactoring or similar usage. It seems to me like it just needs an appropriate spot somewhere in the Godot folder that just isn't in core.

@poke1024
Copy link
Contributor

As a general solution, this would be great; it could deal with allocating, storing and cleaning up graph data much more efficient that Mono's GC or GDScript's manual strong/weak ref implementation.

@willnationsdev
Copy link
Contributor Author

As for the placement, how about we create a module that is dedicated to providing generic data structure and algorithm extensions to the core? That would open things up for successive additions of a similar nature and give the DataGraph a comfortable place to live outside of core.

@willnationsdev
Copy link
Contributor Author

Ok, I have created a module specifically for "data_structures" and included subdirectory for a DataGraph class and its associated classes. Would appreciate any feedback you guys would like to give regarding it. What sorts of features might you look for a in graph data structure? Including a shortest path method is probably one I'm thinking of adding. Including a method to return the number of connections with a certain number of "hops" to another vertex is probably another...any ideas?
https://github.com/willnationsdev/godot/commits/datagraph

@raymoo
Copy link
Contributor

raymoo commented Jan 20, 2018

It would be nice to support efficiently keeping track of the connected component a vertex is in dynamically, as in https://en.m.wikipedia.org/wiki/Dynamic_connectivity

@willnationsdev
Copy link
Contributor Author

willnationsdev commented Jan 21, 2018

I'm currently making some internal data structure changes to improve efficiency, so the following is based on the revised structure which will be uploaded soon.

@raymoo From what I can tell, I'm already supporting this. I've designed the graph data structure such that it will be easy to support Cypher queries in the future (I would like to add a Cypher interpreter to the DataGraph's directory in the new module at some point). This necessitates quickly answering connectivity questions.

The current structure is...

  • Vertices contain 0...* labels and one Dictionary of metadata.
  • Edges contain 1 label and one Dictionary of metadata.
  • Vertices are stored within VertexClusters that contain both the vertex and two lists of pointers to all related connections (1 for incoming, 1 for outgoing).
  • There exist Dictionaries that map the vertex or edge ID to the VertexCluster or DataGraphEdge respectively. This provides constant-time lookup of the associated object.
  • To reduce lookup times and to allow us to look up connections quickly, each of the aforementioned Dictionaries are mapped via StringNames for the labels associated with them in HashMaps.

Basically, this supplies constant-time lookups that improve when you increase the diversity of labels associated with a vertex / with the graph's edges. The VertexClusters ensure that I can look up a vertex and find its associated edges very quickly (as it gives me direct pointers to all of them, also in a label-based HashMap, which lets me quickly find, for example, "The :Character vertices that the :Player:Character vertex has :MET where the :Character in question :LIVES_IN a :Place :OWNED by the :Mayor:Character vertex."

Edit: oh, also, the "lists" in the VertexClusters are also being changed to HashMaps of labels pointing to Dictionaries.

@raymoo
Copy link
Contributor

raymoo commented Jan 21, 2018

@willnationsdev Are connectivity queries sublinear in graph size? Also I don't know what VertexClusters are. Are they required to be connected? When you delete a vertex / edge in a VertexCluster, and it causes the VertexCluster to become disconnected, is splitting it into new clusters an operation sublinear in the size of the cluster?

@willnationsdev
Copy link
Contributor Author

willnationsdev commented Jan 21, 2018

@raymoo As I explained, all of the data is being stored in HashMaps and Dictionaries (which are HashMaps internally), so you technically are getting constant time in every case. In order to reduce the number of collisions however, the hash tables are subdivided based on the labels you associate with vertices and edges.

The VertexClusters are, as stated, just a vertex paired with it's associated incoming and outgoing edges (the Vertex itself doesn't store that data). Deleting a vertex is equivalent to deleting a VertexCluster in this case. I refer to them as clusters only because a vertex with edges necessarily touches other vertices around it. Deleting a VertexCluster requires one constant time hashtable removal + another for each label it possesses. It also deletes every edge connected to it which in turn means that many constant time hash table lookups. As such, every operation should be sublinear in regards to the graph size and density, but the efficiency is further improved by maintaining a diversity of labels (which causes the hashtable lookups to result in fewer collisions).

Edit: Basically, the complexity of every direct vertex or edge interaction should involve only constant time operations, that way, of you do something that affects one vertex, at most you only have to deal with a number operations equivalent to the number of dependencies it had in the graph (the smallest possible time).

@raymoo
Copy link
Contributor

raymoo commented Jan 21, 2018

@willnationsdev Determining the connected component requires more than looking at the neighbors of one vertex. Even if accessing the edge / neighbor lists for a vertex is constant time, it is not necessarily constant time to determine which other vertices it is connected to.

@willnationsdev
Copy link
Contributor Author

@raymoo if you are referring to an N-length path from one vertex to another, and looking up whether it exists, then that should be a sublinear operation. As stated, the hashtable lookups are all subdivided based on label, so a path calculation would become linear if you disregarded labels entirely, but it becomes a log n operation (I believe, off the top of my head?) when you apply the more typical use-case of employing a diversity of labels to the graph's edges.

@raymoo
Copy link
Contributor

raymoo commented Jan 21, 2018

@willnationsdev Basically I want this:

  • There is an ID / object / whatever for each connected component of the graph.
  • Given a particular vertex, I can get the id of the connected component the vertex is in.
  • Given a particular connected component, I can get a list of vertices in that component with a specific label (Not absolutely necessary).
  • I can get a list of connected components.
  • All of these operations are not too expensive.

My hypothetical use-case is keeping track of a network of wires in some kind of engineering game. I want to keep track of which devices are connected to each other by wire so that I can simulate them together as one big group, for efficiency. I shouldn't have to walk over all the wires every time, in case there are a lot of wires. Since I am looking for connectivity through any wire, this use-case is "not typical" by your definition.

I am not sure exactly what your definition is, but I am assuming you mean something like the graph editors for Visualscript and AnimationTreePlayer etc. But even in that use-case, even if you assume that the types of nodes are distributed uniformly, the average time complexity for querying connectivity between two vertices may be worse than logarithmic if the branching factor is high enough (assuming the graph is a tree for simplicity). Worst-case complexity could always be linear, if all the vertices / edges of the label you care about are next to each other.

I have no idea though why you would make connectivity queries in something like a visual coding graph, or why you would want to restrict it along specific edges or vertices. Do you have an example of where someone might want to do that? My gut feeling is that any problem where connectivity queries are interesting and useful will also be problems where connection path lengths do not die out in the way you suggested.

@willnationsdev
Copy link
Contributor Author

willnationsdev commented Jan 21, 2018

@raymoo Ok, I googled "connected component" and finally realized exactly what you were talking about. So you essentially want to be able to designate one or more pre-existing graphs to use as components of a larger supergraph, and you would want to be able to quickly determine whether a given vertex or edge was in that subgraph?

That actually would make sense, and doesn't seem like it should be too difficult to incorporate into the design (great idea!). All I would need to do is create a tree hierarchy between graphs and set the graph pointer within each vertex/edge so that it points to only the most direct parent. Then each DataGraph can keep another Dictionary to store its subgraphs.

I have been designing this graph under the impression that I will eventually be able to efficiently execute Cypher queries against it. Cypher is a graph query language for the Neo4j Graph Database. My target use-case is maintaining a backend in-memory/flat-file database for an RPG, but I also want to ensure that the design is as broadly applicable as possible so that the same data structure maybe used for several other projects once it is exposed to the scripting API. So, for example, I might want to know whether "Bob" "LOVES" somebody who loves someone else, who also loves someone else (a love triangle, if you will). I could query this by asking for a 3-hop LOVES connection starting from Bob and ending at Bob:

MATCH (Bob)-[:LOVES*3]->(Bob)

To know if Bob loves someone with a reciprocal love, that would be a 2-hop. I could also ask who specifically it is with a return statement.

MATCH (Bob)-[:LOVES]->(p)-[:LOVES]->(Bob)
RETURN p.name

Ideally, a data structure that can support queries of this sort would also be effective at creating fast lookups for other graph / tree use cases. That's the reason for asking for feedback here.

Edit: I would create a derived version of this data structure that supports all the Cypher stuff with an interpreter and everything. This would just serve as a base class to organize the graph storage / functionality.

@poke1024
Copy link
Contributor

I believe the common term for VertexClusters is adjacency lists (even though they're hashed in this implementation, as I understand, which makes "is-in" queries constant time). Finding connected components can be done using a BFS or DFS, so the complexity is linear in number of nodes + edges, supposing you have a constant time data structure to remember which nodes you already visited.

@willnationsdev
Copy link
Contributor Author

willnationsdev commented Jan 21, 2018

@poke1024 Well, assuming someone actually does use a diversity of labels in the graph's vertices and edges, and assuming the average use case, a user should be able to find a path in the overall supergraph in logarithmic time since, at any given step, if you are searching for vertex or edge with a particular label, and if use of unique labels is kept to a minimum (best case scenario), you effectively can ignore all subsections of the graph that do not meet that label criteria.

So for example, you could represent a binary search tree by creating an acyclic graph that maintains only LEFT and RIGHT labels on edges. At any given step, selecting LEFT over RIGHT will remove the need to check any vertices within the RIGHT edge'd subgraph / subtree. The same holds true in this case, only you can create however many types of edges you want, and labels all have fast lookup if you want to directly look for vertices or edges that have a particular label.

Edit: Actually, only an acyclic graph would be logarithmic in this way, I think? A separate example that is cyclic would be an octogon with 100% density...

If you had a 8 vertices, and every vertex had a directed edge to every vertex (including itself), resulting in 64 edges, but if each of those vertices have its own label extending from it in its edges (so vertex1 has only extends outward V1 edges, vertex 2 extends only V2 edges, etc.), then a search will have 64 edges to search through, but at any given step, it is able to isolate (64 total edges) / (8 edge types) edges remaining that it needs to search through in the hash table (so it only searches through 8).

Also, I just realized that the whole "subgraph" thing from before could be simulated by simply creating a dedicated label for the subgraph with all of its vertices and edges, so there is really no need to make changes to the design (would actually damage efficiency in the long run).

@raymoo
Copy link
Contributor

raymoo commented Jan 21, 2018

@willnationsdev Connected components do not need to be be pre-designated, they are just a property of a graph. A connected component is a group of vertices and their edges where every vertex can be reached from every other vertex in the same component, and where the connected component does not exclude any vertex reachable from any vertex in the component. The list of connected components may grow or shrink, and membership may change, depending on how vertices / edges are removed / added.

My request was that the graph keeps track of what connected components exist in the graph and membership information. Finding connected components is linear in graph size without keeping extra connectivity data, like @poke1024, but the wikipedia page I linked describes some algorithms / data structures to make it more efficient. I am beginning to feel though that it shouldn't be included in the basic graph implementation since it does add an asymptotic overhead to all updates to the graph. Maybe it could be a subclass inheriting the basic graph structure in the future.

Your RPG example makes sense to me and I understand now why you might want connectivity in a heterogenous graph like that.

@Angular-Angel
Copy link

I would love some more thorough functionality for making skill and tech trees, and the like. A Graph class and it's components would go a long way towards that.

@Calinou
Copy link
Member

Calinou commented May 27, 2020

Feature and improvement proposals for the Godot Engine are now being discussed and reviewed in a dedicated Godot Improvement Proposals (GIP) (godotengine/godot-proposals) issue tracker. The GIP tracker has a detailed issue template designed so that proposals include all the relevant information to start a productive discussion and help the community assess the validity of the proposal for the engine.

The main (godotengine/godot) tracker is now solely dedicated to bug reports and Pull Requests, enabling contributors to have a better focus on bug fixing work. Therefore, we are now closing all older feature proposals on the main issue tracker.

If you are interested in this feature proposal, please open a new proposal on the GIP tracker following the given issue template (after checking that it doesn't exist already). Be sure to reference this closed issue if it includes any relevant discussion (which you are also encouraged to summarize in the new proposal). Thanks in advance!

@cIymax
Copy link

cIymax commented Jan 23, 2022

@willnationsdev I've reopened your proposal on the new GIP tracker and invite you to comment on the reopened proposal at least regarding some example intended game use-cases for your proposal (as requested by the GIP template) and regarding some of the additional info requested by the respondents. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants