Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate concerns in Path class #1727

Open
easbar opened this issue Sep 13, 2019 · 15 comments
Open

Separate concerns in Path class #1727

easbar opened this issue Sep 13, 2019 · 15 comments

Comments

@easbar
Copy link
Member

easbar commented Sep 13, 2019

This issue is very similar to #1110, but since that one addresses so many things I am opening a new one here.

The Path class really does too many thing and it leads to too much coupling between not necessarily related things. I took a look and so far I think Path should be split into three parts:

  1. Path as the result type of the routing algorithms. This should be a data-only class consisting of weight/distance/time/edges/nodes (and maybe a debugString). How this data/result is obtained should be entirely up to the algorithms.

  2. The path extraction (producing a result type Path from one (or two) SPTEntrys) that is currently done within Path (and leads to the Path class having multiple subclasses as well, making things even more complicated) is really a part of the routing algorithm(s) and therefore should be moved into the algorithm classes. It might be worth to move it into some Helper class that the algorithms can use instead of using inheritance.

  3. Any kind of path post processing (extracting EdgeIteratorStates, instructions details etc.) should be moved into a separate class as well, maybe this can be achieved using some static methods.

@michaz
Copy link
Member

michaz commented Sep 14, 2019

100%

@michaz
Copy link
Member

michaz commented Sep 14, 2019

Just in case you start on this: May I add something to the wish list?

While looking into CH-Alternatives, I noticed that it can also be helpful to navigate the CH-Path itself. The real path through the CH-augmented graph. With un-unpacked shortcuts and all. Because I may want to analyze it without fully unpacking it, e.g. binary-search for a certain travelled distance by recursively going down.

I think that's just a matter of doing less, namely not hide the non-unpacked Path. Just wanted to note. Of course, it could also go on Path4CH as special methods. But I'm pretty much for "data first", think about objects and such later.

@easbar
Copy link
Member Author

easbar commented Sep 14, 2019

e.g. binary-search for a certain travelled distance by recursively going down.

that's a particularly bad example because we just removed distances from CH shortcuts: #1719. Or maybe you want something like unpack one shortcut but not another ? Anyway, I think I get it, you want for example the list of edge/shortcut ids before the shortcuts are unpacked ? Its good you say this, because at one point I was asking this myself (but could not really come up with a use-case for it). Will keep it this in mind.

@michaz
Copy link
Member

michaz commented Sep 14, 2019

you want for example the list of edge/shortcut ids before the shortcuts are unpacked

Yes.

we just removed distances from CH shortcuts

Oh, that's fine. Yes, it was just an example. I think not even the real one.

@easbar
Copy link
Member Author

easbar commented Sep 17, 2019

Refactoring Path is quite a task and after thinking about and looking at it for a while I think we definitely should try to do it in small steps, because there are many factors involved. For the beginning I found its best to remove the path extraction stuff -> #1730. Not sure if this already helps with the (not)-unpacking you had in mind, but maybe this can be also improved further afterwards.

@michaz
Copy link
Member

michaz commented Aug 12, 2020

Almost done I think. Remaining issues I'm having:

  • The additional properties (description, maybe also weight, distance, time) seem a bit random on this class as it is now.
  • The start/end-node properties obfuscate a little the (very different) reasons why they are there:
    • start-node is there because otherwise the path isn't well-defined by only the list of edge-ids
    • end-node is there because a single node and no edge is also a Path.
  • It should be immutable instead of making property setters package-private (I already made them public -- since this is a basic data type that is the input to other procedures (Instructions, PathDetails, ..) I need to be able to create instances freely.
  • Looks like the self-rolled push-iterator/visitor thing isn't really necessary -- don't see anything that can't be replaced by a regular Iterator and mean the same thing.

@michaz
Copy link
Member

michaz commented Aug 12, 2020

The fundamental ugliness (to me) is that, to process it as a data type, you basically have to pattern-match:

  • if "not found", then do something
  • else if single-node, then get node number and do something with it
  • otherwise, iterate over edges

And that isn't immediately obvious.

@otbutz
Copy link
Contributor

otbutz commented Aug 13, 2020

Sounds like Path should be an abstract class than? e.g the list of edges doesn't make any sense for the first two use cases

@michaz
Copy link
Member

michaz commented Aug 13, 2020

Nah, list of edges + list of nodes with the invariant that they form a path is okay as representation. So if this is a Graph:

1 -- 2 -- 3 -- 4 -- 5
   e1   e2   e3   e4

this is a Path:

[], []

So is this:

[3], []

and this:

[3,2], [e2]

and this:

[2,3], [e2]

My question is more how best to consume / pattern-match this.
If the first thing I see is the prominent forAllEdges method, I think "great, let's just iterate over all edges, because every EdgeIteratorState gives me fromNode, edgeID and toNode" -- and then I forget that there are two types of Path with no edges: Empty paths and one-node paths.

Maybe just give up on this, let it just be a pure tuple of edge ids and node ids, remove the back-pointer to Graph, and let the consumer figure out what to do with it?

@michaz
Copy link
Member

michaz commented Aug 13, 2020

Sounds like Path should be an abstract class than?

Your solution would be a solution to that, of course.

I like it because it's almost like this, but not sure if it is the most "normal" thing to do in Java.

@michaz
Copy link
Member

michaz commented Aug 13, 2020

Let's remember the history of this class: It originally served all of these purposes:

  • Be the return type of the entire routing request, on the application level. (Later replaced by PathWrapper aka ResponsePath)
  • Be the intermediate thing the routing algorithm holds on to while routing
  • Be an abstraction where a path through contraction hierarchies really also is the path through the original graph it represents
  • Be a builder for itself

We've come a long way since then.

@otbutz
Copy link
Contributor

otbutz commented Aug 13, 2020

If it's going to be immutable, i'd go for the abstract class approach as it's the most natural one from the OOP perspective:

  • AbstractPath
  • EmptyPath(could even be a static instance)
  • NodePath
  • Path

@otbutz
Copy link
Contributor

otbutz commented Aug 13, 2020

A simple marker interface would also be an option if the AbstractPath boilds down to a "shared nothing"

@easbar
Copy link
Member Author

easbar commented Aug 13, 2020

Another issue I am having with the Path class is performance related: When the routing algorithms calculate a Path they iterate over all edges to calculate the accumulated distance and time. But often we later do this iteration again to calculate instructions, path details etc. So in my opinion calculating time and distance is an additional task on top of calculating the 'path' just like instructions and path details. It might be better to reduce Path to a list of edge keys (so the directions of the edges are already included) and calculating anything else would be done in a post-processing step (and ideally traversing the edges would only happen once). Maybe we need the list of nodes as well (not sure).

The additional properties (description, maybe also weight, distance, time) seem a bit random on this class as it is now

Yes, as far as I am concerned they can be removed and added to something that is built from a Path.

start-node is there because otherwise the path isn't well-defined by only the list of edge-ids

Maybe this problem is solved if we make Path a list of edge keys instead of edge ids?

if "not found", then do something

Hm, but checking for Path.isFound() does not seem to be too ugly to me. As a return type of the routing algorithms it seems very reasonable to me that there is a notion of a 'not found' path and one has to check for this. We could also go for Optional<Path> or use null if you ask me.

else if single-node, then get node number and do something with it

Is this really such a problem? In cases where one wants to iterate over all edges and the edge list is empty it makes sense that we iterate over an empty list? And if one wants to iterate over the nodes of a path or get the start/end node this can be done by storing the start/end nodes with Path.

tldr: I have not tried it yet, but I think Path (the result type of the routing algorithms) ideally should be (close to):

/**
 * Result type used by the routing algorithms. Additional information about the shortest path like time, distance, details and instructions can be calculated in a post-processing step
 */
public class Path {
  private IntArrayList edgeKeys;
  private int startNode;
  private int endNode;
  private double weight;
}

@easbar
Copy link
Member Author

easbar commented Sep 29, 2020

there are two types of Path with no edges: Empty paths and one-node paths.

But don't you have to do a minimum amount of pattern matching anyway when you check if the path is 'valid' (points are connected) or not (connection not found)? And if you do (and the path is valid) you already know that all paths with no edges are one-node paths?

I would like to stop 'automatically' calculating the distance and time when running the algorithms, because:

  • It is inefficient to get the edge iterator states to calculate time/distance while 'unpacking' the path and then do it all again when calculating points/nodes, path details and instructions.

  • It makes testing harder. When checking the correctness of different algorithms it would be sufficient to check the returned weights and the list of edge-ids (or keys, or nodes). Calculating the time and difference can be done completely independent from finding the optimal path and IMO should be done in a post-processing step (that can be tested in isolation).

Ideally, I would even like to support running algo.calcPath in 'weight-only' mode where we do not need to 'unpack' the path or backtrace the SPT parent pointers at all.

Can we agree on this:

/**
 * Result type used by the routing algorithms. Additional information about the shortest path like time, distance, details and instructions can be calculated in a post-processing step
 */
public class Path {
  private IntArrayList edgeKeys;
  private int startNode;
  private int endNode;
  private double weight;
}

? Note that the Path contains the edge keys here, and I think there should be a graph.getEdgeIteratorState(int edgeKey). Do we need the node IDs as well? It probably does not hurt? The isFound() method should be easy to implement by simply checking for infinite weight?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants