Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cycles detection: use cycles map of addresses to stack depth #79

Closed
wants to merge 5 commits into from

Conversation

posener
Copy link

@posener posener commented Mar 18, 2018

Enable cycles detection by default.
This solution gives O(1) additional complexity for detecting cycles.

Fixes #74
Instead of #78

The current comparison fails on stack overflow when given a cyclic struct
(such as a cyclic linked list for example).

This fix adds a new option: DecetctCycles, which finds cycles by adding information
to the path stack, when iterating over pointers.

When a cycle is checked (every time a pointer kind is compared) the path stack
is iterated to find cycles. The comparison of cycles is by they length.

Fixes: google#74
Enable cycles detection by default.
This solution gives O(1) additional complexity for detecting cycles.
@dsnet
Copy link
Collaborator

dsnet commented Mar 21, 2018

Thanks for prototyping this.

I have some high-level thoughts:

  1. I don't see the rationale for how the path-depth is the correct way to determine graph equality.
  2. We need to be extra careful since compareArray calls diff.Difference which may need to compare arbitrary elements with each other. I noticed that you push and pop from your pointer map, so that this may not be an issue, but it does require careful thought and review.
  3. It's insufficient to only check pointers. Cycles can occur through slices and maps (and possibly interfaces, but that may be a special case of one of the other 3). See this example.
  4. The reporter logic needs to be cycle tolerant. In the event of a inequality, the formatting logic needs to handle printing a cyclic data structure. Currently, the formatter will get into an infinite recursive loop. A reasonable output would to be just print the type and pointer.

@dsnet
Copy link
Collaborator

dsnet commented Mar 22, 2018

An additional thought. The algorithm must be agnostic to traversal order. That is, the result of cmp.Equal should be identical regardless of whether the values are traversed by depth-first-search or breadth-first-search or some other order. I haven't studied your approach enough to determine whether this is the case or not.

@posener
Copy link
Author

posener commented Mar 22, 2018

Thanks for the review,

1 .I don't see the rationale for how the path-depth is the correct way to determine graph equality.

This is the way to make difference between a 1 step cycle to a 2 step cycle to a 3 step cycle, as you gave in your example. So since it is a DFS we are doing, The stack length "represents" the length of the cycle - it was more accurate if we had a counter that counts only the pointer references (or the other case you mentioned) but I think that it is not necessary and the stack size does a good job for this comparison.

  1. We need to be extra careful since compareArray calls diff.Difference which may need to compare arbitrary elements with each other. I noticed that you push and pop from your pointer map, so that this may not be an issue, but it does require careful thought and review.

I must push and pop: for example, a diamond shape graph will be detected as a cycle without popping elements in a DFS traverse.

  1. It's insufficient to only check pointers. Cycles can occur through slices and maps (and possibly interfaces, but that may be a special case of one of the other 3). See this example.

Really nice example! I'll add hooks also for those kind of cycles too so we can detect them!

  1. The reporter logic needs to be cycle tolerant. In the event of a inequality, the formatting logic needs to handle printing a cyclic data structure. Currently, the formatter will get into an infinite recursive loop. A reasonable output would to be just print the type and pointer.

Can you please explain when I will meet this? in the current implementation it looks to works fine.

An additional thought. The algorithm must be agnostic to traversal order. That is, the result of cmp.Equal should be identical regardless of whether the values are traversed by depth-first-search or breadth-first-search or some other order.

Why should it? Currently we are doing a DFS and it works with that. The diff algorithm also saves a "stack" which is a DFS struct, and relies on the path to be correct by pushing and popping elements.

I haven't studied your approach enough to determine whether this is the case or not.

Do you want me to elaborate on the solution? create some sort of a doc for it? maybe add it to the cycles.go file?

@dsnet
Copy link
Collaborator

dsnet commented Mar 27, 2018

In order to reason through your PR, I took a very different approach. See #85 for my WIP.

My approach uses two maps of type map[pointer]pointer. For one map, the key is a pointer from x, and the value is the associated pointer from y. The other map is the opposite. For every pointer, we check both maps for whether any of the corresponding pointers have already been visited. If either side has a cycle, we report equal only if both pointers were first encountered together as a pair.

What's interesting is that my approach and yours are semantically equivalent. My definition of equality is if "both pointers were first encountered together as a pair". In order for this to be true, they must exist in the same path step, in which case they have the same depth. In fact, I think "depth" could be better thought as the index into the the Path for the corresponding PathStep for that given pointer dereference. Furthermore, both of our approaches prevent further recursion if a cycle ever exists on either x or y. Thus, we are guaranteed that there is never a situation where a address matches multiple path steps.

I find your approach easier to explain in documentation on cmp.Equal, but for the actual implementation, I actually prefer my approach as it only cares about the pointers themselves and not the path depth.

There are some nuanced differences though:

  • It is not enough to only use the slice pointer for cycle detection. Broadly speaking, a slice is a tuple of {pointer, length, capacity}. Thus, suppose you have a []T at address R of length N, it is not enough to only check R. You need to check R, R+1*sizeof(T), R+2*sizeof(T), ..., R+(N-1)*sizeof(T). The need to check every element was what caused me to be afraid that it wouldn't work well with the diffing algorithm. However, I've come to convince myself that the push/pop nature makes it okay.
  • It's incorrect to only store uintptr in the map. This assumes that the GC implementation does not use a moving collector. Instead, unsafe.Pointer should be used. I was surprised to discover that I had actually made value.Format handle cyclic values. However, that logic needed to be updated to use the correct pointer representation.
  • There is no need to check Arrays as arrays do not inherently have a pointer in them (unless converted to a slice).
  • My implementation uses typed pointers instead of untyped ones. This is a minor distinction, but handles the case where someone type casts a pointer to a different type. A cycle can still exist in that situation, but we should delay the detection of the cycle until later. In fact, the slice situation above is a case where you can have two identical address of different types (e.g., a []T and the first element have the same address but different types).

@posener
Copy link
Author

posener commented Mar 27, 2018

@dsnet nice code!

Good points that you made, and the code looks good.

It is not enough to only use the slice pointer for cycle detection. Broadly speaking, a slice is a tuple of {pointer, length, capacity}. Thus, suppose you have a []T at address R of length N, it is not enough to only check R. You need to check R, R+1sizeof(T), R+2sizeof(T), ..., R+(N-1)*sizeof(T). The need to check every element was what caused me to be afraid that it wouldn't work well with the diffing algorithm. However, I've come to convince myself that the push/pop nature makes it okay.

I'm not sure you are correct.
After comparing the slice pointer, you anyway traverse through the slice elements and check them. If they are also of a slice type, then you make the check that you are talking about there. Am I missing anything?

Another thing: I understand that you want this to be a single commit in the repo. But - I would appreciate if you leave my commits and push your changes on top (I can squash them to a single commit if it helps).

@dsnet
Copy link
Collaborator

dsnet commented Mar 27, 2018

Am I missing anything?

Consider the following examples.

Example 1:

v := reflect.ValueOf(make([]int32, 4))
PrintPointer(v)                 // 0x10414020 []int32
PrintPointer(v.Index(0).Addr()) // 0x10414020 *int32
PrintPointer(v.Index(1).Addr()) // 0x10414024 *int32
PrintPointer(v.Index(2).Addr()) // 0x10414028 *int32
PrintPointer(v.Index(3).Addr()) // 0x1041402c *int32

Notice from this snippet that the elements of a slice are functionally pointers to them from the parent slice. For this reason, the "address" in the slice is equal to the address of the first element.

Example 2:

v := reflect.ValueOf(make([]int32, 4))
PrintPointer(v.Slice(0, 0)) // 0x10414020 []int32
PrintPointer(v.Slice(0, 1)) // 0x10414020 []int32
PrintPointer(v.Slice(0, 2)) // 0x10414020 []int32
PrintPointer(v.Slice(0, 3)) // 0x10414020 []int32
PrintPointer(v.Slice(0, 4)) // 0x10414020 []int32

Notice from this snippet that all 5 sub-slices (i.e., v[:0], v[:1], v[:2], etc) all have the same address, but obviously different values. This actually shows that it is incorrect to use the slice pointer for equality.

Consider the following:

v := make([]int32, 4)
v0 := v[:0]
v1 := v[:1]
p0 := (*reflect.SliceHeader)(unsafe.Pointer(&v0)).Data // 0x10414020
p1 := (*reflect.SliceHeader)(unsafe.Pointer(&v1)).Data // 0x10414020

reflect.DeepEqual(v0, v1) // false
p0 == p1                  // true

Just because p0 == p1, that does not necessarily imply that v0 == v1.

Another thing: I understand that you want this to be a single commit in the repo. But - I would appreciate if you leave my commits and push your changes on top (I can squash them to a single commit if it helps).

I'm not sure I understand. My PR in #85 is an entirely different approach with no shared code.

dsnet added a commit that referenced this pull request Mar 27, 2018
Previously, trying to call Equal on a graph would result in a stack-overflow
due to infinite recursion traversing cycles on a graph.
While a vast majority of Go values are trees or acyclic graphs, there exist
a small number of cases where graph equality is required.

As such, we add cycle detection to Equal and define what it means for two graphs
to be equal. Contrary to reflect.DeepEqual, which declares two graphs to be
equal so long any cycle were encountered, we require two graphs to have
equivalent graph structures.

Mathematically speaking, a graph G is a tuple (V, E) consisting of the set of
vertices and edges in that graph. Graphs G1 and G2 are equal if V1 == V2,
E1 == E2, and both have the same root vertex (entry point into the graph).
When traversing G1 and G2, we remember a stack of previously visited edges
ES1 and ES2. If the current edge e1 is in ES1 or e2 is in ES2, then we know that
a cycle exists. The graphs have the same structure when the previously
encountered edge ep1 and ep2 were encountered together.
Note that edges and vertices unreachable from the root vertex are ignored.

In addition to adding cycle detection, we fix some other issues:
* Fix formatting of cyclic data structures. It is not safe simply storing
an uintptr as this assumes the runtime never has a moving GC. Instead, we should
use unsafe.Pointer, which the GC knows how to scan.
* Since AppEngine cannot use unsafe, we continue to use uintptr in the "purego"
environment. The pointer package creates an abstraction over opaque pointers.

Appreciation goes to Eyal Posener, who proposed a different (but semantically
equivalent) approach in #79.

Fixes #74
@posener
Copy link
Author

posener commented Mar 28, 2018

I'm not sure I understand. My PR in #85 is an entirely different approach with no shared code.

Sorry we are getting into this, and with all the respect, I can see a lot of similarities in the code and concepts. Also, quoting you from #74:

Properly supporting cyclic data structures is very challenging. Even though I added that TODO, over time I've become convinced that cmp shouldn't support cyclic structures by default in order to avoid further complicating its logic.
If someone has a good design for supporting graph equality (and fits in well with the ecosystem of cmp.Option), I'll consider it. For now, I have no intention of adding this functionality.

I think I deserve a credit here.

@dsnet
Copy link
Collaborator

dsnet commented Mar 28, 2018

I think I deserve a credit here.

FYI. You are already mentioned in the PR description of #85:

Appreciation goes to Eyal Posener, who proposed a different (but semantically
equivalent) approach in #79.

@posener posener closed this Mar 28, 2018
dsnet added a commit that referenced this pull request Mar 28, 2018
Previously, trying to call Equal on a graph would result in a stack-overflow
due to infinite recursion traversing cycles on a graph.
While a vast majority of Go values are trees or acyclic graphs, there exist
a small number of cases where graph equality is required.

As such, we add cycle detection to Equal and define what it means for two graphs
to be equal. Contrary to reflect.DeepEqual, which declares two graphs to be
equal so long any cycle were encountered, we require two graphs to have
equivalent graph structures.

Mathematically speaking, a graph G is a tuple (V, E) consisting of the set of
vertices and edges in that graph. Graphs G1 and G2 are equal if V1 == V2,
E1 == E2, and both have the same root vertex (entry point into the graph).
When traversing G1 and G2, we remember a stack of previously visited edges
ES1 and ES2. If the current edge e1 is in ES1 or e2 is in ES2, then we know that
a cycle exists. The graphs have the same structure when the previously
encountered edge ep1 and ep2 were encountered together.
Note that edges and vertices unreachable from the root vertex are ignored.

In addition to adding cycle detection, we fix some other issues:
* Fix formatting of cyclic data structures. It is not safe simply storing
an uintptr as this assumes the runtime never has a moving GC. Instead, we should
use unsafe.Pointer, which the GC knows how to scan.
* Since AppEngine cannot use unsafe, we continue to use uintptr in the "purego"
environment. The pointer package creates an abstraction over opaque pointers.

Appreciation goes to Eyal Posener (@posener), who proposed a different
(but semantically equivalent) approach in #79, which served as inspirition.

Fixes #74
dsnet added a commit that referenced this pull request Mar 28, 2018
Previously, trying to call Equal on a graph would result in a stack-overflow
due to infinite recursion traversing cycles on a graph.
While a vast majority of Go values are trees or acyclic graphs, there exist
a small number of cases where graph equality is required.

As such, we add cycle detection to Equal and define what it means for two graphs
to be equal. Contrary to reflect.DeepEqual, which declares two graphs to be
equal so long any cycle were encountered, we require two graphs to have
equivalent graph structures.

Mathematically speaking, a graph G is a tuple (V, E) consisting of the set of
vertices and edges in that graph. Graphs G1 and G2 are equal if V1 == V2,
E1 == E2, and both have the same root vertex (entry point into the graph).
When traversing G1 and G2, we remember a stack of previously visited edges
ES1 and ES2. If the current edge e1 is in ES1 or e2 is in ES2, then we know that
a cycle exists. The graphs have the same structure when the previously
encountered edge ep1 and ep2 were encountered together.
Note that edges and vertices unreachable from the root vertex are ignored.

In addition to adding cycle detection, we fix some other issues:
* Fix formatting of cyclic data structures. It is not safe simply storing
an uintptr as this assumes the runtime never has a moving GC. Instead, we should
use unsafe.Pointer, which the GC knows how to scan.
* Since AppEngine cannot use unsafe, we continue to use uintptr in the "purego"
environment. The pointer package creates an abstraction over opaque pointers.

Appreciation goes to Eyal Posener (@posener), who proposed a different
(but semantically equivalent) approach in #79, which served as inspirition.

Fixes #74
dsnet added a commit that referenced this pull request Mar 28, 2018
Previously, trying to call Equal on a graph would result in a stack-overflow
due to infinite recursion traversing cycles on a graph.
While a vast majority of Go values are trees or acyclic graphs, there exist
a small number of cases where graph equality is required.

As such, we add cycle detection to Equal and define what it means for two graphs
to be equal. Contrary to reflect.DeepEqual, which declares two graphs to be
equal so long any cycle were encountered, we require two graphs to have
equivalent graph structures.

Mathematically speaking, a graph G is a tuple (V, E) consisting of the set of
vertices and edges in that graph. Graphs G1 and G2 are equal if V1 == V2,
E1 == E2, and both have the same root vertex (entry point into the graph).
When traversing G1 and G2, we remember a stack of previously visited edges
ES1 and ES2. If the current edge e1 is in ES1 or e2 is in ES2, then we know that
a cycle exists. The graphs have the same structure when the previously
encountered edge ep1 and ep2 were encountered together.
Note that edges and vertices unreachable from the root vertex are ignored.

Appreciation goes to Eyal Posener (@posener), who proposed a different
(but semantically equivalent) approach in #79, which served as inspiration.

Fixes #74
dsnet added a commit that referenced this pull request Mar 28, 2018
Previously, trying to call Equal on a graph would result in a stack-overflow
due to infinite recursion traversing cycles on a graph.
While a vast majority of Go values are trees or acyclic graphs, there exist
a small number of cases where graph equality is required.

As such, we add cycle detection to Equal and define what it means for two graphs
to be equal. Contrary to reflect.DeepEqual, which declares two graphs to be
equal so long any cycle were encountered, we require two graphs to have
equivalent graph structures.

Mathematically speaking, a graph G is a tuple (V, E) consisting of the set of
vertices and edges in that graph. Graphs G1 and G2 are equal if V1 == V2,
E1 == E2, and both have the same root vertex (entry point into the graph).
When traversing G1 and G2, we remember a stack of previously visited edges
ES1 and ES2. If the current edge e1 is in ES1 or e2 is in ES2, then we know that
a cycle exists. The graphs have the same structure when the previously
encountered edge ep1 and ep2 were encountered together.
Note that edges and vertices unreachable from the root vertex are ignored.

Appreciation goes to Eyal Posener (@posener), who proposed a different
(but semantically equivalent) approach in #79, which served as inspiration.

Fixes #74
dsnet added a commit that referenced this pull request Dec 16, 2019
Previously, trying to call Equal on a graph would result in a stack-overflow
due to infinite recursion traversing cycles on a graph.
While a vast majority of Go values are trees or acyclic graphs, there exist
a small number of cases where graph equality is required.

As such, we add cycle detection to Equal and define what it means for two graphs
to be equal. Contrary to reflect.DeepEqual, which declares two graphs to be
equal so long any cycle were encountered, we require two graphs to have
equivalent graph structures.

Mathematically speaking, a graph G is a tuple (V, E) consisting of the set of
vertices and edges in that graph. Graphs G1 and G2 are equal if V1 == V2,
E1 == E2, and both have the same root vertex (entry point into the graph).
When traversing G1 and G2, we remember a stack of previously visited edges
ES1 and ES2. If the current edge e1 is in ES1 or e2 is in ES2, then we know that
a cycle exists. The graphs have the same structure when the previously
encountered edge ep1 and ep2 were encountered together.
Note that edges and vertices unreachable from the root vertex are ignored.

Appreciation goes to Eyal Posener (@posener), who proposed a different
(but semantically equivalent) approach in #79, which served as inspiration.

Fixes #74
dsnet added a commit that referenced this pull request Dec 16, 2019
Previously, trying to call Equal on a graph would result in a stack-overflow
due to infinite recursion traversing cycles on a graph.
While a vast majority of Go values are trees or acyclic graphs, there exist
a small number of cases where graph equality is required.

As such, we add cycle detection to Equal and define what it means for two graphs
to be equal. Contrary to reflect.DeepEqual, which declares two graphs to be
equal so long any cycle were encountered, we require two graphs to have
equivalent graph structures.

Mathematically speaking, a graph G is a tuple (V, E) consisting of the set of
vertices and edges in that graph. Graphs G1 and G2 are equal if V1 == V2,
E1 == E2, and both have the same root vertex (entry point into the graph).
When traversing G1 and G2, we remember a stack of previously visited edges
ES1 and ES2. If the current edge e1 is in ES1 or e2 is in ES2, then we know that
a cycle exists. The graphs have the same structure when the previously
encountered edge ep1 and ep2 were encountered together.
Note that edges and vertices unreachable from the root vertex are ignored.

Appreciation goes to Eyal Posener (@posener), who proposed a different
(but semantically equivalent) approach in #79, which served as inspiration.

Fixes #74
dsnet added a commit that referenced this pull request Dec 16, 2019
Previously, trying to call Equal on a graph would result in a stack-overflow
due to infinite recursion traversing cycles on a graph.
While a vast majority of Go values are trees or acyclic graphs, there exist
a small number of cases where graph equality is required.

As such, we add cycle detection to Equal and define what it means for two graphs
to be equal. Contrary to reflect.DeepEqual, which declares two graphs to be
equal so long any cycle were encountered, we require two graphs to have
equivalent graph structures.

Mathematically speaking, a graph G is a tuple (V, E) consisting of the set of
vertices and edges in that graph. Graphs G1 and G2 are equal if V1 == V2,
E1 == E2, and both have the same root vertex (entry point into the graph).
When traversing G1 and G2, we remember a stack of previously visited edges
ES1 and ES2. If the current edge e1 is in ES1 or e2 is in ES2, then we know that
a cycle exists. The graphs have the same structure when the previously
encountered edge ep1 and ep2 were encountered together.
Note that edges and vertices unreachable from the root vertex are ignored.

Appreciation goes to Eyal Posener (@posener), who proposed a different
(but semantically equivalent) approach in #79, which served as inspiration.

Fixes #74
dsnet added a commit that referenced this pull request Dec 16, 2019
Previously, trying to call Equal on a graph would result in a stack-overflow
due to infinite recursion traversing cycles on a graph.
While a vast majority of Go values are trees or acyclic graphs, there exist
a small number of cases where graph equality is required.

As such, we add cycle detection to Equal and define what it means for two graphs
to be equal. Contrary to reflect.DeepEqual, which declares two graphs to be
equal so long any cycle were encountered, we require two graphs to have
equivalent graph structures.

Mathematically speaking, a graph G is a tuple (V, E) consisting of the set of
vertices and edges in that graph. Graphs G1 and G2 are equal if V1 == V2,
E1 == E2, and both have the same root vertex (entry point into the graph).
When traversing G1 and G2, we remember a stack of previously visited edges
ES1 and ES2. If the current edge e1 is in ES1 or e2 is in ES2, then we know that
a cycle exists. The graphs have the same structure when the previously
encountered edge ep1 and ep2 were encountered together.
Note that edges and vertices unreachable from the root vertex are ignored.

Appreciation goes to Eyal Posener (@posener), who proposed a different
(but semantically equivalent) approach in #79, which served as inspiration.

Fixes #74
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants