Cycles detection: use cycles map of addresses to stack depth #79

posener · 2018-03-18T19:07:32Z

Enable cycles detection by default.
This solution gives O(1) additional complexity for detecting cycles.

Fixes #74
Instead of #78

The current comparison fails on stack overflow when given a cyclic struct (such as a cyclic linked list for example). This fix adds a new option: DecetctCycles, which finds cycles by adding information to the path stack, when iterating over pointers. When a cycle is checked (every time a pointer kind is compared) the path stack is iterated to find cycles. The comparison of cycles is by they length. Fixes: google#74

Enable cycles detection by default. This solution gives O(1) additional complexity for detecting cycles.

dsnet · 2018-03-21T22:29:31Z

Thanks for prototyping this.

I have some high-level thoughts:

I don't see the rationale for how the path-depth is the correct way to determine graph equality.
We need to be extra careful since compareArray calls diff.Difference which may need to compare arbitrary elements with each other. I noticed that you push and pop from your pointer map, so that this may not be an issue, but it does require careful thought and review.
It's insufficient to only check pointers. Cycles can occur through slices and maps (and possibly interfaces, but that may be a special case of one of the other 3). See this example.
The reporter logic needs to be cycle tolerant. In the event of a inequality, the formatting logic needs to handle printing a cyclic data structure. Currently, the formatter will get into an infinite recursive loop. A reasonable output would to be just print the type and pointer.

dsnet · 2018-03-22T01:27:36Z

An additional thought. The algorithm must be agnostic to traversal order. That is, the result of cmp.Equal should be identical regardless of whether the values are traversed by depth-first-search or breadth-first-search or some other order. I haven't studied your approach enough to determine whether this is the case or not.

posener · 2018-03-22T09:36:14Z

Thanks for the review,

1 .I don't see the rationale for how the path-depth is the correct way to determine graph equality.

This is the way to make difference between a 1 step cycle to a 2 step cycle to a 3 step cycle, as you gave in your example. So since it is a DFS we are doing, The stack length "represents" the length of the cycle - it was more accurate if we had a counter that counts only the pointer references (or the other case you mentioned) but I think that it is not necessary and the stack size does a good job for this comparison.

We need to be extra careful since compareArray calls diff.Difference which may need to compare arbitrary elements with each other. I noticed that you push and pop from your pointer map, so that this may not be an issue, but it does require careful thought and review.

I must push and pop: for example, a diamond shape graph will be detected as a cycle without popping elements in a DFS traverse.

It's insufficient to only check pointers. Cycles can occur through slices and maps (and possibly interfaces, but that may be a special case of one of the other 3). See this example.

Really nice example! I'll add hooks also for those kind of cycles too so we can detect them!

The reporter logic needs to be cycle tolerant. In the event of a inequality, the formatting logic needs to handle printing a cyclic data structure. Currently, the formatter will get into an infinite recursive loop. A reasonable output would to be just print the type and pointer.

Can you please explain when I will meet this? in the current implementation it looks to works fine.

An additional thought. The algorithm must be agnostic to traversal order. That is, the result of cmp.Equal should be identical regardless of whether the values are traversed by depth-first-search or breadth-first-search or some other order.

Why should it? Currently we are doing a DFS and it works with that. The diff algorithm also saves a "stack" which is a DFS struct, and relies on the path to be correct by pushing and popping elements.

I haven't studied your approach enough to determine whether this is the case or not.

Do you want me to elaborate on the solution? create some sort of a doc for it? maybe add it to the cycles.go file?

dsnet · 2018-03-27T07:06:55Z

In order to reason through your PR, I took a very different approach. See #85 for my WIP.

My approach uses two maps of type map[pointer]pointer. For one map, the key is a pointer from x, and the value is the associated pointer from y. The other map is the opposite. For every pointer, we check both maps for whether any of the corresponding pointers have already been visited. If either side has a cycle, we report equal only if both pointers were first encountered together as a pair.

What's interesting is that my approach and yours are semantically equivalent. My definition of equality is if "both pointers were first encountered together as a pair". In order for this to be true, they must exist in the same path step, in which case they have the same depth. In fact, I think "depth" could be better thought as the index into the the Path for the corresponding PathStep for that given pointer dereference. Furthermore, both of our approaches prevent further recursion if a cycle ever exists on either x or y. Thus, we are guaranteed that there is never a situation where a address matches multiple path steps.

I find your approach easier to explain in documentation on cmp.Equal, but for the actual implementation, I actually prefer my approach as it only cares about the pointers themselves and not the path depth.

There are some nuanced differences though:

It is not enough to only use the slice pointer for cycle detection. Broadly speaking, a slice is a tuple of {pointer, length, capacity}. Thus, suppose you have a []T at address R of length N, it is not enough to only check R. You need to check R, R+1*sizeof(T), R+2*sizeof(T), ..., R+(N-1)*sizeof(T). The need to check every element was what caused me to be afraid that it wouldn't work well with the diffing algorithm. However, I've come to convince myself that the push/pop nature makes it okay.
It's incorrect to only store uintptr in the map. This assumes that the GC implementation does not use a moving collector. Instead, unsafe.Pointer should be used. I was surprised to discover that I had actually made value.Format handle cyclic values. However, that logic needed to be updated to use the correct pointer representation.
There is no need to check Arrays as arrays do not inherently have a pointer in them (unless converted to a slice).
My implementation uses typed pointers instead of untyped ones. This is a minor distinction, but handles the case where someone type casts a pointer to a different type. A cycle can still exist in that situation, but we should delay the detection of the cycle until later. In fact, the slice situation above is a case where you can have two identical address of different types (e.g., a []T and the first element have the same address but different types).

posener · 2018-03-27T07:39:05Z

@dsnet nice code!

Good points that you made, and the code looks good.

It is not enough to only use the slice pointer for cycle detection. Broadly speaking, a slice is a tuple of {pointer, length, capacity}. Thus, suppose you have a []T at address R of length N, it is not enough to only check R. You need to check R, R+1sizeof(T), R+2sizeof(T), ..., R+(N-1)*sizeof(T). The need to check every element was what caused me to be afraid that it wouldn't work well with the diffing algorithm. However, I've come to convince myself that the push/pop nature makes it okay.

I'm not sure you are correct.
After comparing the slice pointer, you anyway traverse through the slice elements and check them. If they are also of a slice type, then you make the check that you are talking about there. Am I missing anything?

Another thing: I understand that you want this to be a single commit in the repo. But - I would appreciate if you leave my commits and push your changes on top (I can squash them to a single commit if it helps).

dsnet · 2018-03-27T20:38:50Z

Am I missing anything?

Consider the following examples.

Example 1:

v := reflect.ValueOf(make([]int32, 4))
PrintPointer(v)                 // 0x10414020 []int32
PrintPointer(v.Index(0).Addr()) // 0x10414020 *int32
PrintPointer(v.Index(1).Addr()) // 0x10414024 *int32
PrintPointer(v.Index(2).Addr()) // 0x10414028 *int32
PrintPointer(v.Index(3).Addr()) // 0x1041402c *int32

Notice from this snippet that the elements of a slice are functionally pointers to them from the parent slice. For this reason, the "address" in the slice is equal to the address of the first element.

Example 2:

v := reflect.ValueOf(make([]int32, 4))
PrintPointer(v.Slice(0, 0)) // 0x10414020 []int32
PrintPointer(v.Slice(0, 1)) // 0x10414020 []int32
PrintPointer(v.Slice(0, 2)) // 0x10414020 []int32
PrintPointer(v.Slice(0, 3)) // 0x10414020 []int32
PrintPointer(v.Slice(0, 4)) // 0x10414020 []int32

Notice from this snippet that all 5 sub-slices (i.e., v[:0], v[:1], v[:2], etc) all have the same address, but obviously different values. This actually shows that it is incorrect to use the slice pointer for equality.

Consider the following:

v := make([]int32, 4)
v0 := v[:0]
v1 := v[:1]
p0 := (*reflect.SliceHeader)(unsafe.Pointer(&v0)).Data // 0x10414020
p1 := (*reflect.SliceHeader)(unsafe.Pointer(&v1)).Data // 0x10414020

reflect.DeepEqual(v0, v1) // false
p0 == p1                  // true

Just because p0 == p1, that does not necessarily imply that v0 == v1.

Another thing: I understand that you want this to be a single commit in the repo. But - I would appreciate if you leave my commits and push your changes on top (I can squash them to a single commit if it helps).

I'm not sure I understand. My PR in #85 is an entirely different approach with no shared code.

Previously, trying to call Equal on a graph would result in a stack-overflow due to infinite recursion traversing cycles on a graph. While a vast majority of Go values are trees or acyclic graphs, there exist a small number of cases where graph equality is required. As such, we add cycle detection to Equal and define what it means for two graphs to be equal. Contrary to reflect.DeepEqual, which declares two graphs to be equal so long any cycle were encountered, we require two graphs to have equivalent graph structures. Mathematically speaking, a graph G is a tuple (V, E) consisting of the set of vertices and edges in that graph. Graphs G1 and G2 are equal if V1 == V2, E1 == E2, and both have the same root vertex (entry point into the graph). When traversing G1 and G2, we remember a stack of previously visited edges ES1 and ES2. If the current edge e1 is in ES1 or e2 is in ES2, then we know that a cycle exists. The graphs have the same structure when the previously encountered edge ep1 and ep2 were encountered together. Note that edges and vertices unreachable from the root vertex are ignored. In addition to adding cycle detection, we fix some other issues: * Fix formatting of cyclic data structures. It is not safe simply storing an uintptr as this assumes the runtime never has a moving GC. Instead, we should use unsafe.Pointer, which the GC knows how to scan. * Since AppEngine cannot use unsafe, we continue to use uintptr in the "purego" environment. The pointer package creates an abstraction over opaque pointers. Appreciation goes to Eyal Posener, who proposed a different (but semantically equivalent) approach in #79. Fixes #74

posener · 2018-03-28T06:35:19Z

I'm not sure I understand. My PR in #85 is an entirely different approach with no shared code.

Sorry we are getting into this, and with all the respect, I can see a lot of similarities in the code and concepts. Also, quoting you from #74:

Properly supporting cyclic data structures is very challenging. Even though I added that TODO, over time I've become convinced that cmp shouldn't support cyclic structures by default in order to avoid further complicating its logic.
If someone has a good design for supporting graph equality (and fits in well with the ecosystem of cmp.Option), I'll consider it. For now, I have no intention of adding this functionality.

I think I deserve a credit here.

dsnet · 2018-03-28T06:39:09Z

I think I deserve a credit here.

FYI. You are already mentioned in the PR description of #85:

Appreciation goes to Eyal Posener, who proposed a different (but semantically
equivalent) approach in #79.

@posener

Previously, trying to call Equal on a graph would result in a stack-overflow due to infinite recursion traversing cycles on a graph. While a vast majority of Go values are trees or acyclic graphs, there exist a small number of cases where graph equality is required. As such, we add cycle detection to Equal and define what it means for two graphs to be equal. Contrary to reflect.DeepEqual, which declares two graphs to be equal so long any cycle were encountered, we require two graphs to have equivalent graph structures. Mathematically speaking, a graph G is a tuple (V, E) consisting of the set of vertices and edges in that graph. Graphs G1 and G2 are equal if V1 == V2, E1 == E2, and both have the same root vertex (entry point into the graph). When traversing G1 and G2, we remember a stack of previously visited edges ES1 and ES2. If the current edge e1 is in ES1 or e2 is in ES2, then we know that a cycle exists. The graphs have the same structure when the previously encountered edge ep1 and ep2 were encountered together. Note that edges and vertices unreachable from the root vertex are ignored. In addition to adding cycle detection, we fix some other issues: * Fix formatting of cyclic data structures. It is not safe simply storing an uintptr as this assumes the runtime never has a moving GC. Instead, we should use unsafe.Pointer, which the GC knows how to scan. * Since AppEngine cannot use unsafe, we continue to use uintptr in the "purego" environment. The pointer package creates an abstraction over opaque pointers. Appreciation goes to Eyal Posener (@posener), who proposed a different (but semantically equivalent) approach in #79, which served as inspirition. Fixes #74

@posener

Previously, trying to call Equal on a graph would result in a stack-overflow due to infinite recursion traversing cycles on a graph. While a vast majority of Go values are trees or acyclic graphs, there exist a small number of cases where graph equality is required. As such, we add cycle detection to Equal and define what it means for two graphs to be equal. Contrary to reflect.DeepEqual, which declares two graphs to be equal so long any cycle were encountered, we require two graphs to have equivalent graph structures. Mathematically speaking, a graph G is a tuple (V, E) consisting of the set of vertices and edges in that graph. Graphs G1 and G2 are equal if V1 == V2, E1 == E2, and both have the same root vertex (entry point into the graph). When traversing G1 and G2, we remember a stack of previously visited edges ES1 and ES2. If the current edge e1 is in ES1 or e2 is in ES2, then we know that a cycle exists. The graphs have the same structure when the previously encountered edge ep1 and ep2 were encountered together. Note that edges and vertices unreachable from the root vertex are ignored. In addition to adding cycle detection, we fix some other issues: * Fix formatting of cyclic data structures. It is not safe simply storing an uintptr as this assumes the runtime never has a moving GC. Instead, we should use unsafe.Pointer, which the GC knows how to scan. * Since AppEngine cannot use unsafe, we continue to use uintptr in the "purego" environment. The pointer package creates an abstraction over opaque pointers. Appreciation goes to Eyal Posener (@posener), who proposed a different (but semantically equivalent) approach in #79, which served as inspirition. Fixes #74

@posener

Previously, trying to call Equal on a graph would result in a stack-overflow due to infinite recursion traversing cycles on a graph. While a vast majority of Go values are trees or acyclic graphs, there exist a small number of cases where graph equality is required. As such, we add cycle detection to Equal and define what it means for two graphs to be equal. Contrary to reflect.DeepEqual, which declares two graphs to be equal so long any cycle were encountered, we require two graphs to have equivalent graph structures. Mathematically speaking, a graph G is a tuple (V, E) consisting of the set of vertices and edges in that graph. Graphs G1 and G2 are equal if V1 == V2, E1 == E2, and both have the same root vertex (entry point into the graph). When traversing G1 and G2, we remember a stack of previously visited edges ES1 and ES2. If the current edge e1 is in ES1 or e2 is in ES2, then we know that a cycle exists. The graphs have the same structure when the previously encountered edge ep1 and ep2 were encountered together. Note that edges and vertices unreachable from the root vertex are ignored. Appreciation goes to Eyal Posener (@posener), who proposed a different (but semantically equivalent) approach in #79, which served as inspiration. Fixes #74

@posener

Previously, trying to call Equal on a graph would result in a stack-overflow due to infinite recursion traversing cycles on a graph. While a vast majority of Go values are trees or acyclic graphs, there exist a small number of cases where graph equality is required. As such, we add cycle detection to Equal and define what it means for two graphs to be equal. Contrary to reflect.DeepEqual, which declares two graphs to be equal so long any cycle were encountered, we require two graphs to have equivalent graph structures. Mathematically speaking, a graph G is a tuple (V, E) consisting of the set of vertices and edges in that graph. Graphs G1 and G2 are equal if V1 == V2, E1 == E2, and both have the same root vertex (entry point into the graph). When traversing G1 and G2, we remember a stack of previously visited edges ES1 and ES2. If the current edge e1 is in ES1 or e2 is in ES2, then we know that a cycle exists. The graphs have the same structure when the previously encountered edge ep1 and ep2 were encountered together. Note that edges and vertices unreachable from the root vertex are ignored. Appreciation goes to Eyal Posener (@posener), who proposed a different (but semantically equivalent) approach in #79, which served as inspiration. Fixes #74

@posener

Previously, trying to call Equal on a graph would result in a stack-overflow due to infinite recursion traversing cycles on a graph. While a vast majority of Go values are trees or acyclic graphs, there exist a small number of cases where graph equality is required. As such, we add cycle detection to Equal and define what it means for two graphs to be equal. Contrary to reflect.DeepEqual, which declares two graphs to be equal so long any cycle were encountered, we require two graphs to have equivalent graph structures. Mathematically speaking, a graph G is a tuple (V, E) consisting of the set of vertices and edges in that graph. Graphs G1 and G2 are equal if V1 == V2, E1 == E2, and both have the same root vertex (entry point into the graph). When traversing G1 and G2, we remember a stack of previously visited edges ES1 and ES2. If the current edge e1 is in ES1 or e2 is in ES2, then we know that a cycle exists. The graphs have the same structure when the previously encountered edge ep1 and ep2 were encountered together. Note that edges and vertices unreachable from the root vertex are ignored. Appreciation goes to Eyal Posener (@posener), who proposed a different (but semantically equivalent) approach in #79, which served as inspiration. Fixes #74

@posener

Previously, trying to call Equal on a graph would result in a stack-overflow due to infinite recursion traversing cycles on a graph. While a vast majority of Go values are trees or acyclic graphs, there exist a small number of cases where graph equality is required. As such, we add cycle detection to Equal and define what it means for two graphs to be equal. Contrary to reflect.DeepEqual, which declares two graphs to be equal so long any cycle were encountered, we require two graphs to have equivalent graph structures. Mathematically speaking, a graph G is a tuple (V, E) consisting of the set of vertices and edges in that graph. Graphs G1 and G2 are equal if V1 == V2, E1 == E2, and both have the same root vertex (entry point into the graph). When traversing G1 and G2, we remember a stack of previously visited edges ES1 and ES2. If the current edge e1 is in ES1 or e2 is in ES2, then we know that a cycle exists. The graphs have the same structure when the previously encountered edge ep1 and ep2 were encountered together. Note that edges and vertices unreachable from the root vertex are ignored. Appreciation goes to Eyal Posener (@posener), who proposed a different (but semantically equivalent) approach in #79, which served as inspiration. Fixes #74

@posener

Previously, trying to call Equal on a graph would result in a stack-overflow due to infinite recursion traversing cycles on a graph. While a vast majority of Go values are trees or acyclic graphs, there exist a small number of cases where graph equality is required. As such, we add cycle detection to Equal and define what it means for two graphs to be equal. Contrary to reflect.DeepEqual, which declares two graphs to be equal so long any cycle were encountered, we require two graphs to have equivalent graph structures. Mathematically speaking, a graph G is a tuple (V, E) consisting of the set of vertices and edges in that graph. Graphs G1 and G2 are equal if V1 == V2, E1 == E2, and both have the same root vertex (entry point into the graph). When traversing G1 and G2, we remember a stack of previously visited edges ES1 and ES2. If the current edge e1 is in ES1 or e2 is in ES2, then we know that a cycle exists. The graphs have the same structure when the previously encountered edge ep1 and ep2 were encountered together. Note that edges and vertices unreachable from the root vertex are ignored. Appreciation goes to Eyal Posener (@posener), who proposed a different (but semantically equivalent) approach in #79, which served as inspiration. Fixes #74

@posener

Previously, trying to call Equal on a graph would result in a stack-overflow due to infinite recursion traversing cycles on a graph. While a vast majority of Go values are trees or acyclic graphs, there exist a small number of cases where graph equality is required. As such, we add cycle detection to Equal and define what it means for two graphs to be equal. Contrary to reflect.DeepEqual, which declares two graphs to be equal so long any cycle were encountered, we require two graphs to have equivalent graph structures. Mathematically speaking, a graph G is a tuple (V, E) consisting of the set of vertices and edges in that graph. Graphs G1 and G2 are equal if V1 == V2, E1 == E2, and both have the same root vertex (entry point into the graph). When traversing G1 and G2, we remember a stack of previously visited edges ES1 and ES2. If the current edge e1 is in ES1 or e2 is in ES2, then we know that a cycle exists. The graphs have the same structure when the previously encountered edge ep1 and ep2 were encountered together. Note that edges and vertices unreachable from the root vertex are ignored. Appreciation goes to Eyal Posener (@posener), who proposed a different (but semantically equivalent) approach in #79, which served as inspiration. Fixes #74

posener mentioned this pull request Mar 18, 2018

Add DetectCycles to detect cycles in compared structs #78

Closed

posener force-pushed the detect-cycles2 branch 2 times, most recently from 1d7f5f3 to ddbc44a Compare March 19, 2018 04:25

Cycles detection: use cycles map of addresses to stack depth

22b1483

Enable cycles detection by default. This solution gives O(1) additional complexity for detecting cycles.

posener force-pushed the detect-cycles2 branch from ddbc44a to 22b1483 Compare March 19, 2018 04:26

posener mentioned this pull request Mar 21, 2018

Use go-cmp.Equal instead of reflect.DeepEqual stretchr/testify#579

Open

posener added 3 commits March 23, 2018 14:46

Use map size instead of stack length

729e4e9

Treat different cases of cycles: self pointing arrays, slices and maps.

2f97504

cycles: add diamond trees tests

70980f2

dsnet mentioned this pull request Mar 27, 2018

Add support for comparing graphs #85

Merged

posener closed this Mar 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cycles detection: use cycles map of addresses to stack depth #79

Cycles detection: use cycles map of addresses to stack depth #79

posener commented Mar 18, 2018

dsnet commented Mar 21, 2018

dsnet commented Mar 22, 2018

posener commented Mar 22, 2018

dsnet commented Mar 27, 2018 •

edited

Loading

posener commented Mar 27, 2018

dsnet commented Mar 27, 2018 •

edited

Loading

posener commented Mar 28, 2018

dsnet commented Mar 28, 2018 •

edited

Loading

Cycles detection: use cycles map of addresses to stack depth #79

Cycles detection: use cycles map of addresses to stack depth #79

Conversation

posener commented Mar 18, 2018

dsnet commented Mar 21, 2018

dsnet commented Mar 22, 2018

posener commented Mar 22, 2018

dsnet commented Mar 27, 2018 • edited Loading

posener commented Mar 27, 2018

dsnet commented Mar 27, 2018 • edited Loading

posener commented Mar 28, 2018

dsnet commented Mar 28, 2018 • edited Loading

dsnet commented Mar 27, 2018 •

edited

Loading

dsnet commented Mar 27, 2018 •

edited

Loading

dsnet commented Mar 28, 2018 •

edited

Loading