Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add `Graph.contractNodes` #1380

Merged
merged 1 commit into from Sep 18, 2019
Merged

add `Graph.contractNodes` #1380

merged 1 commit into from Sep 18, 2019

Conversation

@decentralion
Copy link
Member

decentralion commented Sep 16, 2019

This commit adds Graph.contract, which allows collapsing certain
nodes in the graph into each other. This will enable the creation of a
SourceCred "identity" plugin, allowing identity resolution between users
different accounts on different services.

Test plan: Thorough unit tests have been added. yarn test passes.

Thanks to @wchargin for review feedback which significantly
improved this API.

@decentralion decentralion requested review from Beanow and wchargin Sep 16, 2019
src/core/graph.js Outdated Show resolved Hide resolved
src/core/graph.js Outdated Show resolved Hide resolved
* needed, we can improve the peformance by mutating the original graph
* instead of creating a new one.
*/
combineNodes(combinations: Map<NodeAddressT, NodeAddressT>): Graph {

This comment has been minimized.

Copy link
@wchargin

wchargin Sep 17, 2019

Member

I’d prefer to call this contractNodes so that the meaning is
immediately obvious. If you object for some reason, the doc comment
should at least mention contraction for greppability.

* graph with just nodes B and C. The edges in the graph are re-written in concert;
* for example, an edge from A to B would become a loop edge from B to itself.
*
* combineNodes' motivating use cases is collapsing different user identities together.

This comment has been minimized.

Copy link
@wchargin

wchargin Sep 17, 2019

Member

nit: s/cases/case/

* combineNodes' motivating use cases is collapsing different user identities together.
* For example, if a user is represented on both GitHub and Discourse, we can create
* a new node that represents that identity, and collapse their GitHub and Discourse user
* nodes into it. (If we chose, we could also collapse the GitHub node into the Discourse

This comment has been minimized.

Copy link
@wchargin

wchargin Sep 17, 2019

Member

The form that always contracts into a new node rather than choosing one
of the nodes arbitrarily is certainly nicer in some ways. For example,
suppose that the GitHub plugin expects its authorship edges to always be
connected to users. If we contract arbitrarily, then there’s a “50/50
chance” that the GitHub plugin will break, and so an error might slink
by; if we always create a new node, we’ll be sure to discover the
problem immediately.

Was there any consideration for the signature

type Contraction = {|
  +old: $ReadOnlyArray<NodeAddressT>,
  +new: NodeAddressT,
|};

contractNodes(contractions: $ReadOnlyArray<Contraction>): Graph {
  // …
}

where new fresh?

This comment has been minimized.

Copy link
@decentralion

decentralion Sep 17, 2019

Author Member

With the caveat that it needs to be new: Node, this is actually a much nicer API. Specifically, in a followon commit I wrote a method with the signature:

/**
 * Outputs identity transformation information.
 *
 * This function takes a list of identities and (semi-optionally) a discourse
 * server url. The server url is required if any Discourse identities are
 * present.
 *
 * It returns the needed information for transforming the graph to have
 * consolidated identities. Specifically, it returns a list of identity nodes
 * that needed to be added to the graph, and a transformation map for use in
 * `graph.combineNodes`.
 */
export function transformation(
  identities: $ReadOnlyArray<Identity>,
  discourseUrl: string | null
): {|+nodes: Node[], +transforms: Map<NodeAddressT, NodeAddressT>|}

Using your API, the function can be called computeContractions and just output a list of contractions, which will be much cleaner to apply.

decentralion added a commit that referenced this pull request Sep 17, 2019
This commit adds Graph.contract, which allows collapsing certain
nodes in the graph into each other. This will enable the creation of a
SourceCred "identity" plugin, allowing identity resolution between users
different accounts on different services.

Test plan: Thorough unit tests have been added. `yarn test` passes.

Thanks to @wchargin for [review feedback][1] which significantly
improved this API.

[1]: #1380 (comment)
@decentralion decentralion force-pushed the combine-nodes branch from 82f86a7 to a93c5d3 Sep 17, 2019
@decentralion decentralion changed the title add `combineNodes` for collapsing nodes together add `contract` for collapsing nodes together Sep 17, 2019
@decentralion

This comment has been minimized.

Copy link
Member Author

decentralion commented Sep 17, 2019

I've refactored the code to take @wchargin's suggestion into account. The implementation is now simpler, and we no longer have unhandled edge cases. Goes to show how much it can help to have a better interface!

decentralion added a commit that referenced this pull request Sep 17, 2019
This commit adds Graph.contract, which allows collapsing certain
nodes in the graph into each other. This will enable the creation of a
SourceCred "identity" plugin, allowing identity resolution between users
different accounts on different services.

Test plan: Thorough unit tests have been added. `yarn test` passes.

Thanks to @wchargin for [review feedback][1] which significantly
improved this API.

[1]: #1380 (comment)
@decentralion decentralion force-pushed the combine-nodes branch from a93c5d3 to 3fa0898 Sep 17, 2019
@Beanow
Beanow approved these changes Sep 17, 2019
Copy link
Member

Beanow left a comment

As a vanilla JS guy, initially unnerved by the assumptions about const remap = new Map(); not having falsy destinations. But looks like it should be no problem :]

Copy link
Member

wchargin left a comment

Nice! I suspected that this would get rid of some of the edge cases,
too, though I hadn’t read the implementation carefully enough to be
sure. Glad to hear that it does.

for (const edge of this.edges({showDangling: true})) {
let {src, dst} = edge;
const srcRemap = remap.get(src);
if (srcRemap != null) {

This comment has been minimized.

Copy link
@wchargin

wchargin Sep 17, 2019

Member
    for (const edge of this.edges({showDangling: true}) {
      const src = NullUtil.orElse(remap.get(src), edge.src);
      const dst = NullUtil.orElse(remap.get(dst), edge.dst);
      // …
    }

This comment has been minimized.

Copy link
@decentralion

decentralion Sep 17, 2019

Author Member

Right. I originally had that construction, except for a little while I had remap: Map<NodeAddressT, Node> which necessitated a different pattern. But now I can go back to that.

* improve the peformance by mutating the original graph instead of creating
* a new one.
*/
contract(contractions: $ReadOnlyArray<GraphContraction>): Graph {

This comment has been minimized.

Copy link
@wchargin

wchargin Sep 17, 2019

Member

As written, it looks like new nodes are perimtted to already exist in
the graph as long as the data is consistent; is that intentional? (It’s
not documented.)

This comment has been minimized.

Copy link
@decentralion

decentralion Sep 17, 2019

Author Member

Yes, I think it's fine. I've updated the documentation to reflect this, and added test cases.

* improve the peformance by mutating the original graph instead of creating
* a new one.
*/
contract(contractions: $ReadOnlyArray<GraphContraction>): Graph {

This comment has been minimized.

Copy link
@wchargin

wchargin Sep 17, 2019

Member

This doesn’t handle chained contractions, right? It looks like both

graph.contract([
  {old: [a.address], new: b},
  {old: [b.address], new: c},
]);

and

graph.contract([
  {old: [b.address], new: c},
  {old: [a.address], new: b},
]);

will turn a into two nodes b and c, with incident edges remapped
to b, and c an isolated vertex?

The semantics that you want, I think, are that you imagine applying the
contractions to the graph one at a time, but when you remap nodes and
edges in the graph you also remap later entries in the contractions
list. This conservatively extends the existing “take-last” semantics to
also handle chains.

This comment has been minimized.

Copy link
@decentralion

decentralion Sep 17, 2019

Author Member

I agree that the semantics you describe make sense. I spent at least 10 minutes thinking about how to get these semantics without either significantly complicating the implementation (keeping a backward map pointing every replacement to its origins), or making the runtime much worse O(n+e+k) -> O(nk + ek), and didn't find any good solutions. Since we don't actually have any need for chained contractions, I don't think it's worth the effort to implement and test the more complicated behavior. Therefore, I instead implemented logic to detect any chained contractions and throw an error.

I left a (commented) block of code with a test for the chained contractions, so if we decide we need it later, we'll have a bit of a head start, test-driven-development style. If you see an elegant way to implement contraction chaining, feel free to do so in a followon PR. (I considered skipping the test instead of commenting it, which would decrease the probability of code rot, but I feel like there's some benefit to having 0 skipped tests--it means that if we start seeing test skips in jest, it indicates some new kind of technical debt that we're accumulating.)

This comment has been minimized.

Copy link
@wchargin

wchargin Sep 18, 2019

Member

The approach that comes immediately to mind is to use a union-find
structure to accumulate the connected components, plus an extra forward
pass to resolve roots to their final node names. This should be nearly
linear time (up to α(·)), but I’ll grant that it’s not straightforward
given that there’s no union-find structure in the “standard library”.

Detecting and failing sounds good to me.

.addEdge(edge("loop", a, a))
.addEdge(edge("dangle1", a, b))
.addEdge(edge("dangle2", b, a))
.contract([{old: [a.address], replacement: c}]);

This comment has been minimized.

Copy link
@wchargin

wchargin Sep 17, 2019

Member

Maybe also test a normal (non-dangling, non-loop) edge, in- and out-?

This comment has been minimized.

Copy link
@decentralion

decentralion Sep 17, 2019

Author Member

Done.

decentralion added a commit that referenced this pull request Sep 17, 2019
This commit adds Graph.contract, which allows collapsing certain
nodes in the graph into each other. This will enable the creation of a
SourceCred "identity" plugin, allowing identity resolution between users
different accounts on different services.

Test plan: Thorough unit tests have been added. `yarn test` passes.

Thanks to @wchargin for [review feedback][1] which significantly
improved this API.

[1]: #1380 (comment)
@decentralion decentralion force-pushed the combine-nodes branch from 3fa0898 to 2c86798 Sep 17, 2019
decentralion added a commit that referenced this pull request Sep 17, 2019
This commit adds Graph.contract, which allows collapsing certain
nodes in the graph into each other. This will enable the creation of a
SourceCred "identity" plugin, allowing identity resolution between users
different accounts on different services.

Test plan: Thorough unit tests have been added. `yarn test` passes.

Thanks to @wchargin for [review feedback][1] which significantly
improved this API.

[1]: #1380 (comment)
@decentralion decentralion force-pushed the combine-nodes branch from 2c86798 to f3fa8a6 Sep 17, 2019
* improve the peformance by mutating the original graph instead of creating
* a new one.
*/
contract(contractions: $ReadOnlyArray<GraphContraction>): Graph {

This comment has been minimized.

Copy link
@wchargin

wchargin Sep 18, 2019

Member

The approach that comes immediately to mind is to use a union-find
structure to accumulate the connected components, plus an extra forward
pass to resolve roots to their final node names. This should be nearly
linear time (up to α(·)), but I’ll grant that it’s not straightforward
given that there’s no union-find structure in the “standard library”.

Detecting and failing sounds good to me.

* improve the peformance by mutating the original graph instead of creating
* a new one.
*/
contract(contractions: $ReadOnlyArray<GraphContraction>): Graph {

This comment has been minimized.

Copy link
@wchargin

wchargin Sep 18, 2019

Member

nit: You might consider contractNodes and NodeContraction, as
without further qualification contraction generally refers to an
operation on a single edge, but it’s clear enough either way.

This comment has been minimized.

Copy link
@decentralion

decentralion Sep 18, 2019

Author Member

Done.

@@ -209,6 +209,15 @@ type CachedOrder = {|
+modificationCount: number,
|};

/**
* Specifies how to contract the graph, collapsing several old nodes

This comment has been minimized.

Copy link
@wchargin

wchargin Sep 18, 2019

Member

s/the graph/a graph/: This is a type definition; there is no “this
context.

This comment has been minimized.

Copy link
@decentralion

decentralion Sep 18, 2019

Author Member

Done.

This commit adds Graph.contractNodes, which allows collapsing certain
nodes in the graph into each other. This will enable the creation of a
SourceCred "identity" plugin, allowing identity resolution between users
different accounts on different services.

Test plan: Thorough unit tests have been added. `yarn test` passes.

Thanks to @wchargin for [review feedback][1] which significantly
improved this API.

[1]: #1380 (comment)
@decentralion decentralion force-pushed the combine-nodes branch from f3fa8a6 to 7c88a95 Sep 18, 2019
@decentralion decentralion merged commit ac8ac70 into master Sep 18, 2019
1 of 2 checks passed
1 of 2 checks passed
ci/circleci: publish-1 CircleCI is running your tests
Details
ci/circleci: test Your tests passed on CircleCI!
Details
@decentralion decentralion deleted the combine-nodes branch Sep 18, 2019
@decentralion decentralion changed the title add `contract` for collapsing nodes together add `Graph.contractNodes` Sep 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.