Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(graph-sequencer): add package @pnpm/graph-sequencer #7168

Merged
merged 6 commits into from
Oct 15, 2023

Conversation

ahaoboy
Copy link
Contributor

@ahaoboy ahaoboy commented Oct 7, 2023

support Subgraph Sorting for Improved Performance

Faster than https://github.com/jamiebuilds/graph-sequencer
image

Subgraph Sort

In scenarios where we have a large graph but only need to execute commands on a few specific nodes, the graphSequencer function now offers enhanced support. The includeNodes parameter allows us to limit the result to only the nodes we're interested in.

graphSequencer(new Map([
    [0, [1]],
    [1, [2]],
    [2, [3]],
    [3, [0]],
]), [0, 1, 2])

For instance, if our graph is a big cycle but our subgraph is a simple linked list, we can now safely execute commands on the selected nodes.

{
    safe: true,
    chunks: [[2], [1], [0]],
    cycles: [],
}

Improved Behavior

In the previous behavior of the graphSequencer function, there were some peculiarities when handling single-node cycles.

graphSequencer({
  graph: new Map([
    ['a', ['a']],
    ['b', ['b']],
    ['c', ['c']],
  ]),
  groups: ['a', 'b', 'c']
})

This resulted in issues when dealing with single-node cycles, as demonstrated by the following output:

{
  safe: false,
  chunks: [ [ 'a' ], [ 'b' ], [ 'c' ] ],
  cycles: [
    [ 'a' ], [ 'a', 'a' ],
    [ 'b' ], [ 'b', 'b' ],
    [ 'c' ], [ 'c', 'c' ],
    [ 'b' ], [ 'c' ],
    [ 'c' ]
  ]
}

Now, with the improved behavior, each single-node cycle is treated independently, ensuring safe parallel execution:

{
  safe: true,
  chunks: [ [ 'a', 'b', 'c' ] ],
  cycles: [ [ 'a' ], [ 'b' ], [ 'c' ] ]
}

@ahaoboy ahaoboy requested a review from zkochan as a code owner October 7, 2023 07:30
@welcome
Copy link

welcome bot commented Oct 7, 2023

💖 Thanks for opening this pull request! 💖
Please be patient and we will get back to you as soon as we can.

@ahaoboy ahaoboy force-pushed the perf/sort-graph branch 2 times, most recently from 380a7a6 to a8bee9b Compare October 7, 2023 07:49
@zkochan
Copy link
Member

zkochan commented Oct 7, 2023

Make the change here https://github.com/pnpm/graph-sequencer instead of moving the project to the monorepo. This way I can't understand what the changes were.

@ahaoboy
Copy link
Contributor Author

ahaoboy commented Oct 9, 2023

This PR represents a reimplementation of the old repository, addressing previously unsupported scenarios and resolving issues related to unexpected behavior. It also includes performance enhancements. The changes have been successfully validated against the original tests and additional test cases. The reason for not submitting this PR to the old repository is due to its lack of support for TypeScript (ts), linting, and other modern tooling. Should I limit my modifications to the algorithmic aspects within the framework of the original repository?

Make the change here https://github.com/pnpm/graph-sequencer instead of moving the project to the monorepo. This way I can't understand what the changes were.

@zkochan
Copy link
Member

zkochan commented Oct 9, 2023

The tests are failing

@ahaoboy
Copy link
Contributor Author

ahaoboy commented Oct 10, 2023

@zkochan done

workspace/graph-sequencer/src/index.ts Outdated Show resolved Hide resolved
graph,
groups: [keys],
})
return graphSequencer(graph, keys)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why has the API of the function changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the older version, the 'groups' parameter more like a concept of priority. In the case of cycles within the graph, if the nodes within a cycle have different execution priorities, it is safe to execute that cycle in the order of priority. For example:

graphSequencer({
  graph: new Map([
    ['a', ['b']],
    ['b', ['c']],
    ['c', ['a']],
  ]),
  groups: [['a', 'b', 'c']]
})

If we use the same priority, the cycle is unsafe:

{
  safe: false,
  chunks: [ [ 'a' ], [ 'c' ], [ 'b' ] ],
  cycles: [ [ 'a', 'b', 'c' ] ]
}

However, if we use different priorities:

  groups: [['a'], ['b', 'c']]

The execution becomes safe:

{ safe: true, chunks: [ [ 'a' ], [ 'c' ], [ 'b' ] ], cycles: [] }

Currently, pnpm does not have the concept of priorities. All nodes passed as parameters have the same priority when called. Priority is not supported at the moment, but support for subgraphs (currently also requiring all nodes to be passed) is available. If you wish to add priorities in the future, it's a straightforward process - you just need to implement priority checking when identifying cycles.

But we really want 'a' to be executed first.
*/
setOfKeys.has(d))]
d => d !== pkgPath && setOfKeys.has(d))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why were the comments removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because both scenarios are now supported

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still retained d => d !== pkgPath && setOfKeys.has(d))], this is because it reduces the number of unnecessary cycles and handles external nodes. If the 'includeNodes' parameter contains nodes that are not within the graph, we cannot determine whether to discard them or treat them as separate isolated nodes. This decision should be made by the caller. If they should be discarded, they should be removed from 'includeNodes,' and if they are to be considered as standalone nodes within the graph, they should be added to the graph.

workspace/graph-sequencer/test/index.ts Outdated Show resolved Hide resolved
Comment on lines 96 to 110
test('graph with multiple dependencies on one item', () => {
expect(graphSequencer(new Map([
[0, [3]],
[1, [3]],
[2, []],
[3, []],
]), [0, 1, 2, 3])).toStrictEqual(
{
safe: true,
chunks: [[2, 3], [0, 1]],
cycles: [],
}
)
})

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not related to your changes but I guess our script runners are not as affective as they should be at the moment.

We don't really need to wait for both 2 and 3 to finish before running 0 an 1. We only need to wait for 3 to finish and then we can continue with 0 and 1 while 2 might be still in progress.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior is the same as the old version

graphSequencer({
  graph: new Map([
    ['a', ['c']],
    ['b', ['c']],
    ['c', []],
    ['d', []],
  ]),
  groups: [['a', 'b', 'c','d']]
})
// { safe: true, chunks: [ [ 'c', 'd' ], [ 'a', 'b' ] ], cycles: [] }

workspace/graph-sequencer/test/index.ts Outdated Show resolved Hide resolved
workspace/graph-sequencer/test/index.ts Outdated Show resolved Hide resolved
@ahaoboy
Copy link
Contributor Author

ahaoboy commented Oct 15, 2023

@zkochan All done, PTAL

CONTRIBUTING.md Outdated
@@ -36,7 +36,7 @@ sudo dnf install make automake gcc gcc-c++ kernel-devel
You can run the tests of the project that you modified by going to the project's directory and running:

```shell
pnpm test
pnpm test-main
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be reverted.

Copy link
Contributor Author

@ahaoboy ahaoboy Oct 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no npm command named 'test' in root project

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not about the root project.

The text says "by going to the project's directory"

@zkochan
Copy link
Member

zkochan commented Oct 15, 2023

Let's move the graph-sequencer project from the workspace to the deps directory. Also, rename it to @pnpm/deps.graph-sequencer.

['c', []],
['d', ['a']],
['e', ['a', 'b', 'c']],
]), ['a', 'b', 'c', 'd', 'e'])).toStrictEqual(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make the second argument optional if all nodes are included?

)
})

test('graph with two self cycles and a edge link them', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
test('graph with two self cycles and a edge link them', () => {
test('graph with two self cycles and an edge linking them', () => {

)
})

test('graph with connected to each other but not a cycle', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
test('graph with connected to each other but not a cycle', () => {
test('graph with nodes connected to each other sequentially without forming a cycle', () => {

)
})

test('graph with 5 nodes and we just need 3 nodes', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
test('graph with 5 nodes and we just need 3 nodes', () => {
test('graph sequencing with a subset of 3 nodes, ignoring 2 nodes, in a 5-node graph', () => {

)
})

test('graph with no edges and we need subgraph', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
test('graph with no edges and we need subgraph', () => {
test('graph of isolated nodes with no edges, sequencing a subgraph of selected nodes', () => {

)
})

test('graph with resolved cycle subgraph', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
test('graph with resolved cycle subgraph', () => {
test('graph with a cycle, but sequencing a subgraph that avoids the cycle', () => {

)
})

test('graph with full conn', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use "full conn" in any test. Use "fully connected"

Suggested change
test('graph with full conn', () => {
test('graph with fully connected subgraph and additional connected node', () => {

)
})

test('graph with self cycle', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
test('graph with self cycle', () => {
test('graph with self-cycle', () => {

@@ -0,0 +1,40 @@
{
"name": "@pnpm/deps.graph-sequencer",
"version": "5.0.4",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"version": "5.0.4",
"version": "0.0.0",

@@ -0,0 +1,10 @@
---
"@pnpm/plugin-commands-rebuild": minor
"@pnpm/deps.graph-sequencer": minor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"@pnpm/deps.graph-sequencer": minor
"@pnpm/deps.graph-sequencer": major

* @param {T[]} includeNodes - An array of nodes that should be included in the sorting process. Other nodes will be ignored.
* @returns {Result<T>} An object containing the result of the sorting, including safe, chunks, and cycles.
*/
export function graphSequencer<T> (graph: Graph<T>, includeNodes: T[] = [...graph.keys()]): Result<T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export function graphSequencer<T> (graph: Graph<T>, includeNodes: T[] = [...graph.keys()]): Result<T> {
export function graphSequencer<T> (graph: Graph<T>, includedNodes: T[] = [...graph.keys()]): Result<T> {

@zkochan zkochan merged commit 4246f41 into pnpm:main Oct 15, 2023
8 checks passed
@welcome
Copy link

welcome bot commented Oct 15, 2023

Congrats on merging your first pull request! 🎉🎉🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants