Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store graph edges in SharedArrayBuffer #6922

Merged
merged 134 commits into from
Nov 16, 2021
Merged

Conversation

lettertwo
Copy link
Member

@lettertwo lettertwo commented Sep 14, 2021

Store graph edges in SharedArrayBuffer

This changes the low level storage mechanism for graph adjacency lists from multiple JS Maps to a custom hashmap (😅 ) implemented on top of SharedArrayBuffer.


Background: Why would we want to implement our own hashmap?

Parallelism

Parcel has the ability to run many operations in parallel, but Node does not have any way to share data between threads except by serializing data and sending it to each worker, where it is then deserialized by each worker before work commences. Even for modestly sized Parcel projects, this overhead is significant.

JavaScript does have support for sharing memory between threads via SharedArrayBuffer, but unfortunately, there is currently nothing higher level (like a SharedMap or something) in JS or Node that can take advantage of this in a significant way.

Thus, having a custom data structure built on top of SharedArrayBuffer means that Parcel can build a very large graph and share some significant portion of that graph with workers for free.

Caching

Parcel caches build state frequently, and that involves serializing data to be written to cache. an ArrayBuffer of data serializes more quickly than a Map.

Other Solutions Considered

We evaluated other options, such as Protocol Buffers and FlatBuffers (either of which may still be viable for storing node data in the future), but ruled them out for this work for the following reasons:

  • Any transport format still requires serialization and deserialization, which is inherently less efficient than sharing memory
  • These solutions generally incur a build step to generate bindings for reading and writing data to the underlying transport format
  • The adjacency list only stores integers (node ids, edge types) which makes it good candidate for storage in an ArrayBuffer

We've added a small set of unit tests to exercise the basic assumptions about our implementation, but we are mostly relying on the existing suites of tests to verify this work.

We have also done extensive benchmarking to measure the impact of these changes in a real-world, significantly sized app.


EDIT: Refactoring based on feedback seems to have improved the overall performance enough to eliminate most of the regressions!

Benchmarks: Testing the impact of these changes in a real app


These benchmarks were taken on a Parcel app with a bundle graph containing ~92,421 nodes and ~255,538 edges and a request graph containing ~163,633 nodes and ~1,351,038 edges.

Some definitions:

Cold Build
A parcel production build with no cache.
Warm Build
A parcel production build from cache.
Cold Start
A parcel dev start with no cache.
Warm Start
A parcel dev start from cache.
Total {Serialization,Deserialization} Time
How much time is spent serializing or deserializing the graphs (including node data)
Edge {Serialization,Deserialization} Time
How much time is spent serializing or deserializing the adjacency list (just the edge data)
Cold Build v2 this PR change change (%)
Total Memory 3397.57 mb 1755.89 mb -1641.68 mb 48% less
Total Serialization Time 4941 ms 3309 ms -1632 ms 33% faster
Edge Serialization Time 2372 ms 1456 ms -916 ms 39% faster
Total Deserialization Time 5738 ms 4123 ms -1615 ms 28% faster
Edge Deserialization Time 2994 ms 518 ms -2476 ms 83% faster
init 1 ms 1 ms +0 ms 16% slower
build 163023 ms 135714 ms -27309 ms 17% faster
shutdown 2991 ms 4517 ms +1525 ms 51% slower
✨ Built in 146.39 s 135.69 s -10.70 s 7% faster
Warm Build v2 this PR change change (%)
Total Memory 1304.12 mb 946.77 mb -357.35 mb 27% less
Total Serialization Time 6469 ms 5379 ms -1090 ms 17% faster
Edge Serialization Time 3109 ms 1438 ms -1671 ms 54% faster
Total Deserialization Time 6223 ms 4563 ms -1660 ms 27% faster
Edge Deserialization Time 3028 ms 495 ms -2533 ms 84% faster
init 1994 ms 744 ms -1250 ms 63% faster
build 4007 ms 3304 ms -703 ms 18% faster
shutdown 2678 ms 3901 ms +1223 ms 46% slower
✨ Built in 4.95 s 3.79 s -1.16 s 23% faster
Cold Start v2 this PR change change (%)
Total Memory 2337.65 mb 1384.07 mb -953.58 mb 41% less
Total Serialization Time 4110 ms 3049 ms -1061 ms 26% faster
Edge Serialization Time 2358 ms 1329 ms -1029 ms 44% faster
Total Deserialization Time 6441 ms 3311 ms -3130 ms 49% faster
Edge Deserialization Time 3141 ms 356 ms -2785 ms 89% faster
init 1 ms 2 ms +1 ms 41% slower
build 79695 ms 69485 ms -10211 ms 13% faster
save 628 ms 415 ms -213 ms 34% faster
shutdown 2565 ms 3490 ms +925 ms 36% slower
✨ Built in 76.64 s 69.47 s -7.17 s 19% faster
Warm Start v2 this PR change change (%)
Total Memory 1195.51 mb 807.47 mb -388.04 mb 32% less
Total Serialization Time 6756 ms 4198 ms -2558 ms 38% faster
Edge Serialization Time 2326 ms 1498 ms -828 ms 36% faster
Total Deserialization Time 6254 ms 4073 ms -2181 ms 35% faster
Edge Deserialization Time 2897 ms 501 ms -2396 ms 83% faster
init 1934 ms 907 ms -1028 ms 53% faster
build 3382 ms 2418 ms -964 ms 29% faster
save 999 ms 464 ms -535 ms 54% faster
shutdown 2564 ms 3067 ms +503 ms 20% slower
✨ Built in 2.57 s 2.42 s -0.15 s 6% faster

Summary

  • Total memory usage is down significantly across all tested conditions
  • Serialization and deserialization are both significantly faster in all tested conditions
  • With warm cache, builds are on par or faster in all tested conditions
  • Shutdown times seem to be slower, which we did not expect, and requires further investigation.

While we have seen some regression with cold build times, and not much improvement with warm times, We have seen no regression (and even some improvement!) in build times, and we have also seen significant improvement in memory usage and serialization and deserialization times in all build scenarios. However, shutdown times are slower than expected.

The main reasons we think these are acceptable trade-offs are:

  • We believe we have not significantly regressed the most common DX scenarios
  • These results reflect one part of the graph improvements we are working toward. Our hope and expectation is that we will ultimately improve all of these metrics in future by backing the graph nodes in shared buffers like we have done for the edges here.

For this work, we did not address data stored in the graph nodes, focusing solely on the edges. Since there are generally many more edges than nodes in most of Parcel's graphs, this change represents a measurable improvement, even though there is still more to do to improve the efficiency of the graph structure.

thebriando and others added 30 commits April 12, 2021 16:19
This is not likely to stay here; it is probably better integrated
into some test or benchmarking utilities (like `dumpGraphToGraphviz`).
This is to allow truthy tests to pass when checking to see if
edges exist.
…raph

* bdo/buffer-backed-graph:
  Add generic TEdgeType to EfficientGraph
  change NullEdgeType back to 1, enforce edge types to be non-zero in Graph
  fix traversal test
  Use new EfficientGraph functions in Graph, update tests, fix edge type in removeEdge
  use EfficientGraph getNodes functions, update graph tests, change default edge type to 1
This adds `nodeAt` and `indexOfNode` utils to abstract away
the bookkeeping for storing and retrieving nodes in the byte array.

Also fixes a bug where subsequently added nodes would partially
overlap previous nodes in the byte array.
@wbinnssmith wbinnssmith mentioned this pull request Sep 15, 2021
3 tasks
Base automatically changed from bdo/number-edgetypes to v2 September 16, 2021 01:05
lettertwo and others added 7 commits September 22, 2021 17:36
* v2:
  Upgrade Flow to 0.160.1 (#6964)
  Only use error overlay if there's a document (#6960)
  Don't fail when HTML tags are implied (#6752)
  Reorder resolveOptions() env priority (#6904)
  Change edge types to numbers (#6126)
  Bump swc (#6848)
  Print diagnostics for scope hoisting bailouts at verbose log level (#6918)
This looks like a big change because a lot of stuff moved around.
The edges are currently stored the same way they were before,
but the hash table implementation has been extracted for use
in the new node map.

The node map is where the real change is; instead of just linking
nodes to addresses in the edge map, it now stores multiple sets
of links per node, one for each type of edge.

This allows us to eliminate the need for a type map cache,
as we are now able to look up edges by type in constant time.
@lettertwo lettertwo marked this pull request as ready for review September 28, 2021 23:49
return true;
}

*getAllEdges(): Iterator<{|
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the now the only scenario where we generate values for iteration. All other traversals are now using arrays.

Comment on lines -295 to -307
replaceNode(
fromNodeId: NodeId,
toNodeId: NodeId,
type: TEdgeType | NullEdgeType = 1,
): void {
this._assertHasNodeId(fromNodeId);
for (let parent of this.inboundEdges.getEdges(fromNodeId, type)) {
this.addEdge(parent, toNodeId, type);
this.removeEdge(parent, fromNodeId, type);
}
this.removeNode(fromNodeId);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method appears to be unused.

* Returns the id of the added node.
*/
addNode(): NodeId {
let id = this.#nodes.getId();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we don't actually add an entry to the node map until it is connected to another node id in addEdge.

Comment on lines +294 to +295
if (toNode === null) toNode = this.#nodes.add(to, type);
if (fromNode === null) fromNode = this.#nodes.add(from, type);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this edge is the first of this type for either of the nodes, we add a new entry to the node map.

}
}

/**
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation described here was inspired by This dissection of the v8 Map internals

/** The number of items to accommodate per hash bucket. */
static BUCKET_SIZE: number = 2;

data: Uint32Array;
Copy link
Member Author

@lettertwo lettertwo Sep 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There would be significant memory saved if we used DataView instead, which would allow us to store the type field in 1 byte (instead of 4), but it also would complicate the implementation significantly.

The savings would come from the following:

  • each node adds an entry for every unique edge type it is connected to, so in the worst case (every node had at least one edge of every type), that's n * t * 3 bytes saved, where n is the number of nodes and t is the number of unique edge types in the graph.
  • each edge keeps its type in its entry, which would be e * 3 bytes saved, where e is the number of edges in the graph.

Additionally, typed arrays are technically not portable, since they use the originating system's endianess

Some of the reasons we haven't implemented this yet:

  • DataView is likely not as well optimized as type arrays
  • memory allocation is more complex, since we cannot rely on the uniform item size of a typed array
  • reads and writes must now be aware of endianness
  • In practice, most systems are LE, so the non-portable nature of typed array is probably a non-issue
  • memory savings from this implementation are already significant compared to v8 Maps

Copy link
Member

@thebriando thebriando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this._graph
.getNodeIdsConnectedFrom(nodeId, bundleGraphEdgeTypes.references)
.reverse(),
});

I wonder if creating a new function to do this would have any benefit since we can traverse backwards with the doubly linked list now.

@lettertwo lettertwo mentioned this pull request Oct 27, 2021
2 tasks
* v2: (68 commits)
  Fix RangeError in `not export` error with other file type (#7295)
  Apply sourcemap in @parcel/transformer-typescript-tsc (#7287)
  Fix side effects glob matching (#7288)
  Fix changelog headings
  v2.0.1
  Changelog for v2.0.1
  Resolve GLSL relative to the importer, not the asset (#7263)
  fix: add @parcel/diagnostic as dependency of @parcel/transformer-typescript-types (#7248)
  Fixed missing "Parcel" export member in Module "@parcel/core" (#7250)
  Add script to sync engines with core version (#7207)
  Bump swc (#7216)
  Make Webpack loader detection regex dramatically faster (#7226)
  swc optimizer (#7212)
  Update esbuild in optimizer (#7233)
  Properly visit member expressions (#7228)
  Update to prettier 2 (#7209)
  Fix serve mode with target override and target source fields (#7187)
  Update package.json to include the repository (#7184)
  fix #6730: add transformer-raw as dependency of config-webextension (#7193)
  Log warning instead of crash if image optimizer fails (#7119)
  ...
@devongovett devongovett merged commit cacdf67 into v2 Nov 16, 2021
@devongovett
Copy link
Member

🥳 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants