-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store graph edges in SharedArrayBuffer #6922
Conversation
This is not likely to stay here; it is probably better integrated into some test or benchmarking utilities (like `dumpGraphToGraphviz`).
This is to allow truthy tests to pass when checking to see if edges exist.
…ault edge type to 1
…nto bdo/buffer-backed-graph
…raph * bdo/buffer-backed-graph: Add generic TEdgeType to EfficientGraph change NullEdgeType back to 1, enforce edge types to be non-zero in Graph fix traversal test Use new EfficientGraph functions in Graph, update tests, fix edge type in removeEdge use EfficientGraph getNodes functions, update graph tests, change default edge type to 1
This adds `nodeAt` and `indexOfNode` utils to abstract away the bookkeeping for storing and retrieving nodes in the byte array. Also fixes a bug where subsequently added nodes would partially overlap previous nodes in the byte array.
* v2: Upgrade Flow to 0.160.1 (#6964) Only use error overlay if there's a document (#6960) Don't fail when HTML tags are implied (#6752) Reorder resolveOptions() env priority (#6904) Change edge types to numbers (#6126) Bump swc (#6848) Print diagnostics for scope hoisting bailouts at verbose log level (#6918)
This looks like a big change because a lot of stuff moved around. The edges are currently stored the same way they were before, but the hash table implementation has been extracted for use in the new node map. The node map is where the real change is; instead of just linking nodes to addresses in the edge map, it now stores multiple sets of links per node, one for each type of edge. This allows us to eliminate the need for a type map cache, as we are now able to look up edges by type in constant time.
return true; | ||
} | ||
|
||
*getAllEdges(): Iterator<{| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the now the only scenario where we generate values for iteration. All other traversals are now using arrays.
replaceNode( | ||
fromNodeId: NodeId, | ||
toNodeId: NodeId, | ||
type: TEdgeType | NullEdgeType = 1, | ||
): void { | ||
this._assertHasNodeId(fromNodeId); | ||
for (let parent of this.inboundEdges.getEdges(fromNodeId, type)) { | ||
this.addEdge(parent, toNodeId, type); | ||
this.removeEdge(parent, fromNodeId, type); | ||
} | ||
this.removeNode(fromNodeId); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method appears to be unused.
* Returns the id of the added node. | ||
*/ | ||
addNode(): NodeId { | ||
let id = this.#nodes.getId(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we don't actually add an entry to the node map until it is connected to another node id in addEdge
.
if (toNode === null) toNode = this.#nodes.add(to, type); | ||
if (fromNode === null) fromNode = this.#nodes.add(from, type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this edge is the first of this type for either of the nodes, we add a new entry to the node map.
} | ||
} | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation described here was inspired by This dissection of the v8 Map internals
/** The number of items to accommodate per hash bucket. */ | ||
static BUCKET_SIZE: number = 2; | ||
|
||
data: Uint32Array; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There would be significant memory saved if we used DataView
instead, which would allow us to store the type
field in 1 byte (instead of 4), but it also would complicate the implementation significantly.
The savings would come from the following:
- each node adds an entry for every unique edge type it is connected to, so in the worst case (every node had at least one edge of every type), that's
n * t * 3
bytes saved, wheren
is the number of nodes andt
is the number of unique edge types in the graph. - each edge keeps its type in its entry, which would be
e * 3
bytes saved, wheree
is the number of edges in the graph.
Additionally, typed arrays are technically not portable, since they use the originating system's endianess
Some of the reasons we haven't implemented this yet:
DataView
is likely not as well optimized as type arrays- memory allocation is more complex, since we cannot rely on the uniform item size of a typed array
- reads and writes must now be aware of endianness
- In practice, most systems are LE, so the non-portable nature of typed array is probably a non-issue
- memory savings from this implementation are already significant compared to v8 Maps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parcel/packages/core/core/src/BundleGraph.js
Lines 1210 to 1213 in 8aa9c25
this._graph | |
.getNodeIdsConnectedFrom(nodeId, bundleGraphEdgeTypes.references) | |
.reverse(), | |
}); |
I wonder if creating a new function to do this would have any benefit since we can traverse backwards with the doubly linked list now.
* v2: (68 commits) Fix RangeError in `not export` error with other file type (#7295) Apply sourcemap in @parcel/transformer-typescript-tsc (#7287) Fix side effects glob matching (#7288) Fix changelog headings v2.0.1 Changelog for v2.0.1 Resolve GLSL relative to the importer, not the asset (#7263) fix: add @parcel/diagnostic as dependency of @parcel/transformer-typescript-types (#7248) Fixed missing "Parcel" export member in Module "@parcel/core" (#7250) Add script to sync engines with core version (#7207) Bump swc (#7216) Make Webpack loader detection regex dramatically faster (#7226) swc optimizer (#7212) Update esbuild in optimizer (#7233) Properly visit member expressions (#7228) Update to prettier 2 (#7209) Fix serve mode with target override and target source fields (#7187) Update package.json to include the repository (#7184) fix #6730: add transformer-raw as dependency of config-webextension (#7193) Log warning instead of crash if image optimizer fails (#7119) ...
4f219c8
to
73cf858
Compare
🥳 🎉 |
Store graph edges in SharedArrayBuffer
This changes the low level storage mechanism for graph adjacency lists from multiple JS Maps to a custom hashmap (😅 ) implemented on top of SharedArrayBuffer.
Background: Why would we want to implement our own hashmap?
Parallelism
Parcel has the ability to run many operations in parallel, but Node does not have any way to share data between threads except by serializing data and sending it to each worker, where it is then deserialized by each worker before work commences. Even for modestly sized Parcel projects, this overhead is significant.
JavaScript does have support for sharing memory between threads via SharedArrayBuffer, but unfortunately, there is currently nothing higher level (like a
SharedMap
or something) in JS or Node that can take advantage of this in a significant way.Thus, having a custom data structure built on top of
SharedArrayBuffer
means that Parcel can build a very large graph and share some significant portion of that graph with workers for free.Caching
Parcel caches build state frequently, and that involves serializing data to be written to cache. an
ArrayBuffer
of data serializes more quickly than aMap
.Other Solutions Considered
We evaluated other options, such as Protocol Buffers and FlatBuffers (either of which may still be viable for storing node data in the future), but ruled them out for this work for the following reasons:
ArrayBuffer
We've added a small set of unit tests to exercise the basic assumptions about our implementation, but we are mostly relying on the existing suites of tests to verify this work.
We have also done extensive benchmarking to measure the impact of these changes in a real-world, significantly sized app.
Benchmarks: Testing the impact of these changes in a real app
These benchmarks were taken on a Parcel app with a bundle graph containing ~92,421 nodes and ~255,538 edges and a request graph containing ~163,633 nodes and ~1,351,038 edges.
Some definitions:
Summary
With warm cache,builds are on par or faster in all tested conditionsWhile we have seen some regression with cold build times, and not much improvement with warm times,We have seen no regression (and even some improvement!) in build times, and we have also seen significant improvement in memory usage and serialization and deserialization times in all build scenarios. However, shutdown times are slower than expected.The main reasons we think these are acceptable trade-offs are:
For this work, we did not address data stored in the graph nodes, focusing solely on the edges. Since there are generally many more edges than nodes in most of Parcel's graphs, this change represents a measurable improvement, even though there is still more to do to improve the efficiency of the graph structure.