Skip to content

0.5.0 Performance, Reworked UI, New formats, Deno

Compare
Choose a tag to compare
@lahmatiy lahmatiy released this 10 May 02:43

This release of CPUpro introduces significant updates, including performance enhancements, a redesigned user interface, and expanded format and runtime support. This version introduces groundbreaking enhancements that significantly reduce the time to load and process extremely large profiles, making CPUpro highly efficient for analyzing complex long-running scripts. The user interface has been thoroughly revamped to offer a more intuitive and responsive experience, enhancing usability across various features and views. New profile formats and support for the Deno runtime has been added, expanding the tool's versatility and adaptability to modern development environments.

Performance

CPUpro has been entirely re-engineered to optimize the preprocessing of profiles upon loading and for subsequent computations. This redesign enables it to handle massive profiles (exceeding 100MB) significantly faster than other tools. CPUpro is currently the best option for analyzing intense long-running scripts that generate extensive CPU profiles, such as webpack build profiles or prolonged browser sessions (that can last minutes or even tens of minutes).

The table below illustrates the time of loading and first render of profiles of varying sizes across different tools:

Profile size Profile type CPUpro v0.5 CPUpro v0.4 Chromium DevTools speedscope
33MB
215k samples / 120k call tree
V8 cpuprofile 0.5s 0.8s 4.6s 6.5s
113MB
625k samples / 62k call tree
Chromium Profile 1.3s 1.6s 10.6s 12.4s
114MB
739k samples / 446k call tree
V8 cpuprofile 1.3s 2.6s 12.3s 18.5s
239MB
11.6M samples / 489k call tree
V8 cpuprofile 2.8s 11.3s 48s Out of memory
(after 23s)
277MB
127k samples / 35k call tree
Chromium Profile 1.9s 2.2s 4.2s Out of memory
(after 30s)
418MB
897k samples / 1.86M call tree
V8 cpuprofile 4.6s 8.7s Out of memory
(after 36s)
Out of memory
(after 49s)
2GB
7.3M samples / 7.28M call tree
V8 cpuprofile 27.1s Out of memory
(after 57s)
Invalid string length
(after 20s)
Out of memory
(after 43s)

Chrome 124 / MacBook Pro 13-inch, M1, 2020

As indicated in the table, the time is affected not only by the profile size but also by its format, the number of samples and the size of the call tree (note that some profiles contain millions of samples and nodes). Notably, the Chromium Profile, which includes extensive additional data beside CPU profile, tends to load faster than .cpuprofile files of the same size. It is worth mentioning that some tools struggle with large profiles, hitting the heap size limit (4GB) and resulting in crashes because of "Out of Memory" errors, which is particularly frustrating when a lengthy load time yields no results. Unlike these tools, CPUpro avoids such pitfalls thanks to new optimizations, now capable of loading and processing even 2GB profiles.

When comparing the loading time between CPUpro versions 0.4 and 0.5, the difference does not look so impressive. The reason for this is that a significant portion of the time is spent on loading and parsing JSON which remains unchanged. However, if we isolate the processing time and initial rendering, where main optimization efforts were concentrated, the new version shows performance improvements ranging from 1.5 to 11 times:

Profile size Profile type Load data & parse CPUpro v0.5
(computations + render)
CPUpro v0.4
(computations + render)
Delta
33MB
215k samples / 120k call tree
V8 cpuprofile 0.3s 0.16s 0.52s 3.1x
113MB
625k samples / 62k call tree
Chromium Profile 1.1s 0.21s 0.64s 3.0x
114MB
739k samples / 446k call tree
V8 cpuprofile 0.9s 0.37s 1.48s 4.0x
239MB
11.6M samples / 489k call tree
V8 cpuprofile 2.2s 0.79s 9.21s 11.7x
277MB
127k samples / 35k call tree
Chromium Profile 1.9s 0.15s 0.24s 1.7x
418MB
897k samples / 1.86M call tree
V8 cpuprofile 3.6s 1.12s 4.26s 3.6x
2GB
7.3M samples / 7.28M call tree
V8 cpuprofile 22.1s 4.98s

Chrome 124 / MacBook Pro 13-inch, M1, 2020

The acceleration was achieved by switching to linear memory (TypedArrays) for tree representation and calculations storage, despite the increased number and complexity of computations added since v0.4. The majority of the calculation algorithms are implemented using simple loops without recursion or complex branching. Experiments with WebAssembly for some calculations have resulted in up to a 2x speed increase in JavaScriptCore (Safari) and SpiderMonkey (Firefox), aligning execution times with V8, where there was no change in performance. Remarkably, the new algorithms allow V8 to optimize JavaScript execution to match the efficiency of WebAssembly, which was an unexpected.

Adopting TypedArray has drastically reduced heap memory usage. While modern browsers typically offer up to 4GB of heap space, exceeding this limit can crash browser's tab (and, accordingly, the app). CPUpro primarily uses the heap only for loading and parsing JSON and during the initial stages of data processing, then most data is managed using TypedArrays. These buffers, stored in what is termed "external memory", are only limited by the system's available memory, significantly lowering the risk of crashes due to "Out of memory". However, there is no reason to worry about it, since CPUpro consumes memory sparingly:

Profile size Profile type CPUpro v0.5 CPUpro v0.4 Chromium DevTools speedscope
33MB
215k samples / 120k call tree
V8 cpuprofile 8MB
External: 20MB
97MB 752MB 916MB
113MB
625k samples / 62k call tree
Chromium Profile 7MB
External: 17MB
61MB 1063MB 466MB
114MB
739k samples / 446k call tree
V8 cpuprofile 8MB
External: 155MB
324MB 1803MB 2001MB
239MB
11.6M samples / 489k call tree
V8 cpuprofile 12MB
External: 92MB
463MB 3877MB Out of memory
277MB
127k samples / 35k call tree
Chromium Profile 8MB
External: 9MB
34MB 488MB Out of memory
418MB
897k samples / 1.86M call tree
V8 cpuprofile 18MB
External: 233MB
1387MB Out of memory Out of memory
2GB
7.3M samples / 7.28M call tree
V8 cpuprofile 22MB
External: 866MB
Out of memory Invalid string length Out of memory

Data collected after loading the profile and calling the garbage collector

After loading the profile and initial calculations, CPUpro is ready for rapid timings recalculations and data sampling on demand, e.g. filter changes. This enhancement enabled the introduction of new complex views that were previously impossible due to prolonged calculations (many seconds) and UI freezing, which broke the user experience. Most views have also been optimized to react almost instantaneously to changes in filters, ensuring a seamless user experience even with large profiles.

cpupro-perf.mov

The optimizations in speed and memory efficiency are not just about improving profile loading and UI responsiveness, they also unlock new capabilities. Notably, it's crucial for features such as profile comparison, which requires loading at least two profiles, potentially doubling both the computation time and memory usage. These challenges have been addressed, setting the stage for future enhancements including profile comparison and more.

User interface

The user interface has undergone a significant redesign. The start page now appears more compact and provides a clearer overview of how the V8 engine operates. It features a timeline categorized by work type and function clustering tables, followed by a flamechart.

Demo


Other pages have also been reworked to be more informative. Each page now includes:

  • A timeline that not only displays self time but also nested time, with the distribution of nested time by categories.
  • A new section titled "Nested Time Distribution" that offers insights into the distribution of nested time in a hierarchical format, from a package to a function.
  • A basic flamechart displaying all frames related to the current subject (category, package, module, or function) as root frames.
image


The timeline has been enhanced with a tooltip that provides expanded details and the capability to select a range, a feature previously lacking when focusing on specific segments of work.

image


The Flamechart is now faster and smoother. It includes new selection capabilities and a detailed information block for the selected or zoomed frame.

image


The welcome page has been redesigned as well, and now offers example profiles in various formats to try:

image

New formats, runtimes, and registries

Support for new formats has been introduced:

  • V8 log converted into JSON with the --preprocess option (node --prof-process --preprocess v8.log > v8log.json). Although the filtered version of a V8 log file loses many details, it remains more informative than .cpuprofile files. This release includes basic support for V8 logs, with plans for expanded support in future releases.
  • Edge Enhanced Performance Traces (.devtools). Currently, using .devtools offers no much benefits over Chromium Performance Profiles but eliminates the need to convert to other supported formats. Future releases may utilize additional data provided by this format.

Adding support for the V8 log format has enabled the analysis of profiles captured in Deno, where capturing .cpuprofile is less suitable (if at all possible) as in Node.js or browsers. However, Deno supports v8 flags, allowing the capture and conversion of V8 logs into JSON (Deno manual):

> deno run --v8-flags=--prof script.ts
> node --prof-process --preprocess isolate-0x000000000-v8.log > v8log.json

Detection for Deno, Electron, and Edge runtimes has been added where feasible. Runtime icons was added as well:

image


Given the nature of Deno programs to source directly from CDNs, detection for the most popular CDNs and registries has also been included. The following screenshot demonstrates before and after CDN detection was added:

image

Changelog

  • Changed the terminology: replaced "area" with "category"
  • Formats
  • Computations
    • Reworked the computations on profile loading from scratch with performance and memory usage in mind, achieving a 4-8 times speed increase and reduced memory consumption
    • Implemented GC nodes reparenting to the script node
    • Fixed the placement of bundle modules to be placed in the "script" category instead of the "bundle" category
    • Changed the handling of negative time deltas, they are now corrected by rearranging instead of being ignored
    • Resolved the issue with shortening paths to scripts when webpack/runtime is present in the CPU profile
    • Adjusted call frame reference computation by omitting line and column when they are not specified or less than zero
  • Runtimes & registries
    • Added Deno detection
    • Added Electron detection
    • Added detection for CDNs and registries: JSR, deno.land, jsdelivr, unpkg, esm.sh, esm.run, jspm, and skypack
  • Redesigned welcome page, added "Try an example" buttons
  • Reworked the layout and UX of the main page
    • Implemented permanent colors and a fixed timeline order for areas and module types
  • Improved the display of regular expressions, particularly long ones
  • Reworked subject pages, each page now includes:
    • A timeline that not only displays self time but also nested time, with the distribution of nested time by categories
    • A section "Nested time distribution"
    • A basic flamechart displaying all frames related to the current subject as root frames
  • Timeline
    • Added the capability to select a range
    • Added a tooltip that provides expanded details on a range
  • Flamechart
    • Added vertical scrolling locking when not activated
    • Added a detailed information block for the selected or zoomed frame
    • Added the capability to select frames
    • Improved performance and reliability
    • Changed colors to match category colors and module types