Skip to content

Introduce instrumentation report utilities#197

Closed
AlexJones0 wants to merge 11 commits into
lowRISC:masterfrom
AlexJones0:instrumentation_report_utils
Closed

Introduce instrumentation report utilities#197
AlexJones0 wants to merge 11 commits into
lowRISC:masterfrom
AlexJones0:instrumentation_report_utils

Conversation

@AlexJones0
Copy link
Copy Markdown
Contributor

Note: this PR is marked as a draft as it currently depends on #194 and #196. The first commit of this PR is from #194, and the 2nd-7th commits are from that #196. These can be safely ignored - only the 8th-11th commits are relevant to this PR. When #194 and #196 are merged, this PR will be updated and marked as ready to review, but is otherwise ready for review.

This PR is the seventh of a series of PRs to introduce instrumentation reporting to DVSim.

This PR adds some utility functions & methods that will be used throughout the various instrumentation visualizations/graphs. These are not used yet, in an effort to split the changes up into multiple PRs that should be easier to review. Changes include:

  • Time string formatting not already available through libraries like datetime.
  • Add some helper methods to the instrumentation report to get concrete timing information where only partial info may exist (as far as the existing typed models go).
  • Helper function for automatically rendering a plotly Figure as a PNG instead of HTML when a threshold of entities in the figure is exceeded. This supports scaling in a generic way so that we don't get 250MB+ reports for large runs :)
  • Helper functions for making job metadata and colour mappings to use for plotly figures, to ensure consistent presentation across various visualisations.

See the commit messages for more information.

AlexJones0 added 11 commits May 14, 2026 12:06
Although it can currently be inferred/derived from the full name of the
job, having the block (& block variant) information explicitly available
in the recorded metadata is much more useful. This means that we can
make visualizations and calculations of regression results partitioned
by block or block variant.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
Define a common protocol for instrumentation report visualisations with
a rendering interface. The idea is to produce a single HTML
instrumentation report with various different visualisations (which
themselves could be graphs, images, text, etc.) which can be switched
between. The rendered outputs themselves are just embedded HTML
fragments (or `None` on failure). This should be simple enough whilst
allowing the level of extensibility and flexibility that is needed for
the current level of instrumentation reporting.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
Add a registry (like the existing registry/factory for the different
instrumentation types) that we can use to add new instrumentation
visualisations. In the same manner that we currently support plugins to
extend the instrumentation functionality of DVSim, this will allow users
extending DVSim to hook in and add their own custom instrumentation
visualisations as well.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
This will allow reports to add custom content to the HTML head if
desired for e.g. styling purposes.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
Whenever a DVSim scheduler run completes and we flush the JSON
instrumentation report, add additional logic to create the HTML
instrumentation report as well - under the same report directory that is
currently being used for the existing simulation HTML report outputs.

It is not 100% clear whether this is the best place to integrate the
instrumentation reporting yet. It seems like it might be more sensible
to do it alongside the simulation reporting to keep common functionality
grouped together, but this then means that we do not get the same
benefits for other flows (e.g. linting, formal) that do not have their
own custom HTML reports. So for now, we keep this in the generic
scheduler running logic.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
This commit introduces the instrumentation report template itself, and a
function `render_html_report` for rendering the instrumentation report
with a list of given instrumentation report visualization
implementations.

Some key things to note:
- We use `plotly.offline` to get the version of the minified plotly.js
  that is packaged with the plotly Python package. Like vendored static
  files this will still work in an airgapped environment, but this way
  it should remain in sync even when the plotly dependency has its
  version updated.
- We only include plotly JS (which is around 4MB minified) as a
  dependency when we actually have renders that might use it.
- For now we include all visualizations as different tabs on the same
  page. This is to keep things simple as the goal is to keep all the
  report logic relatively small and self-contained, but this could be
  expanded upon in the future if it is deemed too restrictive of an
  interface.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
Intended to be used for the instrumentation reporting logic, but this is
general enough that it can just be placed in the time utils.

The `format_time_as_hms` function gets a time format like
`12h 34m 56.79s`. As far as I could tell, there is no nice way to do
this using e.g. the existing datetime standard library, and this is
simple to implement manually, so add it as a utility here.

The `format_time_metric` is an extension of this intended for the
display of time metrics in the instrumentation reports. It reports that
time in hours, minutes and seconds, but also puts the time in raw
seconds alongside it (e.g. `2h 15m 37.21s (8,137.21s)`).

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
This commit adds some helper methods to the full instrumentation report
pydantic model to make generation of instrumentation report
visualizations easier.

Notably, it introduces a new `ConcreteJobTimingMetrics` model. A common
pattern within the instrumentation logic is to only get jobs for which
we definitely know the start and end time (and thus the duration), and
to discard other jobs. We must keep the partial model since this
guarantee does not hold during instrumentation, but we can make a helper
that can filter the jobs and return the concrete models. This saves a
lot of trouble with static type analysis and the need to constantly
re-assert the presence of timing information throughout instrumentation
logic.

The other `get_run_time_info` is an additional cautious helper method.
Generally we would always expect the scheduler timing info to be
populated, but in case it is not for whatever reason, we can derive a
near approximation for these times by simply examining the start and end
times of the jobs themselves.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
The additional kaleido dependency is needed so that plotly is able to
render & export graphs to image formats (e.g. PNG) instead of directly
rendering some dynamic HTML graph. This is needed for graphs for large
runs (> 10K jobs), where the directly rendered graphs grow to enormous
sizes if not compressed via conversion to an image.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
Add some utility functions and constant values to assist in
instrumentation report visualisation creation. This includes the ability
to make (cyclically) repeating colour mappings for datasets with a large
number of categories, generic logic to make a hover tooltip that gives
information about a job (optionally augmented by the presence of
metadata), and a helper for rendering large figures which dynamically
switches to rendering a PNG based on whether a threshold number of
displayed "entities" (points / bars / etc.) are being rendered.

Note that for now we just embed the PNG as a base64 encoded image in the
HTML directly. In the future it would probably be a good idea to save
these images separately and load them in via HTML src references - but
for now this complicates the abstraction interface to be able to handle
the edge case of very large graphs, so this is left as a TODO.

Also define some useful constants that should remain true across many
visualisations: target heights, thresholds & layout/presentation
configuration options.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
@AlexJones0
Copy link
Copy Markdown
Contributor Author

The licensecheck issue causing a CI failure is a minor issue that can be easily worked around (the licensecheck tool doesn't handle licenses with exceptions well). But it has exposed to me some issues with the underlying approach, so I need to spend a bit of time redoing this.

Specifically:

  • For two of the visualizations that will be added (Gantt chart & parallelism chart), these grow very large for large runs (e.g. 50k+ jobs).
  • This both bloats report file size (e.g. from 7 MiB to 250+ MiB for ~50K jobs) and drastically hurts performance when trying to browse the graph.
  • My original solution: since you mostly care about the shape at this scale and interactivity isn't important (with the normal rendering profile), just export it as a PNG. This was working well for me locally.

However it turns out that since Plotly (and many other big Python visualization frameworks) are based in JS, they need a browser to render before being converted to an image. Even worse, Plotly relies on the kaleido/choreographer dependencies to do this, which have a strict dependency on Chrome being available on the system.

Rather annoyingly, although Kaleido's README.md mentions the requirement for Chromium, the Plotly README.md says:

The kaleido package has no dependencies and can be installed using pip

(which feels a little misleading).

I'd really rather not make DVSim depend on a specific browser being installed if possible. One solution would be to just not render these graphs at scale, but another solution I'm exploring which looks quite feasible is just to render these graphs in matplotlib over a certain scale instead. Matplotlib can render directly to an image without going through a browser, and it doesn't seem too hard to convert between the two frameworks, especially since both problematic graphs actually just use the same underlying implementation, so it's really only translating one graph. The main point of difficulty will be abstracting in a nice way to not duplicate a lot of stuff between the two implementations.

Since I'm exploring that avenue and I don't intend to add the kaleido dependency nor use my original render_large_image helper, I'm closing this PR for now.

@AlexJones0 AlexJones0 closed this May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant