Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept objects for encoded typedarrays in data_array valType #5230

Merged
merged 142 commits into from
Jan 5, 2024

Conversation

archmoj
Copy link
Contributor

@archmoj archmoj commented Oct 22, 2020

Architecture Details

The approach is to perform the typed array decoding in coerce (before the supplyDefaults logic in the layout and traces). This has the advantage that none of the supply defaults or calc logic needs to be aware of the typed array specification objects.

For performance, the ArrayBuffer instances resulting from the decoding process are cached by trace uid and property name. To keep cache bounded, only one ArrayBuffer is cached per trace per property. Playing with the isosurface example in #5230, building the typed array with this caching approach is about 10x faster than when performing the base64 decoding.

Implementation Details

This PR handles decoding N-dimensional arrays into a nested collection of Arrays. The inner most layer contains TypedArrays that are backed by the original decoded ArrayBuffer, so no copying is performed. So far, all the 2D traces I've played with have worked fine with this nesting arrangement.

More testing is needed, but hopefully nothing in supplyDefaults or calc or rendering will need to change in any of the traces (apart from potential bug fixes for TypedArray handling).

Latest demo

codepen

Previous attempt

The initial commit dc9f4e5 was inspired by @jonmmease's PR #2911.

@plotly/plotly_js

- move decoding step to the start of calc data
@archmoj
Copy link
Contributor Author

archmoj commented Nov 2, 2020

@jonmmease are you interested to review & possibly work on this PR?

@almarklein almarklein mentioned this pull request Nov 6, 2020
16 tasks
@almarklein
Copy link
Contributor

I'd love for this functionality to land, in particular to communicate images and volumetric data more efficiently. In addition to being more memory efficient, it would also improve the speed of the JSON encoding, as we found during our work on improving the performance of the JSON encoder in #2880.

I'm happy to get my hands dirty to help move this effort forwards. E.g. by implementing the encoding step for the Python JSON encoder. Though that should come after this. In the mean time, I could perhaps help implement support for other traces?

@almarklein
Copy link
Contributor

I just did some benchmarking on the encoding of ndarray data in Python. Perhaps this helps create some enthusiasm for this feature :)

I used a test array of 1000x1000 containing random data. I compared encoding the data in the form proposed in this PR against letting the PlotlyJSONEncoder converting it to lists. The timing includes the time of the base64 encoding.

  • The size is 28% (more than 3x smaller).
  • The speed is about 10x faster (from ~1 sec to ~0.1s for this data size).

(this is measured with the improved PlotlyJSONEncoder of #2880, before that, the speed difference was even a factor 25).

src/plots/plots.js Outdated Show resolved Hide resolved
@jonmmease
Copy link
Contributor

@jonmmease are you interested to review & possibly work on this PR?

Yeah, I'm interested in picking this up again. Probably early December.

@almarklein Thanks for chiming in and for trying out those mini benchmarks!

One not yet settled question, as far as I understand, is what multi-dimensional array buffers are going to be de-serialized into in JavaScript (since TypedArrays are 1D only). @archmoj, I believe we already have a dependnecy on ndarray(https://github.com/scijs/ndarray) through the WebGL stuff. Do you think we could make plotly.js accept these as input for 2D/3D arrays (heatmap, volume, etc.)?

@archmoj
Copy link
Contributor Author

archmoj commented Nov 18, 2020

@jonmmease are you interested to review & possibly work on this PR?

Yeah, I'm interested in picking this up again. Probably early December.

@almarklein Thanks for chiming in and for trying out those mini benchmarks!

One not yet settled question, as far as I understand, is what multi-dimensional array buffers are going to be de-serialized into in JavaScript (since TypedArrays are 1D only). @archmoj, I believe we already have a dependnecy on ndarray(https://github.com/scijs/ndarray) through the WebGL stuff. Do you think we could make plotly.js accept these as input for 2D/3D arrays (heatmap, volume, etc.)?

Right now one could pass 2D/3D arrays to volume, surface, etc. using 'shape' as mentioned in the modified attribute description.

src/plots/plots.js Outdated Show resolved Hide resolved
@jonmmease
Copy link
Contributor

Unlike #2911, the decoding process is done at calc step instead of supplyDefaults: 636e644

@archmoj Could you explain your motivation here? If I remember correctly, it seemed easier to me to do this in the top-level supplyDefaults so that all of the individual traces wouldn't have to be updated to deal with the base64 object.

Thanks again for resurrecting this!

@archmoj
Copy link
Contributor Author

archmoj commented Nov 28, 2020

Unlike #2911, the decoding process is done at calc step instead of supplyDefaults: 636e644

@archmoj Could you explain your motivation here? If I remember correctly, it seemed easier to me to do this in the top-level supplyDefaults so that all of the individual traces wouldn't have to be updated to deal with the base64 object.

Thanks again for resurrecting this!

supplyDefault is called during interactions. So basically we don't want to run slow processes on arrays in there.

@jonmmease
Copy link
Contributor

jonmmease commented Nov 28, 2020

One issue I'm running into with having decoding happen in calc. This does prevent the decoding from happening during interactions, but since supplyDefaults does run during interactions, the typed arrays get overridden by the buffer specification in fullData.

You can see the effect of this in your demo. When you click to change the modebar tool the isosurface disapears. This is because fullData ends up with the buffer specification object instead of the typed array.

Not sure the best way forward here. Is there some way to skip supplyDefaults on an attribute that isn't going to have calc run on it?

Works for initial render, but typed array is overwritten when
supplyDefaults is run again.
@jacksongoode
Copy link

Given the speed improvements, this would be great to get in!

Copy link
Collaborator

@alexcjohnson alexcjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💃 Let's do it! Great work, this is a long time coming!

@archmoj archmoj merged commit 7536e78 into master Jan 5, 2024
1 check passed
@archmoj archmoj deleted the handle-coded-typed-arrays branch January 5, 2024 16:15
@nicolaskruchten
Copy link
Contributor

Wooot! nice job folks! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants