Reduce calls to `getStringSize` when calculating visible ticks (#2589) #3953

bernardobelchior · 2023-11-12T16:57:26Z

Description

Reduce the number of times getStringSize is called when calculating which ticks are rendered.
This optimization is applied to getTicksEnd, getTicksStart and getEquidistantTicks, but the latter is unlikely to benefit from it since the algorithm does not iterate over the candidate ticks sequentially.

I have created this repository, based on the JSFiddle available in this issue to test the changes. It generates a chart with 10000 random data points.

Based on Recharts' master branch (pointing to commit 8cdfab7facc4d1d107c11a1fc3d4a26e716dddca at the time of writing), the Chrome DevTools' Performance chart is as follows:

You can see that 95% time is taken by getStringSize() and descendants, which account for 3.2 seconds on my M1 Macbook Pro from 2020. In fact, I counted and getStringSize() is being called around 40k times to render the chart.

Since we know that getStringSize is expensive, whatever we can do to reduce the number of times it's called will likely yield a performance improvement. As such, this PR is comprised of two changes:

Call getTickSize in getTicksEnd, getTicksStart and getEquidistantTicks only when absolutely necessary;
Update the isVisible calculation to check if the tick won't be visible even before we take size into account: this way we can skip calculating the string size if we already know that the tick won't be visible.

After this change, getStringSize still takes up most of the time (88%), but the absolute value is now much smaller (1.8 seconds):

getStringSize is now called around 18k times, which means there's still a lot of room for future improvements.

Related Issue

#2589
#1465

Motivation and Context

Explained above

How Has This Been Tested?

Added some unit tests. There were none, so it's hard to see if there's no regression, but I ran the new tests against the isVisible function before and after these changes and the results were the same.

If there's any edge case you'd like me to test, please let me know 😄

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have added tests to cover my changes.
I have added a storybook story or extended an existing story to show my changes
All new and existing tests passed.

…rts#2589)

ckifer · 2023-11-12T17:08:15Z

Thanks for the analysis!

Yeah.. unfortunately the long term solution here requires quite a bit more work with a broader swath of changes. Some of the refactoring we're doing should enable us to reduce these calls quite a bit more I think!

bernardobelchior · 2023-11-12T17:21:10Z

Thanks you, @ckifer!
I have a couple question regarding testing:

Does Recharts have any automated visual tests? Or should I go through Storybook stories to check whether there were any regressions?
Do you think there's any additional unit test I should add in this PR? We had no tests for isVisible, so we might want to add some more tests there

Thanks!

ckifer · 2023-11-12T17:23:01Z

We run visual regression with chromatic - it isn't perfect but it will cover some things for sure
I haven't looked at the PR yet but will let you know

bernardobelchior · 2023-11-12T17:24:41Z

Also, @ckifer it seems like getStringSize's cache has a limit of 2k values and it completely clears it when it's over that.

There's several improvements we can think about:

Allow users to customize the number of entries in the cache: it'll affect memory usage but might be worth the tradeoff? The downside is that users shouldn't need to think about this when using a charting library
Evict older entries from the cache instead of clearing it altogether;

nikolasrieble · 2023-11-12T18:22:02Z

Also, @ckifer it seems like getStringSize's cache has a limit of 2k values and it completely clears it when it's over that.

There's several improvements we can think about:

Allow users to customize the number of entries in the cache: it'll affect memory usage but might be worth the tradeoff? The downside is that users shouldn't need to think about this when using a charting library

Evict older entries from the cache instead of clearing it altogether;

re: 1

Given that recharts is known specifically for its customisability, this could make sense. Really though the customer would not necessarily unterstand how the big the cache should be, because the amount of calls are a recharts implementation detail.

What would be the options? Rather, the cache size should be a function of the amount of ticks.
In which cases are we calling it?
For all ticks on all axes, right? We could have a cache option where the cacheSize = axisCount * dataLength. Would that make sense?

Not sure it would be a great experience to offer the user to set a number.

re: 2
Seems a good idea to try out.
I wonder if it helps though, simply because for example, for the tick positioning, we simply measure every tick. If we have 1001 data points, but the cache is of size 1000, then it makes no difference whether you evict all, or the latest one. In both cases you would never have the string in question in the cache.

That being said, this is tackling an important point, thanks a lot for your help here! ❤️

src/util/TickUtils.ts

nikolasrieble · 2023-11-12T18:27:03Z

src/cartesian/getTicks.ts

+    let size: number | undefined;
+    const getSize = () => size ?? getTickSize(entry, i);


This, I do not understand. When would this ever return size? When would size not be undefined?

Sorry, I made a mistake here. It was a leftover from some refactors I made.
Thanks, updated 👍

src/cartesian/getTicks.ts

nikolasrieble

I have a few question about some implementation details, but the idea is both simple as well as clear. Love the improvement. Lets figure out the details and land this.

Thank you for this great improvement.

nikolasrieble · 2023-11-12T18:52:41Z

src/cartesian/getEquidistantTicks.ts

@@ -35,7 +35,7 @@ export function getEquidistantTicks(

    const tickCoord = entry.coordinate;
    // We will always show the first tick.
-    const isShow = index === 0 || isVisible(sign, tickCoord, size, start, end);
+    const isShow = index === 0 || isVisible(sign, tickCoord, () => size, start, end);


Is the tick size not already computed in line 34?
If I understand correctly, this should be

// Check if the element collides with the next element const getSize = () => getTickSize(entry, index); const tickCoord = entry.coordinate; // We will always show the first tick. const isShow = index === 0 || isVisible(sign, tickCoord, getSize, start, end);

Now we would not call getTickSize directly.

Doing so triggers the following lint error:

But I think I managed to find a workaround. Pushing a new commit 😄

src/cartesian/getTicks.ts

bernardobelchior · 2023-11-12T18:55:42Z

Also, @ckifer it seems like getStringSize's cache has a limit of 2k values and it completely clears it when it's over that.
There's several improvements we can think about:

Allow users to customize the number of entries in the cache: it'll affect memory usage but might be worth the tradeoff? The downside is that users shouldn't need to think about this when using a charting library

Evict older entries from the cache instead of clearing it altogether;

re: 1

Given that recharts is known specifically for its customisability, this could make sense. Really though the customer would not necessarily unterstand how the big the cache should be, because the amount of calls are a recharts implementation detail.

What would be the options? Rather, the cache size should be a function of the amount of ticks. In which cases are we calling it? For all ticks on all axes, right? We could have a cache option where the cacheSize = axisCount * dataLength. Would that make sense?

That makes sense if charts are static. However, if we add 1 more data point (or change a data point), we may evict the whole cache because we'd be over the cache size. This might work, though, if we implement cache eviction by the oldest key.
In that case, in case a chart re-renders because of new data points, we'd only evict a subset of the cache, which would reduce cache misses.

Not sure it would be a great experience to offer the user to set a number.

Yeah, I'm not sure either 😅

re: 2 Seems a good idea to try out. I wonder if it helps though, simply because for example, for the tick positioning, we simply measure every tick. If we have 1001 data points, but the cache is of size 1000, then it makes no difference whether you evict all, or the latest one. In both cases you would never have the string in question in the cache.

That's right, but you would still hit the cache for all the potential ticks that didn't change. If we evict the whole cache, we know we'll have 100% cache misses; if we evict the oldest key, there's a chance we might have some cache hits.

Also, another problem is that the cache is global for Recharts. If we have two charts on the same page and they both have 2k potential ticks, they'll share the same cache and cause cache misses. It might make sense for the cache to be scoped to the chart.

That being said, this is tackling an important point, thanks a lot for your help here! ❤️

Thanks for the kind words 😄

nikolasrieble · 2023-11-12T18:58:10Z

All great points! Cache being shared across charts is a good reason to be able to set it separately. Then all my suggestions also do not make sense. 😓

Also, your point with non static charts makes a lot of sense. Would you like to create a follow up issue, and implement cache eviction in the FIFO strategy?

bernardobelchior · 2023-11-12T19:25:21Z

Also, your point with non static charts makes a lot of sense. Would you like to create a follow up issue, and implement cache eviction in the FIFO strategy?

Created an issue, but I don't want to commit to implementing since I don't know when I'll have some free time to do that.

nikolasrieble

I could not find any issue while manually testing in storybook. This is a great first step to improvement. Lets ship! 🚀

nikolasrieble · 2023-11-12T19:37:17Z

src/cartesian/getEquidistantTicks.ts

+    let size: number | undefined;
+    const getSize = () => {
+      if (size === undefined) {
+        size = getTickSize(entry, i);
+      }
+
+      return size;
+    };

    const tickCoord = entry.coordinate;
    // We will always show the first tick.
-    const isShow = index === 0 || isVisible(sign, tickCoord, size, start, end);
+    const isShow = index === 0 || isVisible(sign, tickCoord, getSize, start, end);


We are now redefining this in three places. I wonder: Could we instead extract this combination of getSize with isVisible into a new helper function?

I don't think we could because there might be other places where getSize is used.
For example, in getTicksEnd we use it in lines 37, 46 and 49. It isn't just used for isVisible, so it's hard to abstract that

ckifer · 2023-11-12T23:01:09Z

Didn't have a chance to look at this but great conversation here 🚀🚀🚀

…rts#2589) (recharts#3953)  ## Description Reduce the number of times `getStringSize` is called when calculating which ticks are rendered. This optimization is applied to `getTicksEnd, `getTicksStart` and `getEquidistantTicks`, but the latter is unlikely to benefit from it since the algorithm does not iterate over the candidate ticks sequentially. I have created [this repository](https://github.com/bernardobelchior/recharts-repro), based on the JSFiddle available in [this issue](recharts#1465 (comment)) to test the changes. It generates a chart with 10000 random data points. Based on Recharts' master branch (pointing to commit `8cdfab7facc4d1d107c11a1fc3d4a26e716dddca` at the time of writing), the Chrome DevTools' Performance chart is as follows: ![image](https://github.com/recharts/recharts/assets/12778398/52da2b7b-0cef-4ed4-9d47-a9b9fc2d23f2) You can see that 95% time is taken by `getStringSize()` and descendants, which account for 3.2 seconds on my M1 Macbook Pro from 2020. In fact, I counted and `getStringSize()` is being called around 40k times to render the chart. Since we know that `getStringSize` is expensive, whatever we can do to reduce the number of times it's called will likely yield a performance improvement. As such, this PR is comprised of two changes: 1. Call `getTickSize` in `getTicksEnd`, `getTicksStart` and `getEquidistantTicks` only when absolutely necessary; 1. Update the `isVisible` calculation to check if the tick won't be visible even before we take size into account: this way we can skip calculating the string size if we already know that the tick won't be visible. After this change, `getStringSize` still takes up most of the time (88%), but the absolute value is now much smaller (1.8 seconds): ![image](https://github.com/recharts/recharts/assets/12778398/10dd9067-a89f-4b95-b2d2-e22887345dde) `getStringSize` is now called around 18k times, which means there's still a lot of room for future improvements. ## Related Issue recharts#2589 recharts#1465     ## Motivation and Context Explained above ## How Has This Been Tested? Added some unit tests. There were none, so it's hard to see if there's no regression, but I ran the new tests against the `isVisible` function before and after these changes and the results were the same. If there's any edge case you'd like me to test, please let me know 😄    ## Types of changes  - [x] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Checklist:   - [x] My code follows the code style of this project. - [ ] My change requires a change to the documentation. - [ ] I have updated the documentation accordingly. - [ ] I have added tests to cover my changes. - [ ] I have added a storybook story or extended an existing story to show my changes - [x] All new and existing tests passed.

Reduce calls to getStringSize when calculating visible ticks (recha…

4b4c776

…rts#2589)

Add unit tests

561a570

bernardobelchior marked this pull request as ready for review November 12, 2023 17:24

nikolasrieble reviewed Nov 12, 2023

View reviewed changes

src/util/TickUtils.ts Show resolved Hide resolved

nikolasrieble reviewed Nov 12, 2023

View reviewed changes

src/cartesian/getTicks.ts Show resolved Hide resolved

nikolasrieble reviewed Nov 12, 2023

View reviewed changes

Fix bug. Add comment.

d433d08

nikolasrieble reviewed Nov 12, 2023

View reviewed changes

src/cartesian/getTicks.ts Show resolved Hide resolved

Also cache getSize for getEquidistantTicks

5f16cf4

bernardobelchior mentioned this pull request Nov 12, 2023

Improve string size caching mechanism #3955

Open

1 task

nikolasrieble approved these changes Nov 12, 2023

View reviewed changes

nikolasrieble merged commit 77b42f4 into recharts:master Nov 12, 2023
5 checks passed

bernardobelchior deleted the improv/reduce-string-measurements branch November 19, 2023 21:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce calls to `getStringSize` when calculating visible ticks (#2589) #3953

Reduce calls to `getStringSize` when calculating visible ticks (#2589) #3953

bernardobelchior commented Nov 12, 2023 •

edited

ckifer commented Nov 12, 2023

bernardobelchior commented Nov 12, 2023

ckifer commented Nov 12, 2023

bernardobelchior commented Nov 12, 2023

nikolasrieble commented Nov 12, 2023

nikolasrieble Nov 12, 2023

bernardobelchior Nov 12, 2023

nikolasrieble left a comment

nikolasrieble Nov 12, 2023

bernardobelchior Nov 12, 2023

bernardobelchior commented Nov 12, 2023

nikolasrieble commented Nov 12, 2023

bernardobelchior commented Nov 12, 2023

nikolasrieble left a comment

nikolasrieble Nov 12, 2023

bernardobelchior Nov 12, 2023

ckifer commented Nov 12, 2023

		let size: number \| undefined;
		const getSize = () => size ?? getTickSize(entry, i);

Reduce calls to getStringSize when calculating visible ticks (#2589) #3953

Reduce calls to getStringSize when calculating visible ticks (#2589) #3953

Conversation

bernardobelchior commented Nov 12, 2023 • edited

Description

Related Issue

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist:

ckifer commented Nov 12, 2023

bernardobelchior commented Nov 12, 2023

ckifer commented Nov 12, 2023

bernardobelchior commented Nov 12, 2023

nikolasrieble commented Nov 12, 2023

nikolasrieble Nov 12, 2023

Choose a reason for hiding this comment

bernardobelchior Nov 12, 2023

Choose a reason for hiding this comment

nikolasrieble left a comment

Choose a reason for hiding this comment

nikolasrieble Nov 12, 2023

Choose a reason for hiding this comment

bernardobelchior Nov 12, 2023

Choose a reason for hiding this comment

bernardobelchior commented Nov 12, 2023

nikolasrieble commented Nov 12, 2023

bernardobelchior commented Nov 12, 2023

nikolasrieble left a comment

Choose a reason for hiding this comment

nikolasrieble Nov 12, 2023

Choose a reason for hiding this comment

bernardobelchior Nov 12, 2023

Choose a reason for hiding this comment

ckifer commented Nov 12, 2023

Reduce calls to `getStringSize` when calculating visible ticks (#2589) #3953

Reduce calls to `getStringSize` when calculating visible ticks (#2589) #3953

bernardobelchior commented Nov 12, 2023 •

edited