Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce calls to getStringSize when calculating visible ticks (#2589) #3953

Conversation

bernardobelchior
Copy link
Contributor

@bernardobelchior bernardobelchior commented Nov 12, 2023

Description

Reduce the number of times getStringSize is called when calculating which ticks are rendered.
This optimization is applied to getTicksEnd, getTicksStart and getEquidistantTicks, but the latter is unlikely to benefit from it since the algorithm does not iterate over the candidate ticks sequentially.

I have created this repository, based on the JSFiddle available in this issue to test the changes. It generates a chart with 10000 random data points.

Based on Recharts' master branch (pointing to commit 8cdfab7facc4d1d107c11a1fc3d4a26e716dddca at the time of writing), the Chrome DevTools' Performance chart is as follows:

image

You can see that 95% time is taken by getStringSize() and descendants, which account for 3.2 seconds on my M1 Macbook Pro from 2020. In fact, I counted and getStringSize() is being called around 40k times to render the chart.

Since we know that getStringSize is expensive, whatever we can do to reduce the number of times it's called will likely yield a performance improvement. As such, this PR is comprised of two changes:

  1. Call getTickSize in getTicksEnd, getTicksStart and getEquidistantTicks only when absolutely necessary;
  2. Update the isVisible calculation to check if the tick won't be visible even before we take size into account: this way we can skip calculating the string size if we already know that the tick won't be visible.

After this change, getStringSize still takes up most of the time (88%), but the absolute value is now much smaller (1.8 seconds):

image

getStringSize is now called around 18k times, which means there's still a lot of room for future improvements.

Related Issue

#2589
#1465

Motivation and Context

Explained above

How Has This Been Tested?

Added some unit tests. There were none, so it's hard to see if there's no regression, but I ran the new tests against the isVisible function before and after these changes and the results were the same.

If there's any edge case you'd like me to test, please let me know 😄

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • I have added a storybook story or extended an existing story to show my changes
  • All new and existing tests passed.

@ckifer
Copy link
Member

ckifer commented Nov 12, 2023

Thanks for the analysis!

Yeah.. unfortunately the long term solution here requires quite a bit more work with a broader swath of changes. Some of the refactoring we're doing should enable us to reduce these calls quite a bit more I think!

@bernardobelchior
Copy link
Contributor Author

Thanks you, @ckifer!
I have a couple question regarding testing:

  1. Does Recharts have any automated visual tests? Or should I go through Storybook stories to check whether there were any regressions?
  2. Do you think there's any additional unit test I should add in this PR? We had no tests for isVisible, so we might want to add some more tests there

Thanks!

@ckifer
Copy link
Member

ckifer commented Nov 12, 2023

  1. We run visual regression with chromatic - it isn't perfect but it will cover some things for sure
  2. I haven't looked at the PR yet but will let you know

@bernardobelchior
Copy link
Contributor Author

Also, @ckifer it seems like getStringSize's cache has a limit of 2k values and it completely clears it when it's over that.

There's several improvements we can think about:

  1. Allow users to customize the number of entries in the cache: it'll affect memory usage but might be worth the tradeoff? The downside is that users shouldn't need to think about this when using a charting library
  2. Evict older entries from the cache instead of clearing it altogether;

@bernardobelchior bernardobelchior marked this pull request as ready for review November 12, 2023 17:24
@nikolasrieble
Copy link
Contributor

Also, @ckifer it seems like getStringSize's cache has a limit of 2k values and it completely clears it when it's over that.

There's several improvements we can think about:

  1. Allow users to customize the number of entries in the cache: it'll affect memory usage but might be worth the tradeoff? The downside is that users shouldn't need to think about this when using a charting library
  2. Evict older entries from the cache instead of clearing it altogether;

re: 1

Given that recharts is known specifically for its customisability, this could make sense. Really though the customer would not necessarily unterstand how the big the cache should be, because the amount of calls are a recharts implementation detail.

What would be the options? Rather, the cache size should be a function of the amount of ticks.
In which cases are we calling it?
For all ticks on all axes, right? We could have a cache option where the cacheSize = axisCount * dataLength. Would that make sense?

Not sure it would be a great experience to offer the user to set a number.

re: 2
Seems a good idea to try out.
I wonder if it helps though, simply because for example, for the tick positioning, we simply measure every tick. If we have 1001 data points, but the cache is of size 1000, then it makes no difference whether you evict all, or the latest one. In both cases you would never have the string in question in the cache.

That being said, this is tackling an important point, thanks a lot for your help here! ❤️

Comment on lines 27 to 28
let size: number | undefined;
const getSize = () => size ?? getTickSize(entry, i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, I do not understand. When would this ever return size? When would size not be undefined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I made a mistake here. It was a leftover from some refactors I made.
Thanks, updated 👍

Copy link
Contributor

@nikolasrieble nikolasrieble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few question about some implementation details, but the idea is both simple as well as clear. Love the improvement. Lets figure out the details and land this.

Thank you for this great improvement.

@@ -35,7 +35,7 @@ export function getEquidistantTicks(

const tickCoord = entry.coordinate;
// We will always show the first tick.
const isShow = index === 0 || isVisible(sign, tickCoord, size, start, end);
const isShow = index === 0 || isVisible(sign, tickCoord, () => size, start, end);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the tick size not already computed in line 34?
If I understand correctly, this should be

    // Check if the element collides with the next element
    const getSize = () => getTickSize(entry, index);

    const tickCoord = entry.coordinate;
    // We will always show the first tick.
    const isShow = index === 0 || isVisible(sign, tickCoord, getSize, start, end);

Now we would not call getTickSize directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing so triggers the following lint error:

image

But I think I managed to find a workaround. Pushing a new commit 😄

@bernardobelchior
Copy link
Contributor Author

Also, @ckifer it seems like getStringSize's cache has a limit of 2k values and it completely clears it when it's over that.
There's several improvements we can think about:

  1. Allow users to customize the number of entries in the cache: it'll affect memory usage but might be worth the tradeoff? The downside is that users shouldn't need to think about this when using a charting library
  2. Evict older entries from the cache instead of clearing it altogether;

re: 1

Given that recharts is known specifically for its customisability, this could make sense. Really though the customer would not necessarily unterstand how the big the cache should be, because the amount of calls are a recharts implementation detail.

What would be the options? Rather, the cache size should be a function of the amount of ticks. In which cases are we calling it? For all ticks on all axes, right? We could have a cache option where the cacheSize = axisCount * dataLength. Would that make sense?

That makes sense if charts are static. However, if we add 1 more data point (or change a data point), we may evict the whole cache because we'd be over the cache size. This might work, though, if we implement cache eviction by the oldest key.
In that case, in case a chart re-renders because of new data points, we'd only evict a subset of the cache, which would reduce cache misses.

Not sure it would be a great experience to offer the user to set a number.

Yeah, I'm not sure either 😅

re: 2 Seems a good idea to try out. I wonder if it helps though, simply because for example, for the tick positioning, we simply measure every tick. If we have 1001 data points, but the cache is of size 1000, then it makes no difference whether you evict all, or the latest one. In both cases you would never have the string in question in the cache.

That's right, but you would still hit the cache for all the potential ticks that didn't change. If we evict the whole cache, we know we'll have 100% cache misses; if we evict the oldest key, there's a chance we might have some cache hits.

Also, another problem is that the cache is global for Recharts. If we have two charts on the same page and they both have 2k potential ticks, they'll share the same cache and cause cache misses. It might make sense for the cache to be scoped to the chart.

That being said, this is tackling an important point, thanks a lot for your help here! ❤️

Thanks for the kind words 😄

@nikolasrieble
Copy link
Contributor

All great points! Cache being shared across charts is a good reason to be able to set it separately. Then all my suggestions also do not make sense. 😓

Also, your point with non static charts makes a lot of sense. Would you like to create a follow up issue, and implement cache eviction in the FIFO strategy?

@bernardobelchior
Copy link
Contributor Author

Also, your point with non static charts makes a lot of sense. Would you like to create a follow up issue, and implement cache eviction in the FIFO strategy?

Created an issue, but I don't want to commit to implementing since I don't know when I'll have some free time to do that.

Copy link
Contributor

@nikolasrieble nikolasrieble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find any issue while manually testing in storybook. This is a great first step to improvement. Lets ship! 🚀

Comment on lines +35 to +46
let size: number | undefined;
const getSize = () => {
if (size === undefined) {
size = getTickSize(entry, i);
}

return size;
};

const tickCoord = entry.coordinate;
// We will always show the first tick.
const isShow = index === 0 || isVisible(sign, tickCoord, size, start, end);
const isShow = index === 0 || isVisible(sign, tickCoord, getSize, start, end);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are now redefining this in three places. I wonder: Could we instead extract this combination of getSize with isVisible into a new helper function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we could because there might be other places where getSize is used.
For example, in getTicksEnd we use it in lines 37, 46 and 49. It isn't just used for isVisible, so it's hard to abstract that

@nikolasrieble nikolasrieble merged commit 77b42f4 into recharts:master Nov 12, 2023
5 checks passed
@ckifer
Copy link
Member

ckifer commented Nov 12, 2023

Didn't have a chance to look at this but great conversation here 🚀🚀🚀

@bernardobelchior bernardobelchior deleted the improv/reduce-string-measurements branch November 19, 2023 21:20
GMer78 pushed a commit to GMer78/recharts-1 that referenced this pull request Nov 24, 2023
…rts#2589) (recharts#3953)

<!--- Provide a general summary of your changes in the Title above -->

## Description

Reduce the number of times `getStringSize` is called when calculating
which ticks are rendered.
This optimization is applied to `getTicksEnd, `getTicksStart` and
`getEquidistantTicks`, but the latter is unlikely to benefit from it
since the algorithm does not iterate over the candidate ticks
sequentially.

I have created [this
repository](https://github.com/bernardobelchior/recharts-repro), based
on the JSFiddle available in [this
issue](recharts#1465 (comment))
to test the changes. It generates a chart with 10000 random data points.

Based on Recharts' master branch (pointing to commit
`8cdfab7facc4d1d107c11a1fc3d4a26e716dddca` at the time of writing), the
Chrome DevTools' Performance chart is as follows:


![image](https://github.com/recharts/recharts/assets/12778398/52da2b7b-0cef-4ed4-9d47-a9b9fc2d23f2)


You can see that 95% time is taken by `getStringSize()` and descendants,
which account for 3.2 seconds on my M1 Macbook Pro from 2020. In fact, I
counted and `getStringSize()` is being called around 40k times to render
the chart.


Since we know that `getStringSize` is expensive, whatever we can do to
reduce the number of times it's called will likely yield a performance
improvement. As such, this PR is comprised of two changes:
1. Call `getTickSize` in `getTicksEnd`, `getTicksStart` and
`getEquidistantTicks` only when absolutely necessary;
1. Update the `isVisible` calculation to check if the tick won't be
visible even before we take size into account: this way we can skip
calculating the string size if we already know that the tick won't be
visible.


After this change, `getStringSize` still takes up most of the time
(88%), but the absolute value is now much smaller (1.8 seconds):


![image](https://github.com/recharts/recharts/assets/12778398/10dd9067-a89f-4b95-b2d2-e22887345dde)

`getStringSize` is now called around 18k times, which means there's
still a lot of room for future improvements.


## Related Issue

recharts#2589 
recharts#1465
<!--- This project only accepts pull requests related to open issues -->
<!--- If suggesting a new feature or change, please discuss it in an
issue first -->
<!--- If fixing a bug, there should be an issue describing it with steps
to reproduce -->
<!--- Please link to the issue here: -->

## Motivation and Context

Explained above

## How Has This Been Tested?

Added some unit tests. There were none, so it's hard to see if there's
no regression, but I ran the new tests against the `isVisible` function
before and after these changes and the results were the same.

If there's any edge case you'd like me to test, please let me know 😄 
<!--- Please describe in detail how you tested your changes. -->
<!--- Include details of your testing environment, and the tests you ran
to -->
<!--- see how your change affects other areas of the code, etc. -->


## Types of changes

<!--- What types of changes does your code introduce? Put an `x` in all
the boxes that apply: -->

- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)

## Checklist:

<!--- Go over all the following points, and put an `x` in all the boxes
that apply. -->
<!--- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->

- [x] My code follows the code style of this project.
- [ ] My change requires a change to the documentation.
- [ ] I have updated the documentation accordingly.
- [ ] I have added tests to cover my changes.
- [ ] I have added a storybook story or extended an existing story to
show my changes
- [x] All new and existing tests passed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants