improve clone performance #2985

danillosl · 2020-02-17T15:42:16Z

Changed _clone to use a Map to cache and retrieve the cloned objects.
this is NOT a breaking change and in my tests reduce the time to clone this object from 93.2ms to 6.8ms on average after 100 iterations.

CrossEye

This is great to hear! This performance has always been a bit of a bane.

However, until we hit 1.0 (which hasn't entirely stalled but has again slowed down), we're supporting older versions of the language, ones that do not include the global object Map.

While I might suggest doing an internal value-based _Map like our internal value-based _Set, that would likely have the same performance as the current code, except that it's possible that it would better catch shared nodes (because {x: 1} == {x: 1} //=> false but R.equals({x: 1}, {x: 1}) //=> true.)

However, the introductory primitive check certainly makes sense. Could you scale this PR back to that, and we'll look again after 1.0?

newyankeecodeshop · 2020-02-17T22:44:27Z

How about implementing an internal _Map that uses the global Map if available but falls back to the existing implementation for older runtimes?

CrossEye · 2020-02-18T00:45:08Z

@newyankeecodeshop:

That might work. There is a real tension, though, between trying to make an internal _Map that might gain some of this speedup and one which would work as expected in Ramda (that is, one focused on value equality rather than reference equality.)

danillosl · 2020-02-18T23:42:14Z

@CrossEye:

After thinking about the options I implemented the _ObjectMap which is a private hashmap that uses the object values as a hash, using this class is not as fast as Map but still we get a significant improvement in performance.

this are the values that i get using the clone function now:
max/min/avg time in ms after 100 runs
ramda: new clone max: 71.066 min: 23.946 avg: 30.496
ramda: old clone max: 440.424 min: 411.802 avg: 421.692
lodash cloneDeep(): max: 26.248 min: 9.779 avg: 11.207
JSON.parse(JSON.stringify()): max: 16.261 min: 9.313 avg: 10.276

CrossEye

While I do like the idea of creating an internal _Map, it would still need to be based upon value equality rather than reference equality.

I also think it would be useful to pull it out into its own internal helper file so that we could reuse it elsewhere. (One day it might make sense to make this and _Set public,)

But the big issue is that any hash-based solution that's fast enough to make this useful is probably not robust enough. I'm afraid that any set or has call would have to involve scanning an array. And there's a very good chance that this won't be fast enough to solve the underlying problem.

CrossEye · 2020-02-19T19:37:04Z

source/internal/_clone.js

+    hashedKey.push(key[value]);
+  }
+  return hashedKey.join();
+};


I think there's a problem with any generic hash here. For this simple one, note that

hash ({foo: 1, bar: 2}) == hash ({bar: 1, foo: 2}) //=> true hash ({foo: {a: 42}}) == hash ({foo: {b: false}}) //=> true

That second one is especially problematic. We could easily end up with data that's all hash collisions.

But even a more sophisticated hash can still have such problems. I suppose if we get sophisticated enough, then we'd probably lose much of the benefit.

Yes, the hash function is very simple and if we made it more sophisticated we end up losing performance, but I think this hash function is just good enough to split the data a little bit to make it performant, and yes we could in a worst-case scenario end up with a collection full of collisions, but in that case, the get function will just behave the way the old algorithm behaved.

I'm not convinced.

I'd love to see some benchmarks with different styles of input objects.

This hash seems too tuned to that input structure.

After further inspection, I realize I made a mistake on my hash function, instead of using just the value I should use the propertyName + value as the hash, but regardless, the idea still the same.
About the benchmarks with different inputs I used the following array:

const obja = { foo: "foo1", bar: "bar1" }; const objb = { foo: "foo2", bar: "bar2" }; const objc = { foo: "foo3", bar: "bar3" }; const objd = { foo: "foo4", bar: "bar4" }; const arr = [ { obj1: obja, obj2: { foo: "foo1", bar: "bar1" }, obj3: objc, obj4: { foo: "foo2", bar: "bar2" }, obj5: obja, obj6: objb }, { obj1: obja, obj2: { foo: "foo1", bar: "bar1" }, obj3: objd, obj4: { foo: "foo2", bar: "bar2" }, obj5: obja, obj6: objb }, { obj1: obja, obj2: { foo: "foo1", bar: "bar1" }, obj3: objc, obj4: { foo: "foo2", bar: "bar2" }, obj5: obja, obj6: objb }, { obj1: obja, obj2: { foo: "foo1", bar: "bar1" }, obj3: objd, obj4: { foo: "foo2", bar: "bar2" }, obj5: obja, obj6: objb }, ];

and got the following results:

max/min/avg time in ms after 100 runs
ramda: new clone: max: 0.815 min: 0.030 avg: 0.059
ramda: old clone: max: 0.816 min: 0.029 avg: 0.063
lodash cloneDeep(): max: 1.913 min: 0.038 avg: 0.114
JSON.parse(JSON.stringify()): max: 0.804 min: 0.013 avg: 0.030

Because it is a small object to be cloned it is the best-case scenario for the old clone() but still, the new one shows better results, it only gets better as the object size goes up.
I am happy to test other examples of objects or even show the code that I'm using to benchmark so the numbers could be confirmed.

CrossEye · 2020-02-19T19:38:01Z

source/internal/_clone.js

+
+      for (let i = 0; i < bucket.length; i += 1) {
+        const element = bucket[i];
+        if (element[0] === key) {return element[1];}


I think this would have to be if (equals (element [0], key)) ...

Ramda is all about value equality, not reference equality.

While I understand that Ramda is about value equality I think that the clone function should preserve the structure of the object, let's say:

const d = {foo:1, bar:"2"}; const a = {foo: d, c:{ bar: d}}; a.foo === a.c.bar // == true; //the clone method should preserve the structure. const clone = R.clone(a); clone.foo === clone.c.bar // == should also be true;

That's the way the R.clone() is behaving and lodash cloneDeep() also behave like this and I also believe it should behave like this.

I suppose you're right that this is how it's working now. I never worked closely with this code and was thinking that we had the stronger guarantee of preserving the value equality structure (as if the above with x === y replaced with equals (x, y)). Part of me would still like that. But that would further degrade performance, not enhance it!

But (x === y); //=== true then equals(x,y) //must also be true. So by preserving reference equality we are also preserving value equality.

I was suggesting that we might go further than that, but on further reflection I realized that it's not only unnecessary but actively harmful to the goal here.

danillosl · 2020-02-19T22:18:36Z

Because _ObjectMap is very specific for this problem (for reasons that I explain above) I don't think this should be separated into its own file. If there is a need for a _Map we should create an "abstract class" where the user could extend and override specific pieces of code (like the equality function and the hash function) to suit specific needs kind of how the Map works in java.

CrossEye · 2020-02-20T01:45:26Z

Because _ObjectMap is very specific for this problem

That worries me. We have an internal _Set, written to solve one problem, but available to be used in other functions, and arguably something worth making public. Why would we not want to do the same here? If it's too tuned to clone, is it also perhaps too tuned to a specific sort of data?

If there is a need for a _Map we should create an "abstract class" where the user could extend and override specific pieces of code (like the equality function and the hash function)

The only reason I can think of for exposing these would be that they are useful alternatives, for those who like Ramda's value equality model, to the built in Set and Map types. I would not expect to offer users ways to tweak their behavior.

danillosl · 2020-02-21T21:16:14Z

Why would we not want to do the same here? If it's too tuned to clone, is it also perhaps too tuned to a specific sort of data?

I wouldn't say it is too tuned for specific data but for this specific problem, _ObjectMap is focused on reference equality which is ideal for this problem, but I think it won't be as useful to have as a separate file in ramda because the framework has a focus on value equality.

I would not expect to offer users ways to tweak their behavior.

When I said user I meant the person that would use the code, not the framework user. What I'm proposing is to create an "abstract class" lest say a _HashTable.js and then create two classes that will inherit from _hashTable one focused on value equality and the other focused on reference equality.

wojpawlik · 2020-03-09T16:27:38Z

Closes #2607.

nfantone · 2021-05-20T08:58:40Z

Any movements here? What's the hold up? Can I help in any way?

CrossEye · 2021-05-20T20:30:15Z

@nfantone: We are way overdue for a release. And we should get this into it, if it's ready. I will try to take a look soon.

adispring · 2021-05-20T23:11:28Z

@nfantone: We are way overdue for a release. And we should get this into it, if it's ready. I will try to take a look soon.

We should publish a new release, the last release (v0.27.1) is 10 months ago: https://www.npmjs.com/package/ramda .

nfantone · 2021-05-24T16:31:51Z

@CrossEye @adispring Would be happy to contribute in any way I can to help make that happen.

CrossEye · 2022-01-23T21:09:05Z

When Ramda went through a period of little attention, this was inadvertently dropped. @danillosl: Are you interested in resolving the conflicts so that we can get this in before v1.0?

customcommander · 2022-01-23T21:50:47Z

@CrossEye Rebased and fixed the conflicts. However I'd prefer if you could review this again thoroughly as I'm not quite sure I got everything right. Some other pull requests merged after this one touched the same files and I wanted to preserve them too.

Waiting for #3224 to be merged.

customcommander · 2022-01-24T20:38:10Z

@CrossEye Merge, lint and tests error fixed. Ready for re-review.

CrossEye · 2022-02-05T18:28:05Z

I'm going to merge this, but tag it for follow-up, as I think we still need to think about exposing _Set and it would make sense to have a version of _Map to go with it.

danillosl mentioned this pull request Feb 17, 2020

R.clone is really slow #2134

Closed

danillosl changed the title ~~improved _clone function~~ improve clone performance Feb 17, 2020

CrossEye requested changes Feb 17, 2020

View reviewed changes

CrossEye requested changes Feb 19, 2020

View reviewed changes

andrewscfc mentioned this pull request Sep 7, 2020

Improve Render Speed for CPS Assets bbc/simorgh#7752

Merged

6 tasks

CrossEye added the resolve-conflicts PR needs to be updated due to conflicts before merging label Jan 23, 2022

customcommander force-pushed the clone branch from 3a88bb5 to 17446d7 Compare January 23, 2022 21:47

danillosl added 2 commits January 24, 2022 20:34

improved _clone function

36ad027

adding _ObjectMap

bf6b250

customcommander force-pushed the clone branch from 17446d7 to bf6b250 Compare January 24, 2022 20:35

CrossEye self-assigned this Jan 27, 2022

CrossEye approved these changes Feb 5, 2022

View reviewed changes

CrossEye added Maturation Ideas to make Ramda a more mature library and removed resolve-conflicts PR needs to be updated due to conflicts before merging labels Feb 5, 2022

CrossEye merged commit d27c944 into ramda:master Feb 5, 2022

customcommander added this to the 1.0.0 milestone Feb 6, 2022

adispring mentioned this pull request Apr 7, 2023

0.29.0 Upgrade Guide #3369

Closed

HarveyPeachey mentioned this pull request Aug 29, 2023

NEWSWORLDSERVICE-1791 - Opera mini ramda fix bbc/simorgh#11028

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve clone performance #2985

improve clone performance #2985

danillosl commented Feb 17, 2020 •

edited

Loading

CrossEye left a comment •

edited

Loading

newyankeecodeshop commented Feb 17, 2020

CrossEye commented Feb 18, 2020

danillosl commented Feb 18, 2020

CrossEye left a comment

CrossEye Feb 19, 2020

danillosl Feb 19, 2020 •

edited

Loading

CrossEye Feb 20, 2020

danillosl Feb 21, 2020

CrossEye Feb 19, 2020

danillosl Feb 19, 2020

CrossEye Feb 20, 2020

danillosl Feb 20, 2020

CrossEye Feb 21, 2020

danillosl commented Feb 19, 2020

CrossEye commented Feb 20, 2020

danillosl commented Feb 21, 2020

wojpawlik commented Mar 9, 2020

nfantone commented May 20, 2021

CrossEye commented May 20, 2021

adispring commented May 20, 2021

nfantone commented May 24, 2021

CrossEye commented Jan 23, 2022

customcommander commented Jan 23, 2022 •

edited

Loading

customcommander commented Jan 24, 2022

CrossEye commented Feb 5, 2022

improve clone performance #2985

improve clone performance #2985

Conversation

danillosl commented Feb 17, 2020 • edited Loading

CrossEye left a comment • edited Loading

Choose a reason for hiding this comment

newyankeecodeshop commented Feb 17, 2020

CrossEye commented Feb 18, 2020

danillosl commented Feb 18, 2020

CrossEye left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danillosl Feb 19, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danillosl commented Feb 19, 2020

CrossEye commented Feb 20, 2020

danillosl commented Feb 21, 2020

wojpawlik commented Mar 9, 2020

nfantone commented May 20, 2021

CrossEye commented May 20, 2021

adispring commented May 20, 2021

nfantone commented May 24, 2021

CrossEye commented Jan 23, 2022

customcommander commented Jan 23, 2022 • edited Loading

customcommander commented Jan 24, 2022

CrossEye commented Feb 5, 2022

danillosl commented Feb 17, 2020 •

edited

Loading

CrossEye left a comment •

edited

Loading

danillosl Feb 19, 2020 •

edited

Loading

customcommander commented Jan 23, 2022 •

edited

Loading