-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve clone performance #2985
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great to hear! This performance has always been a bit of a bane.
However, until we hit 1.0 (which hasn't entirely stalled but has again slowed down), we're supporting older versions of the language, ones that do not include the global object Map
.
While I might suggest doing an internal value-based _Map
like our internal value-based _Set
, that would likely have the same performance as the current code, except that it's possible that it would better catch shared nodes (because {x: 1} == {x: 1} //=> false
but R.equals({x: 1}, {x: 1}) //=> true
.)
However, the introductory primitive check certainly makes sense. Could you scale this PR back to that, and we'll look again after 1.0?
How about implementing an internal |
That might work. There is a real tension, though, between trying to make an internal _Map that might gain some of this speedup and one which would work as expected in Ramda (that is, one focused on value equality rather than reference equality.) |
After thinking about the options I implemented the _ObjectMap which is a private hashmap that uses the object values as a hash, using this class is not as fast as Map but still we get a significant improvement in performance. this are the values that i get using the clone function now: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I do like the idea of creating an internal _Map
, it would still need to be based upon value equality rather than reference equality.
I also think it would be useful to pull it out into its own internal helper file so that we could reuse it elsewhere. (One day it might make sense to make this and _Set
public,)
But the big issue is that any hash-based solution that's fast enough to make this useful is probably not robust enough. I'm afraid that any set
or has
call would have to involve scanning an array. And there's a very good chance that this won't be fast enough to solve the underlying problem.
hashedKey.push(key[value]); | ||
} | ||
return hashedKey.join(); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a problem with any generic hash here. For this simple one, note that
hash ({foo: 1, bar: 2}) == hash ({bar: 1, foo: 2}) //=> true
hash ({foo: {a: 42}}) == hash ({foo: {b: false}}) //=> true
That second one is especially problematic. We could easily end up with data that's all hash collisions.
But even a more sophisticated hash can still have such problems. I suppose if we get sophisticated enough, then we'd probably lose much of the benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the hash function is very simple and if we made it more sophisticated we end up losing performance, but I think this hash function is just good enough to split the data a little bit to make it performant, and yes we could in a worst-case scenario end up with a collection full of collisions, but in that case, the get function will just behave the way the old algorithm behaved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced.
I'd love to see some benchmarks with different styles of input objects.
This hash seems too tuned to that input structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After further inspection, I realize I made a mistake on my hash function, instead of using just the value I should use the propertyName + value as the hash, but regardless, the idea still the same.
About the benchmarks with different inputs I used the following array:
const obja = { foo: "foo1", bar: "bar1" };
const objb = { foo: "foo2", bar: "bar2" };
const objc = { foo: "foo3", bar: "bar3" };
const objd = { foo: "foo4", bar: "bar4" };
const arr = [
{ obj1: obja, obj2: { foo: "foo1", bar: "bar1" }, obj3: objc, obj4: { foo: "foo2", bar: "bar2" }, obj5: obja, obj6: objb },
{ obj1: obja, obj2: { foo: "foo1", bar: "bar1" }, obj3: objd, obj4: { foo: "foo2", bar: "bar2" }, obj5: obja, obj6: objb },
{ obj1: obja, obj2: { foo: "foo1", bar: "bar1" }, obj3: objc, obj4: { foo: "foo2", bar: "bar2" }, obj5: obja, obj6: objb },
{ obj1: obja, obj2: { foo: "foo1", bar: "bar1" }, obj3: objd, obj4: { foo: "foo2", bar: "bar2" }, obj5: obja, obj6: objb },
];
and got the following results:
max/min/avg time in ms after 100 runs
ramda: new clone: max: 0.815 min: 0.030 avg: 0.059
ramda: old clone: max: 0.816 min: 0.029 avg: 0.063
lodash cloneDeep(): max: 1.913 min: 0.038 avg: 0.114
JSON.parse(JSON.stringify()): max: 0.804 min: 0.013 avg: 0.030
Because it is a small object to be cloned it is the best-case scenario for the old clone() but still, the new one shows better results, it only gets better as the object size goes up.
I am happy to test other examples of objects or even show the code that I'm using to benchmark so the numbers could be confirmed.
|
||
for (let i = 0; i < bucket.length; i += 1) { | ||
const element = bucket[i]; | ||
if (element[0] === key) {return element[1];} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would have to be if (equals (element [0], key)) ...
Ramda is all about value equality, not reference equality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I understand that Ramda is about value equality I think that the clone function should preserve the structure of the object, let's say:
const d = {foo:1, bar:"2"};
const a = {foo: d, c:{ bar: d}};
a.foo === a.c.bar // == true;
//the clone method should preserve the structure.
const clone = R.clone(a);
clone.foo === clone.c.bar // == should also be true;
That's the way the R.clone() is behaving and lodash cloneDeep() also behave like this and I also believe it should behave like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose you're right that this is how it's working now. I never worked closely with this code and was thinking that we had the stronger guarantee of preserving the value equality structure (as if the above with x === y
replaced with equals (x, y)
). Part of me would still like that. But that would further degrade performance, not enhance it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But (x === y); //=== true
then equals(x,y) //must also be true
. So by preserving reference equality we are also preserving value equality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was suggesting that we might go further than that, but on further reflection I realized that it's not only unnecessary but actively harmful to the goal here.
Because _ObjectMap is very specific for this problem (for reasons that I explain above) I don't think this should be separated into its own file. If there is a need for a _Map we should create an "abstract class" where the user could extend and override specific pieces of code (like the equality function and the hash function) to suit specific needs kind of how the Map works in java. |
That worries me. We have an internal _Set, written to solve one problem, but available to be used in other functions, and arguably something worth making public. Why would we not want to do the same here? If it's too tuned to
The only reason I can think of for exposing these would be that they are useful alternatives, for those who like Ramda's value equality model, to the built in |
I wouldn't say it is too tuned for specific data but for this specific problem, _ObjectMap is focused on reference equality which is ideal for this problem, but I think it won't be as useful to have as a separate file in ramda because the framework has a focus on value equality.
When I said user I meant the person that would use the code, not the framework user. What I'm proposing is to create an "abstract class" lest say a _HashTable.js and then create two classes that will inherit from _hashTable one focused on value equality and the other focused on reference equality. |
Closes #2607. |
Any movements here? What's the hold up? Can I help in any way? |
@nfantone: We are way overdue for a release. And we should get this into it, if it's ready. I will try to take a look soon. |
We should publish a new release, the last release (v0.27.1) is 10 months ago: https://www.npmjs.com/package/ramda . |
@CrossEye @adispring Would be happy to contribute in any way I can to help make that happen. |
When Ramda went through a period of little attention, this was inadvertently dropped. @danillosl: Are you interested in resolving the conflicts so that we can get this in before |
3a88bb5
to
17446d7
Compare
17446d7
to
bf6b250
Compare
@CrossEye Merge, lint and tests error fixed. Ready for re-review. |
I'm going to merge this, but tag it for follow-up, as I think we still need to think about exposing |
Changed _clone to use a Map to cache and retrieve the cloned objects.
this is NOT a breaking change and in my tests reduce the time to clone this object from 93.2ms to 6.8ms on average after 100 iterations.