Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve clone performance #2985

Merged
merged 2 commits into from
Feb 5, 2022
Merged

improve clone performance #2985

merged 2 commits into from
Feb 5, 2022

Conversation

danillosl
Copy link
Contributor

@danillosl danillosl commented Feb 17, 2020

Changed _clone to use a Map to cache and retrieve the cloned objects.
this is NOT a breaking change and in my tests reduce the time to clone this object from 93.2ms to 6.8ms on average after 100 iterations.

@danillosl danillosl changed the title improved _clone function improve clone performance Feb 17, 2020
Copy link
Member

@CrossEye CrossEye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great to hear! This performance has always been a bit of a bane.

However, until we hit 1.0 (which hasn't entirely stalled but has again slowed down), we're supporting older versions of the language, ones that do not include the global object Map.

While I might suggest doing an internal value-based _Map like our internal value-based _Set, that would likely have the same performance as the current code, except that it's possible that it would better catch shared nodes (because {x: 1} == {x: 1} //=> false but R.equals({x: 1}, {x: 1}) //=> true.)

However, the introductory primitive check certainly makes sense. Could you scale this PR back to that, and we'll look again after 1.0?

@newyankeecodeshop
Copy link

How about implementing an internal _Map that uses the global Map if available but falls back to the existing implementation for older runtimes?

@CrossEye
Copy link
Member

@newyankeecodeshop:

That might work. There is a real tension, though, between trying to make an internal _Map that might gain some of this speedup and one which would work as expected in Ramda (that is, one focused on value equality rather than reference equality.)

@danillosl
Copy link
Contributor Author

@CrossEye:

After thinking about the options I implemented the _ObjectMap which is a private hashmap that uses the object values as a hash, using this class is not as fast as Map but still we get a significant improvement in performance.

this are the values that i get using the clone function now:
max/min/avg time in ms after 100 runs
ramda: new clone max: 71.066 min: 23.946 avg: 30.496
ramda: old clone max: 440.424 min: 411.802 avg: 421.692
lodash cloneDeep(): max: 26.248 min: 9.779 avg: 11.207
JSON.parse(JSON.stringify()): max: 16.261 min: 9.313 avg: 10.276

Copy link
Member

@CrossEye CrossEye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I do like the idea of creating an internal _Map, it would still need to be based upon value equality rather than reference equality.

I also think it would be useful to pull it out into its own internal helper file so that we could reuse it elsewhere. (One day it might make sense to make this and _Set public,)

But the big issue is that any hash-based solution that's fast enough to make this useful is probably not robust enough. I'm afraid that any set or has call would have to involve scanning an array. And there's a very good chance that this won't be fast enough to solve the underlying problem.

hashedKey.push(key[value]);
}
return hashedKey.join();
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a problem with any generic hash here. For this simple one, note that

hash ({foo: 1, bar: 2}) == hash ({bar: 1, foo: 2}) //=> true
hash ({foo: {a: 42}}) == hash ({foo: {b: false}})  //=> true

That second one is especially problematic. We could easily end up with data that's all hash collisions.

But even a more sophisticated hash can still have such problems. I suppose if we get sophisticated enough, then we'd probably lose much of the benefit.

Copy link
Contributor Author

@danillosl danillosl Feb 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the hash function is very simple and if we made it more sophisticated we end up losing performance, but I think this hash function is just good enough to split the data a little bit to make it performant, and yes we could in a worst-case scenario end up with a collection full of collisions, but in that case, the get function will just behave the way the old algorithm behaved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced.

I'd love to see some benchmarks with different styles of input objects.

This hash seems too tuned to that input structure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After further inspection, I realize I made a mistake on my hash function, instead of using just the value I should use the propertyName + value as the hash, but regardless, the idea still the same.
About the benchmarks with different inputs I used the following array:

const obja = { foo: "foo1", bar: "bar1" };
const objb = { foo: "foo2", bar: "bar2" };
const objc = { foo: "foo3", bar: "bar3" };
const objd = { foo: "foo4", bar: "bar4" };


const arr = [
  { obj1: obja, obj2: { foo: "foo1", bar: "bar1" }, obj3: objc, obj4: { foo: "foo2", bar: "bar2" }, obj5: obja, obj6: objb },
  { obj1: obja, obj2: { foo: "foo1", bar: "bar1" }, obj3: objd, obj4: { foo: "foo2", bar: "bar2" }, obj5: obja, obj6: objb },
  { obj1: obja, obj2: { foo: "foo1", bar: "bar1" }, obj3: objc, obj4: { foo: "foo2", bar: "bar2" }, obj5: obja, obj6: objb },
  { obj1: obja, obj2: { foo: "foo1", bar: "bar1" }, obj3: objd, obj4: { foo: "foo2", bar: "bar2" }, obj5: obja, obj6: objb },
];

and got the following results:

max/min/avg time in ms after 100 runs
ramda: new clone: max: 0.815 min: 0.030 avg: 0.059
ramda: old clone: max: 0.816 min: 0.029 avg: 0.063
lodash cloneDeep(): max: 1.913 min: 0.038 avg: 0.114
JSON.parse(JSON.stringify()): max: 0.804 min: 0.013 avg: 0.030

Because it is a small object to be cloned it is the best-case scenario for the old clone() but still, the new one shows better results, it only gets better as the object size goes up.
I am happy to test other examples of objects or even show the code that I'm using to benchmark so the numbers could be confirmed.


for (let i = 0; i < bucket.length; i += 1) {
const element = bucket[i];
if (element[0] === key) {return element[1];}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would have to be if (equals (element [0], key)) ...

Ramda is all about value equality, not reference equality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I understand that Ramda is about value equality I think that the clone function should preserve the structure of the object, let's say:

const d = {foo:1, bar:"2"};
const a = {foo: d, c:{ bar: d}};
a.foo === a.c.bar // == true;

//the clone method should preserve the structure.

const clone = R.clone(a);
clone.foo === clone.c.bar // == should also be true;

That's the way the R.clone() is behaving and lodash cloneDeep() also behave like this and I also believe it should behave like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you're right that this is how it's working now. I never worked closely with this code and was thinking that we had the stronger guarantee of preserving the value equality structure (as if the above with x === y replaced with equals (x, y)). Part of me would still like that. But that would further degrade performance, not enhance it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But (x === y); //=== true then equals(x,y) //must also be true. So by preserving reference equality we are also preserving value equality.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was suggesting that we might go further than that, but on further reflection I realized that it's not only unnecessary but actively harmful to the goal here.

@danillosl
Copy link
Contributor Author

Because _ObjectMap is very specific for this problem (for reasons that I explain above) I don't think this should be separated into its own file. If there is a need for a _Map we should create an "abstract class" where the user could extend and override specific pieces of code (like the equality function and the hash function) to suit specific needs kind of how the Map works in java.

@CrossEye
Copy link
Member

Because _ObjectMap is very specific for this problem

That worries me. We have an internal _Set, written to solve one problem, but available to be used in other functions, and arguably something worth making public. Why would we not want to do the same here? If it's too tuned to clone, is it also perhaps too tuned to a specific sort of data?

If there is a need for a _Map we should create an "abstract class" where the user could extend and override specific pieces of code (like the equality function and the hash function)

The only reason I can think of for exposing these would be that they are useful alternatives, for those who like Ramda's value equality model, to the built in Set and Map types. I would not expect to offer users ways to tweak their behavior.

@danillosl
Copy link
Contributor Author

Why would we not want to do the same here? If it's too tuned to clone, is it also perhaps too tuned to a specific sort of data?

I wouldn't say it is too tuned for specific data but for this specific problem, _ObjectMap is focused on reference equality which is ideal for this problem, but I think it won't be as useful to have as a separate file in ramda because the framework has a focus on value equality.

I would not expect to offer users ways to tweak their behavior.

When I said user I meant the person that would use the code, not the framework user. What I'm proposing is to create an "abstract class" lest say a _HashTable.js and then create two classes that will inherit from _hashTable one focused on value equality and the other focused on reference equality.

@wojpawlik
Copy link
Contributor

Closes #2607.

@nfantone
Copy link
Contributor

Any movements here? What's the hold up? Can I help in any way?

@CrossEye
Copy link
Member

@nfantone: We are way overdue for a release. And we should get this into it, if it's ready. I will try to take a look soon.

@adispring
Copy link
Member

@nfantone: We are way overdue for a release. And we should get this into it, if it's ready. I will try to take a look soon.

We should publish a new release, the last release (v0.27.1) is 10 months ago: https://www.npmjs.com/package/ramda .

@nfantone
Copy link
Contributor

@CrossEye @adispring Would be happy to contribute in any way I can to help make that happen.

@CrossEye CrossEye added the resolve-conflicts PR needs to be updated due to conflicts before merging label Jan 23, 2022
@CrossEye
Copy link
Member

When Ramda went through a period of little attention, this was inadvertently dropped. @danillosl: Are you interested in resolving the conflicts so that we can get this in before v1.0?

@customcommander
Copy link
Member

customcommander commented Jan 23, 2022

@CrossEye Rebased and fixed the conflicts. However I'd prefer if you could review this again thoroughly as I'm not quite sure I got everything right. Some other pull requests merged after this one touched the same files and I wanted to preserve them too.

Waiting for #3224 to be merged.

@customcommander
Copy link
Member

@CrossEye Merge, lint and tests error fixed. Ready for re-review.

@CrossEye CrossEye self-assigned this Jan 27, 2022
@CrossEye CrossEye added Maturation Ideas to make Ramda a more mature library and removed resolve-conflicts PR needs to be updated due to conflicts before merging labels Feb 5, 2022
@CrossEye
Copy link
Member

CrossEye commented Feb 5, 2022

I'm going to merge this, but tag it for follow-up, as I think we still need to think about exposing _Set and it would make sense to have a version of _Map to go with it.

@CrossEye CrossEye merged commit d27c944 into ramda:master Feb 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Maturation Ideas to make Ramda a more mature library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants