New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize update fn #732

Merged
merged 1 commit into from Oct 14, 2016

Conversation

Projects
None yet
5 participants
@Skinney
Contributor

Skinney commented Oct 14, 2016

The current implementation performs a lookup in updatedFields for every key in oldRecord. As updatedFields is likely to contain fewer keys than oldRecord, the number of lookups can actually be reduced by iterating both objects, instead of just one.

According to the benchmark from the following gist, the performance increase can be quite noticeable (2-3x): https://gist.github.com/Skinney/51b42ac2c4df1776d7a77e401b7477c4

Benchmark numbers:

➜  node index.js
original x 936,987 ops/sec ±1.31% (85 runs sampled)
new x 2,813,038 ops/sec ±2.08% (79 runs sampled)

This benchmark shows the result of updating an object of six keys, with an object of one key. The benchmark still shows a significant speed increase when updating four out of six keys.

@process-bot

This comment has been minimized.

Show comment
Hide comment
@process-bot

process-bot Oct 14, 2016

Thanks for the pull request! Make sure it satisfies this checklist. My human colleagues will appreciate it!

Here is what to expect next, and if anyone wants to comment, keep these things in mind.

process-bot commented Oct 14, 2016

Thanks for the pull request! Make sure it satisfies this checklist. My human colleagues will appreciate it!

Here is what to expect next, and if anyone wants to comment, keep these things in mind.

@evancz evancz merged commit 352590b into elm:master Oct 14, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@evancz

This comment has been minimized.

Show comment
Hide comment
@evancz

evancz Oct 14, 2016

Member

Great find! I am curious though, what other things did you try to get to this faster version?

What about these things?

// inline
newRecord[key] = (key in updatedFields) ? updatedFields[key] : oldRecord[key];

// typeof
var value = updatedFields[key];
newRecord[key] = (typeof value !== 'undefined') ? value : oldRecord[key];

I know that typeof tends to be the fastest way to check if something exists (it is way faster than in usually) and we know that no Elm values will be undefined.

In other words, I am kind of surprised the way you do it is faster, and I'd like to see some alternatives!

Member

evancz commented Oct 14, 2016

Great find! I am curious though, what other things did you try to get to this faster version?

What about these things?

// inline
newRecord[key] = (key in updatedFields) ? updatedFields[key] : oldRecord[key];

// typeof
var value = updatedFields[key];
newRecord[key] = (typeof value !== 'undefined') ? value : oldRecord[key];

I know that typeof tends to be the fastest way to check if something exists (it is way faster than in usually) and we know that no Elm values will be undefined.

In other words, I am kind of surprised the way you do it is faster, and I'd like to see some alternatives!

@Skinney

This comment has been minimized.

Show comment
Hide comment
@Skinney

Skinney Oct 14, 2016

Contributor

I must admit I ruled out the "typeof" way of doing things because I completely forgot I was dealing with Elm and not Javascript.

I did, however, add it to the benchmark (gist updated), and it is actually the slowest of the alternatives. I also did a run on Safari. The "new" way is still faster, though not as much as in Node/Chrome. The "old" way and "typeof" are about the same performance.

The reason I believe it is faster, simply has to do with the number of lookups. In the original version, you would check if the key was in updatedFields (one lookup), and based on the result retrieve the key from the correct obj (another lookup). This means that every key in the original object is looked up twice. The typeof version still looks up a bunch of keys, many of them non existent, in the updatedFields obj, and then perform another lookup in the old record. In my version, every key is looked up once, except for the updated keys which are looked up twice.

So:
Old way: 2 lookups per key (12 for the benchmarked example)
Typeof: 2 lookups per key - 1 per updated key + typeof check (11 for the benchmarked example)
New: 1 lookup per original key + 1 per updated key (7 for the benchmarked example)

Safari doesn't show nearly as big an improvement as Node/Chrome. Maybe looking up a non-existent key prevents an optimization or something?

Contributor

Skinney commented Oct 14, 2016

I must admit I ruled out the "typeof" way of doing things because I completely forgot I was dealing with Elm and not Javascript.

I did, however, add it to the benchmark (gist updated), and it is actually the slowest of the alternatives. I also did a run on Safari. The "new" way is still faster, though not as much as in Node/Chrome. The "old" way and "typeof" are about the same performance.

The reason I believe it is faster, simply has to do with the number of lookups. In the original version, you would check if the key was in updatedFields (one lookup), and based on the result retrieve the key from the correct obj (another lookup). This means that every key in the original object is looked up twice. The typeof version still looks up a bunch of keys, many of them non existent, in the updatedFields obj, and then perform another lookup in the old record. In my version, every key is looked up once, except for the updated keys which are looked up twice.

So:
Old way: 2 lookups per key (12 for the benchmarked example)
Typeof: 2 lookups per key - 1 per updated key + typeof check (11 for the benchmarked example)
New: 1 lookup per original key + 1 per updated key (7 for the benchmarked example)

Safari doesn't show nearly as big an improvement as Node/Chrome. Maybe looking up a non-existent key prevents an optimization or something?

@mgold

This comment has been minimized.

Show comment
Hide comment
@mgold

mgold Oct 15, 2016

Contributor

I tried benchmarking Object.assign({}, oldRecord, updatedFields) and it did worse than typeof. I didn't expect it to do so poorly given that it's a language function that does almost exactly what we want; maybe it's spending time handling some extra cases? Maybe it's not optimized in node?

Contributor

mgold commented Oct 15, 2016

I tried benchmarking Object.assign({}, oldRecord, updatedFields) and it did worse than typeof. I didn't expect it to do so poorly given that it's a language function that does almost exactly what we want; maybe it's spending time handling some extra cases? Maybe it's not optimized in node?

@evancz

This comment has been minimized.

Show comment
Hide comment
@evancz

evancz Oct 15, 2016

Member

Cool, thanks for looking into this! The theory on minimizing lookups makes sense to me.

Member

evancz commented Oct 15, 2016

Cool, thanks for looking into this! The theory on minimizing lookups makes sense to me.

@gustf

This comment has been minimized.

Show comment
Hide comment
@gustf

gustf Oct 15, 2016

This will change the semantics of the update function as it will add a new property even if it did not exist in the old object. This might be ok for this use case, just wanted to give a heads up.

Example:

var baseObj = {
    name: "Robin",
    age: 27,
    hobbies: ["football", "chess", "programming"],
    male: true,
    occupation: "student",
    phoneNumber: 12345678,
    email: "rob@ibit.no"
};

var updateObj = {
   age: 28,
   city: "Kalmar"
};

> updateOriginal(baseObj, updateObj);
{ name: 'Robin',
  age: 28,
  hobbies: [ 'football', 'chess', 'programming' ],
  male: true,
  occupation: 'student',
  phoneNumber: 12345678,
  email: 'rob@ibit.no' }

> updateNew(baseObj, updateObj);
{ name: 'Robin',
  age: 28,
  hobbies: [ 'football', 'chess', 'programming' ],
  male: true,
  occupation: 'student',
  phoneNumber: 12345678,
  email: 'rob@ibit.no',
  city: 'Kalmar' }

gustf commented Oct 15, 2016

This will change the semantics of the update function as it will add a new property even if it did not exist in the old object. This might be ok for this use case, just wanted to give a heads up.

Example:

var baseObj = {
    name: "Robin",
    age: 27,
    hobbies: ["football", "chess", "programming"],
    male: true,
    occupation: "student",
    phoneNumber: 12345678,
    email: "rob@ibit.no"
};

var updateObj = {
   age: 28,
   city: "Kalmar"
};

> updateOriginal(baseObj, updateObj);
{ name: 'Robin',
  age: 28,
  hobbies: [ 'football', 'chess', 'programming' ],
  male: true,
  occupation: 'student',
  phoneNumber: 12345678,
  email: 'rob@ibit.no' }

> updateNew(baseObj, updateObj);
{ name: 'Robin',
  age: 28,
  hobbies: [ 'football', 'chess', 'programming' ],
  male: true,
  occupation: 'student',
  phoneNumber: 12345678,
  email: 'rob@ibit.no',
  city: 'Kalmar' }
@gustf

This comment has been minimized.

Show comment
Hide comment
@gustf

gustf Oct 15, 2016

This one is almost as fast but don't change the semantics

function updateNew2(oldRecord, updatedFields)
{
    var newRecord = {};

    for (var key in oldRecord)
    {
        newRecord[key] = oldRecord[key];
    }

    for (var key in updatedFields)
    {
        if (typeof oldRecord[key] !== 'undefined') {
            newRecord[key] = updatedFields[key];
        }
    }

    return newRecord;
}
original x 992,576 ops/sec ±0.37% (91 runs sampled)
new2 x 5,420,905 ops/sec ±1.06% (82 runs sampled)
new x 5,681,016 ops/sec ±0.83% (90 runs sampled)
typeof x 774,048 ops/sec ±0.68% (91 runs sampled)

gustf commented Oct 15, 2016

This one is almost as fast but don't change the semantics

function updateNew2(oldRecord, updatedFields)
{
    var newRecord = {};

    for (var key in oldRecord)
    {
        newRecord[key] = oldRecord[key];
    }

    for (var key in updatedFields)
    {
        if (typeof oldRecord[key] !== 'undefined') {
            newRecord[key] = updatedFields[key];
        }
    }

    return newRecord;
}
original x 992,576 ops/sec ±0.37% (91 runs sampled)
new2 x 5,420,905 ops/sec ±1.06% (82 runs sampled)
new x 5,681,016 ops/sec ±0.83% (90 runs sampled)
typeof x 774,048 ops/sec ±0.68% (91 runs sampled)
@evancz

This comment has been minimized.

Show comment
Hide comment
@evancz

evancz Oct 15, 2016

Member

@gustf, the type system means you could never call the function with invalid data like that. We don't need to check because the type system already did.

Member

evancz commented Oct 15, 2016

@gustf, the type system means you could never call the function with invalid data like that. We don't need to check because the type system already did.

@gustf

This comment has been minimized.

Show comment
Hide comment
@gustf

gustf Oct 15, 2016

@evancz Ok, cool. 👍
I just stumbled across this pull-request and since i am a sucker for optimisations I just had to check it out, and then I noticed the semantics change :)

gustf commented Oct 15, 2016

@evancz Ok, cool. 👍
I just stumbled across this pull-request and since i am a sucker for optimisations I just had to check it out, and then I noticed the semantics change :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment