Skip to content
This repository has been archived by the owner on Mar 20, 2022. It is now read-only.

Improve performance of circular reference detection #420

Merged

Conversation

wurstbonbon
Copy link
Contributor

Problem

The support for circular references added with #335 comes with a significant performance cost.
For every entity the list of already parsed entities is searched. This leads to non-linear performance characteristics.

Solution

I’ve partitioned the visited entities lookup by ids. Entities with a different type that share the same id will still be added to a list, which shouldn't be a problem in practice. Using a Set would have been simpler, but that was ruled out earlier.

Performance comparison

Setup

const mySchema = new schema.Entity('tacos');
const data = [];
for (let i = 0; i < N; i++) {
  data.push({ id: i, type: `foo-${i}` });
}
normalize(data, [mySchema]);

Normalization times in milliseconds for 5 runs

N = 10,000 entities
current 157 141 139 148 142
optimized 38 37 39 38 36
N = 100,000 entities
current 6696 6738 6762 6772 6673
optimized 255 248 259 257 241

TODO

  • Add & update tests
  • Ensure CI is passing (lint, tests, flow)
  • Update relevant documentation

@coveralls
Copy link

coveralls commented Sep 28, 2019

Coverage Status

Coverage remained the same at 100.0% when pulling 0cc29be on wurstbonbon:circular-reference-performance into 2524e66 on paularmstrong:master.

Copy link
Owner

@paularmstrong paularmstrong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff!

}
visitedEntities.push(input);
visitedEntities[id].push(input);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just another implementation thought, no need to act on it:

I wonder if visitedEntities could be a map structured by key, then id:

visitedEntities = {
  [key: string]: { [id: string]: entity }
};

As it is now, entities of the same type are jumbled together if they have the same id, which kind of just makes it weird if someone were to step through and inspect it with their debugger.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I skipped the key for compactness, but you’re right that would be more explicit. I can make that change if you want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did it, I hope this can be merged soon now. This causes a big performance hit for us.

@ntucker ntucker merged commit df45954 into paularmstrong:master Oct 10, 2019
ntucker pushed a commit to ntucker/normalizr that referenced this pull request Oct 10, 2019
* Improve performance of circular reference detection

* Use type and id for circular reference lookup
MCluck90 pushed a commit to pluralsight/normalizr that referenced this pull request Oct 10, 2019
* Improve performance of circular reference detection

* Use type and id for circular reference lookup
@lock
Copy link

lock bot commented Apr 15, 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot added the Outdated label Apr 15, 2020
@lock lock bot locked as resolved and limited conversation to collaborators Apr 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
4 participants