Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch load entites when applying the entity cache #1388

Merged
merged 1 commit into from
Dec 2, 2019

Conversation

leoyvens
Copy link
Collaborator

Part of #1381. Local benchmark with erc20:

Captura de Tela 2019-11-30 às 10 53 39

Most of the store_set work got moved into as_modifications. Now store_set doesn't show up in the top categories anymore, and as_modifications is barely there, so we can claim this combined with #1384 really did improve things, I'd say we got at least a 2x speedup at fetching entites for merging. My local sync got to block 4844255 vs 4729000 previously, which signals than things got overall faster.

I still need to check that it really works with JSON storage, but this is ready for review. I also switched the runtime tests to use the test-store so we could get test coverage for this.

@@ -603,6 +603,13 @@ pub trait Store: Send + Sync + 'static {
/// Looks up an entity using the given store key.
fn get(&self, key: EntityKey) -> Result<Option<Entity>, QueryExecutionError>;

/// Look up multiple entities. Returns a map of entities by type.
fn get_many(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a new method in the Store, why not just use an EntityQuery with an EntityFilter::In as the filter?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd be nice, but this can return from multiple entities types in one query, which EntityQuery can only do in a way makes sense if they implement the same interface.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The storage layer actually has no idea about interfaces; all EntityQuery wants is that all the entity types mentioned have all the attributes that are used in the filter and the order. And since you are only filtering by id, that's true. Even when we query for entities that implement an interface, we get the entire concrete entity where some attributes might only appear on one of the entity types (justcan't filter/sort by those)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear: there's code in the relational schema that tracks which interfaces a table implements, but that information isn't used anywhere, and we should probably just delete it since it serves no real purpose.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that if I use an EntityQuery as we do for interfaces, I think I'll get wrong results because I can't express which entities are associated with which ids, if there is Fred the Dog and Fred the Cat, I can either have both or none, I can't have only one. Maybe I could use the EntityQuery to give me a superset of what I want and then filter, but that seemed hacky to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right - EntityQuery isn't up to the job; I was hoping we could avoid adding another method to the Store trait, but you are right in that it is needed.

@leoyvens
Copy link
Collaborator Author

leoyvens commented Dec 2, 2019

I tried this on Moloch with json and relational schemas and it synced correctly on both.

&self,
subgraph_id: &SubgraphDeploymentId,
ids_for_type: BTreeMap<&str, Vec<&str>>,
) -> Result<BTreeMap<String, Vec<Entity>>, StoreError>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor nit: if this returned a Result<Vec<Entity>, StoreError>, we could save ourselves building a BTreeMap that the caller is not really going to use, anyway. But I doubt it will change things all that much.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the map key as well when iterating over the value returned.

@leoyvens leoyvens merged commit f890d07 into master Dec 2, 2019
@leoyvens leoyvens deleted the leo/batch-load-in-entity-cache branch December 2, 2019 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants