Batch load entites when applying the entity cache #1388

leoyvens · 2019-11-30T14:28:34Z

Part of #1381. Local benchmark with erc20:

Most of the store_set work got moved into as_modifications. Now store_set doesn't show up in the top categories anymore, and as_modifications is barely there, so we can claim this combined with #1384 really did improve things, I'd say we got at least a 2x speedup at fetching entites for merging. My local sync got to block 4844255 vs 4729000 previously, which signals than things got overall faster.

I still need to check that it really works with JSON storage, but this is ready for review. I also switched the runtime tests to use the test-store so we could get test coverage for this.

This is a performance optimization.

lutter · 2019-11-30T19:22:26Z

graph/src/components/store.rs

@@ -603,6 +603,13 @@ pub trait Store: Send + Sync + 'static {
    /// Looks up an entity using the given store key.
    fn get(&self, key: EntityKey) -> Result<Option<Entity>, QueryExecutionError>;

+    /// Look up multiple entities. Returns a map of entities by type.
+    fn get_many(


Instead of a new method in the Store, why not just use an EntityQuery with an EntityFilter::In as the filter?

That'd be nice, but this can return from multiple entities types in one query, which EntityQuery can only do in a way makes sense if they implement the same interface.

The storage layer actually has no idea about interfaces; all EntityQuery wants is that all the entity types mentioned have all the attributes that are used in the filter and the order. And since you are only filtering by id, that's true. Even when we query for entities that implement an interface, we get the entire concrete entity where some attributes might only appear on one of the entity types (justcan't filter/sort by those)

Just to be clear: there's code in the relational schema that tracks which interfaces a table implements, but that information isn't used anywhere, and we should probably just delete it since it serves no real purpose.

I mean that if I use an EntityQuery as we do for interfaces, I think I'll get wrong results because I can't express which entities are associated with which ids, if there is Fred the Dog and Fred the Cat, I can either have both or none, I can't have only one. Maybe I could use the EntityQuery to give me a superset of what I want and then filter, but that seemed hacky to me.

You are right - EntityQuery isn't up to the job; I was hoping we could avoid adding another method to the Store trait, but you are right in that it is needed.

leoyvens · 2019-12-02T12:55:18Z

I tried this on Moloch with json and relational schemas and it synced correctly on both.

lutter · 2019-12-02T16:11:21Z

graph/src/components/store.rs

+        &self,
+        subgraph_id: &SubgraphDeploymentId,
+        ids_for_type: BTreeMap<&str, Vec<&str>>,
+    ) -> Result<BTreeMap<String, Vec<Entity>>, StoreError>;


One minor nit: if this returned a Result<Vec<Entity>, StoreError>, we could save ourselves building a BTreeMap that the caller is not really going to use, anyway. But I doubt it will change things all that much.

We need the map key as well when iterating over the value returned.

runtime, store: Batch load entites when applying the entity cache

454806f

This is a performance optimization.

leoyvens requested review from Jannis and lutter November 30, 2019 14:28

lutter reviewed Nov 30, 2019

View reviewed changes

lutter approved these changes Dec 2, 2019

View reviewed changes

leoyvens merged commit f890d07 into master Dec 2, 2019

leoyvens deleted the leo/batch-load-in-entity-cache branch December 2, 2019 16:24

leoyvens mentioned this pull request Jan 7, 2020

Introduce an LfuCache for entities #1416

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch load entites when applying the entity cache #1388

Batch load entites when applying the entity cache #1388

leoyvens commented Nov 30, 2019

lutter Nov 30, 2019

leoyvens Dec 1, 2019

lutter Dec 2, 2019

lutter Dec 2, 2019

leoyvens Dec 2, 2019

lutter Dec 2, 2019

leoyvens commented Dec 2, 2019

lutter Dec 2, 2019

leoyvens Dec 2, 2019

Batch load entites when applying the entity cache #1388

Batch load entites when applying the entity cache #1388

Conversation

leoyvens commented Nov 30, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leoyvens commented Dec 2, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment