-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch load entites when applying the entity cache #1388
Conversation
This is a performance optimization.
@@ -603,6 +603,13 @@ pub trait Store: Send + Sync + 'static { | |||
/// Looks up an entity using the given store key. | |||
fn get(&self, key: EntityKey) -> Result<Option<Entity>, QueryExecutionError>; | |||
|
|||
/// Look up multiple entities. Returns a map of entities by type. | |||
fn get_many( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a new method in the Store
, why not just use an EntityQuery
with an EntityFilter::In
as the filter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'd be nice, but this can return from multiple entities types in one query, which EntityQuery
can only do in a way makes sense if they implement the same interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The storage layer actually has no idea about interfaces; all EntityQuery
wants is that all the entity types mentioned have all the attributes that are used in the filter and the order. And since you are only filtering by id
, that's true. Even when we query for entities that implement an interface, we get the entire concrete entity where some attributes might only appear on one of the entity types (justcan't filter/sort by those)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear: there's code in the relational schema that tracks which interfaces a table implements, but that information isn't used anywhere, and we should probably just delete it since it serves no real purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that if I use an EntityQuery
as we do for interfaces, I think I'll get wrong results because I can't express which entities are associated with which ids, if there is Fred
the Dog
and Fred
the Cat
, I can either have both or none, I can't have only one. Maybe I could use the EntityQuery
to give me a superset of what I want and then filter, but that seemed hacky to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right - EntityQuery
isn't up to the job; I was hoping we could avoid adding another method to the Store
trait, but you are right in that it is needed.
I tried this on Moloch with json and relational schemas and it synced correctly on both. |
&self, | ||
subgraph_id: &SubgraphDeploymentId, | ||
ids_for_type: BTreeMap<&str, Vec<&str>>, | ||
) -> Result<BTreeMap<String, Vec<Entity>>, StoreError>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor nit: if this returned a Result<Vec<Entity>, StoreError>
, we could save ourselves building a BTreeMap
that the caller is not really going to use, anyway. But I doubt it will change things all that much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the map key as well when iterating over the value returned.
Part of #1381. Local benchmark with erc20:
Most of the
store_set
work got moved intoas_modifications
. Nowstore_set
doesn't show up in the top categories anymore, andas_modifications
is barely there, so we can claim this combined with #1384 really did improve things, I'd say we got at least a 2x speedup at fetching entites for merging. My local sync got to block 4844255 vs 4729000 previously, which signals than things got overall faster.I still need to check that it really works with JSON storage, but this is ready for review. I also switched the runtime tests to use the test-store so we could get test coverage for this.