-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize match and decode, turn on static filters #4631
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I am not sure if SubgraphInstance.host_addresses
is actually used.
let ds_address = match self.address { | ||
Some(addr) => addr, | ||
|
||
let Some(ds_address) = self.address else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh .. I didn't know about this syntax
@@ -111,6 +116,11 @@ where | |||
sender | |||
} | |||
}; | |||
|
|||
if let Some(address) = data_source.address() { | |||
self.host_addresses.insert(address.into()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if I am missing something, but I don't see self.host_addresses
used anywhere to check if an address is there or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow seems I completely lost track of where I was going with this 😅. It's not only the AIs that can hallucinate. I'll rework this to actually use the hashset.
trigger: &TriggerData<C>, | ||
) -> Box<dyn Iterator<Item = &T::Host> + Send + '_> { | ||
// If the trigger has no address, it has opted out of this optimization and might match any host. | ||
let Some(address) = trigger.address_match() else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be too nitpicky, but if you turn trigger.address_match()
into trigger.address_matches(other: Option<&[u8]>)
and use that inside the filter, you could save a bit of cloning. It'll require changing the signature of this method to
pub fn hosts_for_trigger<'a>(
&'a self,
trigger: &'a TriggerData<C>,
) -> Box<dyn Iterator<Item = &'a T::Host> + Send + '_> { .. }
@lutter I've reworked this in the latest commit. Now this properly leverages the hashset (a hashmap now) and it not only filters out false positives, but makes matching of any trigger with an address more efficient, by partitioning the hosts by their address and using that in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just one comment on how asserts will impact indexing other subgraphs
// Unwrap and assert: The same host we just popped must be the last one in `hosts_by_address`. | ||
let hosts = self.hosts_by_address.get_mut(address.as_slice()).unwrap(); | ||
let idx = hosts.pop().unwrap(); | ||
assert_eq!(idx, self.hosts.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this running inside some panic handler that will stop the subgraph if this assert gets tripped? Otherwise, it would be better to return an error so that only this subgraph stops, and others can continue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, I've checked and this panic will only stop the runner thread for the subgraph that trips it.
Integration cluster failures look unrelated, merging. |
We saw great effect from static filters on uniswap, stabilizing the firehose connection and the overall throughput of the subgraph. But they do cause a large number of false positive triggers to be received, resulting in a lot of time being spent by
fn match_and_decode
to discard those false positives.Right now the total number of calls to
match_and_decode
would be:n_triggers * n_data_sources
. This PR optimizes trigger matching by introducing a hash set of addresses to filter out triggers which are certain not to match any data sources, and this requires only a hash operation for each trigger.The second commit turns on static filters for any subgraph with more than 10k data sources, under the assumption that with this optimization static filters will be an improvement for most subgraphs.