Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize match and decode, turn on static filters #4631

Merged
merged 3 commits into from
May 25, 2023

Conversation

leoyvens
Copy link
Collaborator

We saw great effect from static filters on uniswap, stabilizing the firehose connection and the overall throughput of the subgraph. But they do cause a large number of false positive triggers to be received, resulting in a lot of time being spent by fn match_and_decode to discard those false positives.

Right now the total number of calls to match_and_decode would be: n_triggers * n_data_sources. This PR optimizes trigger matching by introducing a hash set of addresses to filter out triggers which are certain not to match any data sources, and this requires only a hash operation for each trigger.

The second commit turns on static filters for any subgraph with more than 10k data sources, under the assumption that with this optimization static filters will be an improvement for most subgraphs.

@leoyvens leoyvens requested a review from lutter May 19, 2023 13:33
Copy link
Collaborator

@lutter lutter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I am not sure if SubgraphInstance.host_addresses is actually used.

let ds_address = match self.address {
Some(addr) => addr,

let Some(ds_address) = self.address else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh .. I didn't know about this syntax

@@ -111,6 +116,11 @@ where
sender
}
};

if let Some(address) = data_source.address() {
self.host_addresses.insert(address.into());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I am missing something, but I don't see self.host_addresses used anywhere to check if an address is there or not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow seems I completely lost track of where I was going with this 😅. It's not only the AIs that can hallucinate. I'll rework this to actually use the hashset.

trigger: &TriggerData<C>,
) -> Box<dyn Iterator<Item = &T::Host> + Send + '_> {
// If the trigger has no address, it has opted out of this optimization and might match any host.
let Some(address) = trigger.address_match() else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be too nitpicky, but if you turn trigger.address_match() into trigger.address_matches(other: Option<&[u8]>) and use that inside the filter, you could save a bit of cloning. It'll require changing the signature of this method to

pub fn hosts_for_trigger<'a>(
        &'a self,
        trigger: &'a TriggerData<C>,
    ) -> Box<dyn Iterator<Item = &'a T::Host> + Send + '_> { .. }

@leoyvens
Copy link
Collaborator Author

@lutter I've reworked this in the latest commit. Now this properly leverages the hashset (a hashmap now) and it not only filters out false positives, but makes matching of any trigger with an address more efficient, by partitioning the hosts by their address and using that in fn hosts_for_trigger.

Copy link
Collaborator

@lutter lutter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just one comment on how asserts will impact indexing other subgraphs

// Unwrap and assert: The same host we just popped must be the last one in `hosts_by_address`.
let hosts = self.hosts_by_address.get_mut(address.as_slice()).unwrap();
let idx = hosts.pop().unwrap();
assert_eq!(idx, self.hosts.len());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this running inside some panic handler that will stop the subgraph if this assert gets tripped? Otherwise, it would be better to return an error so that only this subgraph stops, and others can continue

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, I've checked and this panic will only stop the runner thread for the subgraph that trips it.

@leoyvens
Copy link
Collaborator Author

Integration cluster failures look unrelated, merging.

@leoyvens leoyvens merged commit 9ec9447 into master May 25, 2023
6 checks passed
@leoyvens leoyvens deleted the leo/optimize-match-and-decode branch May 25, 2023 16:51
@leoyvens leoyvens mentioned this pull request May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants