-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
P-54 Binary search for holding time assertion #2401
P-54 Binary search for holding time assertion #2401
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, the intention is quite clear!
Have you considered partition_point? It applies the binary search under the hood too.
We can treat the date array as "sorted"/ partitioned because holding_time(index) == true
implies holding_time(index + 1) == true
(but not vice versa). If that's not the case in practice, then data provider malfunctions and you won't get the right result anyway.
if outcomes.iter().any(is_positive) { | ||
let new_accounts = accounts | ||
.into_iter() | ||
.zip(outcomes.iter()) | ||
.filter_map(|(account, outcome)| (!is_negative(outcome)).then_some(account)) | ||
.collect(); | ||
return (Ok(true), new_accounts) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we merge it into the accounts.iter()
traverse above? I feel it should be possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, but I'm not sure it's worth it. We only do the filtering if any of the queries returned true in that iteration; if we want to merge the two loops we'd have to speculatively build the filtered array in advance, and I think that would make the logic messier to follow. In terms of runtime, I don't think the added loop to construct the filtered list makes much of a difference; what dominates here is in any case the query provider requests.
Thanks!
Didn't know about it; will have a look! Thanks
Exactly; that's the entire justification for the search algorithm. |
|
On further look, there's a problem with error handling -- the predicate arg to |
We should be able to use additional flag/storage to tackle that 🤔 |
Care to elaborate on that? Or, do you have a link with some info? Because from the API docs (and some cursory search) I haven't found any way to work around that (without reimplementing the search algorithm) 🤔 |
…requests-in-holder-assertion-building
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partition_point
might produce more readable code, but let's leave it to another PR even if that's the case
Hi, @grumpygreenguy, It's good to have this try, thanks. and i have a question, how much can requests be reduced in the worst-case scenario after this optimization? |
@zhouhuitian
The complexity goes from linear in the number of dates to logarithmic in the same; specifically, if we're checking for a single address, the number of requests in the worst case goes from 15 to 5. |
let mut pred = |date: &&str| { | ||
let (outcome, new_accounts) = | ||
holding_time_search_step(&mut client, q_min_balance, accounts, date); | ||
holding_time_search_step(&mut client, q_min_balance, accounts.clone(), *date); | ||
accounts = new_accounts; | ||
outcome.map(|is_holding| !is_holding) // negated to match the partition_point API | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Kailai-Wang for fixing this issue! Still gotta wrap my head around some of the subtleties here, apparently :D
In this case the extra clone
shouldn't make a huge difference overall, since the cost is in any case dominated by the HTTP requests; in general though it does feel a bit wasteful to have to clone the array each time, only to replace the original anyway after the call to holding_time_search_step
. What would (in theory) have been the better approach here? (Not for this PR to be sure, just as an overall learning)
Co-authored-by: Kailai Wang <kailai.wang@trustcomputing.de> Co-authored-by: Zhouhui Tian <zhouhui@liteng.io>
* bitacross init * add ci fmt * fmt * comment out docker upload job * comment out docker image based jobs * more adjustments * remove parachain dependency on tee (#2433) * adjust crate name * P-54 Binary search for holding time assertion (#2401) Co-authored-by: Kailai Wang <kailai.wang@trustcomputing.de> Co-authored-by: Zhouhui Tian <zhouhui@liteng.io> * clippy fix --------- Co-authored-by: Zhouhui Tian <125243011+zhouhuitian@users.noreply.github.com> Co-authored-by: Ariel Birnbaum <ariel@litentry.com> Co-authored-by: Kailai Wang <kailai.wang@trustcomputing.de> Co-authored-by: Zhouhui Tian <zhouhui@liteng.io> Co-authored-by: Kai <7630809+Kailai-Wang@users.noreply.github.com>
Context
Reduce the number of data provider requests needed to implement the "holding time" assertion.
Resolves P-54
How (Optional)
The basic idea is to do a binary search over the range of relevant dates, testing every relevant address on each iteration, and discarding any addresses that prove to be irrelevant, that is, where we know for sure that the longest holding time is not held by that address.
An address is proven irrelevant if:
false
for that address on a given datetrue
for another address on the same dateNote on Error Handling
Since the goal is to find the earliest holding date (and not necessarily to identify the corresponding address), it is OK to ignore certain errors as long as the search can proceed. In a nutshell, the search can only get "stuck" (i.e. we don't know whether to choose the earlier or the later half of the date range) if:
true
In that case, the actual value of the failed query may have been
true
orfalse
, so we don't know in which direction the search should continue.If at least one query returned
true
, we know that the search continues in the earlier half; at worst we are failing to remove some irrelevant addresses that will be discarded in the subsequent step when the query returnsfalse
for them.TODO