P-54 Binary search for holding time assertion #2401

grumpygreenguy · 2024-01-11T19:22:43Z

Context

Reduce the number of data provider requests needed to implement the "holding time" assertion.

Resolves P-54

How (Optional)

The basic idea is to do a binary search over the range of relevant dates, testing every relevant address on each iteration, and discarding any addresses that prove to be irrelevant, that is, where we know for sure that the longest holding time is not held by that address.

An address is proven irrelevant if:

The data provider query returns false for that address on a given date
The data provider query returns true for another address on the same date

Note on Error Handling

Since the goal is to find the earliest holding date (and not necessarily to identify the corresponding address), it is OK to ignore certain errors as long as the search can proceed. In a nutshell, the search can only get "stuck" (i.e. we don't know whether to choose the earlier or the later half of the date range) if:

No data provider query returns true
At least one query returned an error.

In that case, the actual value of the failed query may have been true or false, so we don't know in which direction the search should continue.

If at least one query returned true, we know that the search continues in the earlier half; at worst we are failing to remove some irrelevant addresses that will be discarded in the subsequent step when the query returns false for them.

TODO

Pass CI

linear · 2024-01-11T19:22:46Z

P-54 Reduce number of requests in HOLDER assertion building

Kailai-Wang

Thank you, the intention is quite clear!

Have you considered partition_point? It applies the binary search under the hood too.

We can treat the date array as "sorted"/ partitioned because holding_time(index) == true implies holding_time(index + 1) == true (but not vice versa). If that's not the case in practice, then data provider malfunctions and you won't get the right result anyway.

tee-worker/litentry/core/assertion-build/src/holding_time.rs

Kailai-Wang · 2024-01-14T21:51:03Z

tee-worker/litentry/core/assertion-build/src/holding_time.rs

+	if outcomes.iter().any(is_positive) {
+		let new_accounts = accounts
+			.into_iter()
+			.zip(outcomes.iter())
+			.filter_map(|(account, outcome)| (!is_negative(outcome)).then_some(account))
+			.collect();
+		return (Ok(true), new_accounts)
+	}


Can we merge it into the accounts.iter() traverse above? I feel it should be possible

Possibly, but I'm not sure it's worth it. We only do the filtering if any of the queries returned true in that iteration; if we want to merge the two loops we'd have to speculatively build the filtered array in advance, and I think that would make the logic messier to follow. In terms of runtime, I don't think the added loop to construct the filtered list makes much of a difference; what dominates here is in any case the query provider requests.

grumpygreenguy · 2024-01-15T10:11:21Z

@Kailai-Wang

Thank you, the intention is quite clear!

Thanks!

Have you considered partition_point? It applies the binary search under the hood too.

Didn't know about it; will have a look! Thanks

We can treat the date array as "sorted"/ partitioned because holding_time(index) == true implies holding_time(index + 1) == true (but not vice versa). If that's not the case in practice, then data provider malfunctions and you won't get the right result anyway.

Exactly; that's the entire justification for the search algorithm.

Kailai-Wang · 2024-01-15T11:05:11Z

partition_point is basically a binary search by predicate, applying it would make the code simpler (so better readability) and less error-prone (as it's a library func) IMO

grumpygreenguy · 2024-01-15T11:53:02Z

partition_point is basically a binary search by predicate, applying it would make the code simpler (so better readability) and less error-prone (as it's a library func) IMO

On further look, there's a problem with error handling -- the predicate arg to partition_point must return bool, but the predicate depends on data provider queries which could fail, so the search logic needs to be able to handle that (at the very least, to bubble the error without panicking)

Kailai-Wang · 2024-01-15T12:03:01Z

We should be able to use additional flag/storage to tackle that 🤔

grumpygreenguy · 2024-01-15T12:09:46Z

We should be able to use additional flag/storage to tackle that 🤔

Care to elaborate on that? Or, do you have a link with some info? Because from the API docs (and some cursory search) I haven't found any way to work around that (without reimplementing the search algorithm) 🤔

…requests-in-holder-assertion-building

…rtion-building

Kailai-Wang

partition_point might produce more readable code, but let's leave it to another PR even if that's the case

…rtion-building

outofxxx · 2024-01-26T07:24:17Z

Hi, @grumpygreenguy, It's good to have this try, thanks. and i have a question, how much can requests be reduced in the worst-case scenario after this optimization?

grumpygreenguy · 2024-01-26T09:33:36Z

@zhouhuitian

Hi, @grumpygreenguy, It's good to have this try, thanks. and i have a question, how much can requests be reduced in the worst-case scenario after this optimization?

The complexity goes from linear in the number of dates to logarithmic in the same; specifically, if we're checking for a single address, the number of requests in the worst case goes from 15 to 5.
If there are multiple addresses to check, the complexity is still in the worst case linear in the number of addresses as well, but depending on how quickly the search can spot and eliminate irrelevant addresses, the cost of each iteration can be drastically reduced. Would need to have a better idea of the distribution of holding times between multiple accounts of real users to make an estimate of the average runtime, tho.

grumpygreenguy · 2024-01-26T10:10:44Z

tee-worker/litentry/core/assertion-build/src/holding_time.rs

+	let mut pred = |date: &&str| {
 		let (outcome, new_accounts) =
-			holding_time_search_step(&mut client, q_min_balance, accounts, date);
+			holding_time_search_step(&mut client, q_min_balance, accounts.clone(), *date);
 		accounts = new_accounts;
 		outcome.map(|is_holding| !is_holding) // negated to match the partition_point API
 	};


Thanks @Kailai-Wang for fixing this issue! Still gotta wrap my head around some of the subtleties here, apparently :D

In this case the extra clone shouldn't make a huge difference overall, since the cost is in any case dominated by the HTTP requests; in general though it does feel a bit wasteful to have to clone the array each time, only to replace the original anyway after the call to holding_time_search_step. What would (in theory) have been the better approach here? (Not for this PR to be sure, just as an overall learning)

…rtion-building

Co-authored-by: Kailai Wang <kailai.wang@trustcomputing.de> Co-authored-by: Zhouhui Tian <zhouhui@liteng.io>

* bitacross init * add ci fmt * fmt * comment out docker upload job * comment out docker image based jobs * more adjustments * remove parachain dependency on tee (#2433) * adjust crate name * P-54 Binary search for holding time assertion (#2401) Co-authored-by: Kailai Wang <kailai.wang@trustcomputing.de> Co-authored-by: Zhouhui Tian <zhouhui@liteng.io> * clippy fix --------- Co-authored-by: Zhouhui Tian <125243011+zhouhuitian@users.noreply.github.com> Co-authored-by: Ariel Birnbaum <ariel@litentry.com> Co-authored-by: Kailai Wang <kailai.wang@trustcomputing.de> Co-authored-by: Zhouhui Tian <zhouhui@liteng.io> Co-authored-by: Kai <7630809+Kailai-Wang@users.noreply.github.com>

Binary search for holding time

4f16439

grumpygreenguy added the rust Pull requests that update Rust code label Jan 11, 2024

Kailai-Wang reviewed Jan 14, 2024

View reviewed changes

grumpygreenguy added 3 commits January 23, 2024 11:01

update tests

d0b04e9

add redundant check

bcb6c25

Merge remote-tracking branch 'origin/dev' into p-54-reduce-number-of-…

4ac47ba

…requests-in-holder-assertion-building

grumpygreenguy marked this pull request as ready for review January 23, 2024 12:04

grumpygreenguy added 4 commits January 23, 2024 14:36

fix clipi

e97d333

fix compilation issues

0e3d1b6

oops

9d32d4f

Merge branch 'dev' into p-54-reduce-number-of-requests-in-holder-asse…

2e41262

…rtion-building

Kailai-Wang approved these changes Jan 24, 2024

View reviewed changes

Kailai-Wang and others added 4 commits January 24, 2024 09:54

Merge branch 'dev' into p-54-reduce-number-of-requests-in-holder-asse…

3520d7b

…rtion-building

refactoring attempt #3

f7f538a

try to fix compile error

477be43

Merge branch 'dev' into p-54-reduce-number-of-requests-in-holder-asse…

a97b32b

…rtion-building

grumpygreenguy commented Jan 26, 2024

View reviewed changes

grumpygreenguy enabled auto-merge (squash) January 26, 2024 13:15

grumpygreenguy added 4 commits January 26, 2024 13:15

retrigger CI

cf3bb00

clipy

572cb1a

more clippy

f50de98

Merge branch 'dev' into p-54-reduce-number-of-requests-in-holder-asse…

0990d56

…rtion-building

grumpygreenguy added 2 commits January 26, 2024 17:16

more more clippy

b99c5ce

and more

a9ab3f8

grumpygreenguy merged commit f120d87 into dev Jan 26, 2024
32 checks passed

grumpygreenguy deleted the p-54-reduce-number-of-requests-in-holder-assertion-building branch January 27, 2024 10:02

kziemianek pushed a commit that referenced this pull request Jan 28, 2024

P-54 Binary search for holding time assertion (#2401)

d0e6e81

Co-authored-by: Kailai Wang <kailai.wang@trustcomputing.de> Co-authored-by: Zhouhui Tian <zhouhui@liteng.io>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

P-54 Binary search for holding time assertion #2401

P-54 Binary search for holding time assertion #2401

grumpygreenguy commented Jan 11, 2024 •

edited

Loading

linear bot commented Jan 11, 2024

Kailai-Wang left a comment

Kailai-Wang Jan 14, 2024

grumpygreenguy Jan 16, 2024

grumpygreenguy commented Jan 15, 2024

Kailai-Wang commented Jan 15, 2024

grumpygreenguy commented Jan 15, 2024

Kailai-Wang commented Jan 15, 2024

grumpygreenguy commented Jan 15, 2024

Kailai-Wang left a comment

outofxxx commented Jan 26, 2024

grumpygreenguy commented Jan 26, 2024

grumpygreenguy Jan 26, 2024

P-54 Binary search for holding time assertion #2401

P-54 Binary search for holding time assertion #2401

Conversation

grumpygreenguy commented Jan 11, 2024 • edited Loading

Context

How (Optional)

Note on Error Handling

TODO

linear bot commented Jan 11, 2024

Kailai-Wang left a comment

Choose a reason for hiding this comment

Kailai-Wang Jan 14, 2024

Choose a reason for hiding this comment

grumpygreenguy Jan 16, 2024

Choose a reason for hiding this comment

grumpygreenguy commented Jan 15, 2024

Kailai-Wang commented Jan 15, 2024

grumpygreenguy commented Jan 15, 2024

Kailai-Wang commented Jan 15, 2024

grumpygreenguy commented Jan 15, 2024

Kailai-Wang left a comment

Choose a reason for hiding this comment

outofxxx commented Jan 26, 2024

grumpygreenguy commented Jan 26, 2024

grumpygreenguy Jan 26, 2024

Choose a reason for hiding this comment

grumpygreenguy commented Jan 11, 2024 •

edited

Loading