initial change to consolidate get watcher timestamp #123

tubi70 · 2021-08-12T12:37:11Z

Soundtrack of this PR: link to song that really fits the mood of this PR

Motivation

Currently the watcher database provides an API for getting block signatures and from there we can get timestamps for a block. However, sometimes the watcher fails or gets stuck. Both fog ingest and fog ledger have code for handling these errors and retries. Because the clients need a timestamp on every piece of data in order to function, failing to get a timestamp blocks the service, and immediately escalates to a P0 issue. We now have the code for getting the timestamps from the watcher in two places — in the fog-ingest worker thread and in the fog ledger worker thread, and specifically a function "get_watcher_timestamp" .

fog/fog/ingest/server/src/worker.rs

Line 151 in 5a776bd

fn get_watcher_timestamp(

that is duplicated in both places.

In this PR

It would be better to have a crate like "fog-timestamps" to provides this function, and reconcile any differences between the two versions, and test this. So that we know it works the same way everywhere, and this will make it easier to debug if fog gets blocked on this in production (which has happened before, and will happen again).

< Ticket status, e.g. "fixes #issue number" >
https://app.asana.com/0/1200353042931237/1200688378834228/f

Future Work

< Out of scope non-goals for this PR >
< These should be links to tickets. If the tickets do not exist, make them. >

eranrund · 2021-08-12T18:45:26Z

Any reason to not have this in the mobilecoin repository? This is not fog-specific.
Since this heavily relies on mc-watcher, maybe it should live inside that crate?

eranrund

Some nits, but my main request would be to place this as a method in WatcherDB.

eranrund · 2021-08-12T18:46:03Z

fog/fog_timestamps/src/watcher.rs

+            Ok((ts, res)) => match res {
+                TimestampResultCode::WatcherBehind => {
+                    if watcher_behind_timer.elapsed() > watcher_timeout {
+                        log::warn!(logger, "watcher is still behind on block index = {} after waiting {} seconds, ingest will be blocked", block_index, watcher_timeout.as_secs());


ingest no longer applies here. Applies to other places below.

please let me know if below is the change you are requesting:

log::warn!(logger, "watcher is still behind on block index = {} after waiting {} seconds, caller will be blocked", block_index, watcher_timeout.as_secs());

Any reason to not have this in the mobilecoin repository? This is not fog-specific.
Since this heavily relies on mc-watcher, maybe it should live inside that crate?

Ping on this.

eranrund · 2021-08-12T18:46:31Z

fog/fog_timestamps/src/watcher.rs

+/// proceed. But bringing the server down is costly from ops POV because
+/// we will lose all the user rng's.


user rngs do not apply here.

please confirm that below is a better replacement for the comment

/// we will lose all the blocks

We will not be losing any blocks. We have no way of knowing what effect broken database invariants has on the caller since we don't know what the caller is doing.
I would change But bringing the server down is costly from ops POV because /// we will lose all the user rng's. to something like Generally when an invariant is violated we would panic, but this code is used in services that are expensive to restart (such as the ingest enclave).

cbeck88 · 2021-08-16T17:55:08Z

It's not super easy to see from the diff what the differences are here -- can you document what the change is in which server, and what you think the consequences of that might be?

initial change to consolidate get watcher timestamp

7553f1e

tubi70 requested a review from cbeck88 August 12, 2021 12:37

eranrund reviewed Aug 12, 2021

View reviewed changes

eranrund mentioned this pull request Aug 23, 2021

move watcher timestamp to mc mobilecoinfoundation/mobilecoin#877

Merged

cbeck88 closed this Aug 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

initial change to consolidate get watcher timestamp #123

initial change to consolidate get watcher timestamp #123

tubi70 commented Aug 12, 2021

eranrund commented Aug 12, 2021

eranrund left a comment

eranrund Aug 12, 2021

tubi70 Aug 23, 2021

eranrund Aug 23, 2021

eranrund Aug 12, 2021

tubi70 Aug 23, 2021

eranrund Aug 23, 2021

cbeck88 commented Aug 16, 2021

		/// proceed. But bringing the server down is costly from ops POV because
		/// we will lose all the user rng's.

initial change to consolidate get watcher timestamp #123

initial change to consolidate get watcher timestamp #123

Conversation

tubi70 commented Aug 12, 2021

Motivation

In this PR

Future Work

eranrund commented Aug 12, 2021

eranrund left a comment

Choose a reason for hiding this comment

eranrund Aug 12, 2021

Choose a reason for hiding this comment

tubi70 Aug 23, 2021

Choose a reason for hiding this comment

eranrund Aug 23, 2021

Choose a reason for hiding this comment

eranrund Aug 12, 2021

Choose a reason for hiding this comment

tubi70 Aug 23, 2021

Choose a reason for hiding this comment

eranrund Aug 23, 2021

Choose a reason for hiding this comment

cbeck88 commented Aug 16, 2021