Auto-updater improvements #8078

andresilva · 2018-03-09T00:32:59Z

Refactored auto-updater logic into state machine
Refactored to make it testable
Added --auto-update-delay to randomly delay updates by n blocks. This takes into account the number of the block of the update release (old updates aren't delayed).
Added --auto-update-check-frequency to define the periodicity of auto-update checks in number of blocks.

andresilva · 2018-03-09T00:35:36Z

I still need to write some tests for this. I need help in testing this manually, by deploying the Operations contract into the Dev chain and triggering an update there.

andresilva · 2018-03-12T19:49:57Z

I tested this locally by deploying the Operations and GitHubHint contracts and creating a new release, it seems to be working properly. I refactored the interactions with the operations contract by creating a OperationsClient trait, this will allow me to mock it and to write unit tests.

folsen · 2018-03-13T14:01:57Z

updater/src/updater.rs

@@ -245,6 +294,82 @@ impl Updater {
 		})
 	}

+	fn release_block_number(&self, from: BlockNumber, release: &ReleaseInfo) -> Option<BlockNumber> {


Pretty big piece of code, can I get a comment about what it does and when to use it?

It has a comment up there in the trait definition:

/// Fetches the block number when the given release was added, checking the interval [from; latest_block].

I can make it more descriptive if you feel it's necessary.

Ah sorry, I wasn't looking back at the trait to compare.

folsen · 2018-03-13T14:07:27Z

updater/src/updater.rs

+												fs::copy(path, &dest).map_err(|e| format!("Unable to copy update: {:?}", e))?;
+												restrict_permissions_owner(&dest, false, true).map_err(|e| format!("Unable to update permissions: {}", e))?;
+												info!(target: "updater", "Copied updated binary to {}", dest.display());
+											}


The answer to that is that if you need more than 3 levels of indentation, you're screwed anyway, and should fix your program.

– Linus Torvalds

No but seriously though, is there a way to break this up somehow?

please exit fast on if let statements, cause it's difficult to read it :)

Agreed, I'll clean it up.

folsen · 2018-03-13T14:12:53Z

updater/src/updater.rs

+									// There was an error fetching the update, apply a backoff delay before retrying
+									Err(err) => {
+										let delay = 2usize.pow(retries) as u64;
+										let backoff = (retries, Instant::now() + Duration::from_secs(delay));


I would suggest putting a maximum here, otherwise it goes from 6 days to 12 days to a month etc around 20 retries, meaning we have like ~2 weeks to fix any issue before it takes a month before any node updates. Having a maximum wait of a week seems pretty reasonable?

It's also probably a cause of #8003 crash

The current updater logic will drop the backoff as soon as a new release is pushed, so we won't get stuck in any wait case. I will add a test for that 😉.

folsen · 2018-03-13T14:14:13Z

updater/src/updater.rs

+				UpdaterStatus::Ready { ref release } if *release == latest.track => {
+					let auto = match self.update_policy.filter {
+						UpdateFilter::All => true,
+						UpdateFilter::Critical if release.is_critical /* TODO: or is on a bad fork */ => true,


Intentionally or unintentionally left TODO?

This was copied from the existing code. I'm not sure how to fix it.

debris

This changes are definitely an improvement, but they still need some polishing

debris · 2018-03-13T15:16:34Z

updater/src/updater.rs

 	// Useful environmental stuff.
 	update_policy: UpdatePolicy,
-	weak_self: Mutex<Weak<Updater>>,
+	weak_self: Mutex<Weak<Updater<O, F>>>,


can we somehow get rid of this? I know that this will require moving some logic to a separate structure, but weak_self even sounds terribly :p

debris · 2018-03-13T15:21:51Z

updater/src/updater.rs

+												fs::copy(path, &dest).map_err(|e| format!("Unable to copy update: {:?}", e))?;
+												restrict_permissions_owner(&dest, false, true).map_err(|e| format!("Unable to update permissions: {}", e))?;
+												info!(target: "updater", "Copied updated binary to {}", dest.display());
+											}


please exit fast on if let statements, cause it's difficult to read it :)

kirushik · 2018-03-19T12:44:03Z

updater/src/updater.rs

+			"x86_64-apple-darwin".into()
+		} else if cfg!(windows) {
+			"x86_64-pc-windows-msvc".into()
+		} else if cfg!(target_os = "linux") {


If this is going to be backported, we might encounter problems with different openssl versions on different distros. See three different versions of linux packages here for example.

How do we handle that in out current updater? We might've been breaking clients' installations on Debian and CentOS in the worst case.

I don't think we handle it at all, AFAIK only one Linux release is published in the Operations contract for a given platform. Maybe we can start detecting the distro and treat those as different platforms: "x64-ubuntu-linux-gnu"/"x64-centos-linux-gnu" etc. I think this will require changes in https://github.com/paritytech/push-release and in our CI pipeline.

jamesray1 · 2018-03-27T10:59:18Z

I haven't looked into how auto-update works but I think users should be prompted with a list of changes for each update (particularly for forks and especially for contentious changes) so that they can choose whether to look into the changes before updating. I expect most users will just update like they would with checking yes without reading terms and conditions, but having the choice to look further into it is important for increasing participation and having healthy governance. I'm sure that @vladzamfir would agree.

andresilva · 2018-03-27T15:47:55Z

@jamesray1 I appreciate your concerns about informing users on the list of changes for an update, although that is outside the scope of this PR (which is just refactoring the current behavior of the updater). For now users can disable the installation of updates (--auto-update none). They are still notified about new updates and they can then decide for themselves whether they want to update or not. I feel like what you are suggesting would be more suitable to a GUI app which could be built using the existing RPC methods (https://wiki.parity.io/JSONRPC-parity_set-module#parity_upgradeready https://wiki.parity.io/JSONRPC-parity_set-module#parity_executeupgrade).

jamesray1 · 2018-03-27T21:41:59Z

@andresilva Yes, a GUI app for this kind of update notification would be most appropriate.

tomusdrw

Couple of grumbles.

tomusdrw · 2018-03-23T14:26:50Z

updater/src/updater.rs

+			to_block: BlockId::Latest,
+			address: Some(vec![address]),
+			topics: vec![
+				Some(vec![*RELEASE_ADDED_EVENT_NAME_HASH]),


Why do we need to create this filter manually? Ethabi does not support that?

Yes I think ethabi supports this, I'll fix it.

tomusdrw · 2018-03-28T17:45:49Z

updater/src/updater.rs

+				Err(err) => {
+					let delay = 2usize.pow(retries) as u64;
+					// cap maximum backoff to 1 month
+					let delay = cmp::min(delay, 30 * 24 * 60 * 60);


I'd say a day/week would be enough. To reach 2^21 (month) we already need to spend 48 days waiting for previous delays.

With week (2^19) it's 12 days waiting before reaching the cap
and with a day (2^16) it's 2 days waiting.

(Given my math is not off 😩 )

And I think it shouldn't be an issue to handle one request a week/day from every single node in case something goes wrong.

Seems reasonable to me.

#hedidthemath

tomusdrw · 2018-03-28T17:55:57Z

updater/src/updater.rs

+
+		let mut state = self.state.lock();
+
+		if let Some(latest) = state.latest.clone() {


if let Some(ref latest) = state.latest.clone() should be enough, you are cloning it again anyway.

We can't borrow since we mutate on state. But I've updated it to avoid cloning latest again when triggering the fetch. I also removed a similar latest clone in poll.

tomusdrw · 2018-03-28T17:56:45Z

updater/src/updater.rs

+
+					state.status = UpdaterStatus::Fetching { release: release.clone(), binary, retries: 1 };
+					// will lock self.state
+					drop(state);


Isn't it possible to isolate the locks better? Or maybe pass the lock guard to fetch instead?

Actually fetch doesn't lock on self.state so these are unnecessary. I updated updater_step and execute_upgrade to take the MutexGuard, all explicit drops have been removed.

tomusdrw · 2018-03-28T17:57:02Z

updater/src/updater.rs

+						state.status = UpdaterStatus::Ready { release: release.clone() };
+						// will lock self.state
+						drop(state);
+						self.updater_step();


Perhaps pass the lock guard to the method instead?

tomusdrw · 2018-03-28T17:58:02Z

updater/src/updater.rs

+				},
+				// we're ready to retry the fetch after we applied a backoff for the previous failure
+				UpdaterStatus::FetchBackoff { ref release, backoff, binary } if *release == latest.track && self.time_provider.now() >= backoff.1 => {
+					state.status = UpdaterStatus::Fetching { release: release.clone(), binary, retries: backoff.0 + 1 };


Should we increase the backoff even if latest.track is now different?

If latest.track changes then we don't enter this branch and instead start downloading the new release (I think there's a test for this).

tomusdrw · 2018-03-28T17:59:49Z

updater/src/updater.rs

+
+						} else if self.update_policy.enable_downloading {
+							let update_block_number = {
+								let from = current_block_number.saturating_sub(self.update_policy.max_delay);


Should the delay be capped with upcoming fork block number?

Yes, good idea.

tomusdrw · 2018-03-28T18:01:13Z

updater/src/updater.rs

-				}
+		// Only check for updates every n blocks
+		let current_block_number = self.client.upgrade().map_or(0, |c| c.block_number(BlockId::Latest).unwrap_or(0));
+		if current_block_number % cmp::max(self.update_policy.frequency, 1) != 0 {


How often do we poll? What's the guarantee that it will ever proceed?

This is currently driven by ChainNotify and we only poll after we're done syncing.

Ok! I was worried that if it's run by a timer instead it might go into some weird stalled situations where it's aligned with multiplication of block time and the frequency would work incorrectly.

andresilva · 2018-03-29T17:00:22Z

@tomusdrw I think I've addressed all your comments please re-review.

andresilva added 4 commits March 8, 2018 15:37

updater: refactor updater flow into state machine

203edac

updater: delay update randomly within max range

c7bd283

updater: configurable update delay

4ede9cf

updater: split polling and updater state machine step

bb98824

andresilva added F6-refactor 📚 Code needs refactoring. M4-core ⛓ Core client code / Rust. labels Mar 9, 2018

updater: drop state to avoid deadlocking

114e93a

andresilva force-pushed the andre-updater-improvements branch from 74f0b29 to 114e93a Compare March 9, 2018 00:43

NikVolf changed the title ~~[WIP] Auto-updater improvements~~ Auto-updater improvements Mar 9, 2018

NikVolf added the A3-inprogress ⏳ Pull request is in progress. No review needed at this stage. label Mar 9, 2018

andresilva force-pushed the andre-updater-improvements branch from 3b052ac to 19083ee Compare March 11, 2018 01:54

updater: fix fetch backoff

56b616e

andresilva force-pushed the andre-updater-improvements branch from 19083ee to 56b616e Compare March 11, 2018 01:56

updater: fix overflow in update delay calculation

70fa9ee

andresilva force-pushed the andre-updater-improvements branch from 7a178f3 to 70fa9ee Compare March 12, 2018 15:42

updater: configurable update check frequency

5aa9f56

andresilva added A0-pleasereview 🤓 Pull request needs code review. and removed A3-inprogress ⏳ Pull request is in progress. No review needed at this stage. labels Mar 12, 2018

andresilva added 6 commits March 12, 2018 23:35

updater: fix update policy frequency comparison

860fbc4

updater: use lazy_static for platform and platform_id_hash

e033351

updater: refactor operations contract calls into OperationsClient

71402ba

updater: make updater generic over operations and fetch client

145fa5a

updater: fix compilation

73f9509

updater: add testing infrastructure and minimal test

6a0e9a3

folsen reviewed Mar 13, 2018

View reviewed changes

debris reviewed Mar 13, 2018

View reviewed changes

andresilva added A6-mustntgrumble 💦 Pull request has areas for improvement. The author need not address them before merging. and removed A0-pleasereview 🤓 Pull request needs code review. labels Mar 13, 2018

updater: add test for pending outdated fetch

3046b2e

andresilva assigned kirushik Mar 14, 2018

andresilva added 3 commits March 14, 2018 16:57

updater: test auto install of updates

33b1872

updater: add test for capability updates

b87c46d

updater: fix capability update

0e584af

kirushik reviewed Mar 19, 2018

View reviewed changes

5chdn added this to the 1.11 milestone Mar 20, 2018

andresilva mentioned this pull request Mar 28, 2018

Auto-updater improvement quest openethereum/auto-updater#5

Open

12 tasks

tomusdrw reviewed Mar 28, 2018

View reviewed changes

andresilva added 7 commits March 29, 2018 12:21

updater: use ethabi to create event topic filter

c988448

updater: decrease maximum backoff to 1 day

0f683d1

updater: cap maximum update delay with upcoming fork block number

db395a7

updater: receive state mutex guard in updater_step

daf8ca6

updater: overload execute_upgrade to take state mutex guard

57e1215

updater: remove unnecessary clone of latest operations info

605b75e

updater: remove latest operations info clone when triggering fetch

8aa3dc1

tomusdrw approved these changes Apr 3, 2018

View reviewed changes

5chdn merged commit dcaff6f into master Apr 3, 2018

5chdn deleted the andre-updater-improvements branch April 3, 2018 14:51

5chdn added B7-releasenotes 📜 Changes should be mentioned in the release notes of the next minor version release. A8-looksgood 🦄 Pull request is reviewed well. and removed A0-pleasereview 🤓 Pull request needs code review. labels Apr 3, 2018


		let mut state = self.state.lock();

		if let Some(latest) = state.latest.clone() {

Auto-updater improvements #8078

Auto-updater improvements #8078

Conversation

andresilva commented Mar 9, 2018 • edited Loading

andresilva commented Mar 9, 2018

andresilva commented Mar 12, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andresilva Mar 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

debris left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kirushik Mar 19, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamesray1 commented Mar 27, 2018 • edited Loading

andresilva commented Mar 27, 2018 • edited Loading

jamesray1 commented Mar 27, 2018

tomusdrw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomusdrw Mar 28, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andresilva commented Mar 29, 2018

andresilva commented Mar 9, 2018 •

edited

Loading

andresilva commented Mar 12, 2018 •

edited

Loading

andresilva Mar 13, 2018 •

edited

Loading

kirushik Mar 19, 2018 •

edited

Loading

jamesray1 commented Mar 27, 2018 •

edited

Loading

andresilva commented Mar 27, 2018 •

edited

Loading

tomusdrw Mar 28, 2018 •

edited

Loading