Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Auto-updater improvements #8078

Merged
merged 37 commits into from
Apr 3, 2018
Merged

Auto-updater improvements #8078

merged 37 commits into from
Apr 3, 2018

Conversation

andresilva
Copy link
Contributor

@andresilva andresilva commented Mar 9, 2018

  • Refactored auto-updater logic into state machine
  • Refactored to make it testable
  • Added --auto-update-delay to randomly delay updates by n blocks. This takes into account the number of the block of the update release (old updates aren't delayed).
  • Added --auto-update-check-frequency to define the periodicity of auto-update checks in number of blocks.

Fixes #7272.

@andresilva andresilva added F6-refactor 📚 Code needs refactoring. M4-core ⛓ Core client code / Rust. labels Mar 9, 2018
@andresilva
Copy link
Contributor Author

I still need to write some tests for this. I need help in testing this manually, by deploying the Operations contract into the Dev chain and triggering an update there.

@NikVolf NikVolf changed the title [WIP] Auto-updater improvements Auto-updater improvements Mar 9, 2018
@NikVolf NikVolf added the A3-inprogress ⏳ Pull request is in progress. No review needed at this stage. label Mar 9, 2018
@andresilva andresilva added A0-pleasereview 🤓 Pull request needs code review. and removed A3-inprogress ⏳ Pull request is in progress. No review needed at this stage. labels Mar 12, 2018
@andresilva
Copy link
Contributor Author

andresilva commented Mar 12, 2018

I tested this locally by deploying the Operations and GitHubHint contracts and creating a new release, it seems to be working properly. I refactored the interactions with the operations contract by creating a OperationsClient trait, this will allow me to mock it and to write unit tests.

@@ -245,6 +294,82 @@ impl Updater {
})
}

fn release_block_number(&self, from: BlockNumber, release: &ReleaseInfo) -> Option<BlockNumber> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty big piece of code, can I get a comment about what it does and when to use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has a comment up there in the trait definition:

/// Fetches the block number when the given release was added, checking the interval [from; latest_block].

I can make it more descriptive if you feel it's necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, I wasn't looking back at the trait to compare.

fs::copy(path, &dest).map_err(|e| format!("Unable to copy update: {:?}", e))?;
restrict_permissions_owner(&dest, false, true).map_err(|e| format!("Unable to update permissions: {}", e))?;
info!(target: "updater", "Copied updated binary to {}", dest.display());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The answer to that is that if you need more than 3 levels of indentation, you're screwed anyway, and should fix your program.

– Linus Torvalds

No but seriously though, is there a way to break this up somehow?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please exit fast on if let statements, cause it's difficult to read it :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I'll clean it up.

// There was an error fetching the update, apply a backoff delay before retrying
Err(err) => {
let delay = 2usize.pow(retries) as u64;
let backoff = (retries, Instant::now() + Duration::from_secs(delay));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest putting a maximum here, otherwise it goes from 6 days to 12 days to a month etc around 20 retries, meaning we have like ~2 weeks to fix any issue before it takes a month before any node updates. Having a maximum wait of a week seems pretty reasonable?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also probably a cause of #8003 crash

Copy link
Contributor Author

@andresilva andresilva Mar 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current updater logic will drop the backoff as soon as a new release is pushed, so we won't get stuck in any wait case. I will add a test for that 😉.

UpdaterStatus::Ready { ref release } if *release == latest.track => {
let auto = match self.update_policy.filter {
UpdateFilter::All => true,
UpdateFilter::Critical if release.is_critical /* TODO: or is on a bad fork */ => true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentionally or unintentionally left TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was copied from the existing code. I'm not sure how to fix it.

Copy link
Collaborator

@debris debris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes are definitely an improvement, but they still need some polishing

// Useful environmental stuff.
update_policy: UpdatePolicy,
weak_self: Mutex<Weak<Updater>>,
weak_self: Mutex<Weak<Updater<O, F>>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we somehow get rid of this? I know that this will require moving some logic to a separate structure, but weak_self even sounds terribly :p

fs::copy(path, &dest).map_err(|e| format!("Unable to copy update: {:?}", e))?;
restrict_permissions_owner(&dest, false, true).map_err(|e| format!("Unable to update permissions: {}", e))?;
info!(target: "updater", "Copied updated binary to {}", dest.display());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please exit fast on if let statements, cause it's difficult to read it :)

@andresilva andresilva added A6-mustntgrumble 💦 Pull request has areas for improvement. The author need not address them before merging. and removed A0-pleasereview 🤓 Pull request needs code review. labels Mar 13, 2018
"x86_64-apple-darwin".into()
} else if cfg!(windows) {
"x86_64-pc-windows-msvc".into()
} else if cfg!(target_os = "linux") {
Copy link
Collaborator

@kirushik kirushik Mar 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is going to be backported, we might encounter problems with different openssl versions on different distros. See three different versions of linux packages here for example.

How do we handle that in out current updater? We might've been breaking clients' installations on Debian and CentOS in the worst case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we handle it at all, AFAIK only one Linux release is published in the Operations contract for a given platform. Maybe we can start detecting the distro and treat those as different platforms: "x64-ubuntu-linux-gnu"/"x64-centos-linux-gnu" etc. I think this will require changes in https://github.com/paritytech/push-release and in our CI pipeline.

@andresilva andresilva added A6-mustntgrumble 💦 Pull request has areas for improvement. The author need not address them before merging. A0-pleasereview 🤓 Pull request needs code review. and removed A0-pleasereview 🤓 Pull request needs code review. A6-mustntgrumble 💦 Pull request has areas for improvement. The author need not address them before merging. labels Mar 19, 2018
@5chdn 5chdn added this to the 1.11 milestone Mar 20, 2018
@jamesray1
Copy link
Contributor

jamesray1 commented Mar 27, 2018

I haven't looked into how auto-update works but I think users should be prompted with a list of changes for each update (particularly for forks and especially for contentious changes) so that they can choose whether to look into the changes before updating. I expect most users will just update like they would with checking yes without reading terms and conditions, but having the choice to look further into it is important for increasing participation and having healthy governance. I'm sure that @vladzamfir would agree.

@andresilva
Copy link
Contributor Author

andresilva commented Mar 27, 2018

@jamesray1 I appreciate your concerns about informing users on the list of changes for an update, although that is outside the scope of this PR (which is just refactoring the current behavior of the updater). For now users can disable the installation of updates (--auto-update none). They are still notified about new updates and they can then decide for themselves whether they want to update or not. I feel like what you are suggesting would be more suitable to a GUI app which could be built using the existing RPC methods (https://wiki.parity.io/JSONRPC-parity_set-module#parity_upgradeready https://wiki.parity.io/JSONRPC-parity_set-module#parity_executeupgrade).

@jamesray1
Copy link
Contributor

@andresilva Yes, a GUI app for this kind of update notification would be most appropriate.

Copy link
Collaborator

@tomusdrw tomusdrw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of grumbles.

to_block: BlockId::Latest,
address: Some(vec![address]),
topics: vec![
Some(vec![*RELEASE_ADDED_EVENT_NAME_HASH]),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to create this filter manually? Ethabi does not support that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think ethabi supports this, I'll fix it.

Err(err) => {
let delay = 2usize.pow(retries) as u64;
// cap maximum backoff to 1 month
let delay = cmp::min(delay, 30 * 24 * 60 * 60);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say a day/week would be enough. To reach 2^21 (month) we already need to spend 48 days waiting for previous delays.

With week (2^19) it's 12 days waiting before reaching the cap
and with a day (2^16) it's 2 days waiting.

(Given my math is not off 😩 )

And I think it shouldn't be an issue to handle one request a week/day from every single node in case something goes wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#hedidthemath


let mut state = self.state.lock();

if let Some(latest) = state.latest.clone() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if let Some(ref latest) = state.latest.clone() should be enough, you are cloning it again anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't borrow since we mutate on state. But I've updated it to avoid cloning latest again when triggering the fetch. I also removed a similar latest clone in poll.


state.status = UpdaterStatus::Fetching { release: release.clone(), binary, retries: 1 };
// will lock self.state
drop(state);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it possible to isolate the locks better? Or maybe pass the lock guard to fetch instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually fetch doesn't lock on self.state so these are unnecessary. I updated updater_step and execute_upgrade to take the MutexGuard, all explicit drops have been removed.

state.status = UpdaterStatus::Ready { release: release.clone() };
// will lock self.state
drop(state);
self.updater_step();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps pass the lock guard to the method instead?

},
// we're ready to retry the fetch after we applied a backoff for the previous failure
UpdaterStatus::FetchBackoff { ref release, backoff, binary } if *release == latest.track && self.time_provider.now() >= backoff.1 => {
state.status = UpdaterStatus::Fetching { release: release.clone(), binary, retries: backoff.0 + 1 };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we increase the backoff even if latest.track is now different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If latest.track changes then we don't enter this branch and instead start downloading the new release (I think there's a test for this).


} else if self.update_policy.enable_downloading {
let update_block_number = {
let from = current_block_number.saturating_sub(self.update_policy.max_delay);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the delay be capped with upcoming fork block number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good idea.

}
// Only check for updates every n blocks
let current_block_number = self.client.upgrade().map_or(0, |c| c.block_number(BlockId::Latest).unwrap_or(0));
if current_block_number % cmp::max(self.update_policy.frequency, 1) != 0 {
Copy link
Collaborator

@tomusdrw tomusdrw Mar 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often do we poll? What's the guarantee that it will ever proceed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently driven by ChainNotify and we only poll after we're done syncing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok! I was worried that if it's run by a timer instead it might go into some weird stalled situations where it's aligned with multiplication of block time and the frequency would work incorrectly.

@andresilva
Copy link
Contributor Author

@tomusdrw I think I've addressed all your comments please re-review.

@5chdn 5chdn merged commit dcaff6f into master Apr 3, 2018
@5chdn 5chdn deleted the andre-updater-improvements branch April 3, 2018 14:51
@5chdn 5chdn added B7-releasenotes 📜 Changes should be mentioned in the release notes of the next minor version release. A8-looksgood 🦄 Pull request is reviewed well. and removed A0-pleasereview 🤓 Pull request needs code review. labels Apr 3, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A8-looksgood 🦄 Pull request is reviewed well. B7-releasenotes 📜 Changes should be mentioned in the release notes of the next minor version release. F6-refactor 📚 Code needs refactoring. M4-core ⛓ Core client code / Rust.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants