Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Offchain API to handle local storage race conditions better #3143

Closed
cmichi opened this issue Jul 18, 2019 · 8 comments

Comments

@cmichi
Copy link
Member

commented Jul 18, 2019

Follow-up from #3079.

This is how the race condition can appear:

Worker A finds no value in storage (local_storage_get(…) returns None). It can not issue local_storage_compare_and_set(…) with old_value == None, since that function requires a &Vec<u8> for old_value. Thus it must use a local_storage_set(…).

Concurrently another worker does the same thing, resulting in a race condition.

  • Change local_storage_compare_and_set(…) to take Option<&Vec<u8>>.
  • Adapt the way srml/im-online sets local storage to use this new local_storage_compare_and_set(…) API then.

@cmichi cmichi added the M4-core label Jul 18, 2019

@cmichi cmichi self-assigned this Jul 18, 2019

@cmichi cmichi added the F2-bug 🐜 label Jul 18, 2019

@gterzian

This comment has been minimized.

Copy link
Contributor

commented Jul 22, 2019

You could also consider running offchain worker sequentially, as in "one after the other".

You could add compare_and_set in a few places, and that means users having to use an atomic-like API, which is notoriously hard.

Or you run workers sequentially, meaning anyone can get/set as much as they want in any order, without potential for race-condition.

cc @tomusdrw

@tomusdrw

This comment has been minimized.

Copy link
Contributor

commented Jul 22, 2019

@gterzian Offchain workers can be a long-running processes that await some heavy computations to be finished or some external requests.
We currently run them after every block import and we allow the workers to decide when they want to do something or not. By running them sequentially we could end up with a situation where you are multiple blocks behind and that's definitely not feasible.

The need for synchronisation between workers should be rare though, so I'd expect CAS to be used sparingly.

@gterzian

This comment has been minimized.

Copy link
Contributor

commented Jul 23, 2019

By running them sequentially we could end up with a situation where you are multiple blocks behind and that's definitely not feasible.

Couldn't that also happen de-facto if each block adds an off-chain worker that lives for "a long time" and they end-up contending for resources?

So let's say we're at a new block 100, and they're are 99 offchain workers that have been spawned and doing all sorts of stuff, then that 100th spawned worker might well only be scheduled when we're at block 200?

Perhaps you could run the worker for block 100 in parallel to the authoring of block 101, and when 101 is authored, either somehow stop it and run the worker for 101, or actually block block authoring until the worker from 100 finishes.

That could give you some guarantees that workers always run "soon after" their block has been authored.

@tomusdrw

This comment has been minimized.

Copy link
Contributor

commented Jul 23, 2019

Couldn't that also happen de-facto if each block adds an off-chain worker that lives for "a long time" and they end-up contending for resources?

Sure, but it's up to the users to define the logic. They can have a long running workers every 10k blocks (That runs for say 5k blocks) and in the meantime spawn a bunch of smaller tasks that complete faster.

It's a similar argument to: "What if user creates a runtime that causes blocks to be processed for longer than the block time.". Well, sure it's possible, but we trust they know what they are doing, and with good docs we can educate them what can and cannot be done.

@gterzian

This comment has been minimized.

Copy link
Contributor

commented Jul 23, 2019

Ok, I see what you mean.

I'll take a look at adding the Option then, unless somehow is already working on this.

@tomusdrw

This comment has been minimized.

Copy link
Contributor

commented Jul 23, 2019

I'll take a look at adding the Option then, unless somehow is already working on this.

No, I don't think anyone is looking into it currently.

Also I consider the local_storage ext APIs to be low-level, I really hope we will be able to employ some macro magic and make the read/atomic_mutate operations really seamless for the users.

@gterzian

This comment has been minimized.

Copy link
Contributor

commented Jul 23, 2019

Looks like this was done already, right?

old_value: Option<&[u8]>,

@tomusdrw

This comment has been minimized.

Copy link
Contributor

commented Jul 23, 2019

Only on the db layer, it should also be exposed like that to wasm:

fn local_storage_compare_and_set(kind: StorageKind, key: &[u8], old_value: &[u8], new_value: &[u8]) -> bool;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.