PVF: Compromised artifact file integrity can lead to disputes #677

mrcnski · 2023-03-28T16:27:35Z

ISSUE

Overview

If we try to execute a corrupt PVF artifact it will likely lead to a dispute. See the comment below for an example of this.

A simple mitigation is to just put the hash of the file contents in the filename. Artifacts are not that big, on the order of several dozen MB (I think?). If we use something like https://github.com/BLAKE3-team/BLAKE3, which claims 6866 MiB/s throughput, then the time spent hashing would barely register.

Original comment

As far as I understand, the main concern about not clearing the artifact cache is:

Unsafe functions are used to import the artifact
Their safety guarantees rely on the fact the node has produces them with prepare() and they are not just some random files put there by someone.

Not that we can guarantee nobody will overwrite them during the node run time. I believe the cache pruning was implemented as a measure to mitigate those unwanted outcomes. But I don't have a strong opinion if it makes any sense. If someone wants to screw things up, he will.

_Originally posted by @s0me0ne-unkn0wn in #685

The text was updated successfully, but these errors were encountered:

mrcnski · 2023-08-08T08:30:16Z

For 30-40mb artifacts (and they can get much larger), 6866 MiB/s would take about 4-6 ms. This is negligible compared to preparation, which is on the order of seconds, but not compared to execution which is on the order of 10 ms. Considering that such disk corruptions are quite rare, this hardly seems worth checking before every execution.

So probably the most realistic solution here is to hash after preparation, but only check it on node restart and not before every execution.

@s0me0ne-unkn0wn suggested keeping prepared artifacts in RAM only, which would mitigate the issue of disk/FS corruptions. At first blush it doesn't seem feasible, especially with on-demand coming, but it could be worth considering...

s0me0ne-unkn0wn · 2023-08-08T08:36:57Z

If you're up to some experiment you can try to compress them with sp_maybe_compressed_blob::compress() and see if it makes any good results 🙂

mrcnski · 2023-08-08T08:38:56Z

@s0me0ne-unkn0wn Hmm that would add decompression overhead to each execution, definitely worth considering though.

s0me0ne-unkn0wn · 2023-10-10T09:23:46Z

Not sure I follow. You mean, hash a prepared artifact right after the preparation, include its hash into its name, and check the hash before execution? But how do you know the name of the artifact you're going to open for execution, then? Sounds like a chicken and egg situation.

mrcnski · 2023-10-10T09:30:17Z

Not sure I follow. You mean, hash a prepared artifact right after the preparation, include its hash into its name, and check the hash before execution? But how do you know the name of the artifact you're going to open for execution, then? Sounds like a chicken and egg situation.

I was thinking on node startup, we just list all the artifacts in the cache path, check consistency with the hash, and load them into the artifacts table? As mentioned above, hashing before each execution would be too much of a performance hit.

s0me0ne-unkn0wn · 2023-10-10T09:35:53Z

Aha, thanks for the elaboration, that does make sense. But that would protect from accidental artifact corruption but not from malicious action. On the other hand, I don't see any easy way to protect against malicious artifacts while still keeping them on disk anyway. (Sign them with the validator key maybe? But that would require some effort)

burdges · 2023-10-10T16:28:34Z

How big are artifacts? A few megabytes? blake2 claims 3.08 cycles per byte, so 2/1000th of a second? You;d need to load the PVF only once of course, but that'd avoid a race condition too.

It's otoh just one of many reasons validataors need to avoid their machines being compromised, so whatever.

eagr · 2023-10-11T02:06:34Z

Please help me understand. What would happen if someone deliberately swaps the cache after preparation? And what if they do that in a large scale? Is there any incentive of doing that?

mrcnski · 2023-10-11T08:52:47Z

How big are artifacts? A few megabytes? blake2 claims 3.08 cycles per byte, so 2/1000th of a second? You;d need to load the PVF only once of course, but that'd avoid a race condition too.

@burdges I did a quick analysis in #677 (comment). Using the published numbers of blake3, there would be a ~5ms performance hit which I'm not sure is acceptable in PVF execution.

What do you mean by a race condition? Loading all the artifacts once into memory is an idea we had, but probably not feasible with on-demand.

It's otoh just one of many reasons validataors need to avoid their machines being compromised, so whatever.

Yeah, though a malicious attack in this way seems really unlikely now that the PVF processes have filesystem sandboxing. They are only able to access the artifacts they need to. My main concern is avoiding unneeded disputes due to a hardware failure (corrupted disk) - but if that happens while the node is running, there's probably not much we can do. OTOH, checking the hashes on node startup is acceptable and may bring some small benefit for no real cost.

Please help me understand. What would happen if someone deliberately swaps the cache after preparation? And what if they do that in a large scale? Is there any incentive of doing that?

@eagr The validator would vote wrongly, so I don't think there's any incentive to doing that deliberately or a possible attack vector there. They would just get slashed.

burdges · 2023-10-11T10:23:36Z

My main concern is avoiding unneeded disputes due to a hardware failure (corrupted disk)

These cause invalid disputes, which disables the validator. We need not punish those too heavily, unless correlated across validators, so maybe the solution is just to take this into consideration in the slashing level?

mrcnski · 2023-10-11T12:14:47Z

@burdges Related to that, another idea we had with @eskimor is to only hash the file if a candidate is found to be invalid, after the fact. Reason being that this is a rare case so this additional check shouldn't affect performance too much. If an artifact was corrupted it is very unlikely to be found valid, so we shouldn't need to check file integrity in that case.

s0me0ne-unkn0wn · 2023-10-11T12:23:48Z

@mrcnski sounds like a really good idea!

burdges · 2023-10-11T13:22:38Z

It's actually more important that the validator be hard to hack, but yeah sure.

eagr · 2023-11-28T16:46:21Z

I guess what's left is to check integrity on execution errors and act accordingly?

mrcnski · 2023-11-28T17:08:58Z

Yes, I filed a follow-up here: #2399 (comment) Please coordinate with @Jpserrat as he commented there, I'm not sure if he started on this yet or not!

mrcnski mentioned this issue Aug 24, 2023

PVF: Avoid clearing the artifacts cache on restart #685

Closed

mrcnski added T4-parachains_engineering labels Apr 5, 2023

mrcnski mentioned this issue Apr 7, 2023

PVF: Don't dispute on missing artifact paritytech/polkadot#7011

Merged

6 tasks

Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023

mrcnski mentioned this issue Aug 24, 2023

PVF: include polkadot's version into the artifact path #695

Closed

the-right-joyce added C1-mentor A task where a mentor is available. Please indicate in the issue who the mentor could be. T8-parachains_engineering and removed J7-mentor labels Aug 25, 2023

s0me0ne-unkn0wn mentioned this issue Oct 9, 2023

Check executor params coherence #1774

Merged

mrcnski mentioned this issue Oct 10, 2023

Include polkadot version in artifact path #1828

Merged

mrcnski mentioned this issue Oct 19, 2023

Preserve artifact cache unless stale #1918

Merged

the-right-joyce removed the T8-parachains_engineering label Oct 23, 2023

maksimryndin mentioned this issue Jan 30, 2024

PVF: Incorporate wasmtime version in worker version checks #2742

Closed

maksimryndin mentioned this issue Feb 2, 2024

PVF: Consider re-preparing artifact on failed runtime construction #3139

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PVF: Compromised artifact file integrity can lead to disputes #677

PVF: Compromised artifact file integrity can lead to disputes #677

mrcnski commented Mar 28, 2023 •

edited

Loading

mrcnski commented Aug 8, 2023

s0me0ne-unkn0wn commented Aug 8, 2023

mrcnski commented Aug 8, 2023

s0me0ne-unkn0wn commented Oct 10, 2023

mrcnski commented Oct 10, 2023

s0me0ne-unkn0wn commented Oct 10, 2023

burdges commented Oct 10, 2023 •

edited

Loading

eagr commented Oct 11, 2023

mrcnski commented Oct 11, 2023

burdges commented Oct 11, 2023

mrcnski commented Oct 11, 2023

s0me0ne-unkn0wn commented Oct 11, 2023

burdges commented Oct 11, 2023

eagr commented Nov 28, 2023

mrcnski commented Nov 28, 2023

PVF: Compromised artifact file integrity can lead to disputes #677

PVF: Compromised artifact file integrity can lead to disputes #677

Comments

mrcnski commented Mar 28, 2023 • edited Loading

ISSUE

Overview

Original comment

mrcnski commented Aug 8, 2023

s0me0ne-unkn0wn commented Aug 8, 2023

mrcnski commented Aug 8, 2023

s0me0ne-unkn0wn commented Oct 10, 2023

mrcnski commented Oct 10, 2023

s0me0ne-unkn0wn commented Oct 10, 2023

burdges commented Oct 10, 2023 • edited Loading

eagr commented Oct 11, 2023

mrcnski commented Oct 11, 2023

burdges commented Oct 11, 2023

mrcnski commented Oct 11, 2023

s0me0ne-unkn0wn commented Oct 11, 2023

burdges commented Oct 11, 2023

eagr commented Nov 28, 2023

mrcnski commented Nov 28, 2023

mrcnski commented Mar 28, 2023 •

edited

Loading

burdges commented Oct 10, 2023 •

edited

Loading