Makes fuzzer call explicitly I/O operations #143

Tpt · 2022-10-10T14:34:21Z

Makes fuzzing way more deterministic

Allows a more careful model of which state are already persisted and which state is not

Does also some refactoring of Db test related API to be able to call them from the fuzzer

Allows to enforce that the database is either create+write, either write, either read-only.

It is only composed of two fields

Adds test-only options to control some behaviors

Makes fuzzing way more deterministic Allows a more careful model of which state are already persisted and which state is not

fuzz/fuzz_targets/simple_model.rs

fuzz/src/lib.rs

cheme · 2022-10-11T12:57:09Z

fuzz/fuzz_targets/refcounted_model.rs

+		operations: impl IntoIterator<Item = &'a Operation>,
+		model: &mut Model,
+	) {
+		let mut counts = [None; 256];


Suggested change

let mut counts = [None; 256];

let mut counts = [None; u8::MAX];

or a constant.

It's 256 and not 255 (=u8::MAX) because it's the number of possible values in a u8. I might do 1 << u8::BITS but I am not sure if it's more readable.

fuzz/fuzz_targets/refcounted_model.rs

fuzz/fuzz_targets/simple_model.rs

cheme · 2022-10-11T13:59:20Z

fuzz/fuzz_targets/simple_model.rs

 	}

 	fn model_optional_content(model: &Model) -> Vec<(Vec<u8>, Vec<u8>)> {
-		Self::model_required_content(model)


Please correct me if wrong, here the difference between required and optional is that optional does include already saved content.

the difference between required and optional is here for the ref-counted case: keys which number of references is 0 might still be in the database (hence "optional") but are not required to be still there ("required"). This is not useful when not using ref-counting.

Should we just keep old code here (Self::model_required_content(model))?

fuzz/fuzz_targets/refcounted_model.rs

fuzz/src/lib.rs

src/db.rs

Tpt

Thank you for the review!

src/db.rs

fuzz/fuzz_targets/refcounted_model.rs

Tpt · 2022-10-11T15:42:36Z

fuzz/fuzz_targets/refcounted_model.rs

+				for layer in model {
+					if let Some(c) = layer.counts[key] {
+						if !layer.is_maybe_saved && c < 0 {
+							min_count += c;


The Option here is the differentiate between Some(0) aka "no reference anymore, might be garbage collected" and None "has never been present in the dabase".

Indeed. I got mixed up here. I wrote that to handle the case where some layers were written to disk but not flushed and have been lost. But we have a proper "recovery" process for our in-memory model of the database state so it is not required anymore. I have removed the condition.

fuzz/fuzz_targets/refcounted_model.rs

Tpt · 2022-10-11T15:46:55Z

fuzz/fuzz_targets/simple_model.rs

 	}

 	fn model_optional_content(model: &Model) -> Vec<(Vec<u8>, Vec<u8>)> {
-		Self::model_required_content(model)


the difference between required and optional is here for the ref-counted case: keys which number of references is 0 might still be in the database (hence "optional") but are not required to be still there ("required"). This is not useful when not using ref-counting.

fuzz/fuzz_targets/simple_model.rs

Tpt · 2022-10-11T16:07:46Z

fuzz/fuzz_targets/refcounted_model.rs

+	let mut count = None;
+	for layer in layers {
+		if let Some(c) = layer.counts[key] {
+			*count.get_or_insert(0) += c;


Indeed. Done.

1. properly fails if not able to recover to a known tests 2. For ref-counted, in case of multiple eligible states, we build a state that is as less restrictive as possible

Tpt · 2022-10-13T14:14:44Z

Changes since last review:

tweaks after review comments
merge of latest master changes
fix inside of internal model recovery process (properly rejects when no recovery possible and fixes behavior related to reference counting).

fuzz/fuzz_targets/refcounted_model.rs

cheme · 2022-10-13T14:29:06Z

fuzz/fuzz_targets/refcounted_model.rs

+				Operation::Set(k) => *counts[usize::from(k)].get_or_insert(0) += 1,
+				Operation::Dereference(k) =>
+					if counts[usize::from(k)].unwrap_or(0) > 0 {
+						*counts[usize::from(k)].get_or_insert(0) -= 1;


I kind of like the previous version where the modification of count for dereference and reference did only happen if in the previous layer there was a net RC > 0 (aka an entry in db).
(for operation set I got no doubt it is correct)

A way to do (and would work with the if condition here), would be to replace

let mut counts = model.last().map_or([None; NUMBER_OF_POSSIBLE_KEYS], |l| l.counts);

with

let mut counts = model.last().map_or([None; NUMBER_OF_POSSIBLE_KEYS], |l| if l.counts == Some(0) { None } else { i.counts });

I kind of like the previous version where the modification of count for dereference and reference did only happen if in the previous layer there was a net RC > 0 (aka an entry in db).

I believe it's already happenning with if counts[usize::from(k)].unwrap_or(0) > 0 {. But maybe i'm misssing something. I have tweaked the code to make each layer store the total number of reference at this time, not the changes in reference count.

Changing Some(0) to None would mean to assume that the GC is collecting all unused key-value at this step and I am not sure it's the case in the DB.

cheme · 2022-10-13T14:36:28Z

fuzz/fuzz_targets/refcounted_model.rs

 		}
+
+		// if we are multiple candidates, we are unsure. We pick the lower count per candidate


I did not get this minimal candidate logic.
Correct me where I am wrong, what I did understand:

each layer is containing net rc of changes from a commit

current on disk value should therefore be last layer with is_written to true

so resetting to earlier state should use the last candidates at a lates layer an not a merge of all lesser rc

each layer is containing net rc of changes from a commit

Sorry, it's the same missleading thing. I have simplified the implementation by storing not the "net rc of changes" but the "total rc at the time the commit is done". I should have written a code review comment about it.

current on disk value should therefore be last layer with is_written to true

I believe it is not always the case. What if the DB crashes between write and flush? We set is_written during write, not after flush.

so resetting to earlier state should use the last candidates at a lates layer an not a merge of all lesser rc

There is also a "fun thing": the DB API returns the present key-values not their RC count. So, if we take the following sequence:

Commit 1: Set([0], [0])

Commit 2: Increment([0])

If we are unsure that commit 2 has been flushed properly there is now way to know from the presence/abscence of key [0] if the reference count of key [0] is 1 or 2.

cheme · 2022-10-13T14:51:01Z

fuzz/fuzz_targets/simple_model.rs

 	}

 	fn model_optional_content(model: &Model) -> Vec<(Vec<u8>, Vec<u8>)> {
-		Self::model_required_content(model)


Should we just keep old code here (Self::model_required_content(model))?

Tpt

Thank you @cheme for the review!

I have fixed or reply to your comments.

I have also made simple_model validation of removed keys stricter (it was unnecessarily lenient).

fuzz/fuzz_targets/refcounted_model.rs

Tpt · 2022-10-14T07:46:11Z

fuzz/fuzz_targets/refcounted_model.rs

+				Operation::Set(k) => *counts[usize::from(k)].get_or_insert(0) += 1,
+				Operation::Dereference(k) =>
+					if counts[usize::from(k)].unwrap_or(0) > 0 {
+						*counts[usize::from(k)].get_or_insert(0) -= 1;


I kind of like the previous version where the modification of count for dereference and reference did only happen if in the previous layer there was a net RC > 0 (aka an entry in db).

I believe it's already happenning with if counts[usize::from(k)].unwrap_or(0) > 0 {. But maybe i'm misssing something. I have tweaked the code to make each layer store the total number of reference at this time, not the changes in reference count.

Changing Some(0) to None would mean to assume that the GC is collecting all unused key-value at this step and I am not sure it's the case in the DB.

Tpt · 2022-10-14T07:50:48Z

fuzz/fuzz_targets/refcounted_model.rs

 		}
+
+		// if we are multiple candidates, we are unsure. We pick the lower count per candidate


each layer is containing net rc of changes from a commit

Sorry, it's the same missleading thing. I have simplified the implementation by storing not the "net rc of changes" but the "total rc at the time the commit is done". I should have written a code review comment about it.

current on disk value should therefore be last layer with is_written to true

I believe it is not always the case. What if the DB crashes between write and flush? We set is_written during write, not after flush.

so resetting to earlier state should use the last candidates at a lates layer an not a merge of all lesser rc

There is also a "fun thing": the DB API returns the present key-values not their RC count. So, if we take the following sequence:

Commit 1: Set([0], [0])

Commit 2: Increment([0])

If we are unsure that commit 2 has been flushed properly there is now way to know from the presence/abscence of key [0] if the reference count of key [0] is 1 or 2.

cheme · 2022-10-14T08:56:23Z

I believe it is not always the case. What if the DB crashes between write and flush? We set is_written during write, not after flush.

This sentence explains a lot to me. I was expecting is_written to be set when we consider things flushed (since running without thread we should be able to set it at this point deterministically, may make sense as a next step).

Changing Some(0) to None would mean to assume that the GC is collecting all unused key-value at this step and I am not sure it's the case in the DB.

yes that was my initial assumption :)

    Commit 1: Set([0], [0])
    Commit 2: Increment([0])

If we are unsure that commit 2 has been flushed properly there is now way to know from the presence/abscence of key [0] if the reference count of key [0] is 1 or 2.

I guess your example was more on a decrement, but I see what you mean.

Tpt · 2022-10-14T09:37:10Z

Thank you!

I believe it is not always the case. What if the DB crashes between write and flush? We set is_written during write, not after flush.
This sentence explains a lot to me. I was expecting is_written to be set when we consider things flushed (since running without thread we should be able to set it at this point deterministically, may make sense as a next step).

Yes! I started to write something about it but I encountered some issue. I prefered to keep it as a next step.

cheme · 2022-10-14T09:39:22Z

Now that I think of it when I did put the test with different processing target, I did remove one variant (I think data in WAL cache but not flushed to WAL file) for being awkward to implement.

Tpt · 2022-10-14T09:41:46Z

@cheme And when there are also I/O errors it creates "fun" states like "flushed if it has been actually written" or "maybe flushed"...

cheme · 2022-10-14T09:45:18Z

In theory these state should be cover by a layer upward or when restarting by the WAL commit not being complete (or crc failing).
But you're right, for fuzzing we should not assume everything is implemented/sequenced properly.

Tpt force-pushed the fuzz branch 2 times, most recently from 479f5c0 to a760120 Compare October 11, 2022 07:34

Tpt requested review from cheme and arkpar October 11, 2022 08:15

Tpt added 4 commits October 11, 2022 11:40

Introduces the OpeningMode struct

ecb29d0

Allows to enforce that the database is either create+write, either write, either read-only.

Splits the InnerOptions struct

9648ad3

It is only composed of two fields

Moves EnableCommitPipelineStages to be test-only

2caec03

Adds test-only options to control some behaviors

Makes fuzzer call explicitly I/O operations

7d94213

Makes fuzzing way more deterministic Allows a more careful model of which state are already persisted and which state is not

Tpt force-pushed the fuzz branch from a760120 to 7d94213 Compare October 11, 2022 09:40

arkpar reviewed Oct 11, 2022

View reviewed changes

fuzz/fuzz_targets/simple_model.rs Outdated Show resolved Hide resolved

arkpar reviewed Oct 11, 2022

View reviewed changes

fuzz/src/lib.rs Show resolved Hide resolved

Applies review suggestions

832c889

Tpt force-pushed the fuzz branch from dc18649 to 832c889 Compare October 11, 2022 12:27

cheme reviewed Oct 11, 2022

View reviewed changes

Applies more review comments by @cheme

ded3940

Tpt commented Oct 11, 2022

View reviewed changes

Tpt force-pushed the fuzz branch from 9f82fdc to 68f41ea Compare October 13, 2022 10:10

Merge master into fuzz

5070ce4

Tpt force-pushed the fuzz branch from 68f41ea to 5070ce4 Compare October 13, 2022 10:40

Improves fuzzers model recovery

11edd44

1. properly fails if not able to recover to a known tests 2. For ref-counted, in case of multiple eligible states, we build a state that is as less restrictive as possible

Tpt requested review from cheme and arkpar October 13, 2022 14:13

arkpar approved these changes Oct 13, 2022

View reviewed changes

cheme reviewed Oct 13, 2022

View reviewed changes

Tpt added 2 commits October 14, 2022 09:51

Fuzzers: improves code formatting and removes duplicates

129e1a6

Fuzzers: makes delete key validation more careful

6a063f8

Tpt commented Oct 14, 2022

View reviewed changes

cheme approved these changes Oct 14, 2022

View reviewed changes

Tpt merged commit ca04149 into paritytech:master Oct 14, 2022

Tpt deleted the fuzz branch October 14, 2022 09:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Makes fuzzer call explicitly I/O operations #143

Makes fuzzer call explicitly I/O operations #143

Tpt commented Oct 10, 2022

cheme Oct 11, 2022

cheme Oct 11, 2022

Tpt Oct 11, 2022

cheme Oct 11, 2022

Tpt Oct 11, 2022

cheme Oct 13, 2022

Tpt left a comment

Tpt Oct 11, 2022

Tpt Oct 11, 2022

Tpt Oct 11, 2022

Tpt commented Oct 13, 2022

cheme Oct 13, 2022

cheme Oct 13, 2022

Tpt Oct 14, 2022

cheme Oct 13, 2022

Tpt Oct 14, 2022 •

edited

Loading

cheme Oct 13, 2022

Tpt left a comment

Tpt Oct 14, 2022

Tpt Oct 14, 2022 •

edited

Loading

cheme commented Oct 14, 2022

Tpt commented Oct 14, 2022

cheme commented Oct 14, 2022

Tpt commented Oct 14, 2022

cheme commented Oct 14, 2022

	let mut counts = [None; 256];
	let mut counts = [None; u8::MAX];

		}

		// if we are multiple candidates, we are unsure. We pick the lower count per candidate

Makes fuzzer call explicitly I/O operations #143

Makes fuzzer call explicitly I/O operations #143

Conversation

Tpt commented Oct 10, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Tpt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Tpt commented Oct 13, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Tpt Oct 14, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Tpt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Tpt Oct 14, 2022 • edited Loading

Choose a reason for hiding this comment

cheme commented Oct 14, 2022

Tpt commented Oct 14, 2022

cheme commented Oct 14, 2022

Tpt commented Oct 14, 2022

cheme commented Oct 14, 2022

Tpt Oct 14, 2022 •

edited

Loading

Tpt Oct 14, 2022 •

edited

Loading