Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Makes fuzzer call explicitly I/O operations #143

Merged
merged 10 commits into from
Oct 14, 2022
Merged

Makes fuzzer call explicitly I/O operations #143

merged 10 commits into from
Oct 14, 2022

Conversation

Tpt
Copy link
Contributor

@Tpt Tpt commented Oct 10, 2022

Makes fuzzing way more deterministic

Allows a more careful model of which state are already  persisted and which state is not

Does also some refactoring of Db test related API to be able to call them from the fuzzer

@Tpt Tpt force-pushed the fuzz branch 2 times, most recently from 479f5c0 to a760120 Compare October 11, 2022 07:34
@Tpt Tpt requested review from cheme and arkpar October 11, 2022 08:15
Allows to enforce that the database is either create+write, either write, either read-only.
It is only composed of two fields
Adds test-only options to control some behaviors
Makes fuzzing way more deterministic

Allows a more careful model of which state are already  persisted and which state is not
operations: impl IntoIterator<Item = &'a Operation>,
model: &mut Model,
) {
let mut counts = [None; 256];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut counts = [None; 256];
let mut counts = [None; u8::MAX];

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or a constant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's 256 and not 255 (=u8::MAX) because it's the number of possible values in a u8. I might do 1 << u8::BITS but I am not sure if it's more readable.

fuzz/fuzz_targets/refcounted_model.rs Outdated Show resolved Hide resolved
fuzz/fuzz_targets/refcounted_model.rs Outdated Show resolved Hide resolved
fuzz/fuzz_targets/refcounted_model.rs Outdated Show resolved Hide resolved
fuzz/fuzz_targets/refcounted_model.rs Outdated Show resolved Hide resolved
fuzz/fuzz_targets/simple_model.rs Outdated Show resolved Hide resolved
}

fn model_optional_content(model: &Model) -> Vec<(Vec<u8>, Vec<u8>)> {
Self::model_required_content(model)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct me if wrong, here the difference between required and optional is that optional does include already saved content.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the difference between required and optional is here for the ref-counted case: keys which number of references is 0 might still be in the database (hence "optional") but are not required to be still there ("required"). This is not useful when not using ref-counting.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just keep old code here (Self::model_required_content(model))?

fuzz/fuzz_targets/refcounted_model.rs Outdated Show resolved Hide resolved
fuzz/src/lib.rs Show resolved Hide resolved
src/db.rs Outdated Show resolved Hide resolved
Copy link
Contributor Author

@Tpt Tpt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review!

src/db.rs Outdated Show resolved Hide resolved
fuzz/fuzz_targets/refcounted_model.rs Outdated Show resolved Hide resolved
fuzz/fuzz_targets/refcounted_model.rs Outdated Show resolved Hide resolved
fuzz/fuzz_targets/refcounted_model.rs Outdated Show resolved Hide resolved
fuzz/fuzz_targets/refcounted_model.rs Outdated Show resolved Hide resolved
for layer in model {
if let Some(c) = layer.counts[key] {
if !layer.is_maybe_saved && c < 0 {
min_count += c;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Option here is the differentiate between Some(0) aka "no reference anymore, might be garbage collected" and None "has never been present in the dabase".

Indeed. I got mixed up here. I wrote that to handle the case where some layers were written to disk but not flushed and have been lost. But we have a proper "recovery" process for our in-memory model of the database state so it is not required anymore. I have removed the condition.

fuzz/fuzz_targets/refcounted_model.rs Outdated Show resolved Hide resolved
}

fn model_optional_content(model: &Model) -> Vec<(Vec<u8>, Vec<u8>)> {
Self::model_required_content(model)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the difference between required and optional is here for the ref-counted case: keys which number of references is 0 might still be in the database (hence "optional") but are not required to be still there ("required"). This is not useful when not using ref-counting.

fuzz/fuzz_targets/simple_model.rs Outdated Show resolved Hide resolved
let mut count = None;
for layer in layers {
if let Some(c) = layer.counts[key] {
*count.get_or_insert(0) += c;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Done.

1. properly fails if not able to recover to a known tests
2. For ref-counted, in case of multiple eligible states, we build a state that is as less restrictive as possible
@Tpt Tpt requested review from cheme and arkpar October 13, 2022 14:13
@Tpt
Copy link
Contributor Author

Tpt commented Oct 13, 2022

Changes since last review:

  • tweaks after review comments
  • merge of latest master changes
  • fix inside of internal model recovery process (properly rejects when no recovery possible and fixes behavior related to reference counting).

fuzz/fuzz_targets/refcounted_model.rs Outdated Show resolved Hide resolved
Operation::Set(k) => *counts[usize::from(k)].get_or_insert(0) += 1,
Operation::Dereference(k) =>
if counts[usize::from(k)].unwrap_or(0) > 0 {
*counts[usize::from(k)].get_or_insert(0) -= 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of like the previous version where the modification of count for dereference and reference did only happen if in the previous layer there was a net RC > 0 (aka an entry in db).
(for operation set I got no doubt it is correct)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A way to do (and would work with the if condition here), would be to replace

	let mut counts = model.last().map_or([None; NUMBER_OF_POSSIBLE_KEYS], |l| l.counts);

with

	let mut counts = model.last().map_or([None; NUMBER_OF_POSSIBLE_KEYS], |l| if l.counts == Some(0) {
None
} else {
i.counts
});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of like the previous version where the modification of count for dereference and reference did only happen if in the previous layer there was a net RC > 0 (aka an entry in db).

I believe it's already happenning with if counts[usize::from(k)].unwrap_or(0) > 0 {. But maybe i'm misssing something. I have tweaked the code to make each layer store the total number of reference at this time, not the changes in reference count.

Changing Some(0) to None would mean to assume that the GC is collecting all unused key-value at this step and I am not sure it's the case in the DB.

}

// if we are multiple candidates, we are unsure. We pick the lower count per candidate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not get this minimal candidate logic.
Correct me where I am wrong, what I did understand:

  • each layer is containing net rc of changes from a commit
  • current on disk value should therefore be last layer with is_written to true
  • so resetting to earlier state should use the last candidates at a lates layer an not a merge of all lesser rc

Copy link
Contributor Author

@Tpt Tpt Oct 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each layer is containing net rc of changes from a commit

Sorry, it's the same missleading thing. I have simplified the implementation by storing not the "net rc of changes" but the "total rc at the time the commit is done". I should have written a code review comment about it.

current on disk value should therefore be last layer with is_written to true

I believe it is not always the case. What if the DB crashes between write and flush? We set is_written during write, not after flush.

so resetting to earlier state should use the last candidates at a lates layer an not a merge of all lesser rc

There is also a "fun thing": the DB API returns the present key-values not their RC count. So, if we take the following sequence:

  • Commit 1: Set([0], [0])
  • Commit 2: Increment([0])

If we are unsure that commit 2 has been flushed properly there is now way to know from the presence/abscence of key [0] if the reference count of key [0] is 1 or 2.

}

fn model_optional_content(model: &Model) -> Vec<(Vec<u8>, Vec<u8>)> {
Self::model_required_content(model)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just keep old code here (Self::model_required_content(model))?

Copy link
Contributor Author

@Tpt Tpt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @cheme for the review!

I have fixed or reply to your comments.

I have also made simple_model validation of removed keys stricter (it was unnecessarily lenient).

fuzz/fuzz_targets/refcounted_model.rs Outdated Show resolved Hide resolved
Operation::Set(k) => *counts[usize::from(k)].get_or_insert(0) += 1,
Operation::Dereference(k) =>
if counts[usize::from(k)].unwrap_or(0) > 0 {
*counts[usize::from(k)].get_or_insert(0) -= 1;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of like the previous version where the modification of count for dereference and reference did only happen if in the previous layer there was a net RC > 0 (aka an entry in db).

I believe it's already happenning with if counts[usize::from(k)].unwrap_or(0) > 0 {. But maybe i'm misssing something. I have tweaked the code to make each layer store the total number of reference at this time, not the changes in reference count.

Changing Some(0) to None would mean to assume that the GC is collecting all unused key-value at this step and I am not sure it's the case in the DB.

}

// if we are multiple candidates, we are unsure. We pick the lower count per candidate
Copy link
Contributor Author

@Tpt Tpt Oct 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each layer is containing net rc of changes from a commit

Sorry, it's the same missleading thing. I have simplified the implementation by storing not the "net rc of changes" but the "total rc at the time the commit is done". I should have written a code review comment about it.

current on disk value should therefore be last layer with is_written to true

I believe it is not always the case. What if the DB crashes between write and flush? We set is_written during write, not after flush.

so resetting to earlier state should use the last candidates at a lates layer an not a merge of all lesser rc

There is also a "fun thing": the DB API returns the present key-values not their RC count. So, if we take the following sequence:

  • Commit 1: Set([0], [0])
  • Commit 2: Increment([0])

If we are unsure that commit 2 has been flushed properly there is now way to know from the presence/abscence of key [0] if the reference count of key [0] is 1 or 2.

@cheme
Copy link
Collaborator

cheme commented Oct 14, 2022

I believe it is not always the case. What if the DB crashes between write and flush? We set is_written during write, not after flush.

This sentence explains a lot to me. I was expecting is_written to be set when we consider things flushed (since running without thread we should be able to set it at this point deterministically, may make sense as a next step).

Changing Some(0) to None would mean to assume that the GC is collecting all unused key-value at this step and I am not sure it's the case in the DB.

yes that was my initial assumption :)

    Commit 1: Set([0], [0])
    Commit 2: Increment([0])

If we are unsure that commit 2 has been flushed properly there is now way to know from the presence/abscence of key [0] if the reference count of key [0] is 1 or 2.

I guess your example was more on a decrement, but I see what you mean.

@Tpt Tpt merged commit ca04149 into paritytech:master Oct 14, 2022
@Tpt Tpt deleted the fuzz branch October 14, 2022 09:36
@Tpt
Copy link
Contributor Author

Tpt commented Oct 14, 2022

Thank you!

I believe it is not always the case. What if the DB crashes between write and flush? We set is_written during write, not after flush.
This sentence explains a lot to me. I was expecting is_written to be set when we consider things flushed (since running without thread we should be able to set it at this point deterministically, may make sense as a next step).

Yes! I started to write something about it but I encountered some issue. I prefered to keep it as a next step.

@cheme
Copy link
Collaborator

cheme commented Oct 14, 2022

Now that I think of it when I did put the test with different processing target, I did remove one variant (I think data in WAL cache but not flushed to WAL file) for being awkward to implement.

@Tpt
Copy link
Contributor Author

Tpt commented Oct 14, 2022

@cheme And when there are also I/O errors it creates "fun" states like "flushed if it has been actually written" or "maybe flushed"...

@cheme
Copy link
Collaborator

cheme commented Oct 14, 2022

In theory these state should be cover by a layer upward or when restarting by the WAL commit not being complete (or crc failing).
But you're right, for fuzzing we should not assume everything is implemented/sequenced properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants