-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make block_fs
append-only + other refactorings
#3194
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3194 +/- ##
==========================================
+ Coverage 65.64% 65.67% +0.03%
==========================================
Files 619 619
Lines 48890 49811 +921
Branches 4404 4670 +266
==========================================
+ Hits 32092 32713 +621
- Misses 15297 15575 +278
- Partials 1501 1523 +22
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
libres/lib/res_util/block_fs.cpp
Outdated
@@ -107,12 +93,6 @@ struct file_node_struct { | |||
status; /* This should be: NODE_IN_USE | NODE_FREE; in addition the disk can have NODE_WRITE_ACTIVE for incomplete writes. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NODE_FREE mentioned
@@ -725,9 +536,6 @@ static void block_fs_build_index(block_fs_type *block_fs, | |||
block_fs_insert_index_node(block_fs, filename, | |||
file_node); | |||
break; | |||
case (NODE_FREE): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only two possibilities - replace switch with if?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general LGTM. But would it be possible to add info-level logging to collect information from real usage (aka telemetry) whether anyone actually rewrite blocks? In addition, is it possible to add a test to back up this statement from commit-msg:
Older ERT would read both, but due to the current one appearing last, that'll be the one that ERT will internalise in its dictionary.
cd8c636
to
052a1ec
Compare
I have updated this PR:
|
052a1ec
to
651ad20
Compare
Block FS has support for data deletion, as well as data replacement. The deletion part via the function `enkf_fs_unlink` was removed a while ago since the functionality was unused. It is also possible for data to be replaced. I don't believe this functionality is actually used, but it's been difficult to prove it due to the amount of locations from which the `block_fs` writing function can be accessed (including from Python). The places that are of interest, namely writing of parameters to PARAMETER and response data to RESPONSE after a successful ERT run are written once and only once. That is, by the time ERT knows about FOPR from a forward-model, it has already internalised all of the data, and thus won't need to rewrite the data. Essentially, as far as I can tell, ERT doesn't make use of this "partial" data writing functionality. However, due to the breadth of sources from which the writing function can be called, I can't say this for certain. However, by making `block_fs` append-only, we make it easy to discover new data as it's being written. It makes it possible to poll the file (by eg. dark storage) and if the file has been updated, one only has to look at the end of the file to find out what's new. This could potentially make it easier to update visualisations before all forward models complete. Compatibility with older ERTs are kept: Should I be incorrect in my assumption that no data is being rewritten, storages generated by older ERTs could have gaps in the data file marked with `NODE_FREE`, which is safe to ignore. If this append-only variant has data that is rewritten, it'll appear twice in the file as `NODE_IN_USE`. Older ERT would read both, but due to the current one appearing last, that'll be the one that ERT will internalise in its dictionary.
Block FS has support for data deletion, as well as data replacement. The
deletion part via the function
enkf_fs_unlink
was removed a while agosince the functionality was unused. It is also possible for data to be
replaced. I don't believe this functionality is actually used, but it's
been difficult to prove it due to the amount of locations from which the
block_fs
writing function can be accessed (including from Python).The places that are of interest, namely writing of parameters to
PARAMETER and response data to RESPONSE after a successful ERT run are
written once and only once. That is, by the time ERT knows about FOPR
from a forward-model, it has already internalised all of the data, and
thus won't need to rewrite the data. Essentially, as far as I can tell,
ERT doesn't make use of this "partial" data writing functionality.
However, due to the breadth of sources from which the writing
function can be called, I can't say this for certain.
However, by making
block_fs
append-only, we make it easy to discovernew data as it's being written. It makes it possible to poll the
file (by eg. dark storage) and if the file has been updated, one only
has to look at the end of the file to find out what's new. This could
potentially make it easier to update visualisations before all forward
models complete.
Compatibility with older ERTs are kept: Should I be incorrect in my
assumption that no data is being rewritten, storages generated by older
ERTs could have gaps in the data file marked with
NODE_FREE
, which issafe to ignore. If this append-only variant has data that is rewritten,
it'll appear twice in the file as
NODE_IN_USE
. Older ERT would readboth, but due to the current one appearing last, that'll be the one that
ERT will internalise in its dictionary.