Make `block_fs` append-only + other refactorings #3194

pinkwah · 2022-04-03T17:23:21Z

Block FS has support for data deletion, as well as data replacement. The
deletion part via the function enkf_fs_unlink was removed a while ago
since the functionality was unused. It is also possible for data to be
replaced. I don't believe this functionality is actually used, but it's
been difficult to prove it due to the amount of locations from which the
block_fs writing function can be accessed (including from Python).

The places that are of interest, namely writing of parameters to
PARAMETER and response data to RESPONSE after a successful ERT run are
written once and only once. That is, by the time ERT knows about FOPR
from a forward-model, it has already internalised all of the data, and
thus won't need to rewrite the data. Essentially, as far as I can tell,
ERT doesn't make use of this "partial" data writing functionality.
However, due to the breadth of sources from which the writing
function can be called, I can't say this for certain.

However, by making block_fs append-only, we make it easy to discover
new data as it's being written. It makes it possible to poll the
file (by eg. dark storage) and if the file has been updated, one only
has to look at the end of the file to find out what's new. This could
potentially make it easier to update visualisations before all forward
models complete.

Compatibility with older ERTs are kept: Should I be incorrect in my
assumption that no data is being rewritten, storages generated by older
ERTs could have gaps in the data file marked with NODE_FREE, which is
safe to ignore. If this append-only variant has data that is rewritten,
it'll appear twice in the file as NODE_IN_USE. Older ERT would read
both, but due to the current one appearing last, that'll be the one that
ERT will internalise in its dictionary.

codecov-commenter · 2022-04-03T17:46:15Z

Codecov Report

Merging #3194 (052a1ec) into main (28e0ef6) will increase coverage by 0.03%.
The diff coverage is 61.11%.

❗ Current head 052a1ec differs from pull request most recent head 651ad20. Consider uploading reports for the commit 651ad20 to get more accurate results

@@            Coverage Diff             @@
##             main    #3194      +/-   ##
==========================================
+ Coverage   65.64%   65.67%   +0.03%     
==========================================
  Files         619      619              
  Lines       48890    49811     +921     
  Branches     4404     4670     +266     
==========================================
+ Hits        32092    32713     +621     
- Misses      15297    15575     +278     
- Partials     1501     1523      +22

Impacted Files	Coverage Δ
libres/lib/res_util/block_fs.cpp	`67.37% <61.11%> (+15.29%)`	⬆️
ert_gui/ertwidgets/validationsupport.py	`79.45% <0.00%> (-19.18%)`	⬇️
libres/lib/enkf/enkf_state.cpp	`61.32% <0.00%> (-5.52%)`	⬇️
libres/lib/enkf/time_map.cpp	`48.06% <0.00%> (-4.49%)`	⬇️
libres/lib/enkf/enkf_main_jobs.cpp	`18.08% <0.00%> (-3.61%)`	⬇️
ert_gui/ertwidgets/__init__.py	`75.00% <0.00%> (-2.28%)`	⬇️
libres/lib/enkf/forward_load_context.cpp	`75.26% <0.00%> (-1.76%)`	⬇️
libres/lib/enkf/enkf_main.cpp	`54.91% <0.00%> (-1.56%)`	⬇️
libres/lib/enkf/row_scaling.cpp	`58.87% <0.00%> (-1.33%)`	⬇️
libres/lib/enkf/obs_vector.cpp	`36.90% <0.00%> (-1.17%)`	⬇️
... and 20 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

libres/lib/res_util/block_fs.cpp

libres/tests/enkf/test_enkf_fs.cpp

BjarneHerland · 2022-04-06T10:19:41Z

libres/lib/res_util/block_fs.cpp

@@ -107,12 +93,6 @@ struct file_node_struct {
        status; /* This should be: NODE_IN_USE | NODE_FREE; in addition the disk can have NODE_WRITE_ACTIVE for incomplete writes. */


NODE_FREE mentioned

libres/lib/res_util/block_fs.cpp

BjarneHerland · 2022-04-06T10:26:15Z

libres/lib/res_util/block_fs.cpp

@@ -725,9 +536,6 @@ static void block_fs_build_index(block_fs_type *block_fs,
                        block_fs_insert_index_node(block_fs, filename,
                                                   file_node);
                        break;
-                    case (NODE_FREE):


Only two possibilities - replace switch with if?

BjarneHerland

In general LGTM. But would it be possible to add info-level logging to collect information from real usage (aka telemetry) whether anyone actually rewrite blocks? In addition, is it possible to add a test to back up this statement from commit-msg:

Older ERT would read both, but due to the current one appearing last, that'll be the one that ERT will internalise in its dictionary.

pinkwah · 2022-04-11T15:19:07Z

I have updated this PR:

Rebased and kept only the last commit which we have yet to merge
Added test which checks that overwriting works.
Added logging statement

Block FS has support for data deletion, as well as data replacement. The deletion part via the function `enkf_fs_unlink` was removed a while ago since the functionality was unused. It is also possible for data to be replaced. I don't believe this functionality is actually used, but it's been difficult to prove it due to the amount of locations from which the `block_fs` writing function can be accessed (including from Python). The places that are of interest, namely writing of parameters to PARAMETER and response data to RESPONSE after a successful ERT run are written once and only once. That is, by the time ERT knows about FOPR from a forward-model, it has already internalised all of the data, and thus won't need to rewrite the data. Essentially, as far as I can tell, ERT doesn't make use of this "partial" data writing functionality. However, due to the breadth of sources from which the writing function can be called, I can't say this for certain. However, by making `block_fs` append-only, we make it easy to discover new data as it's being written. It makes it possible to poll the file (by eg. dark storage) and if the file has been updated, one only has to look at the end of the file to find out what's new. This could potentially make it easier to update visualisations before all forward models complete. Compatibility with older ERTs are kept: Should I be incorrect in my assumption that no data is being rewritten, storages generated by older ERTs could have gaps in the data file marked with `NODE_FREE`, which is safe to ignore. If this append-only variant has data that is rewritten, it'll appear twice in the file as `NODE_IN_USE`. Older ERT would read both, but due to the current one appearing last, that'll be the one that ERT will internalise in its dictionary.

pinkwah self-assigned this Apr 3, 2022

pinkwah changed the title ~~Refactor bfs~~ Make block_fs append-only + other refactorings Apr 3, 2022

pinkwah force-pushed the refactor-bfs branch from 067912b to 0796bdd Compare April 3, 2022 17:29

pinkwah marked this pull request as ready for review April 3, 2022 17:43