Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make block_fs append-only + other refactorings #3194

Merged
merged 2 commits into from
Apr 27, 2022

Conversation

pinkwah
Copy link
Contributor

@pinkwah pinkwah commented Apr 3, 2022

Block FS has support for data deletion, as well as data replacement. The
deletion part via the function enkf_fs_unlink was removed a while ago
since the functionality was unused. It is also possible for data to be
replaced. I don't believe this functionality is actually used, but it's
been difficult to prove it due to the amount of locations from which the
block_fs writing function can be accessed (including from Python).

The places that are of interest, namely writing of parameters to
PARAMETER and response data to RESPONSE after a successful ERT run are
written once and only once. That is, by the time ERT knows about FOPR
from a forward-model, it has already internalised all of the data, and
thus won't need to rewrite the data. Essentially, as far as I can tell,
ERT doesn't make use of this "partial" data writing functionality.
However, due to the breadth of sources from which the writing
function can be called, I can't say this for certain.

However, by making block_fs append-only, we make it easy to discover
new data as it's being written. It makes it possible to poll the
file (by eg. dark storage) and if the file has been updated, one only
has to look at the end of the file to find out what's new. This could
potentially make it easier to update visualisations before all forward
models complete.

Compatibility with older ERTs are kept: Should I be incorrect in my
assumption that no data is being rewritten, storages generated by older
ERTs could have gaps in the data file marked with NODE_FREE, which is
safe to ignore. If this append-only variant has data that is rewritten,
it'll appear twice in the file as NODE_IN_USE. Older ERT would read
both, but due to the current one appearing last, that'll be the one that
ERT will internalise in its dictionary.

@pinkwah pinkwah self-assigned this Apr 3, 2022
@pinkwah pinkwah changed the title Refactor bfs Make block_fs append-only + other refactorings Apr 3, 2022
@pinkwah pinkwah marked this pull request as ready for review April 3, 2022 17:43
@codecov-commenter
Copy link

codecov-commenter commented Apr 3, 2022

Codecov Report

Merging #3194 (052a1ec) into main (28e0ef6) will increase coverage by 0.03%.
The diff coverage is 61.11%.

❗ Current head 052a1ec differs from pull request most recent head 651ad20. Consider uploading reports for the commit 651ad20 to get more accurate results

@@            Coverage Diff             @@
##             main    #3194      +/-   ##
==========================================
+ Coverage   65.64%   65.67%   +0.03%     
==========================================
  Files         619      619              
  Lines       48890    49811     +921     
  Branches     4404     4670     +266     
==========================================
+ Hits        32092    32713     +621     
- Misses      15297    15575     +278     
- Partials     1501     1523      +22     
Impacted Files Coverage Δ
libres/lib/res_util/block_fs.cpp 67.37% <61.11%> (+15.29%) ⬆️
ert_gui/ertwidgets/validationsupport.py 79.45% <0.00%> (-19.18%) ⬇️
libres/lib/enkf/enkf_state.cpp 61.32% <0.00%> (-5.52%) ⬇️
libres/lib/enkf/time_map.cpp 48.06% <0.00%> (-4.49%) ⬇️
libres/lib/enkf/enkf_main_jobs.cpp 18.08% <0.00%> (-3.61%) ⬇️
ert_gui/ertwidgets/__init__.py 75.00% <0.00%> (-2.28%) ⬇️
libres/lib/enkf/forward_load_context.cpp 75.26% <0.00%> (-1.76%) ⬇️
libres/lib/enkf/enkf_main.cpp 54.91% <0.00%> (-1.56%) ⬇️
libres/lib/enkf/row_scaling.cpp 58.87% <0.00%> (-1.33%) ⬇️
libres/lib/enkf/obs_vector.cpp 36.90% <0.00%> (-1.17%) ⬇️
... and 20 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@@ -107,12 +93,6 @@ struct file_node_struct {
status; /* This should be: NODE_IN_USE | NODE_FREE; in addition the disk can have NODE_WRITE_ACTIVE for incomplete writes. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NODE_FREE mentioned

@@ -725,9 +536,6 @@ static void block_fs_build_index(block_fs_type *block_fs,
block_fs_insert_index_node(block_fs, filename,
file_node);
break;
case (NODE_FREE):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only two possibilities - replace switch with if?

Copy link
Contributor

@BjarneHerland BjarneHerland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general LGTM. But would it be possible to add info-level logging to collect information from real usage (aka telemetry) whether anyone actually rewrite blocks? In addition, is it possible to add a test to back up this statement from commit-msg:

Older ERT would read both, but due to the current one appearing last, that'll be the one that ERT will internalise in its dictionary.

@pinkwah pinkwah force-pushed the refactor-bfs branch 2 times, most recently from cd8c636 to 052a1ec Compare April 11, 2022 15:18
@pinkwah
Copy link
Contributor Author

pinkwah commented Apr 11, 2022

I have updated this PR:

  • Rebased and kept only the last commit which we have yet to merge
  • Added test which checks that overwriting works.
  • Added logging statement

@pinkwah pinkwah force-pushed the refactor-bfs branch 2 times, most recently from 052a1ec to 651ad20 Compare April 21, 2022 14:08
Block FS has support for data deletion, as well as data replacement. The
deletion part via the function `enkf_fs_unlink` was removed a while ago
since the functionality was unused. It is also possible for data to be
replaced. I don't believe this functionality is actually used, but it's
been difficult to prove it due to the amount of locations from which the
`block_fs` writing function can be accessed (including from Python).

The places that are of interest, namely writing of parameters to
PARAMETER and response data to RESPONSE after a successful ERT run are
written once and only once. That is, by the time ERT knows about FOPR
from a forward-model, it has already internalised all of the data, and
thus won't need to rewrite the data. Essentially, as far as I can tell,
ERT doesn't make use of this "partial" data writing functionality.
However, due to the breadth of sources from which the writing
function can be called, I can't say this for certain.

However, by making `block_fs` append-only, we make it easy to discover
new data as it's being written. It makes it possible to poll the
file (by eg. dark storage) and if the file has been updated, one only
has to look at the end of the file to find out what's new. This could
potentially make it easier to update visualisations before all forward
models complete.

Compatibility with older ERTs are kept: Should I be incorrect in my
assumption that no data is being rewritten, storages generated by older
ERTs could have gaps in the data file marked with `NODE_FREE`, which is
safe to ignore. If this append-only variant has data that is rewritten,
it'll appear twice in the file as `NODE_IN_USE`. Older ERT would read
both, but due to the current one appearing last, that'll be the one that
ERT will internalise in its dictionary.
@pinkwah pinkwah merged commit eb65252 into equinor:main Apr 27, 2022
@pinkwah pinkwah deleted the refactor-bfs branch April 27, 2022 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants