-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Atomic version map update #1104
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
joe-iddon
force-pushed
the
atomic_version_map_update
branch
from
December 6, 2023 09:58
79ca969
to
69cb6c6
Compare
poodlewars
approved these changes
Dec 7, 2023
G-D-Petrov
pushed a commit
that referenced
this pull request
Dec 12, 2023
#### Reference Issues/PRs <!--Example: Fixes #1234. See also #3456.--> #### What does this implement or fix? Writing the symbol ref key should always be the last step for all public methods in `version_map.hpp` to avoid race conditions where the symbol ref key points temporarily to, say, a `TOMBSTONE_ALL` version key in the process of a call to `write_and_prune_previous`. This means that readers temporarily get invalid data (or no data!). In particular, if reading a library with `incompletes` on, then we can get a `Stream descriptor not found in pipeline context` error due to this race condition (need incompletes to keep reading even if no versions exist whereas for a symbol with `incompletes` off we would get a more informative error, and also need `columns` specified otherwise we get a very closely related `Normalization metdata not found` exception for the same reason but errors at a different point in the code). > [!WARNING] > ### API Changes > - Rather than the obtuse `Stream descriptor not found`/`Normalization metadata not found` errors, we now give a more sensible error message: `E_NO_SYMBOL_DATA: read_dataframe_impl: read returned no data for symbol {}` which is a more informative failure mode in the case of reading an `incomplete=True` symbol, but there being no data available. > - Changed logging from INFO to DEBUG when compacting a symbol (top-level API giving INFO log message does not make senese): ```diff - log::version().info("Compacting incomplete symbol {}", stream_id); + log::version().debug("Compacting incomplete symbol {}", stream_id); ``` #### Example race condition Previously (one possible race condition) - the new version is created and APPEND_DATA keys deleted - a TOMBSTONE_ALL version was added to the version list - the version ref key was updated to point at that new version - the new version was added to the version list - the version ref key updated Now: - the new version is created and APPEND_DATA keys deleted - a TOMBSTONE_ALL version was added to the version list - the new version was added to the version list - the version ref key updated ### Implementation Main changes are to `version_map.hpp`: Before, whenever it made sense to write the sym ref key, it was done. Following these two rules, we guarantee that all updates are now atomic: - Public methods write the symbol ref key at most once. - Private methods don't write the symbol ref key. - All methods only call private methods. There are two exceptions to these rules: - Any method can call a public method if it clearly doesn't write the symbol ref key. - Public methods can wrap other public methods if they don't write the symbol ref key again themselves. Examples of exceptions: - delete_all_versions is public, but can call the public tomstone_from_key_or_all since all it does is convert the return type fix_ref_key is public but calls scan_and_rewrite which is also public. This is fine as it just does a read of the ref key first, but then scan_and_rewrite does the rest of the work. #### Any other comments? - We should really delete the APPEND_DATA keys as the final step in the process. #### Checklist <details> <summary> Checklist for code changes... </summary> - [x] Have you updated the relevant docstrings, documentation and copyright notice? - [x] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [x] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [x] Are API changes highlighted in the PR description? - [x] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing -->
5 tasks
poodlewars
added a commit
that referenced
this pull request
May 15, 2024
) #### What does this implement or fix? We should only write the version ref key once when we write with `prune_previous_versions=True`. Currently we are writing it twice - once after we write the tombstone all and once when we write the new version. This means that there is a period of time where the symbol is unreadable. This was fixed a while ago with PR #1104 but regressed with PR #1355.
poodlewars
added a commit
that referenced
this pull request
May 15, 2024
) #### What does this implement or fix? We should only write the version ref key once when we write with `prune_previous_versions=True`. Currently we are writing it twice - once after we write the tombstone all and once when we write the new version. This means that there is a period of time where the symbol is unreadable. This was fixed a while ago with PR #1104 but regressed with PR #1355.
5 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reference Issues/PRs
What does this implement or fix?
Writing the symbol ref key should always be the last step for all public methods in
version_map.hpp
to avoid race conditions where the symbol ref key points temporarily to, say, aTOMBSTONE_ALL
version key in the process of a call towrite_and_prune_previous
. This means that readers temporarily get invalid data (or no data!).In particular, if reading a library with
incompletes
on, then we can get aStream descriptor not found in pipeline context
error due to this race condition (need incompletes to keep reading even if no versions exist whereas for a symbol withincompletes
off we would get a more informative error, and also needcolumns
specified otherwise we get a very closely relatedNormalization metdata not found
exception for the same reason but errors at a different point in the code).Warning
API Changes
Stream descriptor not found
/Normalization metadata not found
errors, we now give a more sensible error message:E_NO_SYMBOL_DATA: read_dataframe_impl: read returned no data for symbol {}
which is a more informative failure mode in the case of reading anincomplete=True
symbol, but there being no data available.Example race condition
Previously (one possible race condition)
Now:
Implementation
Main changes are to
version_map.hpp
: Before, whenever it made sense to write the sym ref key, it was done. Following these two rules, we guarantee that all updates are now atomic:There are two exceptions to these rules:
Examples of exceptions:
fix_ref_key is public but calls scan_and_rewrite which is also public. This is fine as it just does a read of the ref key first, but then scan_and_rewrite does the rest of the work.
Any other comments?
Checklist
Checklist for code changes...