Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomic version map update #1104

Merged
merged 2 commits into from
Dec 8, 2023
Merged

Atomic version map update #1104

merged 2 commits into from
Dec 8, 2023

Conversation

joe-iddon
Copy link
Collaborator

@joe-iddon joe-iddon commented Nov 27, 2023

Reference Issues/PRs

What does this implement or fix?

Writing the symbol ref key should always be the last step for all public methods in version_map.hpp to avoid race conditions where the symbol ref key points temporarily to, say, a TOMBSTONE_ALL version key in the process of a call to write_and_prune_previous. This means that readers temporarily get invalid data (or no data!).

In particular, if reading a library with incompletes on, then we can get a Stream descriptor not found in pipeline context error due to this race condition (need incompletes to keep reading even if no versions exist whereas for a symbol with incompletes off we would get a more informative error, and also need columns specified otherwise we get a very closely related Normalization metdata not found exception for the same reason but errors at a different point in the code).

Warning

API Changes

  • Rather than the obtuse Stream descriptor not found/Normalization metadata not found errors, we now give a more sensible error message: E_NO_SYMBOL_DATA: read_dataframe_impl: read returned no data for symbol {} which is a more informative failure mode in the case of reading an incomplete=True symbol, but there being no data available.
  • Changed logging from INFO to DEBUG when compacting a symbol (top-level API giving INFO log message does not make senese):
- log::version().info("Compacting incomplete symbol {}", stream_id);
+ log::version().debug("Compacting incomplete symbol {}", stream_id);

Example race condition

Previously (one possible race condition)

  • the new version is created and APPEND_DATA keys deleted
  • a TOMBSTONE_ALL version was added to the version list
  • the version ref key was updated to point at that new version
  • the new version was added to the version list
  • the version ref key updated

Now:

  • the new version is created and APPEND_DATA keys deleted
  • a TOMBSTONE_ALL version was added to the version list
  • the new version was added to the version list
  • the version ref key updated

Implementation

Main changes are to version_map.hpp: Before, whenever it made sense to write the sym ref key, it was done. Following these two rules, we guarantee that all updates are now atomic:

  • Public methods write the symbol ref key at most once.
  • Private methods don't write the symbol ref key.
  • All methods only call private methods.

There are two exceptions to these rules:

  • Any method can call a public method if it clearly doesn't write the symbol ref key.
  • Public methods can wrap other public methods if they don't write the symbol ref key again themselves.
    Examples of exceptions:
  • delete_all_versions is public, but can call the public tomstone_from_key_or_all since all it does is convert the return type
    fix_ref_key is public but calls scan_and_rewrite which is also public. This is fine as it just does a read of the ref key first, but then scan_and_rewrite does the rest of the work.

Any other comments?

  • We should really delete the APPEND_DATA keys as the final step in the process.

Checklist

Checklist for code changes...
  • Have you updated the relevant docstrings, documentation and copyright notice?
  • Is this contribution tested against all ArcticDB's features?
  • Do all exceptions introduced raise appropriate error messages?
  • Are API changes highlighted in the PR description?
  • Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

@joe-iddon joe-iddon marked this pull request as ready for review December 6, 2023 09:05
@joe-iddon joe-iddon changed the title WIP: Atomic version map update Atomic version map update Dec 6, 2023
@joe-iddon joe-iddon added the bug Something isn't working label Dec 6, 2023
@joe-iddon joe-iddon self-assigned this Dec 7, 2023
cpp/arcticdb/util/error_code.hpp Show resolved Hide resolved
cpp/arcticdb/version/version_map.hpp Show resolved Hide resolved
cpp/arcticdb/version/version_map.hpp Show resolved Hide resolved
@joe-iddon joe-iddon merged commit cd6c654 into master Dec 8, 2023
131 checks passed
@joe-iddon joe-iddon deleted the atomic_version_map_update branch December 8, 2023 15:03
G-D-Petrov pushed a commit that referenced this pull request Dec 12, 2023
#### Reference Issues/PRs
<!--Example: Fixes #1234. See also #3456.-->

#### What does this implement or fix?

Writing the symbol ref key should always be the last step for all public
methods in `version_map.hpp` to avoid race conditions where the symbol
ref key points temporarily to, say, a `TOMBSTONE_ALL` version key in the
process of a call to `write_and_prune_previous`. This means that readers
temporarily get invalid data (or no data!).

In particular, if reading a library with `incompletes` on, then we can
get a `Stream descriptor not found in pipeline context` error due to
this race condition (need incompletes to keep reading even if no
versions exist whereas for a symbol with `incompletes` off we would get
a more informative error, and also need `columns` specified otherwise we
get a very closely related `Normalization metdata not found` exception
for the same reason but errors at a different point in the code).

> [!WARNING]  
> ### API Changes 
> - Rather than the obtuse `Stream descriptor not found`/`Normalization
metadata not found` errors, we now give a more sensible error message:
`E_NO_SYMBOL_DATA: read_dataframe_impl: read returned no data for symbol
{}` which is a more informative failure mode in the case of reading an
`incomplete=True` symbol, but there being no data available.
> - Changed logging from INFO to DEBUG when compacting a symbol
(top-level API giving INFO log message does not make senese):

```diff
- log::version().info("Compacting incomplete symbol {}", stream_id);
+ log::version().debug("Compacting incomplete symbol {}", stream_id);
```


#### Example race condition

Previously (one possible race condition)
- the new version is created and APPEND_DATA keys deleted
- a TOMBSTONE_ALL version was added to the version list
- the version ref key was updated to point at that new version
- the new version was added to the version list
- the version ref key updated

Now:

- the new version is created and APPEND_DATA keys deleted
- a TOMBSTONE_ALL version was added to the version list
- the new version was added to the version list
- the version ref key updated


### Implementation

Main changes are to `version_map.hpp`: Before, whenever it made sense to
write the sym ref key, it was done. Following these two rules, we
guarantee that all updates are now atomic:
- Public methods write the symbol ref key at most once.
- Private methods don't write the symbol ref key.
- All methods only call private methods.

There are two exceptions to these rules:


- Any method can call a public method if it clearly doesn't write the
symbol ref key.
- Public methods can wrap other public methods if they don't write the
symbol ref key again themselves.
Examples of exceptions:
- delete_all_versions is public, but can call the public
tomstone_from_key_or_all since all it does is convert the return type
fix_ref_key is public but calls scan_and_rewrite which is also public.
This is fine as it just does a read of the ref key first, but then
scan_and_rewrite does the rest of the work.

#### Any other comments?

- We should really delete the APPEND_DATA keys as the final step in the
process.

#### Checklist

<details>
  <summary>
   Checklist for code changes...
  </summary>
 
- [x] Have you updated the relevant docstrings, documentation and
copyright notice?
- [x] Is this contribution tested against [all ArcticDB's
features](../docs/mkdocs/docs/technical/contributing.md)?
- [x] Do all exceptions introduced raise appropriate [error
messages](https://docs.arcticdb.io/error_messages/)?
 - [x] Are API changes highlighted in the PR description?
- [x] Is the PR labelled as enhancement or bug so it appears in
autogenerated release notes?
</details>

<!--
Thanks for contributing a Pull Request to ArcticDB! Please ensure you
have taken a look at:
- ArcticDB's Code of Conduct:
https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md
- ArcticDB's Contribution Licensing:
https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing
-->
poodlewars added a commit that referenced this pull request May 15, 2024
)

#### What does this implement or fix?

We should only write the version ref key once when we write with
`prune_previous_versions=True`. Currently we are writing it twice - once
after we write the tombstone all and once when we write the new version.
This means that there is a period of time where the symbol is
unreadable.

This was fixed a while ago with PR #1104 but regressed with PR #1355.
poodlewars added a commit that referenced this pull request May 15, 2024
)

#### What does this implement or fix?

We should only write the version ref key once when we write with
`prune_previous_versions=True`. Currently we are writing it twice - once
after we write the tombstone all and once when we write the new version.
This means that there is a period of time where the symbol is
unreadable.

This was fixed a while ago with PR #1104 but regressed with PR #1355.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants