Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some API clarifications #9080

Closed
wants to merge 4 commits into from
Closed

Conversation

pdillinger
Copy link
Contributor

@pdillinger pdillinger commented Oct 26, 2021

Summary:

  • Clarify that RocksDB is not exception safe on many of our callback
    and extension interfaces
  • Clarify FSRandomAccessFile::MultiRead implementations must accept
    non-sorted inputs (see Sort per-file blob read requests by offset #8953)
  • Clarify ConcurrentTaskLimiter and SstFileManager are not (currently)
    extensible interfaces
  • Mark WriteBufferManager as final, so it is then clearly not a
    callback interface, even though it smells like one
  • Clarify TablePropertiesCollector Status returns are mostly ignored

Test Plan: comments only (except WriteBufferManager final)

Summary:
* Clarify FSRandomAccessFile::MultiRead implementations must accept
non-sorted inputs
* Clarify ConcurrentTaskLimiter and SstFileManager are not callback
interfaces
* Mark WriteBufferManager as `final`, so it is then clearly not a
callback interface, even though it smells like one
* Clarify TablePropertiesCollector Status returns are mostly ignored
* Clarify that RocksDB is not exception safe on many of our callback
interfaces

Test Plan: comments only (except WriteBufferManager final)
@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Comment on lines 27 to 30
//
// RocksDB callbacks are NOT exception-safe. A callback completing with an
// exception can lead to undefined behavior in RocksDB, including data loss,
// unreported corruption, deadlocks, and more.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Most of these classes do not define "callbacks" per-se (at least in the classical sense) but methods that can be overridden.
  2. Can we define somewhere more generally (like in programming rules or extensions) that RocksDB is not exception safe, rather than mentioning it for every class. Seems repetitive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This website I found agrees with your terminology. How about this:

// Exceptions MUST NOT propagate out of overridden functions into RocksDB,
// because RocksDB is not exception-safe. This could cause undefined behavior
// including data loss, unreported corruption, deadlocks, and more.

(Technically people can do whatever they want in overrides; it's only a problem if they provide their derived implementations back to RocksDB such that RocksDB will call into their implementation.)

And @ajkr has noted in the past that some RocksDB applications prefer a chance of persisting relevant information to avoiding corruption. I think that's a small minority of applicatoins and they should be able to make their own judgment despite the MUST NOT warning.

rather than mentioning it for every class

Debugging corruption is a nightmare. More often we bear the investigative consequences of violated API contracts because the problem presents as a RocksDB problem. To me it's clearly worthwhile to make non-obvious and high-impact contracts clear everywhere they are relevant, to minimize the potential for violations. We do not live in an ideal world in which everyone using RocksDB reads and absorbs all available documentation before writing code. Even if we did, we now need to "correct the record," because this issue is arguably mis-documented. AFAIK, we only mention exceptions here and even subtly suggest that exceptions passing through RocksDB is OK. (I will update the wiki after this PR lands.) And I know from experience that people don't read release notes, so release note+wiki is not sufficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This website I found agrees with your terminology. How about this:

// Exceptions MUST NOT propagate out of overridden functions into RocksDB,
// because RocksDB is not exception-safe. This could cause undefined behavior
// including data loss, unreported corruption, deadlocks, and more.

How about: "RocksDB is not exception-safe. Exceptions MUST not propagate into RocksDB or undefined behavior -- including data loss, unreported corruption, and deadlocks -- may occur"

rather than mentioning it for every class

Debugging corruption is a nightmare. More often we bear the investigative consequences of violated API contracts because the problem presents as a RocksDB problem. To me it's clearly worthwhile to make non-obvious and high-impact contracts clear everywhere they are relevant, to minimize the potential for violations. We do not live in an ideal world in which everyone using RocksDB reads and absorbs all available documentation before writing code. Even if we did, we now need to "correct the record," because this issue is arguably mis-documented. AFAIK, we only mention exceptions here and even subtly suggest that exceptions passing through RocksDB is OK. (I will update the wiki after this PR lands.) And I know from experience that people don't read release notes, so release note+wiki is not sufficient.

I agree with you about the difficulty in debugging corruptions. But I am not sure how this documentation will make it more likely that others read the documentation and understood it and any problems are not related to this issue.

I also worry that there may be other things (mis-matched compiler options, RTTI, alignment issues, etc) that might cause similar issues in the future and would also need to be documented everywhere. Having a single place that outlined the rules for extending RocksDB would make it easier to maintain the documentation and a consistent message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Exceptions MUST NOT propagate out of overridden functions into RocksDB,
// because RocksDB is not exception-safe. This could cause undefined behavior
// including data loss, unreported corruption, deadlocks, and more.

This works for me.

Copy link
Contributor Author

@pdillinger pdillinger Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's suppose two different people involved in application X built on RocksDB do the following: Person 1 handles the build integration and Person 2 writes source code using and extending RocksDB. What are the "other things" Person 2 needs to know about that specifically relate to RocksDB? I can't think of how any you have listed apply to Person 2 specifically in their interaction with RocksDB. Limitations due to compiler options, RTTI availability, etc. will apply project-wide. Your suggestions seem irrelevant at the source code API level. If you have a specific example of a piece of contract that you think is equatable with mine, please share.

Comment on lines 20 to 21
// This is NOT a callback interface but a public interface for result of
// NewConcurrentTaskLimiter. Any derived classes must be RocksDB internal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really true? The user cannot define their own implementation? Why not? If so, then the interface here should/could be rethought/different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto limiter = static_cast<ConcurrentTaskLimiterImpl*>(
cfd->ioptions()->compaction_thread_limiter.get());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is only this one use, it seems like an easy one to actually fix to allow multiple implementations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we can fix this my putting GetToken in the public API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we can fix this my putting GetToken in the public API.

What function might be implemented differently from ConcurrentTaskLimiterImpl?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really care whether it's extensible or not. We could idiomatically enforce "internal extension only" with private ctor and friend the Impl class.

In either case, I could envision someone wanting public access to GetToken, just as people use our RateLimiter to rate limit other things along with RocksDB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-trivial issue because of the Token class. Postponing to another PR.

Comment on lines 24 to 25
// SstFileManager is not extensible.
// SstFileManager is NOT a callback interface but a public interface for
// result of NewSstFileManager. Any derived classes must be RocksDB internal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I am not sure why this is true. I know the RocksDB Cloud attempts to extend it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ git grep static_cast.SstFileManagerImpl | grep -v test.c | wc -l
13
$

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than internal, can we say must extend the internal Impl class?

I still hate this limit/requirement. Seems like a bad design...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the number of non-override functions in the Impl class, this seems much harder to untangle

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

I know that the cloud folks extended the Impl one, which appears to be the way one would have to go currently (which is not great but...). I am not sure about the "RocksDB internal" wording in the comment. This is one of those gray areas where the code/documentation/design do not line up. It looks like this class should be replaceable but in reality it is not. But to extend it requires access to something that is not public. And we know of scenarios where users felt the need to extend it.

Ideas?

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 27 to 30
//
// RocksDB callbacks are NOT exception-safe. A callback completing with an
// exception can lead to undefined behavior in RocksDB, including data loss,
// unreported corruption, deadlocks, and more.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Exceptions MUST NOT propagate out of overridden functions into RocksDB,
// because RocksDB is not exception-safe. This could cause undefined behavior
// including data loss, unreported corruption, deadlocks, and more.

This works for me.

Comment on lines 20 to 21
// This is NOT a callback interface but a public interface for result of
// NewConcurrentTaskLimiter. Any derived classes must be RocksDB internal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we can fix this my putting GetToken in the public API.

What function might be implemented differently from ConcurrentTaskLimiterImpl?

@@ -34,7 +34,7 @@ class StallInterface {
virtual void Signal() = 0;
};

class WriteBufferManager {
class WriteBufferManager final {
Copy link
Contributor

@hx235 hx235 Nov 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this! One of my previous PR had this assumption in mind that WBM should not be extended and made an API implement change in here. It's good to explicitly mark it final just to confirm my assumption!

@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pdillinger merged this pull request in 82afa01.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants