Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please consider replacing foreign language idioms with proper C++ equivalents #79

Closed
target-san opened this issue Feb 10, 2020 · 6 comments

Comments

@target-san
Copy link

The RAII idiom is a native way in C++ land to handle resources management while try-with-resources/finally/defer-like idiom is an escape hatch for rare occasions. External destruction makes code prone to resources leakage and, more importantly, memory corruption and misuse.

The other issue is lack of proper copy/move constructors/operators. Please check C++ guidelines on movable/copyable classes.

One such example is Scheduler class

  1. It looks like either non-copyable or even non-movable class. Although it has no copy constructors deleted, or move-constructors overridden. As a result, it implicitly uses generated ones, depending on its member fields semantics. This may lead to unexpected copyability and handles duplication, with consequent dangling-pointers or double-frees.
  2. Destructor only checks scheduler was unbound. The proper way would be to both perform unbind and, if movability is desired, create explicit move constructor which would leave old instance in "moved-from" state.
@ben-clayton
Copy link
Contributor

Thank you for your feedback.

The RAII idiom is a native way in C++ land to handle resources management while try-with-resources/finally/defer-like idiom is an escape hatch for rare occasions. External destruction makes code prone to resources leakage and, more importantly, memory corruption and misuse.

These are optional features that marl does not require you to use. defer() is a go-like helper which keeps the amount of boilerplate to a minimum. If it offends you, you're free to use alternatives. I (and the team I work with) feels that defer(), with understanding of its behavior and limitations is still a useful tool to have. Also note that Marl does not have a include-all header. If you want to use defer(), you have to explicitly #include "marl/defer.h".

It looks like either non-copyable or even non-movable class. Although it has no copy constructors deleted, or move-constructors overridden. As a result, it implicitly uses generated ones, depending on its member fields semantics. This may lead to unexpected copyability and handles duplication, with consequent dangling-pointers or double-frees.

This is a valid point, and one I will fix by deleting the copy / move constructors on this class. Good spot!
You said "One such example". Have you found others I should also look at?

Destructor only checks scheduler was unbound. The proper way would be to both perform unbind and, if movability is desired, create explicit move constructor which would leave old instance in "moved-from" state.

The "proper way" here seems rather opinionated.
Unbinding can be expensive as it has to flush all tasks from the single-threaded worker, and is not something I think should be automatically done by the destructor. Also note that the scheduler can be bound to multiple threads, so having automatic unbinding on one thread feels unsymmetrical.

Cheers,
Ben

ben-clayton added a commit to ben-clayton/marl that referenced this issue Feb 10, 2020
`marl::Scheduler` is not safe to copy or move.

Bug: google#79
ben-clayton added a commit to ben-clayton/marl that referenced this issue Feb 10, 2020
`marl::Scheduler` is not safe to copy or move.

Bug: google#79
ben-clayton added a commit that referenced this issue Feb 10, 2020
`marl::Scheduler` is not safe to copy or move.

Bug: #79
@target-san
Copy link
Author

These are optional features that marl does not require you to use. defer() is a go-like helper which keeps the amount of boilerplate to a minimum. If it offends you, you're free to use alternatives. I (and the team I work with) feels that defer(), with understanding of its behavior and limitations is still a useful tool to have. Also note that Marl does not have a include-all header. If you want to use defer(), you have to explicitly #include "marl/defer.h".

Sorry, but the question isn't about what offends me or not. It's about what practices are common and widespread in which lang. C++ doesn't have GC, so both resources and memory are handled with smart wrapper. And RAII with deterministic destructor is a common right way to do so. If you do it the other way, you put aside certain best-practices and make things much less obvious. As for the defer, it's in fact a lambda wrapped into object which calls it on destruction. In fact, it's more or less a copy of gsl::finally. That's what i wrote about when discussing finally-like idioms. You're saying that you make deterministic destruction optional this way.

The "proper way" here seems rather opinionated.

Again, it's a best-practice used throughout C++ world. You can go to https://old.reddit.com/r/cpp/ and ask yourself.

Unbinding can be expensive as it has to flush all tasks from the single-threaded worker, and is not something I think should be automatically done by the destructor. Also note that the scheduler can be bound to multiple threads, so having automatic unbinding on one thread feels unsymmetrical.

In this case, if I understand you correctly, Scheduler binds to HW thread and makes it its execution thread. What would happen if the thread which holds Scheduler on stack dies?

Please don't get me wrong. It's your project. I'm just saying that using idioms from one language in another without reconsidering how they map isn't the best idea. And C++ is very, very different from Go. It gives you much flexibility but requires you being very careful. And it doesn't have GC. So resource leakage is the least of your issues. I urge you to consult with Google's own C++ guidelines and a seasoned C++ engineer.

@ben-clayton
Copy link
Contributor

Sorry, but the question isn't about what offends me or not. It's about what practices are common and widespread in which lang... As for the defer, it's in fact a lambda wrapped into object which calls it on destruction.

Yes, that's precisely what marl::defer does. I'm still not following why you feel that having this optional helper functionality is "prone to resources leakage and, more importantly, memory corruption and misuse". Perhaps if you gave examples, we could discuss the merits here?

Again, it's a best-practice used throughout C++ world.

What exactly is this best-practice used throughout C++ world? Having schedulers automatically unbind themselves from the one thread that called the destructor?

What would happen if the thread which holds Scheduler on stack dies?

If it is destructed without being unbound on any bound thread, it would assert, as this is invalid usage. This is much like calling the destructor of std::thread without std::thread::join(). You have rightfully highlighted that this requirement is currently undocumented from the Scheduler class, which I will add. I also intend to write proper documentation outside of the code, and will clearly state this requirement.
Again, to reiterate my point - the marl::Scheduler can be bound to any number of threads. Having the destructor automatically unbind from the one thread that called the destructor isn't going to help clean up tasks scheduled on any of the other threads.

I'm just saying that using idioms from one language in another without reconsidering how they map isn't the best idea. And C++ is very, very different from Go. It gives you much flexibility but requires you being very careful. And it doesn't have GC

Thank you for your thoughts here. While you seem to have strong opinions, I can assure you that we did try to consider that C++ is not golang. That doesn't mean I didn't try to borrow some of the more convenient features from the language.
Please correct me if I'm wrong, but you seem to give the impression that with correct usage there is potential for resource leakage. Extensive testing has not highlighted this as a problem. If you know problem cases we have, please share examples.

I urge you to consult with Google's own C++ guidelines and a seasoned C++ engineer.

Please keep respectful. I've been writing C++ professionally for over 15 years in many lead roles for several big companies. While I'd never consider myself an expert in C++ (or yet meet anyone else I would call an expert), I do like to think I'm moderately competent.

@target-san
Copy link
Author

Yes, that's precisely what marl::defer does. I'm still not following why you feel that having this optional helper functionality is "prone to resources leakage and, more importantly, memory corruption and misuse". Perhaps if you gave examples, we could discuss the merits here?

The helper function is okay on its own. What's concerning here is its use as the main way of resource management.

What exactly is this best-practice used throughout C++ world? Having schedulers automatically unbind themselves from the one thread that called the destructor?

Of course not. The best-practice I'm referring is that object should properly clean resources it owns (or unbinds from resources it shares) in destructor.

If it is destructed without being unbound on any bound thread, it would assert, as this is invalid usage. This is much like calling the destructor of std::thread without std::thread::join().

Please note that std::thread::~thread() is required to std::terminate() while MARL_ASSERT is a debug assertion.

You have rightfully highlighted that this requirement is currently undocumented from the Scheduler class, which I will add. I also intend to write proper documentation outside of the code, and will clearly state this requirement.
Again, to reiterate my point - the marl::Scheduler can be bound to any number of threads. Having the destructor automatically unbind from the one thread that called the destructor isn't going to help clean up tasks scheduled on any of the other threads.

I thought at first that scheduler, when created, spawns multiple worker threads and manages them. Seems that I was wrong. If I'm correct now, it's user's responsibility to create worker threads and bind scheduler to them. In such a case I'd suggest to create thread binder class, similar to std::lock_guard (or even reuse it?). User then instances this kind of locker object in a worker thread to perform binding. And thread is automatically unbound when it completes.

Please correct me if I'm wrong, but you seem to give the impression that with correct usage there is potential for resource leakage. Extensive testing has not highlighted this as a problem. If you know problem cases we have, please share examples.

I didn't say so. A completely correct usage may yield no issues. My main point is that manual binding is error-prone, and is usually avoided similarly to manual resource or memory management.

Please keep respectful. I've been writing C++ professionally for over 15 years in many lead roles for several big companies. While I'd never consider myself an expert in C++ (or yet meet anyone else I would call an expert), I do like to think I'm moderately competent.

I sincerely apologize if I offended you somehow. What made me come to such conclusion is the use of Golang-specific resource management with manual defer statements which is very rarely used and is considered antipattern, at least in my practice. I must confess I often did similar things myself by trying to write C++ in other languages. So I thought it was an attempt to directly migrate some idea from Go-land to C++.

@ben-clayton
Copy link
Contributor

ben-clayton commented Feb 10, 2020

The helper function is okay on its own. What's concerning here is its use as the main way of resource management.

Okay, let's put defer() to one side, and focus on your objection to needing to call marl::Scheduler::unbind().

Let me firstly explain why we have the bind() / unbind() calls in the first place. Calling bind() sets a TLS variable so the scheduler is associated with the given thread. This serves two purposes:

  1. It means you don't have to pass scheduler pointers into marl::Scheduler::schedule() and the various synchronization primitives. This is convenience.
  2. More importantly it provides a way to get the currently executing Fiber for the current thread. This is used by the likes of marl::ConditionVariable::wait() to suspend the current fiber and place it into a vector so a notify_xxx() can reschedule the blocked fibers.

The Scheduler can work in two distinct modes: single-threaded-worker and multi-threaded-worker. This is controlled by marl::Scheduler::setWorkerThreadCount().

When in multi-threaded mode, N worker threads are automatically spun up by the scheduler, and these are shared across all bound threads. Calling marl::Scheduler::schedule() picks a worker for the task and enqueues it to be background-executed by the worker's dedicated thread. These tasks can block, spawning fibers owned by the worker thread to continue working on other tasks.

If marl::Scheduler::setWorkerThreadCount() is called with 0, then the scheduler is in the single-threaded-worker mode. Each thread bound with marl::Scheduler::bind() has a single-threaded-worker created, but no dedicated worker thread. Tasks scheduled in this mode are again placed into the worker's queue, but because there's no dedicated thread, tasks are only executed (using fibers) when there's a blocking call made by that thread.

In either mode, you have tasks that may not have completed, holding on to resources by the time the scheduler destructor is called. The unbind() call ensures that all in flight tasks for the single-threaded-worker are completed (which may take some time).

This single-threaded-worker case is also the reason the destructor cannot automatically do what you're asking. If you have tasks enqueued on other bound threads, they must execute on those same threads (a task yielded on one thread is guaranteed to resume on that thread - this is to avoid TLS bugs at yield points). There is simply no way in the single-threaded-worker case to get these threads to clean up their tasks because some other thread decided to pull the plug on the shared scheduler.

Please note that std::thread::~thread() is required to std::terminate() while MARL_ASSERT is a debug assertion.

Okay - so there's no warning given that you're going to leak if your program doesn't adhere to the library's interface and you use a release build.
Given how infrequently schedulers are destructed (typically once per application), I could make this check happen in release builds, but we have various policies against logging (and especially aborting) in production code. I sympathize, but there's really no good way to handle not tearing down cleanly. Maybe we could find a better solution here.

My main point is that manual binding is error-prone, and is usually avoided similarly to manual resource or memory management.

Okay, I buy this argument. We could create a new RAII helper to pair up a bind and unbind. I see why this would be helpful as a member to a class. For the simple main() cases presented in the examples, however, I really do not feel that defer() is any less readable. I could be persuaded though.


Edits: fixed some incorrect method names as I was writing this on my phone without the ability to refer back to source.

ben-clayton added a commit to ben-clayton/marl that referenced this issue Feb 10, 2020
ben-clayton added a commit that referenced this issue Feb 11, 2020
@ben-clayton
Copy link
Contributor

As far as I understand, the original issue had 3 main points:

  1. defer() is a non-c++ idiomatic macro, and the preferred approach is to use RAII helper classes to keep with typical C++ patterns.
  2. marl::Scheduler was copyable / movable when it shouldn't have been.
  3. marl::Scheduler::unbind() shouldn't exist, and should be part of the destructor.

Attempting to address each of these:

  1. I can't say I love declaring macros in headers, and yes, RAII is the common C++ pattern for this (although defer() is just a RAII helper in disguise). That said, I strongly believe it makes for simple, readable code, and works well with the lambdas you typically write using marl. As previously discussed feature is entirely optional, requiring an explicit #include and is already in use in projects that use marl. I do not wish to remove this.
  2. A great spot, and clearly this was wrong. Now fixed.
  3. I've tried describing above why things have been structured the way they are. This did highlight to me that I've been putting off writing non-code-comment documentation for too long, so I've put my first draft of the marl::Scheduler documentation up for review here.

Given that there's a lot going on in this issue, I'm going to close this. If you'd like to discuss any of these topics further (or others), please feel free to open new bugs and optionally link back to this.

Thanks for all the feedback.
Ben

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants