Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved error handling in the C API #90

Closed
aseyboldt opened this issue Mar 24, 2023 · 15 comments · Fixed by #91
Closed

Improved error handling in the C API #90

aseyboldt opened this issue Mar 24, 2023 · 15 comments · Fixed by #91
Labels
c-api enhancement New feature or request

Comments

@aseyboldt
Copy link
Contributor

aseyboldt commented Mar 24, 2023

There was some discussion about error handling in #88, but I thought a separate issue would be a better place to discuss this.

Currently, the error messages of the C++ exception are printed to stderr, which works fine, but has the disadvantage that many users might not know to look there, and depending on the setup it might not even be easy to find it (for example in a jupyter notebook that is started from jupyterhub or from a system service). Also, if several models are failing at the same time, a user would have to figure out which error belongs to which model manually.

The two (I think) most important kinds of failures would be

  1. Errors during model creation (ie bs_construct). Users will I think provide invalid data fairly often, and the error message should contain information about what part of the data was invalid.
  2. Errors during bs_log_density_gradient (or for different sampling / VI algorithms the other log_density functions): Error messages often contain information about why there was a divergence during sampling, and sampling algorithms might want to collect this information to help model debugging.

I can see a couple of different options of how this could be achieved, my favorites would probably be those:

  1. Store the error information in the bs_model_rng struct as an extra field, and provide an accessor function for it. This unfortunately makes the bs_log_density_gradient function family no longer thread safe. We can work around that issue however by creating new bs_model_rng structs for each thread. With the current API this would require loading the data several times however. This again can be avoided by splitting bs_model_rng into two separate types: One for the dataset and the potential error message from model creating, and a separate context for rng and error messages during sampling.
    This would then look like this:
bs_model *model = bs_model_construct(data)
if (model == 0) {
    // A failure here indicates that we couldn't even allocate the error msg,
    // which would usually not happen...
    return -1;
}

// Callers have to check that there wasn't an error explicitly again!
const char *msg = bs_model_error_msg(model);
if (msg != 0) {
    // Data was invalid...
    // Do something with message
    bs_model_free(model);
}

// In each thread do the following:
{
    bs_ctx *ctx = bs_ctx_construct(model, seed, seed_idx);
    if (ctx == 0)
        stop();
    }

    // Call log_density and other functions in the sampler or...
    for (int i = 0; i < 10; i++) {
        int rc = bs_log_density(ctx, ...);
        if (rc != 0) {
            char *msg = bs_ctx_error_msg(ctx);
            // do something with message
        }
    }

    bs_ctx_free(ctx);  // Frees the msg field if necessary
}

// And finally a single call to free the memory in the
// original thread
bs_model_free(model);  # Frees the msg field if necessary

The additional function declarations:

// Create a new model. A non-null return value still has to
// be checked for errors with `bs_model_error_msg`.
bs_model *bs_model_construct(const char*);

// Check for an error in model creation.
const char *bs_model_error_msg(const bs_model*);

// Create a new ctx object with rng and space for an error message.
// the model has to outlive the ctx.
bs_ctx *ctx bs_ctx_construct(const bs_model*, int, int);

// Most other functions now take a bs_ctx argument.
// The ctx arguments are not const, because we might change the error.
int bs_log_density_gradient(bs_ctx*, ...);

const char *bs_ctx_error_msg(const bs_ctx*);

In this scenario bs_model would be Sync (functions that take a const pointer to bs_model can be called concurrently from different threads), but not Send (ownership of bs_model can not be transferred to a different thread, ie it has to be deallocated from the same thread where it was created. Is that actually a requirement?). bs_ctx doesn't need to be Sync or Send, as every thread can just create a separate context (which should be super cheap).

  1. Add char **error_msg arguments to the fallible functions, and add a bs_free_error_msg(char *) function.
    This would then look something like this:
char *msg;
bs_model_rng *ctx = bs_construct(data, *msg);
if (ctx == 0) {
    do_something_with_msg(msg);
    bs_free_error_msg(msg);
}

int rc = bs_log_density(ctx, *msg, ...);
if (rc != 0) {
    do_something_with_msg(msg);
    bs_free_error_msg(msg);
}

I hope this is at least somewhat useful to you, happy to discuss more if there is interest in a change like this.

@WardBrian
Copy link
Collaborator

I am having a hard time imagining how the first approach would work in the higher-level languages like Julia.

Right now, it's very easy to use with the built in threading in e.g. Julia, like we do in our tests:

model = load_test_model("multi")
nt = Threads.nthreads()
R = 1000
ld = Vector{Bool}(undef, R)
g = Vector{Bool}(undef, R)
@sync for it = 1:nt
Threads.@spawn for r = it:nt:R
x = randn(BridgeStan.param_num(model))
(lp, grad) = BridgeStan.log_density_gradient(model, x)
ld[r] = isapprox(lp, gaussian(x))
g[r] = isapprox(grad, grad_gaussian(x))
end
end

But it seems like it would be much more difficult to express in the bs_ctx framework.
Let me phrase this another way: What does the object in e.g. Python hold on to, a bs_ctx? a bs_model? Both? Does the user now need to call two constructors before calling any of the existing functions?

The second approach is one we considered, but never implemented. As far as my memory and searching has turned up, there wasn't a strong reason we didn't, it just never got prioritized. Freeing after the fact is a minor annoyance, but something we can nicely wrap-up in our higher-level interfaces I suppose. So I think my vote is for that style, but it's hard to be super sure without seeing the consequences in code.

@bob-carpenter
Copy link
Collaborator

bob-carpenter commented Mar 24, 2023

Thanks, @aseyboldt. I think this is a really important issue to get right. And I can see that we've messed this up pretty badly in the initial implementation by assuming everything would just print.

There are two places where "prints" might happen.

  1. During normal operation when there's a print statement in the Stan code. This can happen during data read (construction), during log density eval (or gradient, Hessian, etc.), or during generated quantity/transformed parameter generation. In all of these cases, I think we should be passing in a stream to print to. It can default to std::cout.

  2. For exceptions. I think we should also pass a stream to write to. We can have these default print to std::cerr.

@aseybolt: Why were you suggesting char** arguments rather than streams? Is it hard to take in a C stream and use it in C++? Streams just make the code a lot simpler if you are going to write to a file or handle like stderr or stdout and they don't need to be freed like character pointers.

From a Python interface, I would strongly prefer to have the exceptions percolate through to Python exceptions and then let the clients handle them. Presumably that would also make sense in Julia and R. For interfaces like C where that's not possible, I think we want to take in an output stream.

@WardBrian
Copy link
Collaborator

I've started a branch https://github.com/roualdes/bridgestan/tree/feature/return-error-messages/ where I'm experimenting with "method 2" here.

It seems reasonable based on what I've done so far. I will keep looking into this next week.

Either option here would be a breaking change for the C level API, but I personally don't mind having a version bump for this

@aseyboldt
Copy link
Contributor Author

I know pretty much nothing about julia, but the simplest implementation for option 1 would probably look something like this:

 model = load_test_model("multi") 
 nt = Threads.nthreads() 
  
 R = 1000 
 ld = Vector{Bool}(undef, R) 
 g = Vector{Bool}(undef, R) 
  
 @sync for it = 1:nt 
     Threads.@spawn for r = it:nt:R 
         ctx = BridgeStan.context(model, seed, it)
         x = randn(BridgeStan.param_num(ctx)) 
         (lp, grad) = BridgeStan.log_density_gradient(ctx, x)

Ideally, ctx would be created once per thread, even though I don't think it would be all that expensive to do it every time. But I don't know how to do that in julia. (There really should be a way though I guess). The same problem should already exist for the functions that use rng, they also can't share the same model object. And without the model/ctx split creating a model is much more expensive, because the data needs to be copied in the current API, but not in the proposed option 1.

I guess splitting the bs_model_rng into two separate parts might be useful independent on how error handling works...

The second option is a bit simpler, but the first one might be easier to extend if that were required for some reason. For instance, let's assume at some later point there was a reason to expose more structured error data. That could be done by simply adding an accessor function for that additional error data. That would be backward compatible, old code just would never call that function.
Or if there were reasons to do something non-thread safe in any of the log_density functions, there woudn't have to be any additional changes. (For instance, maybe it would be helpful for performance to split a function like log_density_hessian_product into two functions: log_density_hessian_product_setup(ctx, param) and log_density_hessian_product_call(ctx, vector, output), so that the first one can precompute things if the second one should be called repeatedly for different vector inputs).

@aseyboldt
Copy link
Contributor Author

@bob-carpenter I didn't know there were additional prints in the stan code... That's certainly something to consider.

By stream you mean a file descriptor on the C side? I don't work with C++ much, but I think the stream construct there doesn't really exist in C, does it? If that's the case, I'm not sure it's really is easier. If things go bad, we might even end up with a deadlock, if the process that calls log_density waits for someone to read from the descriptor.
But I think (?) it is possible to create a C++ stream that essentially writes to a string in memory somewhere, give that stream to the C++ code and later expose the pointer to the string in the C API.

@WardBrian
Copy link
Collaborator

Why were you suggesting char** arguments rather than streams? Is it hard to take in a C stream and use it in C++? Streams just make the code a lot simpler if you are going to write to a file or handle like stderr or stdout and they don't need to be freed like character pointers.
From a Python interface, I would strongly prefer to have the exceptions percolate through to Python exceptions and then let the clients handle them. Presumably that would also make sense in Julia and R. For interfaces like C where that's not possible, I think we want to take in an output stream.

C-style streams (I assume you mean FILE* from stdio.h) seem like an equally ugly API boundary to have in something as a char**. The only resources I can find about how to use them from Python all involve creating temporary files or platform-specific named pipes, which is much worse than just creating a char**

@WardBrian
Copy link
Collaborator

WardBrian commented Mar 24, 2023

@aseyboldt yeah, that snippet is about what I imagined for how the Julia code would need to work for option 1.

I think that it's worth noting that this makes Bridgestan strictly less usable by third-party code than a version in which log_density is thread safe "out of the box".
There are a lot of interfaces (not just in Julia) that look like sample(f: vector -> (float, vector), ...). With bridgestan how it is currently implemented, you can pass a function which calls log_density and not care whether or not sample() is going to try to do anything with threads, etc. But if each thread requires creating an object, you'd lose the ability to use that interface completely (unless you create a ctx before every call in whatever you're passing as f above).

By contrast, constraining the parameters is 1) less likely to be done as part of a black-box API (it's easy to do as a post-processing step) and 2) not a performance bottleneck, so less likely to be done in parallel at all.

Moving thread safety (particularly of the log_density functions) from something which happens on the inside of the call to something which happens outside the call seems like too big a price to pay for me.


Aside: If we did care about thread safety of param_constrain, I think the right thing to do is remove the RNG from the bridgestan object completely, and have param_constrain take in a seed which is used IFF include_gq == True, rather than try to force object creation on each thread. @justindomke actually requested this last week, as this is how JAX etc handles its random number generation dependent APIs.

@aseybolt
Copy link

aseybolt commented Mar 24, 2023 via email

@aseyboldt
Copy link
Contributor Author

@WardBrian If the rng part of gone from the model, I guess the difference between option 1 and option 2 is pretty cosmetic. In that case ctx would (currently) only contain a single field anyway (the error message), and creating it would also be really cheap anyway, so it could just happen in the julia logp_density function itself, and not be exposed in any higher level interface anyway.
I guess it would simply be a couple of extra lines in the C API wrapper (the bs_context_new() and bs_context_free() calls) for some easier backward compatible extensibility.

@WardBrian
Copy link
Collaborator

The difference is indeed smaller in that case but I don’t think it’s purely cosmetic. The ctx still requires a malloc to create and adds a pointer indirection to each call

Plus, it simply doubles the number of objects in our API a user needs to care about. I think that’s a big jump.

I am happy to be outvoted on this particular point, but I am in general more in favor of keeping the API as simple as we can, even if that means we may need to do major version bumps in the future, rather than adding complexity we may not ever need.

I think a char** error parameter to the methods that can fail is the best for this. If you don’t care about specific messages in your application, we can allow nullptr as a valid option which just skips doing any messaging, and if not you’re responsible for freeing it.

How to handle things like print statements inside the Stan program is less clear to me regardless of the philosophy we want to take here, but those also seem (to me) to be much less significant to percolate up the API than errors are

@bob-carpenter
Copy link
Collaborator

I would prefer not to have a char** return if it can be at all avoided for reasons of not wanting the client to have to manage memory.

I think the stream construct there doesn't really exist in C, does it?

It exists, but in a stripped down C interface with strange nomenclature for historical reasons. Here's an overview:

https://stackoverflow.com/questions/38652953/what-does-stream-mean-in-c

With streams, the API punts on freeing memory. And there is no worry on the client side about freeing memory.

Ideally, we'd just be able to pipe the C++ output stream output directly to the C stream. I think we'd need something like this (from the very reliable co-author of the standard texts on all this, including templates):

http://www.josuttis.com/libbook/io/outbuf2.hpp.html

If I understand that code, it creates a C++ output stream that wraps a file descriptor (C is confusing in that all streams are file descriptors):

https://www.gnu.org/software/libc/manual/html_node/Streams.html

So we can take it as a file descriptor argument in the C interface, then convert it to a C++ stream on the back end.

@WardBrian
Copy link
Collaborator

With streams, the API punts on freeing memory. And there is no worry on the client side about freeing memory.

That would be nice, but I think creating a FILE* in our clients is worse than needing to call free, especially if you need it to work on all platforms. Both the Julia and Python advice I can find on the subject recommends using mkfifo or using a temporary file and then reading it in, which would require cleanup anyway.

Even if we could pass something like Python's sys.stderr (we can't; this isn't really the same kind of thing as C's stderr), that doesn't really help throw an error message that contains the right content. We would still have to throw an error saying something like "check stderr for details", the stderr in question would just be Python's now.

@bob-carpenter
Copy link
Collaborator

That's unfortunate. With char** style, the C client code needs to allocate the char** pointer (can just be on the stack), then call the string writer, then put the string output where it needs to go, which is probably going to be either nowhere or a file/stdout/stderr stream, and then free the string pointed to by the char** pointer.

Is there some other way to encapsulate those strings so they don't leak into the client interface?

I'm not an expert in either Python or C, but this looks like one possible tack:

@bob-carpenter
Copy link
Collaborator

Went and had a discussion with @WardBrian about how the interfaces don't even use streams to back up their I/O interface, so we can't interface at that level. He also let me know that the handling of memory will be encapsulated so that the Python interface encapsulates access to the character strings so that clients won't need to worry about it.

@WardBrian
Copy link
Collaborator

I made a couple other issues to keep different threads of discussion separate:

The char** approach is fitting (in my opinion) for the case of exception handling but doesn't address the above. We definitely would not want to use that approach for print stream redirection, due to the inability to keep things cheap in the "happy path"

@WardBrian WardBrian added enhancement New feature or request c-api labels Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c-api enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants