Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibility of non-unwinding error handling #311

Closed
kornelski opened this issue Dec 6, 2018 · 9 comments
Closed

Possibility of non-unwinding error handling #311

kornelski opened this issue Dec 6, 2018 · 9 comments

Comments

@kornelski
Copy link
Contributor

kornelski commented Dec 6, 2018

Currently libjpeg's the error handler isn't allowed to return, which means that error handling either has to abort the program (not nice) or requires setjmp/longjmp or something very similar that does unwinding.

This is tricky for non-C languages to support, since they either have to wrap every libjpeg function in a stub function that sets up setjmp, or they have to support unwinding of C's stack frames in their native unwinding mechanism. Specifically, this is about to get tricky in Rust, because Rust never officially supported unwinding of C's stack and may start enforcing that.

I'm wondering if you'd accept changes to the error handling mechanism. I'm happy to do the work.

The rough idea is to change:

if (bad) {
    ERREXIT(cinfo, …);
}

to:

if (bad) {
    ERREXIT(cinfo, …);       
    return null/-1/etc;
}

That would of course involve checking of return values in the callers.

Where a sentinel value is not possible or not desirable (e.g. due to ABI), it might be something similar to:

if (bad) {
    ERREXIT(cinfo, …);       
    cinfo->…->errno = JERR_…;
    return;
}

With such change the error handler would be allowed to return, and the library user would be able to handle errors without support for stack unwinding.

Would you be willing to accept such changes? Would it be OK to start adding them gradually?

@dcommander
Copy link
Member

The problem is that ERREXIT() is part of the documented libjpeg API. ERREXIT() will call any custom error manager that the application has set up, and that custom error manager is (per the libjpeg API specification) well within its rights to exit the program. Thus, we cannot rely on the ability to set anything in cinfo after ERREXIT() is called, and we cannot assume that ERREXIT() will ever return. In order to accomplish what you propose, without breaking backward compatibility, I think we'd need two functions:

  1. A function, similar to jpeg_std_error(), that returns an error manager that sets an error string field in the opaque jpeg_comp_master or jpeg_decomp_master struct rather than exiting. The application would need to call this new function in order to enable the new error handling behavior.
  2. A function, similar to tjGetErrorStr2()/tjGetErrorCode(), that retrieves the error string field and the severity of the error (i.e. whether or not it's a warning.)

OK, so that technically solves the problem, but from the standpoint of usability, it sucks. It would require that the calling program make an additional call to the error-check function after every call to a libjpeg API function with a void return value. That is going to produce some hard-to-read code. Also, it could produce some really hard-to-diagnose errors if the application fails to call the error-check function and proceeds with calling another libjpeg API function. Best case, the libjpeg API functions would check whether the library is in an error state and return immediately if so, thus preventing memory corruption or other issues, but from experience, I know that this is going to create a lot of confusion (and thus a lot of support requests for me.)

If a patch is as disruptive as this proposed patch is likely to be, then it will require significant time to integrate, test, and debug it, even if it is carefully-implemented. And once the new feature is adopted into the libjpeg-turbo code base, I am responsible for maintaining it and debugging it for the foreseeable future. Thus, since this feature would be of no benefit to me personally (all of my products use the TurboJPEG API), I would need to secure funding for any labor required on my part to integrate it (maybe a MOSS grant?)

In a broader sense-- I have said, on multiple occasions, that the community needs to come together and figure out a new standard JPEG API to replace the libjpeg API. The new standard API would ideally eliminate the 25-year-old cruft in the libjpeg API, use opaque structures with a configuration mechanism similar to the one I designed for mozjpeg, etc. It might use the TurboJPEG API for inspiration, but not necessarily, since we'd want to retain the ability to use buffered I/O. The new API could wrap the old API, in much the same way that the TurboJPEG API does. When Tom Lane gave me his blessing on this project, we discussed an API overhaul as a possible future direction (he oversaw a similar API overhaul in LibPNG), but it's definitely not something I could tackle for free or without significant input from the community.

Long story short: I would be open to this feature if:

  1. We can come up with a nicer way to accomplish it, one that doesn't involve an ugly error-check function. But I'm not sure how to do that without overhauling the API.
  2. Any and all labor required, on my part, to integrate, test, and debug the feature is funded.

As always, any information regarding how other projects achieve similar goals would be helpful.

@kornelski
Copy link
Contributor Author

Thank you for thorough reply.

Thus, we cannot rely on the ability to set anything in cinfo after ERREXIT() is called, and we cannot assume that ERREXIT() will ever return.

Yes, that's fine. What I've had in mind is merely ability to set an alternative non-exiting error manager, which switches the rest of the calls to return-value-based error handling just for programs that want to handle errors that way. The solution you describe later is exactly right.

AFAIK it's safe to change void functions to return a register-sized integer on platforms where the ABI uses a scratch register to return the value, without breaking compatibility. I'm expecting that all platforms you support have such behavior.

In terms of extra code needed by for checking error codes after each function call — it is a bit noisy and does get ugly sometimes, but… that's typical for C APIs. Possibility of user ignoring errors and rolling forward with garbage is also not unusual for C :)

I'm very interested in alternative API for libjpeg. I've proposed this change since it's smaller, and it'd be less burden to leave you with a few more return lines than a completely new library interface. But if you're open to adding a brand new interface, that's great!

@kornelski
Copy link
Contributor Author

kornelski commented Dec 8, 2018

I may be able to get funding for the error handling change. Given Rust's release schedule I have 6 to 12 weeks to find a way to decode and encode JPEGs without stack unwinding. That's probably too short to design a completely new library API (but I've posted some ideas).

@dcommander
Copy link
Member

Given that my primary source of income is independent open source development, I'm open to basically anything that makes sense, as long as the work is paid for. Without funding, however, I'm open to very little, since libjpeg-turbo already does everything I personally need it to do. It was originally just supposed to be an accelerated drop-in replacement for libjpeg v6b, and it has far exceeded that goal.

I'm very wary of changing the return values in the API, because it sounds as if that would be relegating backward ABI compatibility to a platform-specific implementation detail. Also, some of the API functions have boolean return values that are already used for purposes other than signaling an error condition. Barring any changes to the existing API functions, the least painful solution to this would seemingly be an error-check function. I'm less wary of that but still wary. It would be really nice if whatever we do here could be a stepping stone toward a new API. Given how much exposure libjpeg-turbo gets (and it will probably get more if it is adopted as an ISO reference implementation), the reality is that whatever we do here is likely to become canonical. As such, I don't want to haphazardly extend the API in a way that will make it more difficult to redesign later.

The flip side of that argument is:

  • There are other common APIs, such as OpenGL, that require the caller to call a function in order to explicitly check for errors.

  • We would ideally only be extending the API by 2-3 functions.

  • The error-check function could be re-purposed in the proposed new API, such that it could be used to check for the cause of an error that was signaled via a function's return value. I would design it with that dual purpose in mind.

  • The TurboJPEG wrappers could take advantage of the API extensions, thus allowing us to eat our own dog food.

  • It's straightforward enough to use a macro to automatically check for errors, e.g.:

    #define JPEG_TRY(function_call)  \
      function_call;  \
      if (jpeg_get_error(&cinfo) != JPEG_SUCCESS) {  \
        fprintf(stderr, "%s: %s\n",  \
                jpeg_get_error(&cinfo) == JPEG_FATAL ? "ERROR" : "WARNING",  \
                jpeg_error_string(&cinfo));  \
        retval = -1;  \
        goto bailout;  \
    }
    
    GLOBAL(int) read_JPEG_file(char *filename)
    {
      struct jpeg_decompress_struct cinfo;
      FILE *infile = NULL;
      JSAMPARRAY buffer;
      int row_stride, retval = 0;
    
      cinfo.is_decompressor = FALSE;
    
      if ((infile = fopen(filename, "rb")) == NULL) {
        fprintf(stderr, "can't open %s\n", filename);
        retval = -1;
        goto bailout;
      }
    
      jpeg_trap_errors(&cinfo);  /* TODO: come up with a better name for this function */
      JPEG_TRY(jpeg_create_decompress(&cinfo))
      JPEG_TRY(jpeg_stdio_src(&cinfo, infile))
      JPEG_TRY(jpeg_read_header(&cinfo, TRUE))
      JPEG_TRY(jpeg_start_decompress(&cinfo))
      row_stride = cinfo.output_width * cinfo.output_components;
      buffer = (*cinfo.mem->alloc_sarray)
               ((j_common_ptr)&cinfo, JPOOL_IMAGE, row_stride, 1);
    
      while (cinfo.output_scanline < cinfo.output_height) {
        JPEG_TRY(jpeg_read_scanlines(&cinfo, buffer, 1))
        put_scanline_someplace(buffer[0], row_stride);
      }
    
      JPEG_TRY(jpeg_finish_decompress(&cinfo))
    
      bailout:
      if (cinfo.is_decompressor) jpeg_destroy_decompress(&cinfo);
      if (infile) fclose(infile);
      return retval;
    }
    

If you can secure funding quickly, I could probably look at this issue this month, but my availability in January will be spotty. Also, to set expectations, this feature would be checked into the dev/2.1 evolving branch, and it is unknown when 2.1 will be released (it is gated by some external development efforts.)

As a sanity check, does it even make sense to support the literal libjpeg API in Rust if some fundamental features of that API will soon fail in a Rust environment? At some point, we have to ask ourselves whether a wrapper is the more rational approach. As you can see from our own Java wrapper, we basically jettisoned any notion of it working like either of the C APIs and instead designed it to fit into common Java programming paradigms. I understand why you would want some access to the libjpeg API, but I also think it might make more sense to design a more Rust-friendly wrapper around it.

@dcommander
Copy link
Member

@kornelski I am prepared to work on this now if funding can be secured for my labor.

@kornelski
Copy link
Contributor Author

Thank you for your detailed response.

So it seems the problem is harder than I initially though.

I'm not very keen on requiring error-checking function calls, especially with macros.

Returning error codes would feel like a simpler, more idiomatic C API. I still think that investigating feasibility of void -> int change would be worthwhile and it should be a non-ABI-breaking change (especially that back when such ABIs were established, the early C was very lax about function prototypes).

But if the change to returning error codes is not an option, then I'm starting to doubt whether such big changes to error handling are worth the effort.

Also as we've discussed in another thread, given that libjpeg's c_info struct isn't going away, I'm starting to think it may be better to use the Turbo API or add a similar one on top instead, and that wouldn't need changes to error handling within the libjpeg-turbo proper.

@dcommander
Copy link
Member

I'm not interested in pursuing such a wholesale change as altering the return codes. My experience strongly suggests that such a move would have unexpected technical pitfalls. What I'm reading online suggests that ABI compatibility between void and int return values is an implementation detail of the ABI, and while it may be true for popular ABIs, it isn't guaranteed to be true. Also, such a change could create hard-to-diagnose forward compatibility issues. If in fact void and int return values are ABI-compatible, then that opens up the possibility that a new program could check the return values, and this program would behave unexpectedly when run against an older version of the libjpeg-turbo shared library (because the contents of the register used for the return value would be undefined in the void versions of the functions.)

So if an error-check function isn't acceptable, then I concur that your best course of action is to use the TurboJPEG API or some as-yet-to-be-defined API that works similarly. Java and C# access libjpeg-turbo through the TurboJPEG API, so it would make sense for Ruby to do likewise. And it would ultimately take less time/cost less money to extend that API to cover some of the missing features from the libjpeg API than to make the libjpeg API do something it was never designed to do.

@dcommander
Copy link
Member

Re-opening in order to solicit community feedback as to whether a libjpeg API state-setting error manager and error-check function (as described in my comments above) would be worthwhile.

@dcommander
Copy link
Member

I looked into this further, and unfortunately, even my idea of adding a state-setting error manager and error-check function wouldn't be straightforward. The problem is that the libjpeg API assumes that the error_exit() method in the error manager will never return, so internal invocations of ERREXIT*() are never followed by return statements. For global API functions, it is straightforward enough to add a return statement after any invocation of ERREXIT*(). However, for local/static functions or user-defined methods, that ceases to be straightforward or even possible in all cases. There are numerous places within the libjpeg API code in which those functions are nested several levels deep, and there is no easy way to communicate an error condition from the lowest level up to the global level. In some cases, the nesting is so fine-grained that checking for an error code might affect performance. In other cases, the function signature for user-defined methods is part of the API, so I can't change it in order to add a return code.

At the end of the day, this just seems like a fool's errand. I think it is best to work around the issue by wrapping the libjpeg API, as TurboJPEG does. Longer term, fixing this is just going to require a wholesale API redesign, per #313, which will require a lot of money.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants