Skip to content

Allow user-specified maximum error message size #1843

Merged
merged 1 commit into from Jul 13, 2014

4 participants

@mikegee
mikegee commented Jul 12, 2014

Errors can have messages that are very large. When they are raised, redis space is rapidly depleted.

An example of this problem is an INSERT or UPDATE in ActiveRecord with a very large blob of data. When ActiveRecord fails, it puts the entire SQL statement into the error message.

@mikegee
mikegee commented Jul 12, 2014

@mperham This is a sketch of the change I'm considering. I'd like your opinion on the problem and this solution before I write up the test and perhaps factor out the error message modification to a private method.

@seuros
Collaborator
seuros commented Jul 12, 2014

The problem is that we might truncate debug data.

@mikegee
mikegee commented Jul 12, 2014

I chose 4k completely arbitrarily. We were getting error messages in the range of megabytes. Let me know what value you think would prevent losing important debug data. 10k? 100k?

@seuros
Collaborator
seuros commented Jul 12, 2014

What about

@backtrace_log_size = options.fetch(:backtrace_log_size, -1)
e.message[0..@backtrace_log_size]
@mikegee
mikegee commented Jul 12, 2014

I like the idea of allowing the user to configure the amount of error message saved. That way the default behavior is unchanged.

I'm confused by your use of backtrace_log_size. The backtrace goes into msg['error_backtrace'].

Perhaps the option should be called max_error_message_size.

@seuros
Collaborator
seuros commented Jul 12, 2014

Yes, I'm bad at naming things, just like my parents. They put a password instead of a first name :).

Maybe error_message_limit

@seuros
Collaborator
seuros commented Jul 12, 2014

Btw you can implement it like msg['backtrace']

@mikegee
mikegee commented Jul 12, 2014

I updated the implementation.

Let me know if you think this warrants the conditional logic msg['backtrace'] uses. That's not obvious to me.

@seuros
Collaborator
seuros commented Jul 12, 2014

I think that better

msg['error_message_limit'] ||= -1
msg['error_message'] = e.message[0..msg['error_message_limit']]

so we don't touch the initializer and each worker can have it own limit.
wdyt ?

@mikegee
mikegee commented Jul 12, 2014

Each worker having its own limit is what I intended to implement. Sorry, I'm not as familiar with Sidekiq internals. Will a worker's sidekiq_options end up msg?

@seuros
Collaborator
seuros commented Jul 12, 2014

Yes!

@mikegee
mikegee commented Jul 12, 2014

Super!

Thanks for your help with this @seuros. I'll rework the implementation and get some tests around it today or tomorrow.

@mikegee mikegee changed the title from only keep the first 4k of the error message to Allow user-specified maximum error message size Jul 12, 2014
@mikegee
mikegee commented Jul 12, 2014

I updated the implementation again. I made a test by copying a similar test and changing it slightly.

I considered a higher-level test that sets error_message_limit in a worker's sidekiq_options. Let me know if you think that would be worth it.

@mikegee
mikegee commented Jul 13, 2014

I wrote a little something to document error_message_limit.

@mikegee mikegee allow user to specify max size of error messages
Errors can have messages that are very large. When they are raised,
redis space is rapidly depleted.

An example of this problem is an INSERT or UPDATE in ActiveRecord
with a very large blob of data. When ActiveRecord fails, it puts
the entire SQL statement into the error message.
f944999
@mikegee
mikegee commented Jul 13, 2014

Added Master\n------\n\n to top of Changes.md and squashed.

@seuros seuros merged commit be439c5 into mperham:master Jul 13, 2014

1 check was pending

Details continuous-integration/travis-ci The Travis CI build is in progress
@seuros
Collaborator
seuros commented Jul 13, 2014

Thanks

@mperham
Owner
@seuros
Collaborator
seuros commented Jul 13, 2014

It not only AR, any object will throw big error messages.

@jonhyman
Collaborator

We just rely on our logs for this. Sidekiq's dashboard is helpful put it's useful to dig into the logs for a stack trace, surrounding info messages, etc. I'm with @mperham here, I don't think it should have a specialized config option for what is essentially log info.

@mikegee
mikegee commented Jul 14, 2014

We just rely on our logs for this.

Huge error messages can fill redis. I don't understand how log info is related to the problem.

@jonhyman
Collaborator

Actually read the entirety of this thread now instead of just what I saw in my inbox.

Did you have thousands of jobs fail? If you're running a redis instance with little memory for sidekiq you're setting yourself up for badness imo. We run with 16gb extra because you never know if you'll have a huge catastrophic backlog of jobs and you really want to minimize the risk of running out of ram. If tens (or even hundreds) of mb of extra space is a concern I think it's an operational issue not a sidekiq one.

@jonhyman
Collaborator

@mikegee ignore earlier comment. Didn't see full thread in my email.

@mikegee
mikegee commented Jul 14, 2014

Did you have thousands of jobs fail?

No, just a handful with multi-megabyte error messages. It didn't completely consume our redis capacity, just pushed it up enough to set off pagers for the infrastructure team.

@jonhyman
Collaborator
@mikegee
mikegee commented Jul 14, 2014

These large error didn't actually cause issues. They changed our redis usage such that people were notified. If our workers when down, our redis usage would not be similarly affected until millions of jobs backed up.

My claim is that truncating huge error message is a useful mitigation strategy for this rare occurrence. Do we disagree on that?

@mperham
Owner
mperham commented Jul 14, 2014

@mikegee The point is to avoid lots of little flags for every possible edge case. Sidekiq's mantra is "simple, efficient". Simplicity is achieved not by adding features but by explicitly not adding them where possible. Sometimes that means we don't handle rare edge cases perfectly.

Error messages shouldn't be megabytes in size. The root error is elsewhere and I don't want to complicate Sidekiq mitigating it.

@jonhyman
Collaborator
@mikegee
mikegee commented Jul 14, 2014

Here's another way of looking at it: the error message is the one place where arbitrarily large, client-provided data ends up in redis. I have two examples: the SQL statement ActiveRecord errors, and the response body from HTTP client errors.

@mperham mperham added a commit that referenced this pull request Jul 14, 2014
@mperham Truncate error message to 10k, #1843 840c8f6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.