Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [Logstash] Invalid error message when there is a corruption in the path.data and path.queue folders #8098

Closed
jkelastic opened this issue Aug 29, 2017 · 5 comments
Labels

Comments

@jkelastic
Copy link

jkelastic commented Aug 29, 2017

Products/Versions (include any installed plugins): Logstash 5.5

Description:

Cluster ran out of disk space while the path.data and path.queue was still writing. After stopping the cluster and free up some space Logstash will not start up and gives the error:

[2017-08-27T22:49:16,909][ERROR][logstash.pipeline ] Logstash failed to create queue {"exception"=>"Page file size is too small to hold elements", "backtrace"=>["org/logstash/ackedqueue/ext/JrubyAckedQueueExtLibrary.java:133:in `open'", "E:/Logstash/logstash-core/lib/logstash/util/wrapped_acked_queue.rb:41:in `with_queue'", "E:/Logstash/logstash-core/lib/logstash/util/wrapped_acked_queue.rb:30:in `create_file_based'", "E:/Logstash/logstash-core/lib/logstash/queue_factory.rb:29:in `create'", "E:/Logstash/logstash-core/lib/logstash/pipeline.rb:159:in `initialize'", "E:/Logstash/logstash-core/lib/logstash/agent.rb:286:in `create_pipeline'", "E:/Logstash/logstash-core/lib/logstash/agent.rb:95:in `register_pipeline'", "E:/Logstash/logstash-core/lib/logstash/runner.rb:314:in `execute'", "E:/Logstash/vendor/bundle/jruby/1.9/gems/clamp-0.6.5/lib/clamp/command.rb:67:in `run'", "E:/Logstash/logstash-core/lib/logstash/runner.rb:209:in `run'", "E:/Logstash/vendor/bundle/jruby/1.9/gems/clamp-0.6.5/lib/clamp/command.rb:132:in `run'", "E:\\Logstash\\lib\\bootstrap\\environment.rb:71:in `(root)'"]}

The error is misleading as it is not "Page file size is too small to hold elements", a windows setting. It is either the path.data or path.queue that is corrupted. Wasn't able to verify exactly which as both folders were deleted at the same time. However, by deleting the folders the issue went away. Customer ask to create this bug for better logging messaging. e.g. "Corrupted path.data or path.queue"

@jkelastic jkelastic added the bug label Aug 29, 2017
@jkelastic jkelastic changed the title [Logstash] Invalid error message when there is a corruption in the path.data and path.queue folders [Bug] [Logstash] Invalid error message when there is a corruption in the path.data and path.queue folders Aug 29, 2017
@colinsurprenant
Copy link
Contributor

Thanks @jkelastic for the report.

One problem in 5.5 is that the proper stack trace for the exception is lost - this is now fixed in #8245 for 5.6+.

For this specific problem here: the "Page file size is too small to hold elements" exception means that, in the queue opening operation when LS starts, an existing page file is opened but its size is actually too small to hold elements.

This is actually consistent with your report that there was an out of disk space situation: this is certainly the cause for the corrupted page file.

I am not sure there is much we can do to prevent corruption on out of disk space situations, other that asking users to set queue limits within the available disk space and setup their environment so they can better control the available disk space.

One thing we could do is to provide a corrupted queue recovery tool which could do its best to recover from an existing queue as much as it can.

I will go ahead and close this issue - Feel free to reopen if you have other concerns with this.

@akatakatoo
Copy link

Bug was primarily related to the confusing messaging as even after space was freed, Logstash would not start, showing the same "Page file size is too small to hold elements" which is\was misleading - something relating to "corrupted queue" or "unable to open queue due to unknown reason". If additional information indicating the exact reason for the error is in the stack trace in 5.6+ - then we're good.

A corrupted queue is not surprising after disk space has been depleted - and agree it is up to the admin to properly monitor.

@jkelastic
Copy link
Author

@colinsurprenant I'm going to reopen this as the problem is the error message wasn't clear. I was hoping to see a warning on the lines of "Page file size is too small to hold elements e.g. out of space / corrupted data path " . The user should not have to guess what caused the page file size being too small to hold elements.

@colinsurprenant
Copy link
Contributor

@akatakatoo Yes, I totally agree that the message is confusing. I created #8480 to followup - thanks for the headsup!

@colinsurprenant
Copy link
Contributor

So I will re-close this issue and we can followup in #8480 for the error message re-phrasing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants