Skip to content

Gzip compression results to uncompressable package #61

@admlko

Description

@admlko

We are reading lots of data from Elasticsearch every day and compressing it to a gzip package for long term archiving. I wrote a pipeline config using the file output and noticed that it has a handy feature to produce a compressed file without a need for logrotate.

Unfortunately, I am experiencing an issue with it. The size of the log file we are producing daily is around 400 - 700 MB uncompressed, and every time I try to uncompress the gzipped package produced by this plugin, gunzip fails with "unexpected end of file".

I can see that it has all the data in it using
gzip -cd logfile.log.gz | head
and
gzip -cd logfile.log.gz | tail
but it is clearly missing the footer (of the gzip package). I was poking around and found this (http://ruby-doc.org/stdlib-2.4.0/libdoc/zlib/rdoc/Zlib/GzipWriter.html):

NOTE: Due to the limitation of Ruby's finalizer, you must explicitly close GzipWriter objects by Zlib::GzipFile#close etc. Otherwise, GzipWriter will be not able to write the gzip footer and will generate a broken gzip file.

I checked the source of this output and I didn't see that the GzipWriter objects close method was called anywhere, so I guess this could be the reason it is failing?

I was trying to reproduce the problem with a very small dataset in my own lab environment, but no cigar. But it is failing every day in production environment with real dataset.

With a bit of searching, I was able to found another report of this issue: https://stackoverflow.com/questions/45533870/logstash-gzipped-file-output-results-in-unexpected-end-of-file

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions