You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
app1 - input in_tail one file, webhdfs output with file buffer and hadoop_snappy compression
app2 - input in_tail files by mask, webhdfs output with file buffer and hadoop_snappy compression
app1 is watching on 1 file, buffer stores chunks like 20mb per 2 minutes and flush chunk into hdfs by time
app2 is watching on 50 files, buffer stores like 50mb per 5 minutes and flush data into hdfs by time
App1 works fine, but in case of app2 Im getting "invalid compression" on like 5% of files (chunks) from different hosts while processing files.
Exception in thread "main" java.lang.InternalError: Could not decompress data. Input is invalid.
at org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressBytesDirect(Native Method)
at org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:239)
at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:88)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:93)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:121)
at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:106)
at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:101)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:118)
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:372)
How can i tune config? Why hadoop_snappy generates invalid blocks?
If i set compression "text" instead of "hadoop_snappy", all works fine, and no invalid records while parsing data in hdfs.
To Reproduce
To reproduce the issue, you could do what i did in "describe bug" section with fillowing dockerfile:
FROM fluent/fluentd:v1.16.5-debian-amd64-1.0
USER root
# Timezone
ENV TZ="Europe/Moscow"
RUN ln -snf "/usr/share/zoneinfo/$TZ" "/etc/localtime" \
&& echo "$TZ" > "/etc/timezone"
# for snappy gem native libs building
RUN apt update \
&& apt -y install build-essential autoconf automake libtool libsnappy-dev \
&& apt clean
# plugins
RUN fluent-gem install \
fluent-plugin-webhdfs \
fluent-plugin-prometheus \
snappy
USER fluent
Expected behavior
Expected behaviour - valid compression on result files.
Your Environment
- Official fluentd docker image fluent/fluentd:v1.16.5-debian-amd64-1.0
- Fluentd version: 1.16.5
- gem 'fluent-plugin-prometheus' version '2.1.0'
- gem 'fluent-plugin-webhdfs' version '1.6.0'
If i place both pipelines on one worker, im getting "invalid snappy compression" errors on both pipelines (only on the first pipeline, if second pipeline has "text" compression codec)
Describe the bug
Hello!
I have logs from two apps.
app1 - input in_tail one file, webhdfs output with file buffer and hadoop_snappy compression
app2 - input in_tail files by mask, webhdfs output with file buffer and hadoop_snappy compression
app1 is watching on 1 file, buffer stores chunks like 20mb per 2 minutes and flush chunk into hdfs by time
app2 is watching on 50 files, buffer stores like 50mb per 5 minutes and flush data into hdfs by time
App1 works fine, but in case of app2 Im getting "invalid compression" on like 5% of files (chunks) from different hosts while processing files.
How can i tune config? Why hadoop_snappy generates invalid blocks?
If i set compression "text" instead of "hadoop_snappy", all works fine, and no invalid records while parsing data in hdfs.
To Reproduce
To reproduce the issue, you could do what i did in "describe bug" section with fillowing dockerfile:
Expected behavior
Expected behaviour - valid compression on result files.
Your Environment
Your Configuration
Your Error Log
Additional context
The only difference between two pipelines is in_tail watching on 50 files by mask.
The text was updated successfully, but these errors were encountered: