-
Notifications
You must be signed in to change notification settings - Fork 25
Open
Labels
Description
Description of the problem including expected versus actual behavior:
Expected Behavior:
When writing to HDFS using the WebHDFS output plugin with GZIP compression, it should handle interrupted writes gracefully, ensuring data integrity and file consistency.
Actual Behavior:
Currently, the compress_gzip method in the WebHDFS output plugin does not account for interrupted writes or retries. When an interruption occurs during a write operation, the GZIP block being written becomes incomplete. GZIP's nature doesn't allow for appending more data to this incomplete block later on, effectively corrupting the file.
Steps to Reproduce:
- Configure Logstash with the WebHDFS output plugin and enable GZIP compression.
- Start ingesting data to HDFS through Logstash.
- Simulate a WebHDFS failure or maintenance activity during a write operation (you can kill the process, for example).
- Observe the resulting HDFS file. It will contain an incomplete GZIP block, making it corrupted and unreadable.
Logstash Information:
- Logstash version:
7.17.1 - Logstash installation source:
tar - How is Logstash being run:
systemd
OS Version: RHEL 7