Skip to content

hata/embulk-decoder-commons-compress

Repository files navigation

Commons Compress decoder plugin for Embulk

Build Status

This decoder plugin for Embulk supports various archive formats using Apache Commons Compress library.

Overview

  • Plugin type: decoder
  • Load all or nothing: yes
  • Resume supported: no

Configuration

  • format: An archive format like tar, zip, and so on. (string, optional, default: "")
    • The format type is one of supported formats by by Apache Commons Compress.
    • Auto detect is used when there is no configuration. This can use for a single format. If a file format is solid compression like tar.gz, please set format config explicitly.
    • Some listing formats in Apache Commons Compress may not work in your environment. I could confirm the following formats work well. Your environment may be able to use other formats listed in the site.
  • decompress_concatenated: gzip, bzip2, and xz formats support multiple concatenated streams. The default value of this parameter is true. If you want to disable it, then set to false. See CompressorStreamFactory.setDecompressConcatenated() in ver.1.9 for more details.
  • match_name: Only the files in an archive which match to match_name are processed. match_name is set by regular expression.

Formats

  • archive format: ar, cpio, jar, tar, zip
    • These formats are archive formats. All files in an archive are processed by embulk.
  • compress format: bzip2, deflate, gzip
    • These formats are compress formats. Uncompressed file is processed by embulk.
  • solid compression format: Need to set format config parameter explicitly.
    • tgz, tar.gz
    • tbz, tbz2, tb2, tar.bz2
    • taz, tz, tar.Z

Example

  • Use auto detection. This can use for 1 format like tar and zip. If you would like to use a solid compression format like tar.gz, please set the format to your configuration file.
in:
  type: any input plugin type
  decoders:
    - type: commons-compress
  • Set a file format like tar explicit.
in:
  type: any input plugin type
  decoders:
    - type: commons-compress
      format: tar
  • Set a solid compression format.
in:
  type: any input plugin type
  decoders:
    - type: commons-compress
      format: tgz
  • Set decompress_concatenated to false if you would like to read the first concatenated gzip/bzip2 archive only.
in:
  type: any input plugin type
  decoders:
    - type: commons-compress
      decompress_concatenated: false
  • Set match_name to extract only the files whose suffix is '.csv' from an archive.
in:
  type: any input plugin type
  decoders:
    - type: commons-compress
      match_name: ".*\\.csv"

Build

$ ./gradlew gem

To build with integrationTest(It works on OSX or Linux)

$ ./gradlew -DenableIntegrationTest=true clean all

Versions

This plugin version 0.6.0 or later can use with Embulk 0.10.

Reference

About

Commons Compress decoder plugin for Embulk

Resources

License

Stars

Watchers

Forks

Packages

No packages published