CSV with header parser plugin for Embulk
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci
config/checkstyle
gradle/wrapper
lib/embulk
samples
src
.gitignore
LICENSE.txt
README.md
build.gradle
gradlew
gradlew.bat

README.md

Gem Version CircleCI

Guessable csv parser plugin for Embulk

embulk-parser-csv_guessable (runtime)guesses and parses csv which has schema in header.

Csv file sometimes has a schema in the header. embulk-parser-csv_guessable parses such a csv by using their header as column name. This plugin is useful in case of target csv schema changes frequently.

It behaves as original csv parser when embulk-parser-csv_guessable configs(schema_file and schema_line) is not defined.

Overview

  • Plugin type: parser
  • Guess supported: no

Prerequisites

  • java: 1.8+
  • embulk: 0.9+

Configuration

  • schema_file: filename which has schema.(string, default: null)
  • schema_line: schema line in header. (integer default: 1)
  • columns: Columns attributes for parse. embulk-parser-csv_guessable use this config only when schema_file is set. If "schema_file" isn't set, this is same as the original csv parser's columns. (hash, default: null)
    • value_name: Name of the column in the header. rename to name
    • name: Name of the column
    • type: Type of the column
    • format: Format of the timestamp if type is timestamp
    • date: Set date part if the format doesn't include date part
  • any other csv configs: see www.embulk.org

Example

test.csv (There is a schema at the first line.)

id, title, description
1, awesome-title, awesome-description
2, shoddy-title, shoddy-description

config.yml

in:
  type: any file input plugin type
  parser:
    type: csv_guessable
    schema_file: test.csv
    schema_line: 1

(For explain) In case original csv parser config.yml

in:
  type: any file input plugin type
  parser:
    type: csv
    skip_header_lines: 1
    columns:
    - {name: id, type: string}
    - {name: title, type: string}
    - {name: description, type: string}

Example2

rename column name and set type Example

in:
  type: any file input plugin type
  parser:
    type: csv_guessable
    schema_file test.csv
    schema_line: 1
    columns:
    - {value_name: 'id', name: 'number', type: long}
    - {value_name: 'title', name: 'description', type: string}
    - {value_name: 'status', name: 'ok?', type: string}
$ embulk gem install embulk-parser-csv_guessable

Sample

$ cd samples/sample2
$ embulk run -L ../../ config_rename.yml -l debug

Build

$ ./gradlew gem  # -t to watch change of files and rebuild continuously

Test

$ ./gradlew test