Skip to content
This repository has been archived by the owner on Apr 19, 2023. It is now read-only.

koooge/embulk-parser-csv_guessable

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
src
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Gem Version CircleCI

Guessable csv parser plugin for Embulk

embulk-parser-csv_guessable (runtime)guesses and parses csv which has schema in header.

Csv file sometimes has a schema in the header. embulk-parser-csv_guessable parses such a csv by using their header as column name. This plugin is useful in case of target csv schema changes frequently.

It behaves as original csv parser when embulk-parser-csv_guessable configs(schema_file and schema_line) is not defined.

Overview

  • Plugin type: parser
  • Guess supported: no

Prerequisites

  • java: 1.8+
  • embulk: 0.9+

Configuration

  • schema_file: filename which has schema.(string, default: null)
  • schema_line: schema line in header. (integer default: 1)
  • columns: Columns attributes for parse. embulk-parser-csv_guessable use this config only when schema_file is set. If "schema_file" isn't set, this is same as the original csv parser's columns. (hash, default: null)
    • value_name: Name of the column in the header. rename to name
    • name: Name of the column
    • type: Type of the column
    • format: Format of the timestamp if type is timestamp
    • date: Set date part if the format doesn't include date part
  • any other csv configs: see www.embulk.org

Example

test.csv (There is a schema at the first line.)

id, title, description
1, awesome-title, awesome-description
2, shoddy-title, shoddy-description

config.yml

in:
  type: any file input plugin type
  parser:
    type: csv_guessable
    schema_file: test.csv
    schema_line: 1

(For explain) In case original csv parser config.yml

in:
  type: any file input plugin type
  parser:
    type: csv
    skip_header_lines: 1
    columns:
    - {name: id, type: string}
    - {name: title, type: string}
    - {name: description, type: string}

Example2

rename column name and set type Example

in:
  type: any file input plugin type
  parser:
    type: csv_guessable
    schema_file test.csv
    schema_line: 1
    columns:
    - {value_name: 'id', name: 'number', type: long}
    - {value_name: 'title', name: 'description', type: string}
    - {value_name: 'status', name: 'ok?', type: string}
$ embulk gem install embulk-parser-csv_guessable

Sample

$ cd samples/sample2
$ embulk run -L ../../ config_rename.yml -l debug

Build

$ ./gradlew gem  # -t to watch change of files and rebuild continuously

Test

$ ./gradlew test