Skip to content

hiroyuki-sato/embulk-parser-jsonpath

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
src
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Jsonpath parser plugin for Embulk

The JSON with JSONPath parser plugin for the Embulk.

Overview

  • Plugin type: parser
  • Guess supported: yes (A JSON data size supports up to 32KB. #476)

Configuration

  • type: Specify this parser as jsonpath
  • columns: Specify column name and type. See below (array, required)
  • root: Specify data path with JSONPath. It must be Array object (string, default:'$') (detail)
  • stop_on_invalid_record: Stop bulk load transaction if a file includes invalid record (such as invalid timestamp) (boolean, default: false)
  • default_timezone: Default timezone of the timestamp (string, default: UTC)
  • default_timestamp_format: Default timestamp format of the timestamp (string, default: %Y-%m-%d %H:%M:%S.%N %z)
  • default_typecast: Specify whether to cast values automatically to the specified types or not (boolean, default: true)

columns

  • name: Name of the column (string, required)
  • type: Type of the column (string, required)
  • timezone: Timezone of the timestamp if type is timestamp (string, default: default_timestamp)
  • format: Format of the timestamp if type is timestamp (string, default: default_format)
  • typecast: Whether cast values or not (boolean, default: default_typecast)
  • path: JSON ppath for specific column. (string, default: null)

Example

Basic Usage

{
  "count": 100,
  "page": 1,
  "results": [
    {
      "name": "Hugh Rutherford",
      "city": "Mitchellfurt",
      "street_name": "Ondricka Island",
      "zip_code": "75232",
      "registered_at": "2015-09-09 05:28:45",
      "vegetarian": true,
      "age": 44,
      "ratio": 79.092
    },
    {
      "name": "Miss Carmella Bashirian",
      "city": "Madilynchester",
      "street_name": "Rhea Walks",
      "zip_code": "44398",
      "registered_at": "2014-07-01 04:25:27",
      "vegetarian": true,
      "age": 73,
      "ratio": 50.608
    }]
}
in:
  type: any file input plugin type
  parser:
    type: jsonpath
    root: "$.results"
    default_timezone: "Asia/Tokyo"
    columns:
      - { name: "name",          type: string }
      - { name: "city",          type: string }
      - { name: "street_name",   type: string }
      - { name: "zip_code",      type: string }
      - { name: "registered_at", type: timestamp, format: "%Y-%m-%d %H:%M:%S" }
      - { name: "vegetarian",    type: boolean }
      - { name: "age",           type: long }
      - { name: "ratio",         type: double }

Preview results

*************************** 1 ***************************
         name (   string) : Hugh Rutherford
         city (   string) : Mitchellfurt
  street_name (   string) : Ondricka Island
     zip_code (   string) : 75232
registered_at (timestamp) : 2015-09-08 20:28:45 UTC
   vegetarian (  boolean) : true
          age (     long) : 44
        ratio (   double) : 79.092
*************************** 2 ***************************
         name (   string) : Miss Carmella Bashirian
         city (   string) : Madilynchester
  street_name (   string) : Rhea Walks
     zip_code (   string) : 44398
registered_at (timestamp) : 2014-06-30 19:25:27 UTC
   vegetarian (  boolean) : true
          age (     long) : 73
        ratio (   double) : 50.608

Handle more complicated json

If you want to handle more complicated json, you can specify jsonpath to also path in columns section like as follows:

{
    "result" : "success",
    "students" : [
      { "names" : ["John", "Lennon"], "age" : 10 },
      { "names" : ["Paul", "Maccartney"], "age" : 10 }
    ]
}
root: $.students
columns:
  - {name: firstName, type: string, path: "names[0]"}
  - {name: lastName, type: string, path: "names[1]"}

In this case, names[0] will be firstName of schema and names[1] will be lastName.

Guess

This plugin supports minimal guess feature. You don't have to write parser: section in the configuration file. After writing in: section, you can let embulk guess parser: section using this command:

$ embulk gem install embulk-parser-jsonpath
$ embulk guess -g jsonpath config.yml -o guessed.yml

Example

If you want to guess the following JSON file, (This JSON data start with array) You don't have to need parser section.

[
  {
    "name": "Hugh Rutherford",
    "city": "Mitchellfurt",
    "street_name": "Ondricka Island",
    "zip_code": "75232",
    "registered_at": "2015-09-09 05:28:45",
    "vegetarian": true,
    "age": 44,
    "ratio": 79.092
  }
]
in:
  type: file
  path_prefix: example/hoge
out:
  type: stdout

However, If a JSON data doesn't start with array, You have to specify root parameter explicitly.

{
  "count": 100,
  "page": 1,
  "results": [
    {
      "name": "Hugh Rutherford",
      "city": "Mitchellfurt",
      "street_name": "Ondricka Island",
      "zip_code": "75232",
      "registered_at": "2015-09-09 05:28:45",
      "vegetarian": true,
      "age": 44,
      "ratio": 79.092
    }
  ]
}
in:
  type: file
  path_prefix: example/input
  parser:
    type: jsonpath
    root: "$.results"
out:
  type: stdout

Build

$ ./gradlew gem  # -t to watch change of files and rebuild continuously

Acknowledgment

I would like to express my special thanks to the developers of embulk-parser-jsonl and embulk-filter-typecast projects.

Almost codes copied from this project.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published