Skip to content

An example of parent child indexing with JDBC importer

Jörg Prante edited this page Aug 3, 2016 · 1 revision

Let's assume you want to create a single index in Elasticsearch with two types, one for parent documents, one for children.

In your DB, you have the following simple tables:

create table parent_table (
   id integer,
   message varchar(32)
)

create table children_table (
   id integer,
   parent_id integer,
   message varchar(32) 
)

insert into parent_table values (1, "I am the parent")
insert into children_table value (1, 1, "I am child one")
insert into children_table value (2, 1, "I am child two")

commit

The two tables should be represented in Elasticsearch by a parent/child relationship in the index myindex.

Here are the commands. We are extra careful and delete myindex so we are sure we have an empty index.

curl -XDELETE 'localhost:9200/myindex'

We create the children mapping. It requires a pointer in _parent to the parent type.

curl -XPUT 'localhost:9200/myindex' -d '
{
    "mappings": {
        "mychildren" : {
            "_parent" : {
                "type" : "myparents"
            }
        }
    }
}'

Now we can run the JDBC importer for the parents

echo '
{
    "type" : "jdbc",
    "jdbc" : {
        "url" : "jdbc:mysql://localhost:3306/test",
        "user" : "",
        "password" : "",
        "sql" : "select \"myindex\" as _index, \"myparents\" as _type, id as _id, message from parent_table",
        "elasticsearch" : {
             "cluster" : "elasticsearch",
             "host" : "localhost",
             "port" : 9300
        },
        "index" : "myindex",
        "type" : "myparents"
    }
}
' | java \
    -cp "${lib}/*" \
    -Dlog4j.configurationFile=${bin}/log4j2.xml \
    org.xbib.tools.Runner \
    org.xbib.tools.JDBCImporter

Note the extra "index" and "type" parameters. There is a flaw in the JDBC importer that requires the definitions.

Now we can run the JDBC importer for the children

echo '
{
    "type" : "jdbc",
    "jdbc" : {
        "url" : "jdbc:mysql://localhost:3306/test",
        "user" : "",
        "password" : "",
        "sql" : "select \"myindex\" as _index, \"mychildren\" as _type, id as _id, parent_id as _parent, message from children_table",
        "elasticsearch" : {
             "cluster" : "elasticsearch",
             "host" : "localhost",
             "port" : 9300
        },
        "index": "myindex",
        "type" : "mychildren"
    }
}
' | java \
    -cp "${lib}/*" \
    -Dlog4j.configurationFile=${bin}/log4j2.xml \
    org.xbib.tools.Runner \
    org.xbib.tools.JDBCImporter

and we refresh the index finally with

curl -XGET 'localhost:9200/myindex/_refresh'

which is not required because JDBC importer refreshes already, but we are extra careful.

Let's search for the parent of the two children indexed:

curl -XPOST 'localhost:9200/myindex/_search?pretty' -d '
{
  "query": {
    "has_child": {
      "type": "mychildren",
      "query": {
        "match": {
          "_all": "child"
        }
      }
    }
  }
}'

And the expected result is

{"_shards":{"total":10,"successful":5,"failed":0}}{
  "took" : 38,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "myindex",
      "_type" : "myparents",
      "_id" : "1",
      "_score" : 1.0,
      "_source" : {
        "message" : "I am the parent"
      }
    } ]
  }
}