Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Easy way to bulk index a JSON Document? #401

Closed
samechikson opened this issue May 10, 2016 · 11 comments
Closed

Easy way to bulk index a JSON Document? #401

samechikson opened this issue May 10, 2016 · 11 comments

Comments

@samechikson
Copy link

Hi, is there an easy way to bulk index a normal JSON document rather than formatting data to look like

[
    // action description
    { index:  { _index: 'myindex', _type: 'mytype', _id: 1 } },
     // the document to index
    { title: 'foo' },
    // action description
    { update: { _index: 'myindex', _type: 'mytype', _id: 2 } },
    // the document to update
    { doc: { title: 'foo' } },
    ...
]

as is shown in the elasticsearch-jsAPI Docs?

So basically I'm trying to omit the action description object, so my body would look like

[ 
   { data_source_1 },
   { data_source_2 },
   ...
]

I don't mind the _id being auto generated (unless that has some negative consequences that I'm not aware of). I thought the bulk method also takes in the index and type in the parameters so it seems redundant to specify those for each document if you're just indexing one type in one index. I guess you do still need the action verb, but it would be nice if you could specify that in the parameters as well.

I feel like a lot of JSON data out there doesn't have the action description object, and so would have to be added after, which seems like a painful task. What are people doing when they have to index lots of documents at once?

@spalger
Copy link
Contributor

spalger commented May 10, 2016

The bulk API (which you linked above) is the only way to index multiple documents in a single request, but the index API supports indexing a single document at a time and the API doesn't require the action description objects.

@spalger spalger closed this as completed May 10, 2016
@spalger
Copy link
Contributor

spalger commented May 10, 2016

Missed the last paragraph of your issue, but there is not a bulk api that is only for indexing. There are some extensions for the client though, maybe someone has written one that will index all of the documents that flow through a stream? Or maybe you should give it a shot 😸

@spalger
Copy link
Contributor

spalger commented May 10, 2016

Finally, you can omit the _index, _type, and _id and simply add a { index: {} } before each document if you prefer. The index and type can be set outside of the body and the id will be autogenerated.

@samechikson
Copy link
Author

Thanks for your quick answer @spalger! Perhaps this can do the job I want. If not, then I might try something that's more specific to my needs :)

@samechikson
Copy link
Author

Just for reference, created a small utility for myself to bulk index documents without the action description item here. It just inserts { index: {} } before each entry.

@spalger
Copy link
Contributor

spalger commented May 12, 2016

Awesome @samechikson, plan to publish it to npm? I'd be happy to link to it from the docs if you're interested.

@samechikson
Copy link
Author

Hey @spalger, yeah absolutely. Thanks! Here's the link

@adilld
Copy link

adilld commented Mar 9, 2017

Hi @spalger and @samechikson
Can you please tell me how to use this plugin? I've already installed it (npm es-json-load) but i didn't understand well how to use it to generate the JSON file which correspond to bulk API format!
Thank you guys!

@samechikson
Copy link
Author

Hi @adilld , es-json-load loads a normal JSON array directly into Elasticsearch. It does not generate a file which you can then use with the bulk API, which is what you seem to want to do.

In terms of usage, if you install it globally from npm (npm install es-json-load -g), then you should be able to use it in the same way as a normal binary. Something like this

es-json-load --data --file=/absolute/path/to/file --index=<index-name> --type=<type-name>

Make sense?

@adilld
Copy link

adilld commented Mar 9, 2017

@samechikson Thank you for your quick response.
However, I want to index the data into a cluster online using AWS.
What i'm planning to do is to transform a CSV file into a JSON format in order to insert these data via Bulk API. What do you advise me to do in order to achieve this please ?
I can't do it manually (because there is a lot of lines in the CSV file...) so I can transform the file from CSV to JSON (pretty) but it's not in the correct form of bulk api.. if you know what I mean!
Thanks in advance for your help!

@samechikson
Copy link
Author

@adilld Sounds like you're trying pipe data into ES, so consider using Logstash which is a dedicated tool written by Elastic for this. You can use CSV files directly.

If you are to use es-json-load, transforming a CSV file into a JSON file should work. Just make sure that the JSON file that you use is a JSON Array of objects. This would look something like [{},{},{},...]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants