Index JSON files with Elasticsearch #3

waldoj · 2014-05-16T03:26:37Z

Once we have Crump atomizing all JSON data as per-record files, create a shell script to index these files with Elasticsearch.

waldoj · 2014-05-16T19:32:48Z

Alternately, it may well be better to index these from the main JSON files. Now that we're representing records as one record per line, we can just iterate through each line and post them to Elasticsearch. It's tempting to do this with a shell script, but the bulk API may be better suited for this. That bulk data is best generated via Python or PHP, rather than a shell script. (A shell script is capable of it, but let's be realistic.)

We'll want to put together a wrapper that will chunk the requests, since I'm dubious that the bulk API ought to have, say, 300MB of JSON submitted via HTTP.

I think it's entirely possible that this can be done through a simple modification of the JSON files to be indexed. Simply replace every ,\n with \n{Elasticsearch JSON data}\n. The file would still need to be chunked, but that's a pretty light lift.

Huh, on reflection, that could be done at the command line.

waldoj added the enhancement label May 16, 2014

This was referenced May 16, 2014

Create a search interface #4

Closed

Create a browse interface #5

Closed

waldoj closed this as completed Aug 4, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Index JSON files with Elasticsearch #3

Index JSON files with Elasticsearch #3

waldoj commented May 16, 2014

waldoj commented May 16, 2014

Index JSON files with Elasticsearch #3

Index JSON files with Elasticsearch #3

Comments

waldoj commented May 16, 2014

waldoj commented May 16, 2014