You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Alternately, it may well be better to index these from the main JSON files. Now that we're representing records as one record per line, we can just iterate through each line and post them to Elasticsearch. It's tempting to do this with a shell script, but the bulk API may be better suited for this. That bulk data is best generated via Python or PHP, rather than a shell script. (A shell script is capable of it, but let's be realistic.)
We'll want to put together a wrapper that will chunk the requests, since I'm dubious that the bulk API ought to have, say, 300MB of JSON submitted via HTTP.
I think it's entirely possible that this can be done through a simple modification of the JSON files to be indexed. Simply replace every ,\n with \n{Elasticsearch JSON data}\n. The file would still need to be chunked, but that's a pretty light lift.
Huh, on reflection, that could be done at the command line.
Once we have Crump atomizing all JSON data as per-record files, create a shell script to index these files with Elasticsearch.
The text was updated successfully, but these errors were encountered: