Please sign in to comment.
[bug 718826, 715932] Make ES indexing less sucky
* the ES connection already has code for forcing bulk, so we don't need to repeat that. this changes the code to push the setting to ES. * this also tweaks the estimation code so that it shows minutes and seconds and shows the total delta later. Now I can stop running "time ./manage.py esreindex". * fix esreindex so that you can specify doctypes. This will appropriately create/delete indexes so that what you don't want to delete won't get deleted. * adds basic handling for bad data. This does a log.exception, but we really should log more than that and/or make it more obvious to developers that there's bad data out there. In the meantime, this allows us to continue indexing. * reduced memory usage of indexing by iterating over ids---now it runs on my laptop. * ghanges _get_index() to get_es_index(). We use it so often it might as well be part of the "public API". * fixed create/delete indexes so that to switch doctypes to their own index is now just a change in settings---no code changes needed. * fix DEBUG = True case by reseting queries * this also adds a bunch of helpful comments, moves reindex_model to SearchMixin.index_all, and has some other cosmetic code cleanup. End result of this is that indexing doesn't die if it hits bad data, indexing takes much less memory to run, you can specify specific doctypes to index at the command line, and the code is better.
- Loading branch information...
Showing with 168 additions and 72 deletions.