1.9M04 - LuceneBatchInserterIndexProvider deletes all other indices not used in batch #527

Closed
jguhlin opened this Issue Feb 15, 2013 · 6 comments

Projects

None yet

3 participants

@jguhlin
jguhlin commented Feb 15, 2013

If you do not call .nodeIndex on each existing index when you .shutdown the index provider all other indices not called on will be deleted. For example if I have 3 indices, "one" "two" "three" and do another batch insert on "one" and "three" the .shutdown will remove "two" even if it had valid data in it.

Let me know what else I can provide to help.

jguhlin commented Mar 16, 2013

This is still present in 1.9M05. However, I've noticed that the indexes are in a "not created" state but the files are there, and creating the index from the web interface preserves all data in the index, which is useful.

I can post code if it would help but it's all in Clojure. But the problem is because I am entering into batch mode several times for each iteration. I'm aware this isn't how it was designed but the fact it works is incredibly useful and lets me create an excellent program that can enter in additional data after a time of live production use, without having to re-create the entire database from scratch.

Member
jexp commented Mar 16, 2013

Thanks for reporting the issue and a workaround.

@jguhlin Your project sound intriguing. Esp in clojure. Any chance to publish about it?

jguhlin commented Mar 16, 2013

@jexp Yes, hopefully within a few months I'll be submitting it for publication in a scientific journal. I'm hoping to write about it before then as well but haven't really done a blog or anything. It will be open source when it is released to the wild as well.

I'm using clojure for concurrency and data handling, and the entire project is an as-yet-unnamed genomics database pulling data in from many sources and giving them relationships(one of the many strengths of Neo4j).

Member
jexp commented Mar 17, 2013

Awesome, love it! Looking forward to your results. If you need any help ping us on the google group.

Hi there,

We too ran into this issue. Is there any chance you could post the workaround or a link to it? Our indexes are dynamic so we can't easily bring them all up for each run of the batch inserter!

thanks

Jen

jguhlin commented Mar 23, 2013

@jennifersmith I can post the workaround, but it's all in Clojure instead of Java. But the method I'm using is to recreate the indexes from the LuceneBatchInserterIndexProvider, by calling .nodeIndex on each one with the same configuration, before doing any of the real work.

My indexes are also dynamic but I am able to interpret the names of each from a global config file, you may have to do something similar or write them all out to a file and re-call that after recreating the BatchInserter for each step.

Let me know if that explains it a bit better.

Hopefully in the future it will be possible for the manager to give us an idea of which indexes already exist from the Batch Inserter Index Provider.

Here is the relevant code: https://gist.github.com/jguhlin/aee8376ed487d361b96c

Best,
--Joseph

@tinwelint tinwelint added a commit that closed this issue Apr 4, 2013
@tinwelint tinwelint Fixes issue where batch insertion on existing db would delete indexes
When doing batch insertion on an existing database containing indexes,
those indexes, or rather the configuration for them (as well as the
existence of them from the dbs POV) would be deleted if not used in this
second batch insertion. I.e. any index configuration would be overwritten
with whatever this second batch insertion did.

Problem being that the existing index configuration wasn't read (the call
to indexStore.start()) so when shutting down in the end and writing the
index configuration, any existing would be lost.

Fixes #527
e638ce4
@tinwelint tinwelint closed this in e638ce4 Apr 4, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment