Skip to content

Allow batch importing to already existing database. #27

Closed
maxdemarzi opened this Issue Mar 21, 2013 · 6 comments

5 participants

@maxdemarzi

We're always getting requests for this. Maybe a way to specify the node id and rel id that the import should start from.

@stephanf
@jexp jexp closed this Jul 5, 2013
@robinloxley1

May I know how this issue has been solved?

@jexp
Owner
jexp commented Jul 26, 2013

With a config option see the readme

@aroyc
aroyc commented Jan 28, 2014

Hey Michael, I don't see any option in the documentation to keep Unique nodes.
e.g: If I keep

batch_import.keep_db=true
and run the sample/import.sh twice nodes and rels with the same property are getting created:

neo4j-sh (?)$ MATCH (a)-[r]->(b) RETURN a,b LIMIT 25;

+-------------------------------------------------------------------------------------+
| a | b |
+-------------------------------------------------------------------------------------+
| Node[0]{works_on:"neo4j",age:"37",name:"Michael"} | Node[1]{age:"14",name:"Selina"} |
| Node[0]{works_on:"neo4j",age:"37",name:"Michael"} | Node[2]{age:"6",name:"Rana"} |
| Node[0]{works_on:"neo4j",age:"37",name:"Michael"} | Node[3]{age:"4",name:"Selma"} |
| Node[1]{age:"14",name:"Selina"} | Node[2]{age:"6",name:"Rana"} |
| Node[2]{age:"6",name:"Rana"} | Node[3]{age:"4",name:"Selma"} |
| Node[4]{works_on:"neo4j",age:"37",name:"Michael"} | Node[5]{age:"14",name:"Selina"} |
| Node[4]{works_on:"neo4j",age:"37",name:"Michael"} | Node[6]{age:"6",name:"Rana"} |
| Node[4]{works_on:"neo4j",age:"37",name:"Michael"} | Node[7]{age:"4",name:"Selma"} |
| Node[5]{age:"14",name:"Selina"} | Node[6]{age:"6",name:"Rana"} |
| Node[6]{age:"6",name:"Rana"} | Node[7]{age:"4",name:"Selma"} |
+-------------------------------------------------------------------------------------+

I want to know about the specific option to set in the batch.properties so that the nodes with same properties doesn't get created twice.
TO KEEP IT IN A NUT-SHELL MY QUESTION IS: HOW CAN I USE BATCH INSERT TO MAKE SURE THE SAME NODES/RELS WON'T BE CREATED TWICE

Thanks in advance !

@jexp
Owner
jexp commented Jan 28, 2014

The batch insertion is not about creating unique nodes, sorry, right now that was no focus b/c it will also reduce performance.

The only thing out of the box that I can think of is to control the node id's externally (with id:id as first column) and then use the same externally driven id's again.

If you are starting do to index lookups during batch insertion your performance will drop a lot.

@aroyc
aroyc commented Jan 28, 2014

Okay !!
Thanx a lot for your prompt reply !! :)

Actually I've built a graphDB with a large collection of words. Now I'm trying to integrate DB-pedia and ran into such situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.