-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Recently we are doing some research on contact maps and we have a graph consist of almost 2 billion contacts, so obviously we need a graph database. Dgraph is really impressive and really cool. Following the document, we arrange our data into rdf using spark. Now we have about 4.2 billion rdfs.
The schema looks like this:
mutation{
schema{
thisphone: string @index(hash) .
contact: uid .
contact_of: uid .
}
}
The rdf looks like this:
<contact_p104008111111> <contact_of> <contact_p113761083758> (name="sam", ots=1452908610, lts=1501758356, status=1) .
<contact_p104008111111> <contact_of> <contact_p113810888226> (name="frank", ots=1453119360, lts=1500729904, status=1) .
<contact_p104008111111> <contact_of> <contact_p113811659687> (name="tony", ots=1444992764, lts=1498013559, status=1) .
We are running dgraph on a machine with 64G memory, 3.5T ssd(raid5 though), 40 cores.
Now problem is: the import speed will converge to about 20000/s after several minutes. This is not very slow, however when compare to 4.2 billion rdfs still we needs quite a lot of time(3 days). So can we generate sst files and vlogs on spark and then simply copy it to the p directory ? Glad to hear other ways to accelerate the import procedure. Thanks.