-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process not start with huge database #61
Comments
The standard backend is |
Can you please provide more detailed instructions how to run it? Step-by-step instructions would be great. |
What OS are you targeting? |
On my dev machine OSX 10.9 with 16Gb of RAM but I have very limited HDD space on it. I'd like to use OSX. I can connect to server via NFS and use its HDD space. |
Setting up leveldb was quite straightforward[1]. Replace the
Then load the triples into the database. Note, that this process will take hours or days, if you want to import the complete 330G from freebase (extrapolating from my single data point (106M triples, 13h) the freebase import would take over 200 hours[2]).
[1] You do not need to install leveldb beforehand, since the used library is pure Go, too. [2] I am looking at methods to speed up import. I think there are two ways: a) make the input smaller, e.g. by applying namespace or vocabulary shortcuts (I wrote some prototype for that with here); b) use a distributed backend. |
How to query database after data load ( What do you think about using May be its possible to cut only some domains? For example I need only What do think about asking developers to add possibility to load only certain domains ( Can your prototype export from |
a) I think the cayley docs are quite nice, |
If you add to your tool
Can you provide me some examples of commands? Currently I need to do this https://www.freebase.com/user/sergec/views/artists_by_record_label?mql= but get whole list |
@miku is spot-on. I'll also point out that by tweaking some of the database flags (ie, the configuration docs) you can load even more triples into leveldb in even less time :) |
Closing, for lack of scope, and a good answer. |
I've downloaded FreeBase data dump (27Gb freebase-rdf-2014-07-06-00-00.gz) and uncompressed it (
gzip -cd freebase-rdf-2014-07-06-00-00.gz > freebase.nt
330Gb freebase.nt
). When I starting process it takes a lot of time and then process got killed. Log:`root@ns501558:/home# time ./cayley_0.3.0-pre_linux_amd64/cayley http --dbpath=freebase.nt
Killed
real 16m34.086s
user 14m44.880s
sys 0m13.356s`
Is any solution for it?
May be its better to use compressed databases like I suggested before #57 ?
The text was updated successfully, but these errors were encountered: