-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add flag to remove some parts of string #1
Comments
I think the tool (version 1.0.13) meets your requirements already. If I run
the resulting JSON looks like this:
The prefix abbreviations are applied with Sidenote: I am reworking this tool soon, and it will get - among other things - better documentation. |
But I don't want |
I'll see if it is sensible to include such a flag in the new version. Until then, please consider something ad-hoc like this:
|
It will slow down export process a lot. May be its possible temporary to change rule to something like: Please also consider to add support for reading freebase data dump from gz file without unpacking it. cayleygraph/cayley#57 (comment) |
Have you measured it? In my experience these older tools are usually super fast (plus they will run in a separate process). |
Process on previous screenshot took 5 days to complete. Any solutions for speeding it up? |
Thanks for this data point, I am benchmarking various solutions myself and will report them later here. I think there is a chance to reduce the running time by a one if not two orders of magnitude. |
I don't think its possible to get too much performance from CPU. Have you tried OpenCL, CUDA technologies? Have a look at https://github.com/bkase/CUDA-grep and https://bitbucket.org/genbattle/go-opencl |
An ugly approach like this long winded |
Unpacked freebase-rdf-2014-07-06-00-00.gz have 2623380169 lines. |
Have you tried http://sphinxsearch.com/ or http://gearman.org/ ? |
With careful You could always use something like Hadoop or Gearman to distribute work - and in the case of Freebase, this is likely the way to go. |
Is it possible to use |
Just another data point. By replacing So |
I'm going to close this, since the original issue has been resolved:
Furthermore, the performance issues have been addressed. It might be possible to shrink the freebase dump in a few hours. |
I converted
grep
result from freebase data dump with this tool to json file. Then I imported json to mongo db. In mongo db it looks like on screenshot.Can you add a flag that can remove
http://rdf.freebase.com/ns/
while conversion to json. I need it to make DB compact and faster. So data in mongo db will look like:Q2: I also need
-l
flag example. I tried-l="en"
and-l="@en"
to get only english text but process stopped with error.Thanks.
The text was updated successfully, but these errors were encountered: