RDF Admission

BRGS :: Big-data RDF Graph Searcher

This is my masters project implementation. The purpose of this project is to parse a large RDF dataset, create a graph of it’s data and enable keyword searches of it.

You know, this is still WIP.

Running it for development

First, rvm is your friend. If you were not prompted yet, install rvm, leave and re-enter the project folder. Second, once you are in the project folder run:

$ bundle install

To work on development and run tests, you will need redis-server up and running. After it’s up, you can run the tests with:

$ spec rspec

And see things integrated in your development machine with:

$ foreman start

If you want to run more workers, you can set it up on foreman’s concurrency parameters - but keep in mind you should have only one resque-web running:

$ foreman start -c web=1,worker=3

Now you can use the API to parse NTriples files. See the example by running each command and waiting for them to finish by checking the log and the web interface:

$ rake brgs:admission $ rake brgs:spider

You can point to where your web server was deployed and to a different nt file by using environment variable. For instance:

$ NTFILE=s3.amazonaws.com/example-rdfs/stw.nt REMOTE=ec2.my.server.com rake brgs:admission $ REMOTE=ec2.my.server.com rake brgs:spider

Deploying

The deploy process is based on capistrano and foreman. It will need a servers.rb file on the project root as in this example example:

set :redis_server, ‘ip-10-123-32-210.ec2.internal’ server ‘ec2-12-23-13-123.compute-1.amazonaws.com’, :redis, :foreman, :concurrency => ‘web=1,worker=2’ server ‘ec2-12-23-13-124.compute-1.amazonaws.com’, :foreman, :concurrency => ‘worker=4’

You can also generate this file using the rake task that will list your servers that have a tag named brgs_roles accordingly:

$ AWS_ACCESS_KEY_ID=your-access-key AWS_SECRET_ACCESS_KEY=your-secret-access-key rake brgs:servers_rb > servers.rb

You can set your access keys in env.rb . Check lib/network_builder.rb for further details.

As of now, the deploy process expects all servers to be Ubuntu Servers, accessible via SSH by the user ‘ubuntu’ using ssh-key. The first time you run it, and everytime there’s a change on the requirements, run:

$ cap deploy:setup

On the first time you deploy, to avoid errors when starting foreman to take care of the processes, run:

$ cap deploy:cold

Finally, everytime later you wish to deploy just run:

$ cap deploy

And the app code will be updated and all foreman roles restarted. You can also start and stop things individually: check cap -vT for more information.

Jobs

These are the types of jobs that can be created:

RDF Admission

Input: RDF File Action: Enqueue a RDF Parsing job for every 1M lines

RDF Parsing

Input: RDF file piece with 1M lines Action: Index nodes and relations, feed predicate-object collection

Graph Spider

Input: Graph name Action: Enqueue a Graph Crawler job for each source

Graph Crawler

Input: Graph name and source node index Action: Run BFS, index paths, create Path Processer job for each path

Matrix Builder

Input: Graph name and path index Action: Index template, feed sparse matrix

Indexes, collections and sparse matrix

Nodes Index

Uses: ElasticSearch From node index to node value and from node value to node index

Relations Index

Uses: ElasticSearch From relation index to relation value and from relation value to index

Predicate-object Collection

Uses: Redis From node index to list of tuples with predicate relation index and object node index

Path Index

Uses: Redis From path index to list of node-relation-node-…-relation-node indexes

Template Index

Uses: Redis From template index to list of relation-relation-…-relation indexes

Sparse Matrix

Uses: Redis From pair of node-path indexes to tuple of node_pos, path_len, template index

Bye

Go on and develop.

Danilo Moret PUC-Rio Globo.com

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
app		app
config		config
docs		docs
example-rdfs		example-rdfs
lib		lib
spec		spec
.gitignore		.gitignore
.rspec		.rspec
.rvmrc		.rvmrc
Capfile		Capfile
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
Procfile		Procfile
Procfile.prod		Procfile.prod
README.rdoc		README.rdoc
Rakefile		Rakefile
servers.rb.erb		servers.rb.erb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BRGS :: Big-data RDF Graph Searcher

Running it for development

Deploying

Jobs

RDF Admission

RDF Parsing

Graph Spider

Graph Crawler

Matrix Builder

Indexes, collections and sparse matrix

Nodes Index

Relations Index

Predicate-object Collection

Path Index

Template Index

Sparse Matrix

Bye

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Uh oh!

Languages