Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
branch: master
Fetching contributors…

Cannot retrieve contributors at this time

64 lines (47 sloc) 3.172 kb

Cloud Is Us

Cloud Is Us distributes the effort necessary to process large graph datasets to a number of so called contributors, running in a Web browser. Each contributor processes a tiny fraction of the graph data, which is in turn combined and delivered to the client. The allocation of a part of the graph and the combination of the results is performed by the allociner (= allocate + combine).

Architecture

The following steps are performed in a typical Cloud Is Us processing phase:

  1. The client initiates the processing by ingesting a graph dataset into the allociner through providing a HTTP URI that points to the location of a dataset - called the source - in N-Triples format.
  2. The allociner stream-reads the data from the client's source and allocates data chunks round-robin on a per-subject basis to contributors.
  3. Once all contributors have loaded the data locally the client can issue a query, which is distributed to all contributors.
  4. Each contributor locally executes the query and sends back the result to the allociner where it is combined and made available to the client.

cloudisus architecture

Performance and Scalability Considerations

The more contributors are available to Cloud Is Us, the faster a query can be executed. The bottleneck is likely to be the allociner, responsible both for initially distributing the data to the contributors and combining it, eventually from them.

Let's have a look now how, given a dataset with 1 billion (= 1.000.000.000 = 1B) triples, with an increasing numbers of contributors the processing capabilities increase. One easily runs into the dimension of 1B triples these days - take for example an application that uses statistical data from Eurostat together with data from DBpedia, LinkedGeoData and data.gov.uk.

#contributors #triples per contributor
10 100M
100 10M
1.000 1M
10.000 100k
100.000 10k
1.000.000 1k

Essentially, the table above tells us that with some 10k contributors, that is, people having an instance of it running in their Web browser, we're are able to process a 1B triples dataset fairly straight-forward as it would mean a load of some 100k triples per contributor.

Components

Todo

  • implement round-robin stream load in allociner
  • implement local SPARQL query in contributor
  • implement combine in allociner
  • implement client
  • implement dashboard

License

The software provided here is in the Public Domain.

Jump to Line
Something went wrong with that request. Please try again.