Cloud Is Us
Cloud Is Us distributes the effort necessary to process large graph datasets to a number of so called
contributors, running in a Web browser. Each
contributor processes a tiny fraction of the graph data, which is in turn combined and delivered to the
client. The allocation of a part of the graph and the combination of the results is performed by the
allociner (= allocate + combine).
The following steps are performed in a typical Cloud Is Us processing phase:
clientinitiates the processing by ingesting a graph dataset into the
allocinerthrough providing a HTTP URI that points to the location of a dataset - called the source - in N-Triples format.
allocinerstream-reads the data from the client's source and allocates data chunks round-robin on a per-subject basis to
- Once all
contributorshave loaded the data locally the
clientcan issue a query, which is distributed to all
contributorlocally executes the query and sends back the result to the
allocinerwhere it is combined and made available to the
Performance and Scalability Considerations
contributors are available to Cloud Is Us, the faster a query can be executed. The bottleneck is likely to be the
allociner, responsible both for initially distributing the data to the
contributors and combining it, eventually from them.
Let's have a look now how, given a dataset with 1 billion (= 1.000.000.000 = 1B) triples, with an increasing numbers of
contributors the processing capabilities increase. One easily runs into the dimension of 1B triples these days - take for example an application that uses statistical data from Eurostat together with data from DBpedia, LinkedGeoData and data.gov.uk.
|#contributors||#triples per contributor|
Essentially, the table above tells us that with some 10k
contributors, that is, people having an instance of it running in their Web browser, we're are able to process a 1B triples dataset fairly straight-forward as it would mean a load of some 100k triples per
- cloudisus.contributor and cloudisus.client - rdfstore.js
- cloudisus.allociner - Node.js/rdfstore.js + Dydra
- implement round-robin stream load in allociner
- implement local SPARQL query in contributor
- implement combine in allociner
- implement client
- implement dashboard
The software provided here is in the Public Domain.