Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 

Hyperspark

Hyperspark is a decentralized data processing tool for Dat. Inspired by Spark

Basically, it's just a fancy wrapper around Dat Archive

This is a work-in-progress. Any idea/suggestion is welcome

Goal

  • Reuse intermediate data.
  • Minimize bandwidth usage.
  • Share computation power.

How to use

Data owner

It's simple! Just share your data with dat: dat .

Data Scientist

Define your ideas with transforms and actions without worrying about fetching and storing data.

Computation Provider

Run transformations defined by researchers. Cache and share intermediate data so everyone can re-use the knowledge without having their own computation cluster.


Synopsis

define RDD on dat with dat-transform

word-counting:

const hs = require('hyperspark')
var rdd = hs(<DAT-ARCHIVE-KEY>)

// define transforms
var result = rdd
  .splitBy(/[\n\s]/)
  .filter(x => x !== '')
  .map(word => kv(word, 1))

// actual run(action)
result.reduceByKey((x, y) => x + y)
  .toArray(res => {
    console.log(res) // [{bar: 2, baz: 1, foo: 1}]
  })

Related Modules

About

Decentralized data processing platform

Resources

Releases

No releases published

Packages

No packages published