As of 2016, the conventional approach to large-scale, compute-intensive services is:
- Persist data in a relational or NoSQL database
- Farm out compute work using a message broker like RabbitMQ
- Server code exposing an HTTP API
- Clients provide user interface(s) on top of this api
- Deploy it in a private network at some cloud provider
This works, and there exists lots of best-practice around building an maintaining such a service. However it is a heavily centralized solution, with associated drawbacks of single controlling entities.
A primarily decentralized approach may have following benefits:
- Increased robustness from service disruption due to failure in central providers (Heroku/AWS)
- Avoids single actors being able to take service away from users (like a local government and/or ISP). Service may even be maintained independently of the initial creator.
- Geographical distribution of the service does not require. May give better performance for (some) users
- Potential of cost savings, by enabling small-scale participation with reducing friction. Some users and volunteers may be willing to take some of the costs, possibly due to improved service quality, or because they are able to externalize it. Like free/sunk electricity or compute equipment costs.
- Derivative, spin-off and value-add services are more incentivized because of less central control
Proof-of-concept. Can distribute jobs through Ethereum blockchain that can be picked up by workers and executed in browser sandbox. There is no reward payments and no security implemented, and only tested on non-production network and data.
- Ethereum contract is quick&dirty
- Input, code and results are distributed using IPFS
- dapp webui and nodejs CLI tool can post jobs to agency
- worker can listen for jobs and execute them (in PhantomJS)
- Tested with TestRPC virtual network, and go-ethereum on Morden testnet
TODO section for milestones and next steps.
Start Ethereum testing client, with JSON-RPC enabled.
We use the truffle framework, accessed through wrapper
Refer to their documentation for more details on usage.
Deploying a JobAgency
Running a worker
Run an Ethereum node. Make sure
account is unlocked, that JSON-RPC is enabled.
geth --rpc --unlock 0xd87e13619....
Run an IPFS node, on port 8090
ipfs config Addresses.Gateway /ip4/127.0.0.1/tcp/8090 ipfs daemon
Run the actual worker. Optionally specify address of the JobAgency to use
Running a job poster
Run an Ethereum node.
account is unlocked, that JSON-RPC is enabled.
For webui CORS also needs to be allowed.
geth --rpc --unlock 0xd87e13619.... --rpccorsdomain="*"
Serve the webui, then open browser at http://localhost:8080
Alternatively, use the CLI tool:
./bin/postjob CODEHASH INPUTHASH [JobAgency address]
- Agency: Posts new
Jobs, by logging them
- Agent: Subscribed to
Agency, waiting for new
- On new
Agentdownloads the input and code from IPFS, starts the computation.
Agentcompletes a computation done, uploads results to IPFS, then updates the
Jobverifies the result, and assuming it was correct, credits the
- Once in a while (eg weekly or when above N credits), the
Agencypays out the credits of
Agents as Ether.
- Unhardcode script/polyfill
- Unhardcode accounts used
- Add end2end tests
- Add some example application(s), which has some code+data up on IPFS, uses this to compute things
- Document and publish blogpost(s)
- Support options data as part of job?
- Defined & implemented basic security strategy
- Tested a lot on the testnet
- Integration point for, and existance of, functional tests of results
- A way to tune Ethereum/centralized work-balance (manual or automated)
- Docker images available, ready-to-run on x86 cloud/home server
- Ready-to-run SD card image for Rasperry Pi
- Worker ready-to-include in browser/mobile applications
In the case of producing webpages, an attacker returning bad results could for instance:
- Put ads on pages
- Put in obscene content
- Remove some or all content, for censoring or denial-of-service
- Try to steal credentials from host-scoped storage (localstorage)
Attacks can also happen by attacking the contracts themselves. This could allow to bypass security mechanisms to perform any of the above, as well as steal the Ether currently in the system.
This means fairly interesting for attackers with finanical, political and fame motivations.
- Functional verification (tests) of results.
- Trusted workers replicating the work, comparing it. For some small, randomized portion of the jobs. "ticket control"
- Reputation system. Note, may still be open for Sybil attack
- Withholding payout of performed work until a lot of work is verified
- Require a deposit, for punishment in case of bad/contested results.
- User review/approval of output, exposed to visitors
- Not allow executable code (at least not Turing complete) in results. For websites, maybe AMP HTML? Media embedding (iframe, images) is also a vector, though potentially mitigated by comparing to media sources in input data.