Skip to content
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


An R package to do parallel processing on Amazon, (more) easily. Born 2016, at the Brisbane ROpenSci Unconference. This is a work in progress, and is currently in development.


Automatically sets up and starts a cluster of AWS workers, does parallel processing, and saves the output to S3 Bucket.

# Install

WARNING: Check yourself, before you wreck yourself! You are the ruler of your own Amazon costs.(No responsibility taken for your AWS bill...)

snowball takes the location of data, a user defined function, and some basic instructions to set up and run virtual machines in parallel on Amazon, and save results in an S3 bucket.


  • An AWS account, with:
    • IAM user with permissions to manage EC2 and S3.
    • API keys for the IM account.
    • an S3 bucket
      • With policy allowing an IAM user full access
      • Containing the data, and the user function, as .rds file

Overview / workflow:

  1. Put job list and data in S3 bucket (job list is like a job roster, a data table with names of workers and functions )
  2. SpinUp all workers start monitoring S3
  3. snowball(function, bucketName, ...)
  • snowball calls snowpack'
  • this writes the snowpack function that will be run on each worker.

How to

1. Setup snowball

Save a .snowball file into your current working directory with the following configuration,




Next, run snowball_setup to set global variables.

snowball_setup(config_file, echo)

2. Pack the snowball.

Start an AWS instance with buckets, while setting up the data/feature split

snowpack(fn, listItem, bucketNameString, rdsInputObjectString, rdsOutputString)

3. Throw the snowball.

Give data location and user function


4. Avalanche the outputs.

combine all results into one file


More help?

Snow what?

Check out the Snow and Snowfall package documentations.

What is an S3 Bucket..??

We assume you have a (very) basic understanding of what an S3 Bucket is (it's like dropbox, for data). Click here for info from Amazon.. It is very easy to create a bucket. You just click create bucket.

Setting up the 'bucket policy allowing an IAM user full access' is harder:

  • In the top left of an AWS window click on Services, then IAM, then click on the user you want to give access to (you, most likely).
  • copy the User ARN into your clipboard.
  • go to the newly created bucket, click on Properties
    • click on add policy, which opens a window called "AWS Policy Generator"
      • Select policy type: S3 Bucket Policy
      • AWS Services should be Amazon S3,
      • Actions: tick All Actions.
      • Paste your ARN into principal (I know... logical.)
      • Paste this (with YOUR bucket name) into the ARN box: arn:aws:s3:::bucketName
    • Click Add Statement, copy the contents to clipboard. Go back to bucket page, click "Edit bucket policy" and paste clipboard into this.


Automatically start a cluster of worker nodes to do parallel processing




No releases published


You can’t perform that action at this time.