Skip to content

Computing in the Cloud

mndoci edited this page Jul 15, 2012 · 12 revisions

Here is a list of scientific/parallel/distributed computing frameworks and platforms for Amazon EC2. Note that since you do have root access to EC2, you can run and deploy most distributed/parallel computing frameworks. The following list covers those that either are exclusively designed for EC2 (usually via some abstraction) or have built in EC2 support.

Non-commercial

  • StarCluster:StarCluster is a utility for creating and managing general purpose computing clusters hosted on Amazon's Elastic Compute Cloud (EC2). StarCluster minimizes the administrative overhead associated with obtaining, configuring, and managing a traditional computing cluster used in research labs or for general distributed computing applications. There is a github repo as well. I use StarCluster
  • CloudFlu: The CloudFlu library aims to overcome the entrance barrier for high performance parallel computing and make the real-world case analysis available for engineers, specifically for the CFD application, OpenFOAM.
  • cloud-crowd:CloudCrowd is intended to make distributed processing easy for Ruby programmers.
  • Galaxy Cloudman: Create a vanilla SGE environment or a complete bioinformatics analysis environment using Galaxy. BioCloudCentral makes it easy to use Galaxy, CloudMan and Cloud BioLinux together.
  • ec2cluster: Rails REST web service and dashboard UI for launching MPI clusters on Amazon EC2 and running user submitted jobs
  • ec2mpi: A command line interface for managing MPI clusters on Amazon EC2.
  • Ironfan (was cluster-chef): Ironfan, the foundation of The Infochimps Platform, is an expressive toolset for constructing scalable, resilient architectures.
  • Disco:Disco is an open-source implementation of the Map-Reduce framework for distributed computing. Disco supports parallel computations over large data sets on unreliable cluster of computers.
  • CloudGene: CloudGene is an open-source platform to improve the usability of MapReduce programs by providing a graphical user interface for the execution, the import and export of data and the reproducibility of workflows on in-house and rented clusters, i.e. in the cloud.

Commercial

  • Amazon Elastic MapReduce: Amazon's service to manage and run Hadoop clusters. Also supports Pig, Hive and Cascading. Now supports EC2 cluster instances.
  • CycleCloud: CycleCloud takes the delays, configuration, administration, and sunken hardware costs out of Grid Computing allowing you to focus on running your jobs. Supports Condor, Torque and Oracle Grid Engine. CycleCloud uses CycleServer and Grill.
  • StackIQ Rocks+: Rocks+ instances provide a push-button cloud-based cluster solution. Users can automatically spin up multiple, interconnected servers able to run compute-intense applications, rather than attempting to connect instances by hand or with 3rd-party software add-ons.
  • Bright Cluster Manager: Bight Cluster Manager supports EC2 within the native interface.
  • UniCloud: With UniCloud, companies can form an elastic compute infrastructure or cloud environment that unifies provisioning, configuration and virtualization management with application configuration into a single RESTful web-services-based framework. SGE based.
  • Oracle Grid Engine: Oracle (formerly Sun) Grid Engine software is one of the most widely deployed distributed resource managers. SGE 6.2U5 supports multiple EC2 AMIs and also has native Hadoop support. It's both free and commercial.
  • RightGrid: The Grid Edition lets you control and manage any background or batch processing worker tasks in a scalable, fault-tolerant, and audited environment. It is ideal for processing numerous datasets.
  • CRdata: Deploy R and Bioconductor scripts. Free for small jobs. Run at scale on EC2. (CRdata has shut down)
  • Cloudnumbers: Scalable computing in the cloud. Targeted at R, Python and C applications.