GitHub - mariusmni/aws: Personalize AWS RunCommand: How to assign numeric index to instances in a fleet so that commands can behave differently depending on instance index (similar to how MPI code can behave differently based on processor rank)

Introduction

Amazon EC2 RunCommand is a feature that allows you to manage a fleet of amazon instances by automating common administrative tasks like executing Shell scripts and commands on Linux. This is great if you want to run the same command on all the instances. However, one may want to run commands that behave differently depending on the instance. For example, if you want to process certain datasets in parallel, you probably want every instance to process a different dataset. This idea is not new to parallel processing. For example, the Message Passing Interface (MPI) allows each processor to have a different behaviour depending on a processor rank or index.

This page explains how to emulate the MPI behaviour on AWS RunCommand. In other words, we want each instance to have a numerical index. This way, we can run the same script on a fleet of EC2 instances, where the script behaves differently depending on the instance index. The instance index can be generated as follows.

How to assign a unique index to every instance

Assume we have a fleet of EC2 linux instances. Every instance in the fleet already has an instance ID. The instance ID can be retrieved from METADATA as follows:

curl -s http://169.254.169.254/latest/meta-data/instance-id

The output will be something like: i-67a6a8a3

For our purpose, we want every instance to have an index between 0 and size of fleet - 1. A simple way to do this is to create a script, say number.sh, that does echo <index>, where index differs from instance to instance. The script can be created manually, but for a large fleet we can use RunCommand itself to generate the number.sh script, as follows.

For this example, our fleet has only two instances:

i-67a6a8a3
i-cea5ab0a

We can execute the following through RunCommand to generate the number.sh script:

#!/bin/bash

# mapping between instance ID and index
declare -A num
num=(
  [i-67a6a8a3]=0
  [i-cea5ab0a]=1
)

# get instance ID of the current instance
instanceID="`curl -s http://169.254.169.254/latest/meta-data/instance-id`"

# get index for the current instance
index=${num[$instanceID]}

# generate number.sh script and make it executable
echo "echo $index" > /number.sh
chmod +x /number.sh

The above script has a mapping between instance IDs and numerical indices. When it runs, it converts the instance ID to its numerical index. It then generates the number.sh script which echoes the index. The path of the number.sh script in this example is the root folder (/) for simplicity. To test that the script was generated, we can run it via RunCommand:

/number.sh

The outputs should be 0 or 1 depending on the instance.

How to personalize command based on index

In this section, we will clone a particular program from github, generate a random input dataset, then run the program on the dataset. The dataset will be generated by using the instance index as a random number generator seed. Thus, every instance runs on a unique dataset.

For this example, we will clone the git project qpms9, which solves the Planted Motif Search problem. The problem itself is not important for our discussion. The project comes with an input dataset generator. We will use the generator to create different datasets depending on the instance index. Then we will run the qpms9 program on each dataset and save the outputs to an s3 bucket.

The following commands assume that you have git, make and g++ installed. You can install them using something like

sudo apt-get install git make g++

on ubuntu, or

sudo yum install git make gcc-c++

on amazon linux/redhat.

To get and build the qpms9 program we have to run the following via RunCommand. For working dir use /tmp.

git clone https://github.com/mariusmni/qpms9.git
cd qpms9
make -C qpms9 nompi
make -C qpms9-data

Now we generate a dataset and solve it. Execute the following via RunCommand, with the same working dir, /tmp:

cd qpms9
mkdir results
n=`/number.sh`
qpms9-data/Release/qpms9-data  -l 13 -d 4 -r $n -o results/t13,4-$n.in
qpms9/NoMpi/qpms9 results/t13,4-$n.in -l 13 -d 4 i -o results/t13,4-$n.out

Notice the

n=`/number.sh`

This sets the variable n to the index of the instance. The index is then passed to the dataset generator as a random number generator seed (-r $n) and is also used to generate the names of the input/output (t13,4-$n.in, t13,4-$n.out) files.

If we have the AWS CLI installed, we can save our results to amazon s3 via the following RunCommand:

aws s3 sync qpms9/results s3://mariusmni-bucket/qpms

The bucket location will contain the input and output files from our entire fleet:

t13,4-0.in
t13,4-0.out
t13,4-1.in
t13,4-1.in

Conclusion

This tutorial showed how to run commands on a fleet of EC2 instances such that a single command behaves differently depending on the instance. We achieved this by assigning a unique numerical index to every instance. This index can be used to personalize the command, similar to how we can personalize MPI code based on processor rank. The above was tested on a fleet containing one Amazon Linux and one Ubuntu instance. However, the principle can be applied on any fleet, including a mixed linux/windows fleet.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

How to assign a unique index to every instance

How to personalize command based on index

Conclusion

About

Releases

Packages

License

mariusmni/aws

Folders and files

Latest commit

History

Repository files navigation

Introduction

How to assign a unique index to every instance

How to personalize command based on index

Conclusion

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages