# Service Documentation
This notebook will cover the code used to implement and run the service. The code covered in this documentation can be found [here](https://github.com/neurodata-nomads/nomads_cloud/blob/master/service/server.py)
.
## Flask Routes
The main tool we use to interface with the web is Flask. Here are the two routes we use.

In [None]:
'''
    Index Route 
        Returns index.html template (loads web form)
'''

@app.route("/", methods = ["GET"])
def index():
    client = boto3.client("batch")
    return render_template("index.html")
    
'''
    Submit Route 
        Upon form submission, this route submits a batch computing job and redirects back to home
'''
@app.route("/submit", methods = ["GET", "POST"])
def submit():
    token = request.form["token"]
    col = request.form["col"]
    exp = request.form["exp"]
    z_range = request.form["z_range"].replace(" ", "")
    y_range = request.form["y_range"].replace(" ", "")
    x_range = request.form["x_range"].replace(" ", "")

    pipeline = request.form["pipeline"]
    email = request.form["email"]
    host = "api.boss.neurodata.io"
    submit_job(email, pipeline, token, col, exp, z_range, y_range, x_range)


    return redirect(url_for("index"))

All this code should be fairly self-explainable. The index route just renders the index.html. The submit route gathers the form parameters before submitting a job based on them. Note that the following parametsrs must be recieved in order for the job to run correctly:
1. token - BOSS API Token
2. col - BOSS collection you are pulling from
3. exp - BOSS experiment you are using
4. z_range, x_range, y_range - dimmensions of data you are pulling from BOSS
5. pipeline - pipeline to run
6. email - email results are sent to

**Important warnings about current job submission**:
1. The Boss cannot handle pulling large cubes (>2gb) so try to limit the dimensions of your data.
2. The token provided will be what is used to push results to the BOSS. Make sure you have the proper permissions to push to collman_nomads collection. 
3. The dimensions must be formatted as "int, int".
4. The pipeline you run must be a registered pipeline on AWS. 

## Submit Job
The "submit_job()" function is the main bulk of the service program (called in the "/submit" route). Since this function is very long (aka needs to be modularized), we will break it down into a few parts and explain each part below.

In [None]:
def submit_job(email, pipeline, token, col, exp, z_range, y_range, x_range):

    try:
        z_range_proc = list(map(int, z_range.split(",")))
        y_range_proc = list(map(int, y_range.split(",")))
        x_range_proc = list(map(int, x_range.split(",")))
    except:
        return Exception("Job not submitted, dimensions not correctly formmated")

The start of the function just processes the range dimensions to check if they are two integers. An exception will be thrown and a Server 500 error will show up on the web page if this is incorrect.

In [None]:
job_name = "_".join([pipeline, col, exp, "z", str(z_range_proc[0]), str(z_range_proc[1]), "y", \
    str(y_range_proc[0]), str(y_range_proc[1]), "x", str(x_range_proc[0]), str(x_range_proc[1])])

The next line is very important! This is the key used to label a lot of different upload results throughout the pipeline. In the service, this job_name is used to name the Batch Job name and serve as a presigned url for where the results will show up in the S3 Bucket. 

In [None]:
client = boto3.client('s3')

s3_bucket_exists_waiter = client.get_waiter('bucket_exists')

if pipeline == "nomads-unsupervised":
    bucket = client.create_bucket(Bucket="nomads-unsupervised-results")
    s3 = boto3.resource("s3")
    bucket = s3.Bucket("nomads-unsupervised-results")
    bucket.Acl().put(ACL='public-read')

    url = "https://s3.console.aws.amazon.com/s3/buckets/nomads-unsupervised-results/{}/?region=us-east-1&tab=overview".format(job_name)
    send_email(url, email, pipeline)

if pipeline == "nomads-classifier":
    bucket = client.create_bucket(Bucket="nomads-classifier-results")
    s3 = boto3.resource("s3")
    bucket = s3.Bucket("nomads-classifier-results")
    bucket.Acl().put(ACL='public-read')
    url = "https://s3.console.aws.amazon.com/s3/buckets/nomads-classifier-results/{}/?region=us-east-1&tab=overview".format(job_name)
    send_email(url, email, pipeline)
        

The next part generates the links to the S3 folder where all the results will be stored. This url is what is emailed to the user for them to check when everything is ready. You will note several things, before the pipeline is run, we first create the necessary Bucket where the results are stored. Boto3 will not do anything if the bucket is already created so having this line here is a good safety check (the same line appears when running the actual pipeline). The buckets we create are also public (read-only) to make results-sharing much easier.

The url generated is formated with the job_name (the key we use to name the S3 folder where we store our results in). We resort to this URL building because Boto3 does not have a function to generate Bucket URL's and building the URLs is actually a very straightforward task.

Once the URL is generated, an email is immediately sent to the user. THe URL will display nothing until the results are finished, after which the url will be populated with the pipline contents (NDVis links, predictions).

This is the send_email function:

In [None]:
SENDER = 'NOMADSPipeline@gmail.com'
PASSWORD = pickle.load(open("password.pkl", "rb"))

def send_email(url, recipient, pipeline):

    html = """\
    <html>
      <head></head>
      <body>
        <p>Hi!<br>
           Here is the <a href="{}">link</a> to {} Results.
        </p>
      </body>
    </html>
    """.format(url, pipeline)

    msg = MIMEText(html, "html")

    server = smtplib.SMTP('smtp.gmail.com', 587)
    server.starttls()
    server.login(SENDER, PASSWORD)
    server.sendmail(SENDER, recipient, msg.as_string())
    server.quit()

For sending emails, we are using a standard python library called smtplib. To send emails, you need to update the SENDER and PASSWORD global variables. For security reasons, we pickle the PASSWORD and do not push it to github. 

The majority of the function after this is just interfacing with AWS. There's a lot of parameters to be handled and you can read more about this part in the boto3 AWS Batch documentation. We will only cover a few parts of this interfacing:

In [None]:
    client = boto3.client("batch")
    response = client.describe_compute_environments(
        computeEnvironments=[
            'nomads-ce',
        ],
    )
    if len(response["computeEnvironments"]) == 0:
        response = client.create_compute_environment(
            type='MANAGED',
            computeEnvironmentName='nomads-ce',
            computeResources={
                'type': 'EC2',
                'desiredvCpus': 0,
                'instanceRole': 'ecsInstanceRole',
                'instanceTypes': [
                    "optimal"
                ],
                'maxvCpus': 20,
                'minvCpus': 0,
                'securityGroupIds': [
                    'sg-41927a3e',
                ],
                'subnets': [
                    'subnet-11dc531d',
                    'subnet-17c65f72',
                    'subnet-75006549',
                    'subnet-4e2ace06',
                    'subnet-7ca59151',
                    'subnet-74f3d02f'
                ],
                'tags': {
                    'Name': 'Batch Instance - C4OnDemand',
                },
            },
            serviceRole='AWSBatchServiceRole',
            state='ENABLED',
        )

This is an example of some of the interfacing required. Here, we are create compute environments for AWS Batch. Similar steps are made to instantiate other BOSS resources as well. Some things to point out for instantiating compute environments: the "ecsInstanceRole" and "AWSBatchServiceRole" are provided roles that handle AWS permissions while the job is being run. This is how we allow the registered pipeline to push results to S3 without running into permission issues.

AWS can generate the two we use here automatically. If your AWS account throws an error, log into AWS Console and make sure these roles exist.

One thing we also do here that we repeat for all other Batch resources is first call a "describe" function. This just makes sure that we aren't making the same resources over and over again. The describe just searches to see if a "nomads-ce" compute environment already exists, before trying to create it. All the other parameters are explained in [AWS Batch boto3 documentation](http://boto3.readthedocs.io/en/latest/reference/services/batch.html)

In [None]:
response = client.describe_job_definitions(
        jobDefinitionName=pipeline,
        status='ACTIVE',
    )
if len(response["jobDefinitions"]) == 0:
    if pipeline == "nomads-unsupervised":
        register_nomads_unsupervised(client)
    if pipeline == "nomads-classifier":
        register_nomads_classifier(client)

Here is the last important part of the function we will cover. This is the code to specify which job will actually be run. When you are adding new piplines, make sure to change these lines to include your pipeline. All this funciton does is register the proper job definition for your pipeline. Note the pipeline you are registering must be already in AWS as shown below:

In [None]:
def register_nomads_unsupervised(client):
    response = client.register_job_definition(
        type='container',
        containerProperties={
            'command': [
                "echo",
                "Staring Container"
            ],
            'image': '389826612951.dkr.ecr.us-east-1.amazonaws.com/nomads-unsupervised',
            'memory': 4000,
            'vcpus': 1,
        },
        jobDefinitionName="nomads-unsupervised",
    )

Here is an example job definition being registered. Note that you must provide an image. This image is the dockerized pipeline that is uploaded to AWS.

Finally, the actual job submission:

In [None]:
response = client.submit_job(
        jobName=job_name,
        jobQueue='nomads-queue',
        jobDefinition=pipeline,
        containerOverrides={
            'vcpus': 1,
            'memory': 2000,
            'command': [
                "python3",
                "driver.py",
                "--host",
                "api.boss.neurodata.io",
                "--token",
                token,
                "--col",
                col,
                "--exp",
                exp,
                "--z-range",
                z_range,
                "--x-range",
                x_range,
                "--y-range",
                y_range
            ],
        },
    )

The one really important thing to notice here is that we call "python3 driver.py" as our command along with all the arguments we need to run the funtion. 