# Deep learning training pipeline
# Deploying AWS Step Functions + AWS Batch + AWS Lambda

## Installing dependencies
Here we install relevant dependencies to run serverless framework.

In [None]:
!pip install awscli --upgrade --user
!npm install -g serverless@1.77.0

## Setting AWS environmental variables
Here we set up AWS environmental variables so that we will be able to deploy to our AWS account. We will need access key id, secret access key and account id. You will need to replace AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_ACCOUNT_ID with your values. Please use test account and temporary credentials or deactivate credentials after usage.

In [None]:
%env AWS_ACCESS_KEY_ID=<AWS_ACCESS_KEY_ID>
%env AWS_SECRET_ACCESS_KEY=<AWS_SECRET_ACCESS_KEY>
%env AWS_ACCOUNT_ID=<AWS_ACCOUNT_ID>
%env AWS_DEFAULT_REGION=us-east-1

## Creating role for AWS Batch

In [None]:
!aws iam create-role --role-name AWSBatchServiceRole --assume-role-policy-document file://assume-batch-policy.json
!aws iam attach-role-policy --role-name AWSBatchServiceRole --policy-arn arn:aws:iam::aws:policy/service-role/AWSBatchServiceRole

# Deploying deep learning pipeline to AWS

## Deploying CPU pipeline to AWS
Deploying stack with AWS Batch (CPU) + AWS Step Functions + AWS Lambda. At the end of the deployment it will produce endpoint which we can call to trigger AWS Step Functions. AWS Batch will use publicly available CPU image [ryfeus/serverless-for-deep-learning:cpu](https://hub.docker.com/repository/docker/ryfeus/serverless-for-deep-learning/general)

In [None]:
%env IMAGE_NAME=ryfeus/serverless-for-deep-learning:cpu
%env S3_BUCKET=serverless-for-deep-learning
%env INSTANCE_TYPE=EC2
!cd deep-learning-training-cpu;npm install
!cd deep-learning-training-cpu;serverless deploy

## Deploying GPU pipeline to AWS
Deploying stack with AWS Batch (GPU) + AWS Step Functions + AWS Lambda. At the end of the deployment it will produce endpoint which we can call to trigger AWS Step Functions. AWS Batch will use publicly available GPU image [ryfeus/serverless-for-deep-learning:latest](https://hub.docker.com/repository/docker/ryfeus/serverless-for-deep-learning/general)

In [None]:
%env IMAGE_NAME=ryfeus/serverless-for-deep-learning:latest
%env S3_BUCKET=serverless-for-deep-learning
%env INSTANCE_TYPE=EC2
!cd deep-learning-training-gpu;npm install
!cd deep-learning-training-gpu;serverless deploy

## Calling endpoint fron previous cell
Here we can call endpoint from previous cell which will trigger Step Functions with AWS Lambda and AWS Batch.

In [None]:
%env ENDPOINT_URL=
!curl $ENDPOINT_URL

## Listing current executions and their state
Here we list all current Step Function executions related to deployed AWS Step Functions. We will be able to see execution which was created from the request to the endpoint.

In [None]:
%env STATE_MACHINE_NAME=DeepLearningTrainingCPU-StepFunction
!aws stepfunctions list-executions --state-machine-arn arn:aws:states:$AWS_DEFAULT_REGION:$AWS_ACCOUNT_ID:stateMachine:$STATE_MACHINE_NAME --query 'executions[*].[name,status]' --output text 

## Check specific execution state
Based on the results from the previous cell we can choose execution id and get its current graph state. You will need to replace <EXECUTION_ID> with execution id for which you would want to get the state.

In [None]:
%env STATE_MACHINE_NAME=DeepLearningTrainingCPU-StepFunction
%env EXECUTION_ID=
!aws stepfunctions describe-state-machine-for-execution --execution-arn arn:aws:states:$AWS_DEFAULT_REGION:$AWS_ACCOUNT_ID:execution:$STATE_MACHINE_NAME:$EXECUTION_ID --output text --query 'definition'

## Removing CPU or GPU application
Finally we can run the following command to remove infrastructure which we've just created.

In [None]:
!cd deep-learning-training-cpu;serverless remove

In [None]:
!cd deep-learning-training-gpu;serverless remove