Automating the pipeline on AWS:
1. Generate python script from the python notebook
2. Create docker image containing the required environment to run the code
3. Create an ECR container and push the docker image to it
4. Push the python script and supporting files (IPO historic data) to a S3 bucket
5. (Performed on AWS interface) Set up the workflow on AWS CodePipeline

# Convert python notebook to python script

As we can only run the python script in the workflow automatically

In [1]:
# !pip install nbconvert
# !pip install sagemaker

In [2]:
# !jupyter nbconvert --to script 'ipo_risk.py' 'ipo_risk.ipynb'

# Creating Docker image containing the required environment to run the code

cleanup unused images

In [1]:
!docker image prune -a -f

Deleted Images:
untagged: 797452981712.dkr.ecr.us-east-2.amazonaws.com/sagemaker-processing-container:latest
untagged: 797452981712.dkr.ecr.us-east-2.amazonaws.com/sagemaker-processing-container@sha256:fbd8077747b5f2fccc64615e4ae09ff74ee3a71ddc64786ed88dbd575a305d59
untagged: sagemaker-processing-container:latest
deleted: sha256:9b612396e3144abaa16d34787da38b9199f3cc9db6bef1897b154de6d9676d71
untagged: 797452981712.dkr.ecr.us-east-2.amazonaws.com/ipo-risk-model:latest
untagged: 797452981712.dkr.ecr.us-east-2.amazonaws.com/ipo-risk-model@sha256:01b5d3c0e25f98c424eed1e1e17049e58e1260331ff1e315ad0797b5db0f44c3

Total reclaimed space: 0B


In [2]:
%%writefile Dockerfile

FROM python:3.7

RUN pip3 install pandas numpy fuzzywuzzy pytrends re datetime math statistics scipy itertools statistics fuzzywuzzy py_stringmatching
RUN pip3 install difflib sklearn matplotlib seaborn plotly requests bs4 gtab

ENV PYTHONUNBUFFERED=TRUE

ENTRYPOINT ["python3"]

Overwriting Dockerfile


# Create an ECR container that can store the Docker Image

In [3]:
import boto3

account_id = boto3.client('sts').get_caller_identity().get('Account')
region = boto3.Session().region_name
ecr_repository = 'sagemaker-processing-container'
tag = ':latest'
processing_repository_uri = '{}.dkr.ecr.{}.amazonaws.com/{}'.format(account_id, region, ecr_repository + tag)

!docker build -t $ecr_repository docker
!aws ecr get-login-password --region {region} | docker login --username AWS --password-stdin {account_id}.dkr.ecr.{region}.amazonaws.com
!aws ecr create-repository --repository-name $ecr_repository
!docker tag {ecr_repository + tag} $processing_repository_uri
!docker push $processing_repository_uri

[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                                         
[?25h[1A[0G[?25l[+] Building 0.2s (2/3)                                                         
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 196B                                       0.0s
[0m[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                                            0.0s
[0m => [internal] load metadata for docker.io/library/python:3.7-slim-buster  0.1s
[?25h[1A[1A[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (2/3)                                                         
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 196B                                       0.0s
[0m[34m => [internal] load .dockerignore                           

[?25h[1A[1A[1A[1A[1A[1A[0G[?25l[+] Building 2.4s (2/3)                                                         
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 196B                                       0.0s
[0m[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                                            0.0s
[0m => [internal] load metadata for docker.io/library/python:3.7-slim-buster  2.3s
[?25h[1A[1A[1A[1A[1A[1A[0G[?25l[+] Building 2.6s (3/3)                                                         
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 196B                                       0.0s
[0m[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                   

[?25h[1A[1A[1A[1A[1A[1A[1A[1A[1A[0G[?25l[+] Building 4.1s (4/5)                                                         
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 196B                                       0.0s
[0m[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                                            0.0s
[0m[34m => [internal] load metadata for docker.io/library/python:3.7-slim-buster  2.5s
[0m[34m => CACHED [1/2] FROM docker.io/library/python:3.7-slim-buster@sha256:8e6  0.0s
[0m[34m => => resolve docker.io/library/python:3.7-slim-buster@sha256:8e6150aea0  0.0s
[0m => [2/2] RUN pip3 install pandas                                          1.5s
[2m => => # Collecting pandas                                                     
[0m[?25h[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[0G[?25l[+] Building 4.2s (4

[0m[?25h[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[0G[?25l[+] Building 5.3s (4/5)                                                         
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 196B                                       0.0s
[0m[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                                            0.0s
[0m[34m => [internal] load metadata for docker.io/library/python:3.7-slim-buster  2.5s
[0m[34m => CACHED [1/2] FROM docker.io/library/python:3.7-slim-buster@sha256:8e6  0.0s
[0m[34m => => resolve docker.io/library/python:3.7-slim-buster@sha256:8e6150aea0  0.0s
[0m => [2/2] RUN pip3 install pandas                                          2.7s
[2m => => # Collecting pandas                                                     
[0m[2m => => #   Downloading pandas-1.3.5-cp37-cp37m-manylinux

[0m[?25h[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[0G[?25l[+] Building 6.3s (4/5)                                                         
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 196B                                       0.0s
[0m[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                                            0.0s
[0m[34m => [internal] load metadata for docker.io/library/python:3.7-slim-buster  2.5s
[0m[34m => CACHED [1/2] FROM docker.io/library/python:3.7-slim-buster@sha256:8e6  0.0s
[0m[34m => => resolve docker.io/library/python:3.7-slim-buster@sha256:8e6150aea0  0.0s
[0m => [2/2] RUN pip3 install pandas                                          3.7s
[2m => => # ta 0:00:00                                                            
[0m[2m => => # Collecting numpy>=1.19.2           

[0m[?25h[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[0G[?25l[+] Building 7.1s (4/5)                                                         
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 196B                                       0.0s
[0m[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                                            0.0s
[0m[34m => [internal] load metadata for docker.io/library/python:3.7-slim-buster  2.5s
[0m[34m => CACHED [1/2] FROM docker.io/library/python:3.7-slim-buster@sha256:8e6  0.0s
[0m[34m => => resolve docker.io/library/python:3.7-slim-buster@sha256:8e6150aea0  0.0s
[0m => [2/2] RUN pip3 install pandas                                          4.5s
[2m => => #      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 247.7/247.7 KB 21.5 MB/s e
[0m[2m => => # ta 0:00:00                         

[0m[?25h[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[0G[?25l[+] Building 8.0s (4/5)                                                         
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 196B                                       0.0s
[0m[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                                            0.0s
[0m[34m => [internal] load metadata for docker.io/library/python:3.7-slim-buster  2.5s
[0m[34m => CACHED [1/2] FROM docker.io/library/python:3.7-slim-buster@sha256:8e6  0.0s
[0m[34m => => resolve docker.io/library/python:3.7-slim-buster@sha256:8e6150aea0  0.0s
[0m => [2/2] RUN pip3 install pandas                                          5.4s
[2m => => #      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 247.7/247.7 KB 21.5 MB/s e
[0m[2m => => # ta 0:00:00                         

[0m[?25h[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[0G[?25l[+] Building 8.9s (4/5)                                                         
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 196B                                       0.0s
[0m[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                                            0.0s
[0m[34m => [internal] load metadata for docker.io/library/python:3.7-slim-buster  2.5s
[0m[34m => CACHED [1/2] FROM docker.io/library/python:3.7-slim-buster@sha256:8e6  0.0s
[0m[34m => => resolve docker.io/library/python:3.7-slim-buster@sha256:8e6150aea0  0.0s
[0m => [2/2] RUN pip3 install pandas                                          6.3s
[2m => => #      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 247.7/247.7 KB 21.5 MB/s e
[0m[2m => => # ta 0:00:00                         

[4A[0G[?25h[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[1A[0G[?25l[+] Building 9.8s (5/6)                                                         
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 196B                                       0.0s
[0m[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                                            0.0s
[0m[34m => [internal] load metadata for docker.io/library/python:3.7-slim-buster  2.5s
[0m[34m => CACHED [1/2] FROM docker.io/library/python:3.7-slim-buster@sha256:8e6  0.0s
[0m[34m => => resolve docker.io/library/python:3.7-slim-buster@sha256:8e6150aea0  0.0s
[0m[34m => [2/2] RUN pip3 install pandas                                          7.1s
[0m => exporting to image                                                     0.2s
 => => exporting layers                                    

# Move the script and required files to S3

Moving the script to S3

In [4]:
!aws s3 cp ipo_historic_data.csv s3://ipo-risk-model/files/ipo_historic_data.csv

upload: ./ipo_historic_data.csv to s3://ipo-risk-model/files/ipo_historic_data.csv


In [5]:
!aws s3 cp test.py s3://ipo-risk-model/files/test.py

Completed 56 Bytes/56 Bytes (63 Bytes/s) with 1 file(s) remainingupload: ./test.py to s3://ipo-risk-model/files/test.py           
