Skip to content

Latest commit

 

History

History
28 lines (17 loc) · 2.29 KB

README.md

File metadata and controls

28 lines (17 loc) · 2.29 KB

SageMaker JupyterLab with FargateCluster

This is how I started JupyterLab using SageMaker, then launched a FargateCluster using dask-cloudprovider, following Jacob Tomlinson's excellent blog post.

2020-01-23_15-24-00

First I created a SageMaker notebook instance using the AWS Console. The default is ml.t2.tiny with 5GB EBS disk, but that wasn't enough memory or disk for me to create a custom conda environment with xarray, hvplot etc. So I chose ml.t3.large with 40GB storage.

Under SageMaker=>Notebook=>Git Repositories, I added this sagemaker-fargate-test repo so I would have my sample notebooks when I start my SageMaker JupyterLab.

I then fired up the SageMaker instance JupyterLab, opened a terminal and typed:

conda activate base
conda update conda -y
conda env create -f ~/SageMaker/sagemaker-fargate-test/pangeo_env.yml

to update conda and create my custom pangeo environment.

I then did aws configure and added my amazon keys. This creates the ~/.aws directory with credentials, which I copied to the persisted ~/SageMaker directory. This was a hacky way of giving my SageMaker Notebook instance the credentials to create the FargateCluster.

I then created a SageMaker "Lifecycle configuration" script, which runs when the SageMaker notebook instance starts. This script just copies the .condarc and the .aws credentials directory from persisted space to the $HOME directory. This is the lifecycle_start_notebook.sh script in this repo.

The last remaining step was to create a dask worker container for FargateCluster to run. To create this container, I just added some packages to the daskdev/dask container Dockerfile.

The sample Hurricane Ike Notebook then ran successfully. Here's a snapshot of the Dask dashboard:

2020-01-23_14-48-49