SageMaker JupyterLab with FargateCluster

This is how I started JupyterLab using SageMaker, then launched a FargateCluster using dask-cloudprovider, following Jacob Tomlinson's excellent blog post.

First I created a SageMaker notebook instance using the AWS Console. The default is ml.t2.tiny with 5GB EBS disk, but that wasn't enough memory or disk for me to create a custom conda environment with xarray, hvplot etc. So I chose ml.t3.large with 40GB storage.

Under SageMaker=>Notebook=>Git Repositories, I added this sagemaker-fargate-test repo so I would have my sample notebooks when I start my SageMaker JupyterLab.

I then fired up the SageMaker instance JupyterLab, opened a terminal and typed:

conda activate base
conda update conda -y
conda env create -f ~/SageMaker/sagemaker-fargate-test/pangeo_env.yml

to update conda and create my custom pangeo environment.

I then did aws configure and added my amazon keys. This creates the ~/.aws directory with credentials, which I copied to the persisted ~/SageMaker directory. This was a hacky way of giving my SageMaker Notebook instance the credentials to create the FargateCluster.

I then created a SageMaker "Lifecycle configuration" script, which runs when the SageMaker notebook instance starts. This script just copies the .condarc and the .aws credentials directory from persisted space to the $HOME directory. This is the lifecycle_start_notebook.sh script in this repo.

The last remaining step was to create a dask worker container for FargateCluster to run. To create this container, I just added some packages to the daskdev/dask container Dockerfile.

The sample Hurricane Ike Notebook then ran successfully. Here's a snapshot of the Dask dashboard:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SageMaker JupyterLab with FargateCluster

Files

README.md

Latest commit

History

README.md

File metadata and controls

SageMaker JupyterLab with FargateCluster