# Como usar Amazon SageMaker Studio Lab para el TP2 de Digital House

## Usar Git para sincronizar el repositorio de GitHub con Sagemaker Studio Lab

Lo primero que haremos es **clonar** el repositorio [mstokle/digitalhouse-group2](https://github.com/stoklemariano/digitalhouse-group2) desde GitHub usando la funcionalidad integrada de Git en Sagemaker Studio Lab. Git es un sistema de control de versiones que nos permite gestionar los cambios que vayamos a hacer en los archivos que componen un proyecto de software, incluyendo notebooks, archivos de configuración, imágenes, etc. Para ello, usa una base de datos especial, llamado *repositorio*. GitHub es un servicio que permite tener un repositorio compatible con Git en la nube, de manera que distintos colaboradores pueden sincronizar sus repositorios locales Git con un repositorio central en GitHub. Al clonar, descargaremos la última copia del repositorio a nuestro repositorio local.

En Sagemaker Studio Lab, nos paramos sobre la carpeta root (/) vamos al menu principal, hacemos click en el menú **Git**, y allí seleccionamos la opción **Clone Git Repository**. Allí se abrirá una ventana donde deberán ingresar la siguiente información:

1. Git Repository URL: **https://github.com/stoklemariano/digitalhouse-group2.git**
2. Project directory to clone into: **.**

![Clone repo](images/clone-git-repository.jpg)


Una vez terminado, se deberia haber creado una carpeta llamada **digitalhouse-group2** y dentro, dos sub carpetas: **tp1** y **tp2**. 
Al haber dejado seleccionada el *tickbox* "Search for environment.yml and build Conda environment, se ejecutará este archivo yaml que tiene las instrucciones para instalar los paquetes necesarios para correr los notebooks, incluyendo numpy, pandas, y scikitlearn. En caso de no haber seleccionado el tickbox, se puede construir el ambiente parandose sobre el archivo *environment.yml* que se encuantra en la carpeta raíz del repositorio, apretando el botón derecho y seleccionado la opción *Build conda environment*.
Cuando se abra la primera notebook, se podrá elegir el kernel **dhds2021-tp2-gp2**.

## Subiendo el Dataset a Sagemaker Studio Lab

Para subir el dataset a Sagemaker, debido a que por tamaño no se puede incluir facilmente en el repositorio de GitHub, crearemos una carpeta Data en la carpeta raíz del ambiente. Hacemos doble click sobre la carpeta y una vez allí, podemos hacer click sobre el botón *Upload files*. 

![Upload files](images/upload-files-option.jpg)

## Actualizar cambios locales en el repositorio de GitHub

Una vez comenzado a trabajar sobre los notebooks y otros archivos que se necesiten trabajar hay que realizar algunas tareas que les permitan hacer *push* de los cambios:

1. Dar permisos en el repositorio en GitHub
2. Generar un Personal Access Token en GitHub
3. Configurar git para usar el Personal Access Token

### Dar permisos en el repositorio en GitHub

En este paso simplemente el dueño del repositorio de GitHub da de alta a los usuarios para que puedan tener acceso al repo.

### Generar un Personal Access Token en GitHub

Cada usuario debe ir a la pagina de GitHub, hacer click en el icono de usuario en la esquina superior derecha, hacer click en **Settings**, buscar en la barra de opciones izquierda **Developer Settings** y allí seleccionar **Personal Access Tokens**. 

Allí hacer click sobre el botón **Generate new token** y completar *Note* con un nombre para recordar luego para que se generó dicho token, la expiración del mismo y los permisos (seleccionar todos los permisos de *repo*. Hacer click en el botón "Generate token" y guardar en un lugar seguro el token generado ya que no se puede consultar y por lo tanto si se pierde, habrá que generar uno nuevo.


### Configurar git para usar el Personal Access Token

Abrir el **Terminal** (se puede hacer desde el launcher que se invoca con el botón **+** o desde el menu **File**, elegir la opción **New** y elegir **Terminal**.

    cd ~/digitalhouse-group2

    git config --global user.name "<introdducir su usuario de GitHub aquí>"
    git config --global user.email "<introducir su email aquí>"
    git config -l

    git push
    Password: <inrroducir el Personal Access Token>
    
    git config --global credential.helper cache

Una vez hecho esto, ya no neceitaremos introducir el token cada vez que hagamos un push.


## Installing Python packages

The simplest way of installing Python packages is to use either of the following magic commands in a code cell of a notebook:

`%conda install <package>`

`%pip install <package>`

These magic commands will always install packages into the environment used by that notebook and any packages you install are saved in your persistent project directory. Note: we don't recommend using `!pip` or `!conda` as those can behave in unexpected ways when you have multiple environments.

Here is an example that shows how to install NumPy into the environment used by this notebook:

In [1]:
%conda install numpy

Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 4.10.3
  latest version: 4.11.0

Please update conda by running

    $ conda update -n base conda



# All requested packages already installed.


Note: you may need to restart the kernel to use updated packages.


Now you can use NumPy:

In [3]:
import numpy as np
np.random.rand(10)

array([0.57963059, 0.70327324, 0.42511913, 0.05731499, 0.53926905,
       0.67299244, 0.61963274, 0.44308432, 0.76433359, 0.17642703])

## SageMaker Studio Lab example notebooks

SageMaker Studio Lab works with familiar open-source data science and machine learning libraries, such as [NumPy](https://numpy.org/), [pandas](https://pandas.pydata.org/), [scikit-learn](https://scikit-learn.org/stable/), [PyTorch](https://pytorch.org/), and [TensorFlow](https://www.tensorflow.org/). 

To help you take the next steps, we have a GitHub repository with a set of example notebooks that cover a wide range of data science and machine learning topics, from importing and cleaning data to data visualization and training machine learning models.

<button class="jp-mod-styled" data-commandlinker-command="git:clone" data-commandlinker-args="{&quot;URL&quot;: &quot;https://github.com/aws/studio-lab-examples.git&quot;}">Clone SageMaker Studio Lab Example Notebooks</button>

## AWS Machine Learning University

[Machine Learning University (MLU)](https://aws.amazon.com/machine-learning/mlu/) provides anybody, anywhere, at any time access to the same machine learning courses used to train Amazon’s own developers on machine learning. Learn how to use ML with the learn-at-your-own-pace MLU Accelerator learning series.

<button class="jp-mod-styled" data-commandlinker-command="git:clone" data-commandlinker-args="{&quot;URL&quot;: &quot;https://github.com/aws-samples/aws-machine-learning-university-accelerated-tab.git&quot;}">Clone MLU Notebooks</button>

## Dive into Deep Learning (D2L)

[Dive into Deep Learning (D2L)](https://www.d2l.ai/) is an open-source, interactive book that teaches the ideas, the mathematical theory, and the code that powers deep learning. With over 150 Jupyter notebooks, D2L provides a comprehensive overview of deep learning principles and a state-of-the-art introduction to deep learning in computer vision and natural language processing. With tens of millions of online page views, D2L has been adopted for teaching by over 300 universities from 55 countries, including Stanford, MIT, Harvard, and Cambridge.
    
<button class="jp-mod-styled" data-commandlinker-command="git:clone" data-commandlinker-args="{&quot;URL&quot;: &quot;https://github.com/d2l-ai/d2l-pytorch-sagemaker-studio-lab.git&quot;}">Clone D2L Notebooks</button>

## Hugging Face

[Hugging Face](http://huggingface.co/) is the home of the [Transformers](https://huggingface.co/transformers/) library and state-of-the-art natural language processing, speech, and computer vision models.

<button class="jp-mod-styled" data-commandlinker-command="git:clone" data-commandlinker-args="{&quot;URL&quot;: &quot;https://github.com/huggingface/notebooks.git&quot;}">Clone Hugging Face Notebooks</button>

## Switching to a GPU runtime

Depending on the kinds of algorithms you are using, you may want to switch to a GPU or a CPU runtime for faster computation. First, save your work and then navigate back to your project overview page to select the instance type you want. You can navigate back to your project page by selecting the **Open Project Overview Page** in the **Amazon SageMaker Studio Lab** menu. Switching the runtime will stop all your kernels, but all of your notebooks, files, and datasets will be saved in your persistent project directory.

Note that a GPU runtime session is limited to 4 hours and a CPU runtime session is limited to 12 hours of continuous use.

## Managing packages and Conda environments

### Your default environment

SageMaker Studio Lab uses Conda environments to encapsulate the software (Python, R, etc.) packages needed to run notebooks. Your project contains a default Conda environment, named `default`, with the [IPython kernel](https://ipython.readthedocs.io/en/stable/) and that is about it. There are a couple of ways to install additional packages into this environment.

As described above, you can use the following magic commands in any notebook:

`%conda install <package>`

`%pip install <package>`

These magic commands will always install packages into the environment used by that notebook and any packages you install are saved in your persistent project directory. Note: we don't recommend using `!pip` or `!conda` as those can behave in unexpected ways when you have multiple environments.

Alternatively, you can open the Terminal and activate the environment using:

`$ conda activate default`

Once the environment is activated, you can install packages using the [Conda](https://docs.conda.io/en/latest/) or [pip](https://pip.pypa.io/en/stable/) command lines:

`$ conda install <package>`

`$ pip install <package>`

The conda installation for SageMaker Studio Lab uses a default channel of [conda-forge](https://conda-forge.org/), so you don't need to add the `-c conda-forge` argument when calling `conda install`.

### Creating and using new Conda environments

There are a couple of ways of creating new Conda environments.

**First**, you can open the Terminal and directly create a new environment using the Conda command line:

`$ conda env create --name my_environment python=3.9`

This example creates an new environment named `my_environment` with Python 3.9.

**Alternatively**, if you have a Conda environment file, can right click on the file in the JupyterLab file browser, and select the "Build Conda Environment" item:

![Create Environment](images/create_environment.png)

To activate any Conda environment in the Terminal, run:

`$ conda activate my_environment`

Once you do this, any pakcages installed using Conda or pip will be installed in that environment.

To use your new Conda environments with notebooks, make sure the `ipykernel` package is installed into that environment:

`$ conda install ipykernel`

Once installed `ipykernel`, you should see a card in the launcher for that environment and kernel after about a minute.

<div class="alert alert-info"> <b>Note:</b> It may take about one minute for the new environment to appear as a kernel option.</div>

## Installing JupyterLab and Jupyter Server extensions

SageMaker Studio Lab enables you to install open-source JupyterLab and Jupyter Server extensions. These extensions are typically Python packages that can be installed using `conda` or `pip`. To install these extensions, open the Terminal and activate the `studiolab` environment:

`$ conda activate studiolab`

Then you can install the relevant JupyterLab or Jupyter Server extension:

`$ conda install <jupyter_extension>`

You will need to refresh your page to pickup any JupyterLab extensions you have installed, or power cycle your project runtime to pickup any Jupyter server extensions. 

## Adding *Open in Studio Lab* links to your GitHub repositories

If you have public GitHub repositories with Jupyter Notebooks, you can make it easy for other users to open these notebooks in SageMaker Studio Lab by adding an *Open in Studio Lab* link to a README.md or notebook. This allows anyone to quickly preview the notebook and import it into their SageMaker Studio Lab project.

To add an *Open in Studio Lab* badge to your README.md file use the following markdown

```
[![Open In Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/org/repo/blob/master/path/to/notebook.ipynb)
```

and replace `org`, `repo`, the path and the notebook filename with those for your repo. Or in HTML:

```
<a href="https://studiolab.sagemaker.aws/import/github/org/repo/blob/master/path/to/notebook.ipynb">
  <img src="https://studiolab.sagemaker.aws/studiolab.svg" alt="Open In SageMaker Studio Lab"/>
</a>
```

This will creates a badge like:

[![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/d2l-ai/d2l-pytorch-sagemaker-studio-lab/blob/161e45f1055654c547ffe3c81bd5f06310e96cff/GettingStarted-D2L.ipynb)