### **State University of Campinas - UNICAMP** </br>
**Course**: MC886A </br>
**Professor**: Marcelo da Silva Reis </br> 
**TA (PED)**: Marcos Vinicius Souza Freire

---

### **Hands-On: Introduction to Machine Learning and Tensors in PyTorch**
##### Notebook: 00 Setup
---

### **Table of Contents**

1. [**Basic Prerequisites**](#basic-prerequisites) </br>
2. [**Option 1: Using Docker (Linux-based Systems with GPU Support)**](#option-1-using-docker-linux-based-systems-with-gpu-support) </br>
3. [**Option 2: Creating a Python Virtual Environment (Local Setup)**](#option-2-creating-a-python-virtual-environment-local-setup) </br>
4. [**Option 3: Using Google Colab (Cloud-Based)**](#option-3-using-google-colab-cloud-based)

---

### **Setting Up a Machine Learning Environment**

This tutorial guides you through setting up an environment to run Machine Learning (ML) code using PyTorch. We’ll explore three methods: **Docker (with GPU support)**, **Python Virtual Environment (venv)**, and **Google Colab**. Choose the one that best fits your needs for this hands-on session.

---

#### **Basic Prerequisites**

Before diving in, ensure you have the following basics ready:

- **Python 3.8+**: Most ML tools work well with recent Python versions. Check your version with `python3 --version`.
- **Jupyter Notebook**: Ideal for interactive demos. Install it via `pip install jupyter` or use it within your chosen environment.
- **PyTorch**: We’ll install it as part of each setup (`pip install torch`), but you can pre-install it if testing locally.
- **Optional**: A code editor like VS Code for managing files and code.

---

#### **Option 1: Using Docker (Linux-based Systems with GPU Support)**

Docker provides a reproducible, containerized environment, perfect for ML with GPU acceleration.

##### **Prerequisites**
- **Docker**: Install it from [Docker’s official site](https://docs.docker.com/get-docker/). Verify with `docker --version`.
- **NVIDIA Drivers**: Required for GPU support. Version 530 or later is needed for CUDA 12.1.1 (check with `nvidia-smi`).
- **NVIDIA Container Toolkit**: Enables GPU access in Docker. Install it [here](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html).
- **VS Code (Optional)**: For an integrated Dev Container experience.

##### **Steps**

1. **Create a Project Folder**
   - Open a terminal and create a directory: `mkdir MC886_1S2025 && cd MC886_1S2025`.

2. **Set Up Dev Container Files**
   - Inside `MC886_1S2025`, create a `.devcontainer` folder: `mkdir .devcontainer && cd .devcontainer`.
   - Add two files: `devcontainer.json` and `Dockerfile`. Copy the contents below.

> *P.S.: Note that the file `Dockerfile` does not have an extension.*
---
 

#### `devcontainer.json`
```json
{
    "build": {
        "dockerfile": "Dockerfile"
    },
    "postStartCommand": ".devcontainer/post_start.sh", // Optional: Add a script later if needed
    "containerUser": "vscode",
    "runArgs": [
        "--gpus", "all",           // Enable all GPUs
        "--shm-size", "8g",        // Increase shared memory for multiprocessing
        "--ulimit", "memlock=-1",  // Remove memory lock limits
        "--ulimit", "stack=67108864" // Set stack size
    ],
    "customizations": {
        "vscode": {
            "extensions": [
                "ms-python.python",              // Python support
                "ms-python.black-formatter",     // Code formatting
                "ms-toolsai.jupyter",            // Jupyter integration
                "ms-toolsai.vscode-jupyter-powertoys", // Enhanced Jupyter tools
                "donjayamanne.git-extension-pack" // Git tools
            ]
        }
    }
}
```

---

#### **`Dockerfile`**
```dockerfile
# Base image: NVIDIA PyTorch 23.06 (Ubuntu 22.04, Python 3.10, CUDA 12.1.1, PyTorch 2.1.0a0+4136153)
FROM nvcr.io/nvidia/pytorch:23.06-py3

ARG USERNAME=vscode
ARG USER_UID=1000
ARG USER_GID=$USER_UID

# Install common development tools
RUN export DEBIAN_FRONTEND=noninteractive && \
    apt-get update && \
    # Remove vulnerable ImageMagick (CVE-2019-10131)
    apt-get purge -y imagemagick imagemagick-6-common && \
    apt-get install -y \
        build-essential bzip2 ca-certificates cmake curl ffmpeg git htop \
        libsm6 libssl-dev libxext6 nvtop pandoc python3-opencv python3-pip \
        python3-sphinx tmux unrar unzip vim wget && \
    apt-get clean && rm -rf /var/lib/apt/lists/*

# Upgrade pip and setuptools
RUN python3 -m pip install --upgrade pip>=23.0 setuptools>=75.8.0

# Create a non-root user (vscode) with sudo privileges
RUN groupadd --gid $USER_GID $USERNAME && \
    useradd --uid $USER_UID --gid $USER_GID -m $USERNAME && \
    apt-get update && apt-get install -y sudo && \
    echo "$USERNAME ALL=(root) NOPASSWD:ALL" > /etc/sudoers.d/$USERNAME && \
    chmod 0440 /etc/sudoers.d/$USERNAME

# Set user and shell
USER $USERNAME
RUN chsh --shell /bin/bash $USERNAME
```
> **Details**: This image includes PyTorch 2.1.0a0 (May 2023), CUDA 12.1.1, and Python 3.10. It’s purged of ImageMagick due to CVE-2019-10131 for security.

---

#### **`post_start.sh`**

```bash
#!/bin/bash

# Add /home/vscode/.local/bin directory to PATH
export PATH="/home/vscode/.local/bin':$PATH" >> ~/.bashrc
```

3. **Launch the Container in VS Code**
   - Open the `MC886_1S2025` folder in VS Code.
   - Press `Ctrl+Shift+P`, type `Dev Containers: Reopen in Container`, and select it.
   - Wait a few minutes for the container to build and start.

4. **Verify Setup**
   - Open a terminal in VS Code (`Ctrl+``) and run:
     ```bash
     python3 -c "import torch; print(torch.__version__); print(torch.cuda.is_available())"
     ```
   - Expected output: `2.1.0a0+4136153` and `True` (if GPU is detected).

---

#### **Option 2: Creating a Python Virtual Environment (Local Setup)**

A lightweight option for running ML code without containers.

##### **Prerequisites**
- **Python 3.8+**: Installed locally.
- **pip**: Ensure it’s updated (`pip install --upgrade pip`).

##### **Steps**

1. **Create the Virtual Environment**
   - Open a terminal and run (just a fancy name for your venv, like "mc886_venv"):
     ```bash
     python3 -m venv mc886_venv
     ```

2. **Activate the Environment**
   - On Linux/macOS:
     ```bash
     source mc886_venv/bin/activate
     ```
   - On Windows:
     ```bash
     mc886_venv\Scripts\activate
     ```
   - Your prompt should change (e.g., `(mc886_venv)`).

3. **Install PyTorch and Dependencies**
   - With the venv active, install PyTorch:
     ```bash
     pip install torch torchvision jupyter
     ```
   - For GPU support, visit [PyTorch’s site](https://pytorch.org/get-started/locally/) to get the correct command (e.g., `pip install torch --extra-index-url https://download.pytorch.org/whl/cu121` for CUDA 12.1).

4. **Start Jupyter Notebook**
   - Run:
     ```bash
     jupyter notebook
     ```
   - Open or create a notebook to test your code.
   - You can use the RUN button inside VS Code (make sure you install the Python and Jupyter extensions, as well as Ipykernel (`pip install ipykernel`))

5. **Verify Setup**
   - In a notebook cell, run:
     ```python
     import torch
     print(torch.__version__)
     print(torch.cuda.is_available())
     ```

---

#### **Option 3: Using Google Colab (Cloud-Based)**

Google Colab offers a free, cloud-based environment with GPU access. No local setup required.

##### **Prerequisites**
- A Google account.

##### **Steps**

1. **Access Colab**
   - Go to [https://colab.research.google.com/](https://colab.research.google.com/).
   - Sign in with your Google account.

2. **Create a New Notebook**
   - Click `File > New Notebook`.

3. **Enable GPU (Optional)**
   - Go to `Runtime > Change runtime type`, select `GPU`, and save.

4. **Connect to Google Drive (Optional)**
   - Add a code cell and run:
     ```python
     from google.colab import drive
     drive.mount('/content/drive')
     ```
   - Follow the prompts to authenticate and mount your Drive.

5. **Install PyTorch (Pre-installed, but Verify)**
   - PyTorch is included by default. Check the version:
     ```python
     import torch
     print(torch.__version__)
     print(torch.cuda.is_available())
     ```
   - If you need a specific version, install it (e.g., `!pip install torch==2.1.0`).