# ISDN3150-Week2-Section1: Google Colab Python Basics Tutorial

***>>> What you will learn in this workshop:***
- Understand the structure of Colab notebooks (Markdown and Code cells)
- Run code and execute cells effectively
- Import and install Python libraries
- Upload, save, and download files
- Utilize GPU acceleration for computation

## Notebook Structure

This tutorial is organized into sections, each containing:
- **üìù Lecture Notes**: Teaching explanations for each concept
- **üìö Example Code**: Demonstrations showing how to use features
- **‚úèÔ∏è Practice Blocks**: Hands-on exercises for you to complete
- **‚úÖ Answer Keys**: Reference solutions (try the exercises first!)


## Part1. Understanding Cell Types

### üìù Lecture Notes

**What are cells?**
- Cells are the building blocks of a Jupyter notebook
- Each cell can contain either formatted text (Markdown) or executable code (Python)

**Two Main Cell Types:**

1. **Markdown Cells** (like this one)
   - Used for documentation, explanations, and formatted text
   - Support rich formatting: headers, lists, links, code blocks, etc.
   - Double-click to edit, run to render formatted output

2. **Code Cells**
   - Contain Python code that can be executed
   - Code runs sequentially from top to bottom
   - Can also be executed individually
   - Execution order is shown by numbers on the left side
   - Variables persist in memory until the runtime is reset

**How to Run Cells:**
- Click the play button (‚ñ∂Ô∏è) on the left
- Press `Shift + Enter` to run and move to next cell
- Press `Ctrl + Enter` to run without moving to next cell

**Important**: All executed results are stored in memory until you restart the runtime. Make sure to save important outputs!


In [1]:
# üìö Example: Running Your First Code Cell
# This is a code cell containing Python code
# Click the play button or press Shift+Enter to execute

print("Hello, World!")
print("Welcome to Google Colab!")

Hello, World!
Welcome to Google Colab!


In [2]:
# code cells can be run multiple times, and previous results are preserved until you reload your notebook. Make sure to save crucial results if you want to keep them!
count = 1

In [3]:
# Try running this cell multiple times to see this effect.
print(f'The current count is {count}.')
count = count + 1

The current count is 1.


## Part2. Importing and Installing Libraries

### üìù Lecture Notes

**Pre-installed Libraries in Colab:**
Google Colab comes with many popular Python libraries pre-installed, including:
- **NumPy**: Numerical computing and array operations
- **Pandas**: Data manipulation and analysis
- **Matplotlib**: Data visualization
- **PyTorch/TensorFlow**: Deep learning frameworks
- And many more...

**Installing New Libraries:**
- Use `%pip install` or `!pip install` to install packages
- The `%` or `!` prefix allows executing terminal commands in notebook cells
- Installed libraries are only available for the current session (they disappear after restart)
- **Best Practice**: Place installation commands at the top of your notebook

**Importing Libraries:**
- Use `import library_name` to import a library
- Use `import library_name as alias` to create a shorter name
- Common aliases: `numpy as np`, `pandas as pd`, `matplotlib.pyplot as plt`

In [4]:
# üìö Example: Importing Pre-installed Libraries
# Import commonly used libraries with their standard aliases

import numpy as np        # Numerical computing library, commonly aliased as np
import pandas as pd       # Data manipulation library, commonly aliased as pd
import matplotlib.pyplot as plt  # Plotting library, commonly aliased as plt

# Verify libraries are imported successfully
print("NumPy version:", np.__version__)
print("Pandas version:", pd.__version__)
print("Libraries imported successfully!")

NumPy version: 2.0.2
Pandas version: 2.2.2
Libraries imported successfully!


In [5]:
# üìö Example: Installing a New Library
# Use %pip install (recommended) or !pip install to install packages
# Note: If the library is already installed, you'll see "Requirement already satisfied"

# Example: Install seaborn (a data visualization library)
%pip install seaborn

# After installation, you can immediately import and use it
import seaborn as sns
print("seaborn installed and imported successfully!")

seaborn installed and imported successfully!


## Part3. File System: Uploading, Saving, and Downloading Files

### üìù Lecture Notes

**File Upload Methods in Colab:**
1. **File Browser**: Use the left sidebar file browser (üìÅ icon) to upload files
2. **Code Upload**: Use `files.upload()` from `google.colab` to upload programmatically

**File Paths in Colab:**
- **Working Directory**: We'll use a relative path `./data` for our working directory
- **Best Practice**: Use relative paths instead of absolute paths for better portability
- **Auto-create**: We'll automatically create the directory if it doesn't exist

**Saving Files:**
- Files created in the working directory persist during the current session
- **Important**: Download important files or save to Google Drive, as files are lost when the runtime disconnects

**Downloading Files:**
- Use `files.download()` to download files to your local computer
- Or use the file browser to download files

**Note**: Colab runtime is temporary - files disappear after restart unless saved to Drive!

In [6]:
# üìö Example: Setting Up Working Directory
# Set up a working directory using relative paths and create it if it doesn't exist

import os

# Define working directory (using relative path)
WORK_DIR = './data'

# Create directory if it doesn't exist
os.makedirs(WORK_DIR, exist_ok=True)

# Change to working directory (optional, but good practice)
os.chdir(WORK_DIR)

print(f"Working directory set to: {os.path.abspath(WORK_DIR)}")
print(f"Current working directory: {os.getcwd()}")

# List all files in working directory
print(f"\nFiles in {WORK_DIR} directory:")
if os.path.exists(WORK_DIR):
    items = os.listdir(WORK_DIR)
    if items:
        for item in items:
            item_path = os.path.join(WORK_DIR, item)
            if os.path.isfile(item_path):
                print(f"  File: {item}")
            elif os.path.isdir(item_path):
                print(f"  Directory: {item}/")
    else:
        print("  (directory is empty)")
else:
    print(f"  {WORK_DIR} directory does not exist")

Working directory set to: /content/data/data
Current working directory: /content/data

Files in ./data directory:
  File: matrix_data.csv
  File: benchmark_results.csv
  File: nums.csv
  Directory: data/


In [None]:
# üìö Example: Creating and Saving Files
# You can create various types of files in Colab and save them

# Make sure working directory exists
WORK_DIR = './data'
os.makedirs(WORK_DIR, exist_ok=True)

# Create a DataFrame and save it as a CSV file
df = pd.DataFrame(np.random.randint(0, 100, size=(15, 4)), columns=list('ABCD'))
file_path = os.path.join(WORK_DIR, 'nums.csv')
df.to_csv(file_path, index=False)  # index=False means don't save row indices
print(f"DataFrame saved as {file_path}")

# Display first few rows of the file
print("\nFirst 5 rows of CSV file:")
print(df.head())

# List files in working directory
print(f"\nFiles in {WORK_DIR} directory:")
!ls -lh {WORK_DIR}

DataFrame saved as ./data/nums.csv

First 5 rows of CSV file:
    A   B   C   D
0  40  82   0  68
1  98  50  84  23
2  73  15  98  31
3  12  30  14  62
4  63  80  56  10

Files in ./data directory:
total 41M
-rw-r--r-- 1 root root  101 Feb  9 12:11 benchmark_results.csv
drwxr-xr-x 2 root root 4.0K Feb  9 12:11 data
-rw-r--r-- 1 root root  41M Feb  9 12:11 matrix_data.csv
-rw-r--r-- 1 root root  182 Feb  9 12:13 nums.csv


In [8]:
# üìö Example: Reading Saved Files
# Verify the file was saved correctly and read it back

# Read the CSV file we just saved
WORK_DIR = './data'
file_path = os.path.join(WORK_DIR, 'nums.csv')
df_loaded = pd.read_csv(file_path)
print("File read successfully!")
print(f"Data shape: {df_loaded.shape}")
print("\nFirst 5 rows of data:")
print(df_loaded.head())

File read successfully!
Data shape: (15, 4)

First 5 rows of data:
    A   B   C   D
0  40  82   0  68
1  98  50  84  23
2  73  15  98  31
3  12  30  14  62
4  63  80  56  10


## Part4. GPU Settings and Usage

### üìù Lecture Notes

**What is a GPU?**
- **GPU (Graphics Processing Unit)**: Specialized hardware for parallel computing
- **Advantages**: Much faster than CPU for large-scale matrix operations and deep learning tasks
- **Use Cases**: Machine learning, deep learning, large-scale numerical computations

**GPU Usage in Colab:**
- **Free Access**: Colab provides free GPU access (with limitations)
- **Quota Limits**: Free users have usage time limits, but usually sufficient for learning and experimentation
- **How to Enable**: Runtime > Change runtime type > Hardware accelerator > Select GPU

**Important Notes:**
- GPU is only used for computations with deep learning frameworks (PyTorch, TensorFlow)
- Regular Python code still runs on CPU
- Remember to switch back to "None" when not using GPU to save quota

In [9]:
# üìö Example: Matrix Multiplication on CPU
# This example demonstrates large-scale matrix multiplication on CPU

import time
import torch

print("Creating large matrices...")
# Create two random matrices
a = torch.rand(10000, 5000)  # 10000 x 5000 matrix
b = torch.rand(5000, 10000)  # 5000 x 10000 matrix

print("Starting matrix multiplication (CPU)...")
start_time = time.time()
c = a @ b  # Matrix multiplication
end_time = time.time()

cpu_time = end_time - start_time
print(f'CPU computation time: {cpu_time:.2f} seconds')
print(f'Result matrix shape: {c.shape}')


Creating large matrices...
Starting matrix multiplication (CPU)...
CPU computation time: 9.78 seconds
Result matrix shape: torch.Size([10000, 10000])


In [10]:
# üìö Example: Matrix Multiplication on GPU
# Note: To run this example, you need to enable GPU first
# Steps: Runtime > Change runtime type > Hardware accelerator > GPU

import time
import torch

# Check if GPU is available
if torch.cuda.is_available():
    print(f"‚úÖ GPU available! Device: {torch.cuda.get_device_name(0)}")
    device = torch.device('cuda')
else:
    print("‚ö†Ô∏è GPU not available, using CPU")
    print("Hint: Go to Runtime > Change runtime type > Hardware accelerator > Select GPU")
    device = torch.device('cpu')

# Create matrices and move them to the appropriate device
print("\nCreating large matrices...")
a = torch.rand(10000, 5000, device=device)
b = torch.rand(5000, 10000, device=device)

print(f"Starting matrix multiplication ({device})...")
start_time = time.time()
c = a @ b
# If using GPU, synchronize to wait for computation to complete
if device.type == 'cuda':
    torch.cuda.synchronize()
end_time = time.time()

gpu_time = end_time - start_time
print(f'{device} computation time: {gpu_time:.2f} seconds')
print(f'Result matrix shape: {c.shape}')

# If you ran the CPU version earlier, you can compare speeds
# print(f'\nSpeedup: {cpu_time/gpu_time:.2f}x')

‚úÖ GPU available! Device: Tesla T4

Creating large matrices...
Starting matrix multiplication (cuda)...
cuda computation time: 0.33 seconds
Result matrix shape: torch.Size([10000, 10000])


### üìù Lecture Notes: GPU Usage Best Practices

**Important Notes:**
1. **Data must be on GPU**: When using PyTorch, you need to move tensors to GPU (using `.to('cuda')` or `.cuda()`)
2. **Only accelerates specific operations**: GPU mainly accelerates matrix operations and neural network computations; regular Python code still runs on CPU
3. **Memory management**: GPU memory is limited, be careful not to create overly large tensors
4. **Quota management**: Free users have usage time limits; remember to disable GPU when not needed

**Best Practices:**
- Only enable GPU when needed
- Switch back to CPU after GPU tasks to save quota
- Use `torch.cuda.empty_cache()` to clear GPU cache

## Part5. Comprehensive Practice Exercise

### ‚úèÔ∏è Comprehensive Practice

**Instructions**: Combine what you've learned about **file I/O** and **GPU acceleration** to complete the following pipeline:

1. Use `torch.rand()` to create a random matrix of shape **(2000, 2000)**
2. Convert the matrix to a **Pandas DataFrame** and **save it as a CSV** file (e.g., `matrix_data.csv`)
3. **Read the CSV** file back into a DataFrame, then convert it back to a **PyTorch tensor**
4. Perform **matrix multiplication** (`matrix @ matrix.T`) on **CPU** and record the computation time
5. Move the tensor to **GPU**, perform the **same matrix multiplication**, and record the computation time
6. Print the **speedup ratio** (CPU time / GPU time) and **save the timing results** as a CSV file (e.g., `benchmark_results.csv`)

**Hints:**
- Use `pd.DataFrame(tensor.numpy())` to convert a tensor to a DataFrame
- Use `torch.tensor(df.values)` to convert a DataFrame back to a tensor
- Remember to use `torch.cuda.synchronize()` before stopping the GPU timer
- Use `time.time()` to measure computation time

In [None]:
# Try your code here!