# Colab Runbook: Tesla T4 GPU Benchmark & Recipe Pipeline Demo

This Colab runbook shows how to run GPU workloads on a free Tesla T4 instance (when available) and includes quick steps to run a simplified Recipe -> Shopping list pipeline demo (fetch, parse, normalize, mock search, optimize) so you can try the full prototype on Google Colab.

Follow the UI steps to request a GPU runtime: Runtime -> Change runtime type -> Hardware accelerator -> GPU. Colab assigns an available GPU; if you specifically need a T4 you should request GPU and verify it with `!nvidia-smi`.

Notebook outline:
1. Setup and environment info
2. Install/Upgrade packages (torch/tensorflow)
3. Import libs and set deterministic seeds
4. Mount Google Drive and configure WORKDIR
5. Verify GPU, CUDA and drivers
6. PyTorch GPU matmul benchmark
7. TensorFlow GPU matmul benchmark
8. Profile GPU workload and measure throughput (GFLOPS)
9. Save logs and artifacts to Drive
10. Automated sanity tests
11. Cleanup and recommended next steps

Notes:
- This notebook uses large matrix matmul operations (4096x4096) to stress the GPU. If you hit OOM, reduce the matrix size (e.g., 2048).
- Tesla T4 availability depends on the free Colab pool; the GPU assigned could be different.


In [None]:
# Cell 2: Print environment info and Python version
import sys, platform
print('Python:', sys.version)
print('Platform:', platform.platform())
import os
print('PWD:', os.getcwd())


## 2) Install / Upgrade Python packages (PyTorch / TensorFlow)

Run the cell below to upgrade pip and install PyTorch (CUDA 11.x wheel) and TensorFlow. The example installs a CUDA-compatible PyTorch wheel; Colab typically ships CUDA and compatible PyTorch builds, so this cell is optional. If you want to save time, skip reinstalling torch and tensorflow and instead import existing versions.

The cell will print imported versions after installation.

In [None]:
# Cell: install packages (optional heavy installs)
# You can comment/uncomment installs to save time
!python -m pip install --upgrade pip setuptools wheel
# Example CUDA wheel install (Colab often already has a compatible torch)
# Uncomment the next line to explicitly install a CUDA-enabled torch wheel (may take time)
# !pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
!pip install --upgrade tensorflow

import torch, tensorflow as tf
print('torch', getattr(torch, '__version__', 'not installed'))
print('tf', getattr(tf, '__version__', 'not installed'))


## 3) Import libraries and set deterministic seeds

This cell imports common libraries and sets deterministic seeds for reproducibility. For PyTorch, we seed both CPU and CUDA (if available). For TensorFlow we call tf.random.set_seed.


In [None]:
# Cell: imports and seeds
import os, time, random
import numpy as np

# Try imports, may not be installed if you skipped the install cell
try:
    import torch
except Exception:
    torch = None
try:
    import tensorflow as tf
except Exception:
    tf = None

random.seed(42)
np.random.seed(42)
if torch is not None:
    torch.manual_seed(42)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(42)
if tf is not None:
    try:
        tf.random.set_seed(42)
    except Exception:
        pass

print('torch:', getattr(torch, '__version__', None), 'cuda available:', torch is not None and torch.cuda.is_available())
print('tf:', getattr(tf, '__version__', None))


## 4) Mount Google Drive and prepare workspace

This cell mounts Google Drive so you can persist logs and model checkpoints. You can skip mounting if you don't need persistence.


In [None]:
# Cell: mount drive (optional)
try:
    from google.colab import drive
    drive.mount('/content/drive')
    WORKDIR = '/content/drive/MyDrive/colab_t4_runs'
    os.makedirs(WORKDIR, exist_ok=True)
except Exception as e:
    print('Google drive mount skipped or not available:', e)
    WORKDIR = '/content'

print('WORKDIR =', WORKDIR)


## 5) Verify GPU, CUDA and driver versions

Run `!nvidia-smi` to confirm the assigned GPU and driver. Use Python checks for torch and TensorFlow GPU visibility.


In [None]:
# Cell: GPU checks
# Driver-level info
print('nvidia-smi output:')
try:
    !nvidia-smi
except Exception as e:
    print('nvidia-smi not available:', e)

# torch and tf checks
try:
    import torch
    print('torch.cuda.is_available():', torch.cuda.is_available())
    if torch.cuda.is_available():
        try:
            print('Device name:', torch.cuda.get_device_name(0))
        except Exception:
            pass
except Exception:
    print('torch not installed')

try:
    import tensorflow as tf
    print('tf GPUs:', tf.config.list_physical_devices('GPU'))
except Exception:
    print('tensorflow not installed')

# Check nvcc if available
try:
    !nvcc --version
except Exception:
    print('nvcc not available')


## 6) PyTorch GPU workload and timing

This cell runs a matmul on the GPU using 4096x4096 matrices. If you run out of memory, reduce the matrix size to 2048 or 1024.


In [None]:
# Cell: PyTorch matmul benchmark
import time
try:
    import torch
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    n = 4096
    warmups = 2
    iters = 3
    print('device:', device)
    for _ in range(warmups):
        A = torch.randn((n, n), device=device)
        B = torch.randn((n, n), device=device)
        C = torch.matmul(A, B)
        if device.type == 'cuda':
            torch.cuda.synchronize()
    times = []
    for i in range(iters):
        A = torch.randn((n, n), device=device)
        B = torch.randn((n, n), device=device)
        if device.type == 'cuda':
            torch.cuda.synchronize()
        t0 = time.time()
        C = torch.matmul(A, B)
        if device.type == 'cuda':
            torch.cuda.synchronize()
        t = time.time() - t0
        times.append(t)
        print(f'iter {i+1} elapsed {t:.3f}s')
    avg = sum(times)/len(times)
    n_ops = 2 * (n**3)
    gflops = (n_ops / avg) / 1e9
    print('avg', avg, 'GFLOPS', gflops)
except Exception as e:
    print('PyTorch benchmark skipped or failed:', e)


## 7) TensorFlow GPU workload and timing

Run a similar matmul using TensorFlow with warm-up runs and timing. If TensorFlow raises OOM, lower the matrix size.

In [None]:
# Cell: TensorFlow matmul benchmark
import time
try:
    import tensorflow as tf
    device_name = '/GPU:0' if tf.config.list_physical_devices('GPU') else '/CPU:0'
    print('device for tf:', device_name)
    n = 4096
    warmups = 2
    iters = 3
    # warmup
    for _ in range(warmups):
        with tf.device(device_name):
            A = tf.random.normal((n, n))
            B = tf.random.normal((n, n))
            C = tf.matmul(A, B)
    # timed runs
    times = []
    for i in range(iters):
        with tf.device(device_name):
            A = tf.random.normal((n, n))
            B = tf.random.normal((n, n))
            t0 = time.time()
            C = tf.matmul(A, B)
            # eager: force sync by converting to numpy
            try:
                _ = C.numpy()
            except Exception:
                pass
            t = time.time() - t0
        times.append(t)
        print(f'iter {i+1} elapsed {t:.3f}s')
    avg = sum(times)/len(times)
    n_ops = 2 * (n**3)
    gflops = (n_ops / avg) / 1e9
    print('avg', avg, 'GFLOPS', gflops)
except Exception as e:
    print('TensorFlow benchmark skipped or failed:', e)


## 8) Profile and measure throughput (GFLOPS)

This cell calculates GFLOPS from the measured times and optionally logs per-iteration times. For deeper profiling, you can use `torch.utils.bottleneck` or TensorBoard profiler (not included by default here). The cell below reuses recorded times from the PyTorch/TensorFlow benchmark cells.


In [None]:
# Cell: simple profiling wrapper
# This cell assumes you saved timing results in variables `torch_times` and `tf_times` in the previous cells.
# We'll provide a fallback computation here if those variables are not present.

def compute_gflops(n, avg_time):
    n_ops = 2 * (n**3)
    return (n_ops / avg_time) / 1e9

print('Profiling summary placeholder: run the benchmark cells to get actual timings.')


## 9) Save logs, artifacts and checkpoints to Drive

This cell demonstrates saving a short log file and optionally a dummy checkpoint to the `WORKDIR` directory (Drive or /content).


In [None]:
# Cell: save logs and dummy checkpoint
log_text = 'Sample GPU run log\n'
log_text += f'run_time: {time.asctime()}\n'
fn = os.path.join(WORKDIR, 'gpu_log.txt')
with open(fn, 'w') as f:
    f.write(log_text)
print('wrote', fn)

# Save a small dummy checkpoint if torch exists
try:
    if torch is not None:
        dummy = {'state': [1,2,3]}
        torch.save(dummy, os.path.join(WORKDIR, 'dummy_checkpoint.pt'))
        print('saved dummy checkpoint')
except Exception as e:
    print('checkpoint save failed', e)


## 10) Automated sanity tests / unit checks

This cell runs a few assertions to ensure GPU availability and correctness of matmul result shapes. Adjust checks for your environment.


In [None]:
# Cell: automated sanity tests
ok = True
try:
    import torch
    if not torch.cuda.is_available():
        print('WARNING: torch CUDA not available')
        ok = False
    else:
        print('torch CUDA available, device:', torch.cuda.get_device_name(0))
except Exception:
    print('torch not available on this runtime')
    ok = False

# simple math test
try:
    a = torch.randn((16,16), device=('cuda' if torch.cuda.is_available() else 'cpu'))
    b = torch.randn((16,16), device=('cuda' if torch.cuda.is_available() else 'cpu'))
    c = torch.matmul(a,b)
    assert c.shape == (16,16)
    print('matmul shape OK')
except Exception as e:
    print('small matmul failed', e)
    ok = False

print('Sanity checks passed' if ok else 'Sanity checks failed')


## 11) Cleanup: unmount Drive and release GPU resources

This cell shows steps to release resources. Use `Runtime -> Restart runtime` in the Colab UI if necessary to fully free the GPU.


In [None]:
# Cell: Cleanup
try:
    if 'drive' in globals():
        try:
            drive.flush_and_unmount()
            print('Drive unmounted')
        except Exception as e:
            print('drive unmount failed or not available:', e)
except Exception as e:
    print('cleanup error', e)

# free GPU memory
try:
    import torch
    del globals()['A'] if 'A' in globals() else None
    del globals()['B'] if 'B' in globals() else None
    del globals()['C'] if 'C' in globals() else None
    torch.cuda.empty_cache()
    print('freed torch GPU cache')
except Exception:
    pass

print('Recommended: Runtime -> Restart runtime to free all resources')
