-
Notifications
You must be signed in to change notification settings - Fork 0
Pytorch1
cd "C:\Users\IAGhe\OneDrive\Documents\Learning\Python"
python -m venv pytorch-cpu
pytorch-cpu\Scripts\activate
python -m pip install --upgrade pip
python -m pip install ipykernel
python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
python -m ipykernel install --user --name pytorch-cpu --display-name "PyTorch (CPU)"
python -m pip install --upgrade notebookCommand: python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
python -m ipykernel install --user --name pytorch-cpu --display-name "PyTorch (CPU)"
cd "C:\Users\IAGhe\OneDrive\Documents\Learning\Python"
pytorch-cpu\Scripts\activate
python -m notebookStructure & pacing Duration: ~4–6 weeks part-time (12 modules).
Per module outcome: a short notebook with a tiny dataset, ≤5–10 epochs, and a takeaway plot/metric.
Compute tips (your machine): small batches (16–64), image size ≤128², num_workers=0 on Windows/Jupyter, save checkpoints rarely.
Module 0 — Setup & hygiene Goals: Reproducibility, project layout, Jupyter kernels. Topics: venv, requirements.txt; torch.version; seeds (torch.manual_seed), torch.backends.cudnn.deterministic (GPU note), deterministic dataloaders; experiment folders. Deliverable: “00_env_and_repro.ipynb” that prints versions, seeds, and measures a 1-epoch dummy loop.
Module 2 — The nn module & training loop Goals: Build/trace a minimal loop you’ll reuse. Topics: nn.Module, parameters, initialisation, state_dict; optimiser step order; evaluation loop; early stopping. Exercise: MLP on Fashion-MNIST (28×28), 5 epochs.
Module 3 — Data pipeline Goals: Ingest anything reliably. Topics: TensorDataset, custom Dataset, DataLoader, collate_fn, padding, class weights, Subset. Exercise: Create a synthetic shapes dataset (circles vs squares) with PIL/NumPy; train a tiny CNN.
Module 4 — Computer vision basics (CNNs) Goals: Build small but solid CNNs. Topics: Convolutions, pooling, padding, strides; BatchNorm/LayerNorm; data augmentation with torchvision.transforms. Exercise: CIFAR-10 subset (e.g., 10k samples, 64×64 resize), ≤10 epochs. Add label smoothing & cosine LR schedule.
Module 5 — Transformers at tiny scale (vision or text) Goals: See the architecture without big compute. Topics: Multi-head attention, positional encodings, masking. Exercise (pick one):
ViT-tiny on shapes dataset, patch size 8–16;
or mini Transformer encoder for AG-News subset using bag-of-subwords tokens.
Module 6 — NLP pipeline Goals: End-to-end text classification. Topics: tokenisation (WordPiece/BPE via 🤗 tokenizers or simple SpaCy), padding/masks, packed sequences, embeddings. Exercise: GRU vs small Transformer on IMDb subset (e.g., 5k reviews). Compare training curves.
Module 7 — Time series & forecasting Goals: Sequence models beyond text. Topics: Sliding-window Dataset, normalisation, teacher forcing vs free-running, horizon metrics (MAE, sMAPE). Exercise: Univariate series (e.g., airline passengers or synthetic AR process) with LSTM vs Temporal Convolution.
Module 8 — Generative models Goals: Learn representation & sampling ideas. Topics: VAE objective (ELBO, KL annealing), simple GAN training tricks (spectral norm, label noise). Exercise: VAE on Fashion-MNIST; optional DCGAN at 64×64 on a 5–10k-image toy set.
Module 9 — Graphs Goals: Understand message passing. Topics: Graph data structures, neighbourhood aggregation, readout. Exercise: Node classification on Cora using PyTorch Geometric; tiny 2-layer GCN.
Module 10 — Optimisation & regularisation Goals: Make training stable and principled. Topics: SGD vs AdamW, weight decay, learning-rate schedules (OneCycle/Cosine), gradient norm monitoring, initialisation (Kaiming/Xavier), dropout, early stopping, Stochastic Weight Averaging (SWA). Exercise: Revisit your CIFAR-subset CNN and show a 3–4 line table of ablations.
Module 11 — Evaluation, error analysis, and experiment management Goals: Trust results, not accuracy headlines. Topics: Proper splits, stratification, confidence intervals via bootstrapping, calibration; torchmetrics; confusion matrices; logging (TensorBoard/MLflow). Exercise: For one earlier task, write a clean report cell that prints metrics, a confusion matrix, and a calibration plot.
Module 12 — Inference, export & light deployment Goals: Make models usable and fast on CPU. Topics: model.eval() pitfalls, torch.no_grad(), batch vs single inference; dynamic quantisation for Linear/LSTM on CPU; ONNX export; simple FastAPI/Gradio demo. Exercise: Quantise a text classifier and compare latency on CPU; export to ONNX and run a single sample.
Recurring patterns to master Canonical training loop with: timer, tqdm, loss/metric logging, and checkpointing via state_dict.
Config pattern: small dataclasses/dicts for hyper-params; seed all randomness every run.
Small-data discipline: use subsets, assert shapes, write unit-tests for Dataset/collate_fn.
Windows/Jupyter gotcha: keep num_workers=0 unless you guard code with if name == "main":.
Minimal toolset (CPU-friendly) torch, torchvision, torchtext or a tokenizer lib; torchmetrics; matplotlib; tqdm; (optional) pytorch-lightning for higher-level loops; transformers for small pretrained encoders (CPU inference is fine on toy sets); torch_geometric for graphs (use small datasets).
Suggested notebook sequence (file names) 00_env_and_repro.ipynb
01_tensors_autograd.ipynb
02_training_loop_template.ipynb
03_dataset_dataloader.ipynb
04_cnn_fashionmnist.ipynb
05_cifar_subset_aug_sched.ipynb
06_nlp_gru_vs_transformer.ipynb
07_timeseries_lstm_tcn.ipynb
08_vae_fashionmnist.ipynb
09_gnn_cora_gcn.ipynb
10_eval_error_analysis.ipynb
11_inference_quantisation_onnx.ipynb
If you tell me which track you want to start with (vision, NLP, or time-series), I’ll spin up the first two notebooks with tiny datasets and a ready-to-run training loop tailored for CPU.