Jupyter Notebook Learning Project

A comprehensive beginner-friendly repository for learning notebook programming with Jupyter, VS Code, and WSL2 (Ubuntu 24.04). This project emphasizes clean, repeatable workflows and best practices from day one.

🎯 Project Goals

Learn Fundamentals: Master notebook workflows for data exploration, visualization, and basic machine learning
Build Clean Structure: Develop organized, modular project architecture avoiding messy outputs
Practice Best Habits: Version control, environment management, and code refactoring from day one
Mentor-Guided Learning: Use AI assistance for clear explanations and best practice guidance

📁 Current Project Structure

This project is set up with a complete development environment:

nb/
├── README.md           # This file - comprehensive project documentation
├── CODEX_INIT.md      # AI mentor instructions and learning context
├── Makefile           # Automation for common tasks (env, jupyter, cleaning)
├── env.yaml           # Micromamba environment specification
├── .gitignore         # Comprehensive ignore patterns for clean Git history
├── notebooks/         # Jupyter notebooks for tutorials
├── src/              #  Reusable Python modules and functions
├── data/             #  Datasets (ignored by Git, use .gitkeep)
└── outputs/          #  Generated plots, reports, results

🚀 Quick Start Guide

Initial Setup (One-time)

Just run these commands:

Initialize project folders:
```
make init
```
Create the environment:
```
make env-create
```
Install Jupyter kernel:
```
make kernel
```
Verify everything works:
```
make check
```

Daily Workflow

Start your learning session:

make lab           # Launch JupyterLab
# or
make notebook      # Launch classic Jupyter Notebook

Before committing work:

make clear-outputs # Clear notebook outputs
git add .
git commit -m "Add: tutorial on data visualization"

Environment Management with Makefile

Your Makefile provides these automated tasks:

Command	Purpose
`make`/`make help`	Show all available commands
`make init`	Create project folders and basic .gitignore
`make env-create`	Create micromamba environment from env.yaml
`make env-update`	Update environment when you add packages
`make env-remove`	Remove the environment completely
`make env-export`	Export environment spec for reproducibility
`make kernel`	Install/refresh Jupyter kernel
`make lab`	Launch JupyterLab
`make notebook`	Launch classic Jupyter
`make clear-outputs`	Clear all notebook outputs
`make clean`	Remove caches and temporary files
`make freeze-pip`	Export pip requirements
`make check`	Verify tools and environment

📚 Learning Workflow

Adding New Packages

Your env.yaml is designed for easy expansion:

# Current setup in env.yaml
name: nbenv
channels:
  - conda-forge
dependencies:
  - python
  - ipykernel
  - jupyter
  - jupyterlab
  # Add new packages here:
  # - pandas
  # - matplotlib
  # - seaborn
  # - scikit-learn
  #- pip:
  #  - some-pip-package

Workflow for adding packages:

Edit env.yaml to add new dependencies
Run make env-update to install them
If needed, run make kernel to refresh the Jupyter kernel

Your Git Workflow

Your .gitignore is configured to keep your repository clean:

What's tracked ✅:

Source code (.py files in src/)
Notebooks (but outputs are cleared before commit)
Documentation and configuration files
Environment specifications (env.yaml)

What's ignored ❌:

Data files (data/ directory)
Generated outputs (outputs/ directory)
Python caches (__pycache__/, *.pyc)
Jupyter checkpoints (.ipynb_checkpoints/)
Environment files (.env, secrets)
Editor-specific files (.vscode/, .idea/)

Best practices:

# Before committing notebooks
make clear-outputs

# Commit workflow
git status
git add src/ notebooks/ README.md  # Be selective
git commit -m "Add: data loading utilities"

# Export environment state for reproducibility
make env-export  # Creates mamba-linux-64.lock

📚 Learning Progression & Project Management

Phase 1: Foundation Setup ✅ COMPLETED

Project structure established
Environment configuration (env.yaml)
Automation tools (Makefile)
Git configuration (.gitignore)
Documentation framework

Phase 2: Basic Notebook Skills 🎯 NEXT STEPS

Create first notebook: notebooks/01_getting_started.ipynb
- Practice markdown cells and code cells
- Learn about kernel management
- Understand cell execution order
Environment exploration: notebooks/02_environment_setup.ipynb
- Test package imports
- Verify micromamba environment
- Practice using Makefile commands
Data basics: notebooks/03_data_fundamentals.ipynb
- Load sample datasets
- Basic pandas operations
- Simple visualizations with matplotlib

Phase 3: Data Science Workflow 🔮 PLANNED

Add data science packages (pandas, matplotlib, seaborn, numpy)
Create notebooks/04_data_exploration.ipynb
Build first src/ module for reusable functions
Practice notebook → module refactoring
Learn about data versioning (DVC introduction)

Phase 4: Advanced Topics 📈 FUTURE

Machine learning basics (scikit-learn)
Interactive visualizations (plotly, altair)
Notebook testing and quality assurance
Documentation generation from notebooks

Quick Commands Reference

Daily development:

make lab                    # Start JupyterLab
make clear-outputs         # Clean notebooks before Git
make clean                # Remove caches

Environment management:

make env-update           # After editing env.yaml
make kernel              # Refresh Jupyter kernel
make check              # Verify everything works

Project maintenance:

make env-export         # Backup environment state
git status             # Check what's changed
git add notebooks/ src/ # Stage specific changes

🛠️ Your Project Tools & Files

Key Files Explained

env.yaml - Your Environment Blueprint

name: nbenv                    # Environment name (auto-detected by Makefile)
channels: [conda-forge]       # Package source (fast, up-to-date packages)
dependencies:                 # What's installed
  - python                    # Latest Python
  - ipykernel                # Jupyter kernel support  
  - jupyter                  # Classic notebook interface
  - jupyterlab              # Modern notebook interface

Makefile - Your Automation Hub

Smart environment detection: Reads env name from env.yaml
Micromamba integration: Uses fast package manager
Safe shell operations: Configured with error handling
Customizable paths: Override default directories (notebooks/, src/, data/)
Comprehensive help: Run make help anytime

.gitignore - Your Repository Guardian

Python-aware: Ignores __pycache__/, *.pyc, virtual envs
Jupyter-friendly: Excludes .ipynb_checkpoints/
Data-safe: Keeps large datasets out of Git
Editor-agnostic: Works with VS Code, PyCharm, vim, etc.
Security-conscious: Prevents committing secrets and env files

Workflow Integration

VS Code + Jupyter Setup:

Open project in VS Code
Install Python and Jupyter extensions
Select kernel: Ctrl+Shift+P → "Python: Select Interpreter" → choose nbenv
Create .ipynb files in notebooks/ folder
Use make clear-outputs before Git commits

Command Line Workflow:

# Morning routine
make check                 # Verify environment health
make lab                  # Start JupyterLab

# Development cycle  
# ... work in notebooks ...
make clear-outputs        # Clean outputs
git add notebooks/01_*.ipynb
git commit -m "Add: basic data loading tutorial"

# Environment updates
# ... edit env.yaml to add packages ...
make env-update          # Install new packages
make kernel             # Refresh Jupyter kernel

⚠️ Common Pitfalls to Avoid

Hidden State in Notebooks: Always restart kernel and run all cells to verify reproducibility
Large Datasets in Git: Add data files to .gitignore, use data versioning tools like DVC for large datasets
Environment Mismatch: Always document exact package versions in env.yaml
Messy Notebooks: Regularly clean up, refactor reusable code to src/
No Backups: Commit frequently, especially before major experiments

📖 Learning Resources & Best Practices

Essential Tutorials (Recommended Order)

Real Python: Jupyter Notebook Introduction - Start here
Jupyter Notebook Beginner Guide - Official docs
Pandas User Guide - Data manipulation
Matplotlib Tutorials - Plotting basics

Notebook Best Practices

Structure: Clear markdown headers, single-concept cells, import everything upfront
Naming: Use numbered prefixes (01_, 02_) for tutorial sequence
Documentation: Explain your thinking in markdown cells
Reproducibility: Clear outputs before committing, restart kernel frequently
Modularity: Move reusable code to src/ modules when it appears in 2+ notebooks

Code Organization Philosophy

Start in notebooks → Refactor to modules → Import back to notebooks

# notebooks/01_data_exploration.ipynb
import sys
sys.path.append('../src')
from data_utils import load_dataset, clean_data
from viz_utils import create_scatter_plot

# Now your notebook focuses on analysis, not utility code
df = load_dataset('data/sample.csv')
clean_df = clean_data(df)
create_scatter_plot(clean_df, 'x', 'y')

Advanced Learning Path

Data Versioning: DVC for large datasets
Notebook Testing: nbval, pytest integration
Documentation: Sphinx, jupyter-book
Deployment: Voilà for interactive dashboards
Collaboration: JupyterHub, Git workflows with notebooks

⚠️ Common Pitfalls & Solutions

Problem	Why It Happens	Solution
"Kernel not found"	Jupyter can't see your environment	Run `make kernel` to install kernel
Import errors	Package not in environment	Add to `env.yaml`, run `make env-update`
Hidden state	Cells run out of order	Restart kernel, run all cells from top
Git conflicts	Notebook outputs cause merge issues	Use `make clear-outputs` before commits
Large repo size	Data files tracked by Git	Check `.gitignore` covers `data/` directory
Environment drift	Packages installed but not documented	Use `make freeze-pip` or update `env.yaml`
Permission errors	Makefile shell issues	Check bash path with `which bash`

🎯 Project Milestones & Goals

Immediate Next Steps (This Week)

Run make init to create project directories
Create first notebook: Start with notebooks/01_hello_jupyter.ipynb
Test environment: Import basic packages, create simple plots
Practice Git workflow: Make first commit with cleared outputs

Short-term Goals (Next Month)

Complete 5 tutorial notebooks covering data basics
Create first reusable module in src/
Add pandas, matplotlib, seaborn to environment
Practice notebook → script → module workflow

Medium-term Goals (Next Quarter)

Build a complete data analysis project
Learn about data versioning and larger datasets
Explore machine learning basics with scikit-learn
Set up automated testing for your code

🤝 AI Mentor Integration

This project works with AI assistance (see CODEX_INIT.md). When asking for help:

Request clear explanations: "Explain step-by-step how to..." Ask for best practices: "What's the best way to organize..."
Get structure guidance: "Should this code go in the notebook or src/?" Learn from warnings: "What could go wrong if I..." Seek pro tips: "What advanced techniques should I know about..."

🎉 Quick Start Checklist

Ready to begin? Follow this checklist:

Environment: Run make env-create and make kernel
Folders: Run make init to create project structure
Test: Run make check to verify everything works
Launch: Run make lab to start JupyterLab
First notebook: Create notebooks/01_getting_started.ipynb
Git setup: Run make clear-outputs, then your first commit

Happy Learning! 🚀

Remember: This setup emphasizes learning by doing with clean, reproducible workflows. Focus on understanding concepts while building good habits from day one. The automation tools are here to help you focus on learning, not fighting with environment setup.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Makefile		Makefile
README.md		README.md
env.yaml		env.yaml

elecdot/jupyter-tutorials

Folders and files

Latest commit

History

Repository files navigation