A comprehensive beginner-friendly repository for learning notebook programming with Jupyter, VS Code, and WSL2 (Ubuntu 24.04). This project emphasizes clean, repeatable workflows and best practices from day one.
- Learn Fundamentals: Master notebook workflows for data exploration, visualization, and basic machine learning
- Build Clean Structure: Develop organized, modular project architecture avoiding messy outputs
- Practice Best Habits: Version control, environment management, and code refactoring from day one
- Mentor-Guided Learning: Use AI assistance for clear explanations and best practice guidance
This project is set up with a complete development environment:
nb/
โโโ README.md # This file - comprehensive project documentation
โโโ CODEX_INIT.md # AI mentor instructions and learning context
โโโ Makefile # Automation for common tasks (env, jupyter, cleaning)
โโโ env.yaml # Micromamba environment specification
โโโ .gitignore # Comprehensive ignore patterns for clean Git history
โโโ notebooks/ # Jupyter notebooks for tutorials
โโโ src/ # Reusable Python modules and functions
โโโ data/ # Datasets (ignored by Git, use .gitkeep)
โโโ outputs/ # Generated plots, reports, results
Just run these commands:
-
Initialize project folders:
make init
-
Create the environment:
make env-create
-
Install Jupyter kernel:
make kernel
-
Verify everything works:
make check
Start your learning session:
make lab # Launch JupyterLab
# or
make notebook # Launch classic Jupyter NotebookBefore committing work:
make clear-outputs # Clear notebook outputs
git add .
git commit -m "Add: tutorial on data visualization"Your Makefile provides these automated tasks:
| Command | Purpose |
|---|---|
make/make help |
Show all available commands |
make init |
Create project folders and basic .gitignore |
make env-create |
Create micromamba environment from env.yaml |
make env-update |
Update environment when you add packages |
make env-remove |
Remove the environment completely |
make env-export |
Export environment spec for reproducibility |
make kernel |
Install/refresh Jupyter kernel |
make lab |
Launch JupyterLab |
make notebook |
Launch classic Jupyter |
make clear-outputs |
Clear all notebook outputs |
make clean |
Remove caches and temporary files |
make freeze-pip |
Export pip requirements |
make check |
Verify tools and environment |
Your env.yaml is designed for easy expansion:
# Current setup in env.yaml
name: nbenv
channels:
- conda-forge
dependencies:
- python
- ipykernel
- jupyter
- jupyterlab
# Add new packages here:
# - pandas
# - matplotlib
# - seaborn
# - scikit-learn
#- pip:
# - some-pip-packageWorkflow for adding packages:
- Edit
env.yamlto add new dependencies - Run
make env-updateto install them - If needed, run
make kernelto refresh the Jupyter kernel
Your .gitignore is configured to keep your repository clean:
What's tracked โ :
- Source code (
.pyfiles insrc/) - Notebooks (but outputs are cleared before commit)
- Documentation and configuration files
- Environment specifications (
env.yaml)
What's ignored โ:
- Data files (
data/directory) - Generated outputs (
outputs/directory) - Python caches (
__pycache__/,*.pyc) - Jupyter checkpoints (
.ipynb_checkpoints/) - Environment files (
.env, secrets) - Editor-specific files (
.vscode/,.idea/)
Best practices:
# Before committing notebooks
make clear-outputs
# Commit workflow
git status
git add src/ notebooks/ README.md # Be selective
git commit -m "Add: data loading utilities"
# Export environment state for reproducibility
make env-export # Creates mamba-linux-64.lock- Project structure established
- Environment configuration (
env.yaml) - Automation tools (
Makefile) - Git configuration (
.gitignore) - Documentation framework
- Create first notebook:
notebooks/01_getting_started.ipynb- Practice markdown cells and code cells
- Learn about kernel management
- Understand cell execution order
- Environment exploration:
notebooks/02_environment_setup.ipynb- Test package imports
- Verify micromamba environment
- Practice using Makefile commands
- Data basics:
notebooks/03_data_fundamentals.ipynb- Load sample datasets
- Basic pandas operations
- Simple visualizations with matplotlib
- Add data science packages (pandas, matplotlib, seaborn, numpy)
- Create
notebooks/04_data_exploration.ipynb - Build first
src/module for reusable functions - Practice notebook โ module refactoring
- Learn about data versioning (DVC introduction)
- Machine learning basics (scikit-learn)
- Interactive visualizations (plotly, altair)
- Notebook testing and quality assurance
- Documentation generation from notebooks
Daily development:
make lab # Start JupyterLab
make clear-outputs # Clean notebooks before Git
make clean # Remove cachesEnvironment management:
make env-update # After editing env.yaml
make kernel # Refresh Jupyter kernel
make check # Verify everything worksProject maintenance:
make env-export # Backup environment state
git status # Check what's changed
git add notebooks/ src/ # Stage specific changesenv.yaml - Your Environment Blueprint
name: nbenv # Environment name (auto-detected by Makefile)
channels: [conda-forge] # Package source (fast, up-to-date packages)
dependencies: # What's installed
- python # Latest Python
- ipykernel # Jupyter kernel support
- jupyter # Classic notebook interface
- jupyterlab # Modern notebook interfaceMakefile - Your Automation Hub
- Smart environment detection: Reads env name from
env.yaml - Micromamba integration: Uses fast package manager
- Safe shell operations: Configured with error handling
- Customizable paths: Override default directories (notebooks/, src/, data/)
- Comprehensive help: Run
make helpanytime
.gitignore - Your Repository Guardian
- Python-aware: Ignores
__pycache__/,*.pyc, virtual envs - Jupyter-friendly: Excludes
.ipynb_checkpoints/ - Data-safe: Keeps large datasets out of Git
- Editor-agnostic: Works with VS Code, PyCharm, vim, etc.
- Security-conscious: Prevents committing secrets and env files
VS Code + Jupyter Setup:
- Open project in VS Code
- Install Python and Jupyter extensions
- Select kernel:
Ctrl+Shift+Pโ "Python: Select Interpreter" โ choosenbenv - Create
.ipynbfiles innotebooks/folder - Use
make clear-outputsbefore Git commits
Command Line Workflow:
# Morning routine
make check # Verify environment health
make lab # Start JupyterLab
# Development cycle
# ... work in notebooks ...
make clear-outputs # Clean outputs
git add notebooks/01_*.ipynb
git commit -m "Add: basic data loading tutorial"
# Environment updates
# ... edit env.yaml to add packages ...
make env-update # Install new packages
make kernel # Refresh Jupyter kernel- Hidden State in Notebooks: Always restart kernel and run all cells to verify reproducibility
- Large Datasets in Git: Add data files to
.gitignore, use data versioning tools like DVC for large datasets - Environment Mismatch: Always document exact package versions in
env.yaml - Messy Notebooks: Regularly clean up, refactor reusable code to
src/ - No Backups: Commit frequently, especially before major experiments
- Real Python: Jupyter Notebook Introduction - Start here
- Jupyter Notebook Beginner Guide - Official docs
- Pandas User Guide - Data manipulation
- Matplotlib Tutorials - Plotting basics
- Structure: Clear markdown headers, single-concept cells, import everything upfront
- Naming: Use numbered prefixes (
01_,02_) for tutorial sequence - Documentation: Explain your thinking in markdown cells
- Reproducibility: Clear outputs before committing, restart kernel frequently
- Modularity: Move reusable code to
src/modules when it appears in 2+ notebooks
Start in notebooks โ Refactor to modules โ Import back to notebooks
# notebooks/01_data_exploration.ipynb
import sys
sys.path.append('../src')
from data_utils import load_dataset, clean_data
from viz_utils import create_scatter_plot
# Now your notebook focuses on analysis, not utility code
df = load_dataset('data/sample.csv')
clean_df = clean_data(df)
create_scatter_plot(clean_df, 'x', 'y')- Data Versioning: DVC for large datasets
- Notebook Testing: nbval, pytest integration
- Documentation: Sphinx, jupyter-book
- Deployment: Voilร for interactive dashboards
- Collaboration: JupyterHub, Git workflows with notebooks
| Problem | Why It Happens | Solution |
|---|---|---|
| "Kernel not found" | Jupyter can't see your environment | Run make kernel to install kernel |
| Import errors | Package not in environment | Add to env.yaml, run make env-update |
| Hidden state | Cells run out of order | Restart kernel, run all cells from top |
| Git conflicts | Notebook outputs cause merge issues | Use make clear-outputs before commits |
| Large repo size | Data files tracked by Git | Check .gitignore covers data/ directory |
| Environment drift | Packages installed but not documented | Use make freeze-pip or update env.yaml |
| Permission errors | Makefile shell issues | Check bash path with which bash |
- Run
make initto create project directories - Create first notebook: Start with
notebooks/01_hello_jupyter.ipynb - Test environment: Import basic packages, create simple plots
- Practice Git workflow: Make first commit with cleared outputs
- Complete 5 tutorial notebooks covering data basics
- Create first reusable module in
src/ - Add pandas, matplotlib, seaborn to environment
- Practice notebook โ script โ module workflow
- Build a complete data analysis project
- Learn about data versioning and larger datasets
- Explore machine learning basics with scikit-learn
- Set up automated testing for your code
This project works with AI assistance (see CODEX_INIT.md). When asking for help:
Request clear explanations: "Explain step-by-step how to..."
Ask for best practices: "What's the best way to organize..."
Get structure guidance: "Should this code go in the notebook or src/?"
Learn from warnings: "What could go wrong if I..."
Seek pro tips: "What advanced techniques should I know about..."
Ready to begin? Follow this checklist:
- Environment: Run
make env-createandmake kernel - Folders: Run
make initto create project structure - Test: Run
make checkto verify everything works - Launch: Run
make labto start JupyterLab - First notebook: Create
notebooks/01_getting_started.ipynb - Git setup: Run
make clear-outputs, then your first commit
Happy Learning! ๐
Remember: This setup emphasizes learning by doing with clean, reproducible workflows. Focus on understanding concepts while building good habits from day one. The automation tools are here to help you focus on learning, not fighting with environment setup.