# Week 1 — Part 01: Environment Setup Lab

**Estimated Time:** 45-60 minutes

---

## Pre-study (Self-learn)

Foundations Course assumes Self-learn is complete. If you need a refresher on environments or Jupyter:

- [Foundations Course Pre-study index](../PRESTUDY.md)
- [Self-learn — Conda environment management](../self_learn/Chapters/2/03_conda_environments.md)
- [Self-learn — Conda environments and packages](../self_learn/Chapters/1/04_conda_environment_management.md)
- [Self-learn — Jupyter](../self_learn/Chapters/1/05_jupyter_interactive_computing.md)

---

## What success looks like (end of Part 01)

- You can create a new virtual environment (`.venv`).
- You can install `pandas` inside that environment.
- You can produce a `requirements.txt` and use it to reproduce installs later.

**Checkpoint evidence** (minimum):

- `python --version` prints what you expect.
- `pip --version` shows it is from your environment (not system-wide).
- `pip freeze > requirements.txt` produces a file.

---

## Learning Objectives

By completing this lab, you will:

- ✅ Create isolated Python environments using venv
- ✅ Install and manage dependencies with pip
- ✅ Save and restore project dependencies
- ✅ Test environment reproducibility

---

## Exercise 1: Virtual environments (quick refresher)

If you need the full explanation of why environments exist and how activation works, use the Self-learn links above.

In this notebook, you will:

1. Check your Python version
2. Create a venv
3. Install dependencies
4. Freeze dependencies into `requirements.txt` so the setup is reproducible

### What this cell does
Runs `python --version` using the shell (`!` prefix). This tells you which Python interpreter is active.

**Why it matters:** If you see `/usr/bin/python` instead of your venv path, your environment is not activated. All subsequent `pip install` calls would go to the wrong place.

In [2]:
# Check Python version
!python --version

Python 3.13.11


### Task 1.1: Create a Virtual Environment

Let's create a virtual environment for our project. This will be an isolated Python environment where we can install our project dependencies.

### What this cell does
Creates a virtual environment named `.venv` in the current directory using Python's built-in `venv` module.

**Key concept:** A virtual environment is an isolated copy of Python with its own `site-packages` folder. Installing packages here does **not** affect your system Python or other projects.

**What to check after running:** A new `.venv/` directory should appear. On Linux/macOS, `ls .venv/bin/` should show `python`, `pip`, and `activate`.

In [None]:
# Create a virtual environment named .venv
!python -m venv .venv

print("Virtual environment created successfully!")

### Task 1.2: Activate the Virtual Environment

Activation details vary by OS and workflow.

If you need a refresher on activation, kernels, and how Jupyter relates to environments, use the Self-learn links at the top.

Below we provide the common activation commands so you can proceed with the lab.

To activate your environment, run one of the following commands:

**Linux/macOS:**


In [None]:
%%bash
source .venv/bin/activate




**Windows (Command Prompt):**


.venv\Scripts\activate.bat




**Windows (PowerShell):**


.venv\Scripts\Activate.ps1


### What this cell does
Upgrades `pip` to the latest version, then verifies which `pip` is active.

**Why upgrade pip first?** Older pip versions sometimes fail to resolve dependencies correctly or miss newer wheel formats. This is a standard first step after creating any new environment.

**What to check:** The `pip --version` output should show a path inside your `.venv/` directory, not `/usr/bin/pip`.

### Task 1.3: Upgrade pip

It's good practice to upgrade pip to the latest version after creating a new virtual environment.

In [3]:
# Upgrade pip to the latest version
!python -m pip install --upgrade pip

# Verify pip version
!pip --version

Collecting pip
  Using cached pip-26.0.1-py3-none-any.whl.metadata (4.7 kB)
Using cached pip-26.0.1-py3-none-any.whl (1.8 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 26.0
    Uninstalling pip-26.0:
      Successfully uninstalled pip-26.0
Successfully installed pip-26.0.1
pip 26.0.1 from /home/wsl2ubt2204/miniconda/envs/py313/lib/python3.13/site-packages/pip (python 3.13)


### What this cell does
Installs `pandas` and verifies the import works. The `try/except` block gives a clear error if the install failed.

**Why pandas?** It's the standard library for reading and manipulating tabular data (CSV, Excel, etc.) in Python. You'll use it in Part 02 to build the data profiling script.

**What to check:** The printed version should match what you'd expect. If you see `ImportError`, the install failed — check that your venv is activated.

---

## Exercise 2: Installing Dependencies

Now that we have our virtual environment set up, let's install some dependencies for our project.

### Task 2.1: Install Required Dependencies

For this week's exercises, we need pandas for data manipulation. Let's install it in our virtual environment.

In [4]:
# Install pandas in our virtual environment
!pip install pandas

# Verify installation
try:
    import pandas as pd
    print("Pandas installed successfully!")
    print(f"Pandas version: {pd.__version__}")
except ImportError:
    print("Failed to import pandas. Please check your installation.")

Collecting pandas
  Downloading pandas-3.0.0-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (79 kB)
Collecting numpy>=1.26.0 (from pandas)
  Downloading numpy-2.4.2-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (6.6 kB)
Downloading pandas-3.0.0-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (10.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.9/10.9 MB[0m [31m3.2 MB/s[0m  [33m0:00:03[0meta [36m0:00:01[0mm
[?25hDownloading numpy-2.4.2-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (16.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.6/16.6 MB[0m [31m20.1 MB/s[0m  [33m0:00:00[0mm0:00:01[0m00:01[0m
[?25hInstalling collected packages: numpy, pandas
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2/2[0m [pandas]2m1/2[0m [pandas]
[1A[2KSuccessfully installed numpy-2.4.2 pandas-3.0.0
Pandas installed successfully!
Pandas version: 3.0.0


### What this cell does
Runs `pip freeze > requirements.txt` to capture all currently installed packages with their **exact** version numbers, then reads the file back to show you what was saved.

**Why `pip freeze`?** It records the full dependency tree — not just what you explicitly installed, but also all transitive dependencies. This is what makes the environment reproducible: anyone can run `pip install -r requirements.txt` and get the exact same versions.

**Key detail:** `sort_keys` and deterministic ordering matter here — the same environment should produce the same file every time.

### Task 2.2: Install Multiple Dependencies

Let's install a few more packages that we'll need for future weeks.

In [None]:
# Install additional dependencies
!pip install scikit-learn matplotlib

# Verify installations
try:
    import sklearn
    print("Scikit-learn installed successfully!")
    print(f"Scikit-learn version: {sklearn.__version__}")
except ImportError:
    print("Failed to import scikit-learn. Please check your installation.")

try:
    import matplotlib
    print("Matplotlib installed successfully!")
    print(f"Matplotlib version: {matplotlib.__version__}")
except ImportError:
    print("Failed to import matplotlib. Please check your installation.")

---

## Exercise 3: Managing Dependencies

Goal: produce a `requirements.txt` so the environment can be recreated.

If you want deeper background on environment reproducibility, see Self-learn (links at the top).

Next you will:

- freeze installed packages into `requirements.txt`
- inspect the file so you know what it contains

### What this cell does
Creates a second virtual environment (`test_env`) to simulate the "fresh machine" test — proving your `requirements.txt` is complete enough to recreate the project from scratch.

**Why this matters:** The real test of reproducibility is not "it works on my machine" but "it works on a clean machine with only the README and `requirements.txt`." This exercise simulates that scenario.

**In practice:** You would delete your current `.venv`, create a new one, and run `pip install -r requirements.txt`. If everything imports correctly, your environment setup is reproducible.

### Task 3.1: Save Dependencies to requirements.txt

Let's generate a requirements.txt file with the exact package versions we've installed.

In [None]:
# Generate requirements.txt with exact package versions
!pip freeze > requirements.txt

# Display the contents of requirements.txt
with open('requirements.txt', 'r') as f:
    requirements = f.read()
    print("requirements.txt contents:")
    print(requirements)

### Task 3.2: Understanding requirements.txt

Let's examine the contents of our requirements.txt file to understand what information it contains.

In [None]:
# Parse and display information about our dependencies
with open('requirements.txt', 'r') as f:
    requirements = f.readlines()

print("Dependency Information:")
for req in requirements:
    req = req.strip()
    if req and not req.startswith('#'):
        package_info = req.split('==')
        if len(package_info) == 2:
            package_name, package_version = package_info
            print(f"  {package_name}: {package_version}")
        else:
            print(f"  {req}")

---

## Exercise 4: Testing Environment Reproducibility

Goal: prove the setup is repeatable.

In practice, you would do this in a fresh folder or by deleting/recreating your environment.

Next you will:

- create a second venv (`test_env`)
- (conceptually) install from `requirements.txt`
- verify imports/versions

### Task 4.1: Create a New Environment

Let's create a new virtual environment and try to install our dependencies from requirements.txt.

In [None]:
# Create a new virtual environment for testing
!python -m venv test_env

print("Test environment created successfully!")

### Task 4.2: Install Dependencies from requirements.txt

Now let's try to install our dependencies in the new environment using the requirements.txt file.

Note: In a real scenario, you would activate the new environment first.

To install dependencies in the test environment, you would run:



%%bash
source test_env/bin/activate  # On Linux/macOS
pip install -r requirements.txt


# In a real scenario, you would activate the test environment first
# For this notebook, we'll simulate the installation

print("To install dependencies in the test environment, you would run:")
print("  source test_env/bin/activate  # On Linux/macOS")
print("  pip install -r requirements.txt")

# For demonstration purposes, let's just show what packages would be installed
with open('requirements.txt', 'r') as f:
    requirements = f.read()
    print("\nPackages that would be installed:")
    print(requirements)

### Task 4.3: Verify Installation

Let's verify that the packages were installed correctly in our test environment.

Note: In a real scenario, you would run these commands in the activated test environment:



In [None]:
%%bash
python -c "import pandas; print('Pandas version:', pandas.__version__)"
python -c "import sklearn; print('Scikit-learn version:', sklearn.__version__)"
python -c "import matplotlib; print('Matplotlib version:', matplotlib.__version__)"


In [None]:
# In a real scenario, you would run these commands in the activated test environment
print("To verify installation in the test environment, you would run:")
print("  python -c \"import pandas; print('Pandas version:', pandas.__version__)\"")
print("  python -c \"import sklearn; print('Scikit-learn version:', sklearn.__version__)\"")
print("  python -c \"import matplotlib; print('Matplotlib version:', matplotlib.__version__)\"")

# For demonstration purposes, let's verify in our current environment
print("\nVerification in current environment:")
try:
    import pandas as pd
    print(f"  Pandas version: {pd.__version__}")
except ImportError:
    print("  Pandas not available")

try:
    import sklearn
    print(f"  Scikit-learn version: {sklearn.__version__}")
except ImportError:
    print("  Scikit-learn not available")

try:
    import matplotlib
    print(f"  Matplotlib version: {matplotlib.__version__}")
except ImportError:
    print("  Matplotlib not available")

---

## Exercise 5: Practice Challenges

Now it's your turn to apply what you've learned. Try to complete the following challenges:

### Challenge 5.1: Create a Project Setup Script

Create a shell script that automates the entire environment setup process:

1. Create a virtual environment
2. Activate it
3. Upgrade pip
4. Install dependencies from requirements.txt (if it exists)
5. If requirements.txt doesn't exist, install a default set of packages

Bonus: Make the script work on both Linux/macOS and Windows.

In [None]:
# TODO: Create a project setup script
#
# Goal:
# - Create a script that a teammate can run on a fresh clone/folder.
# - It should create a venv, upgrade pip, and install dependencies.
#
# Hint: start with a minimal script, then add OS detection if you want.
setup_script = ""  # TODO

print("TODO: write a setup script string, then save it to setup_project.sh")

### Challenge 5.2: Dependency Management Best Practices

Research and implement best practices for dependency management:

1. Create separate requirements files for different environments (development, testing, production)
2. Use version ranges instead of exact versions for some packages
3. Add comments to your requirements.txt to explain why certain packages are needed
4. Regularly update dependencies and check for security vulnerabilities

In [None]:
# TODO: Create separate requirements files for different environments
#
# Goal:
# - requirements-dev.txt (dev tools like pytest, jupyter)
# - requirements-prod.txt (only runtime deps)
#
# Keep this section as an exercise. Solutions are in the appendix.
print("TODO: create requirements-dev.txt and requirements-prod.txt")

---

## Summary and Key Takeaways

This lab’s core deliverable is a reproducible environment setup:

- you can create/activate an environment
- you can install dependencies
- you can write `requirements.txt`
- you can recreate the environment later

If you want more best-practice depth (conda/venv strategies, kernels, troubleshooting), refer to Self-learn via the links at the top.

## Appendix: Solutions (peek only after trying)

Reference implementations for the two challenge tasks.

In [None]:
# Solution for Challenge 5.1
setup_script = '''\
#!/bin/bash

set -euo pipefail

echo "Setting up project environment..."

python -m venv .venv

# shellcheck disable=SC1091
source .venv/bin/activate

python -m pip install --upgrade pip

if [ -f requirements.txt ]; then
  echo "Installing dependencies from requirements.txt..."
  pip install -r requirements.txt
else
  echo "Installing default packages..."
  pip install pandas scikit-learn matplotlib
fi

echo "Environment setup complete"
'''

with open('setup_project.sh', 'w', encoding='utf-8') as f:
    f.write(setup_script)

print("Wrote setup_project.sh")


# Solution for Challenge 5.2

# Development requirements
# These packages are needed for development but not for production

dev_requirements = '''\
# Development requirements
pandas>=1.3.0
scikit-learn>=1.0.0
matplotlib>=3.4.0
jupyter>=1.0.0
pytest>=6.2.0
'''

# Production requirements
# These packages are needed for production deployment

prod_requirements = '''\
# Production requirements
pandas>=1.3.0
scikit-learn>=1.0.0
'''

with open('requirements-dev.txt', 'w', encoding='utf-8') as f:
    f.write(dev_requirements)

with open('requirements-prod.txt', 'w', encoding='utf-8') as f:
    f.write(prod_requirements)

print("Wrote requirements-dev.txt and requirements-prod.txt")
