# Virtual Environments and Package Management with pip

Package management and virtual environments are crucial skills for Python development, especially in data science and machine learning projects where different projects may require different versions of libraries.

## Why Virtual Environments?
- **Isolation**: Keep project dependencies separate
- **Version control**: Different projects can use different library versions
- **Reproducibility**: Share exact environment configurations
- **Clean system**: Avoid cluttering global Python installation
- **Collaboration**: Team members can replicate exact environments

## Topics Covered:
- Understanding pip (Python Package Installer)
- Creating and managing virtual environments
- Installing and managing packages
- Requirements files
- Best practices for project setup
- Conda vs pip vs pipenv
- Troubleshooting common issues

## Understanding pip

pip is the standard package installer for Python. It connects to the Python Package Index (PyPI) to download and install packages.

In [None]:
# Check pip version and location
import subprocess
import sys

def run_command(command):
    """Run a command and return the output."""
    try:
        result = subprocess.run(command, shell=True, capture_output=True, text=True, check=True)
        return result.stdout.strip()
    except subprocess.CalledProcessError as e:
        return f"Error: {e.stderr.strip()}"

# Check pip version
pip_version = run_command("pip --version")
print(f"pip version: {pip_version}")

# Check Python version
print(f"Python version: {sys.version}")
print(f"Python executable: {sys.executable}")

## Basic pip Commands

Here are the most important pip commands you'll use regularly:

In [None]:
# Basic pip commands (run these in terminal/command prompt)
print("Essential pip commands:")
print()

commands = {
    "Install a package": "pip install package_name",
    "Install specific version": "pip install package_name==1.2.3",
    "Install minimum version": "pip install 'package_name>=1.2.0'",
    "Upgrade a package": "pip install --upgrade package_name",
    "Uninstall a package": "pip uninstall package_name",
    "List installed packages": "pip list",
    "Show package info": "pip show package_name",
    "Search for packages": "pip search search_term",
    "Install from requirements": "pip install -r requirements.txt",
    "Generate requirements": "pip freeze > requirements.txt",
    "Check for outdated packages": "pip list --outdated"
}

for description, command in commands.items():
    print(f"{description:.<30} {command}")

## Virtual Environments with venv

Python 3.3+ includes the `venv` module for creating virtual environments.

In [None]:
# Virtual environment commands (run in terminal)
print("Virtual Environment Commands:")
print()

venv_commands = {
    "Create virtual environment": "python -m venv myenv",
    "Activate (Windows)": "myenv\\Scripts\\activate",
    "Activate (macOS/Linux)": "source myenv/bin/activate",
    "Deactivate": "deactivate",
    "Remove environment": "rm -rf myenv  # or rmdir /s myenv (Windows)"
}

for description, command in venv_commands.items():
    print(f"{description:.<25} {command}")

print("\n" + "="*60)
print("Step-by-step Virtual Environment Setup:")
print("="*60)

steps = [
    "1. Create project directory: mkdir my_project && cd my_project",
    "2. Create virtual environment: python -m venv venv",
    "3. Activate environment: source venv/bin/activate (or venv\\Scripts\\activate on Windows)",
    "4. Upgrade pip: pip install --upgrade pip",
    "5. Install packages: pip install numpy pandas matplotlib",
    "6. Save requirements: pip freeze > requirements.txt",
    "7. Deactivate when done: deactivate"
]

for step in steps:
    print(step)

## Requirements Files

Requirements files specify exact package versions for reproducible environments.

In [None]:
# Example requirements.txt content
sample_requirements = """
# Data Science Core Libraries
numpy==1.24.3
pandas==2.0.3
matplotlib==3.7.1
seaborn==0.12.2
scipy==1.11.1

# Machine Learning
scikit-learn==1.3.0
tensorflow>=2.13.0,<3.0.0
torch>=2.0.0

# NLP Libraries
nltk==3.8.1
spacy>=3.6.0
transformers>=4.30.0

# Web and API
requests==2.31.0
flask==2.3.2
fastapi>=0.100.0

# Development Tools
jupyter==1.0.0
pytest>=7.4.0
black>=23.0.0
flake8>=6.0.0

# Optional packages for specific features
# plotly>=5.15.0  # Interactive plotting
# streamlit>=1.25.0  # Web apps
""".strip()

print("Sample requirements.txt:")
print("=" * 40)
print(sample_requirements)

# Create a sample requirements file
with open('sample_requirements.txt', 'w') as f:
    f.write(sample_requirements)

print("\nFile saved as 'sample_requirements.txt'")
print("To install: pip install -r sample_requirements.txt")

## Different Types of Requirements Files

In [None]:
# Different requirements files for different purposes

# Base requirements (production)
base_requirements = """
# requirements.txt - Production dependencies
numpy==1.24.3
pandas==2.0.3
requests==2.31.0
flask==2.3.2
""".strip()

# Development requirements
dev_requirements = """
# requirements-dev.txt - Development dependencies
-r requirements.txt  # Include base requirements

# Testing
pytest>=7.4.0
pytest-cov>=4.1.0
coverage>=7.2.0

# Code quality
black>=23.0.0
flake8>=6.0.0
mypy>=1.4.0

# Development tools
jupyter>=1.0.0
ipython>=8.14.0
pre-commit>=3.3.0
""".strip()

# Test requirements
test_requirements = """
# requirements-test.txt - Testing only
pytest>=7.4.0
pytest-mock>=3.11.0
pytest-xdist>=3.3.0
factory-boy>=3.3.0
faker>=19.0.0
""".strip()

files_content = {
    'requirements.txt': base_requirements,
    'requirements-dev.txt': dev_requirements,
    'requirements-test.txt': test_requirements
}

for filename, content in files_content.items():
    print(f"\n{filename}:")
    print("=" * len(filename))
    print(content)
    
    # Save the file
    with open(f'sample_{filename}', 'w') as f:
        f.write(content)

print("\n" + "=" * 50)
print("Usage:")
print("Production: pip install -r requirements.txt")
print("Development: pip install -r requirements-dev.txt")
print("Testing only: pip install -r requirements-test.txt")

## Package Version Specifiers

In [None]:
# Understanding version specifiers
version_examples = {
    "Exact version": "package==1.2.3",
    "Minimum version": "package>=1.2.0",
    "Maximum version": "package<=1.9.9",
    "Range": "package>=1.2.0,<2.0.0",
    "Compatible release": "package~=1.2.3  # >= 1.2.3, < 1.3.0",
    "Latest version": "package  # No version specified",
    "Pre-release": "package>=1.0.0a1  # Include alpha/beta",
    "Development version": "package>=1.0.dev0",
    "From git": "git+https://github.com/user/repo.git",
    "From git branch": "git+https://github.com/user/repo.git@branch",
    "From local path": "-e /path/to/local/package",
    "With extras": "package[extra1,extra2]==1.2.3"
}

print("Package Version Specifiers:")
print("=" * 40)

for description, example in version_examples.items():
    print(f"{description:.<20} {example}")

print("\n" + "=" * 40)
print("Version Specifier Rules:")
print("=" * 40)

rules = [
    "== : Exactly equal to version",
    ">= : Greater than or equal to version",
    "> : Greater than version",
    "<= : Less than or equal to version",
    "< : Less than version",
    "~= : Compatible release (same as >= version, < next major)",
    "!= : Not equal to version",
    "* : Can combine multiple specifiers with comma"
]

for rule in rules:
    print(f"  {rule}")

## Project Structure Best Practices

In [None]:
# Recommended project structure
project_structure = """
my_project/
├── venv/                     # Virtual environment (don't commit to git)
├── src/                      # Source code
│   ├── __init__.py
│   ├── main.py
│   └── utils/
│       ├── __init__.py
│       └── helpers.py
├── tests/                    # Test files
│   ├── __init__.py
│   ├── test_main.py
│   └── test_utils.py
├── data/                     # Data files
│   ├── raw/
│   ├── processed/
│   └── external/
├── notebooks/                # Jupyter notebooks
│   ├── exploratory/
│   └── analysis/
├── docs/                     # Documentation
├── requirements.txt          # Production dependencies
├── requirements-dev.txt      # Development dependencies
├── .gitignore               # Git ignore file
├── .env                     # Environment variables (don't commit)
├── README.md                # Project documentation
├── setup.py                 # Package setup (optional)
└── Makefile                 # Common commands (optional)
""".strip()

print("Recommended Project Structure:")
print("=" * 40)
print(project_structure)

# Sample .gitignore content
gitignore_content = """
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
*.egg
*.egg-info/
dist/
build/

# Virtual Environments
venv/
env/
ENV/
.venv/

# IDEs
.vscode/
.idea/
*.swp
*.swo
*~

# OS
.DS_Store
Thumbs.db

# Data
*.csv
*.json
*.pkl
*.h5

# Logs
*.log
logs/

# Environment variables
.env
.env.local

# Jupyter
.ipynb_checkpoints/

# Model files
*.model
*.pkl
models/
""".strip()

print("\n\nSample .gitignore:")
print("=" * 20)
print(gitignore_content)

# Save gitignore
with open('sample_.gitignore', 'w') as f:
    f.write(gitignore_content)

print("\nFile saved as 'sample_.gitignore'")

## Alternative Package Managers

In [None]:
# Comparison of package managers
package_managers = {
    "pip + venv": {
        "Pros": [
            "Built into Python",
            "Standard and widely used",
            "Large package repository (PyPI)",
            "Simple and straightforward"
        ],
        "Cons": [
            "No dependency resolution",
            "Separate tools for env and packages",
            "No lock files by default"
        ],
        "Best for": "General Python development, simple projects"
    },
    "Conda": {
        "Pros": [
            "Handles non-Python dependencies",
            "Great for data science",
            "Built-in environment management",
            "Dependency resolution"
        ],
        "Cons": [
            "Larger installation",
            "Slower package resolution",
            "Smaller package repository"
        ],
        "Best for": "Data science, scientific computing, complex dependencies"
    },
    "Pipenv": {
        "Pros": [
            "Combines pip and venv",
            "Lock files for reproducibility",
            "Dependency resolution",
            "Security vulnerability scanning"
        ],
        "Cons": [
            "Additional dependency",
            "Slower than pip",
            "Sometimes complex dependency resolution"
        ],
        "Best for": "Production applications, team collaboration"
    },
    "Poetry": {
        "Pros": [
            "Modern dependency management",
            "Built-in build system",
            "Excellent dependency resolution",
            "Easy publishing to PyPI"
        ],
        "Cons": [
            "Learning curve",
            "Additional tool to learn",
            "Not built into Python"
        ],
        "Best for": "Modern Python projects, package development"
    }
}

print("Package Manager Comparison:")
print("=" * 50)

for manager, details in package_managers.items():
    print(f"\n{manager.upper()}:")
    print("-" * len(manager))
    
    print("✅ Pros:")
    for pro in details["Pros"]:
        print(f"   • {pro}")
    
    print("❌ Cons:")
    for con in details["Cons"]:
        print(f"   • {con}")
    
    print(f"🎯 Best for: {details['Best for']}")

## Common Package Management Commands

In [None]:
# Common commands for different package managers
commands_comparison = {
    "Create environment": {
        "pip + venv": "python -m venv myenv",
        "conda": "conda create -n myenv python=3.9",
        "pipenv": "pipenv --python 3.9",
        "poetry": "poetry init"
    },
    "Activate environment": {
        "pip + venv": "source myenv/bin/activate",
        "conda": "conda activate myenv",
        "pipenv": "pipenv shell",
        "poetry": "poetry shell"
    },
    "Install package": {
        "pip + venv": "pip install numpy",
        "conda": "conda install numpy",
        "pipenv": "pipenv install numpy",
        "poetry": "poetry add numpy"
    },
    "Install from requirements": {
        "pip + venv": "pip install -r requirements.txt",
        "conda": "conda env create -f environment.yml",
        "pipenv": "pipenv install",
        "poetry": "poetry install"
    },
    "Export dependencies": {
        "pip + venv": "pip freeze > requirements.txt",
        "conda": "conda env export > environment.yml",
        "pipenv": "pipenv requirements > requirements.txt",
        "poetry": "poetry export -f requirements.txt"
    }
}

print("Command Comparison Across Package Managers:")
print("=" * 60)

for task, commands in commands_comparison.items():
    print(f"\n{task.upper()}:")
    print("-" * len(task))
    
    for manager, command in commands.items():
        print(f"{manager:.<15} {command}")

## Troubleshooting Common Issues

In [None]:
# Common issues and solutions
troubleshooting_guide = {
    "Permission Denied Error": {
        "Problem": "Cannot install packages due to permission issues",
        "Solutions": [
            "Use virtual environment instead of system Python",
            "Use --user flag: pip install --user package_name",
            "On Linux/Mac: Use sudo cautiously (not recommended)",
            "Fix ownership: sudo chown -R $(whoami) ~/.local"
        ]
    },
    "Package Not Found": {
        "Problem": "pip cannot find the package",
        "Solutions": [
            "Check spelling of package name",
            "Search PyPI: https://pypi.org/",
            "Update pip: pip install --upgrade pip",
            "Check if package exists for your Python version"
        ]
    },
    "Dependency Conflicts": {
        "Problem": "Package conflicts with existing installations",
        "Solutions": [
            "Use virtual environments to isolate dependencies",
            "Check compatibility with pip-tools or pipenv",
            "Use pip install --force-reinstall package_name",
            "Create fresh virtual environment"
        ]
    },
    "SSL Certificate Error": {
        "Problem": "SSL certificate verification failed",
        "Solutions": [
            "Upgrade pip and certificates",
            "Use trusted host: pip install --trusted-host pypi.org package_name",
            "Update system certificates",
            "Check corporate firewall/proxy settings"
        ]
    },
    "Out of Memory Error": {
        "Problem": "Installation fails due to insufficient memory",
        "Solutions": [
            "Use --no-cache-dir: pip install --no-cache-dir package_name",
            "Install pre-compiled wheels instead of source",
            "Close other applications to free memory",
            "Use swap file on Linux systems"
        ]
    }
}

print("Troubleshooting Guide:")
print("=" * 40)

for issue, details in troubleshooting_guide.items():
    print(f"\n🚨 {issue}:")
    print("-" * len(issue))
    print(f"Problem: {details['Problem']}")
    print("Solutions:")
    for solution in details['Solutions']:
        print(f"  • {solution}")

# Diagnostic commands
print("\n\n🔍 Diagnostic Commands:")
print("=" * 30)
diagnostic_commands = [
    "pip --version                    # Check pip version",
    "pip list                         # List installed packages",
    "pip show package_name            # Show package details",
    "pip check                        # Check for dependency issues",
    "pip list --outdated              # Show outdated packages",
    "python -m site --user-site       # Show user site directory",
    "which python                     # Show Python executable path",
    "echo $PYTHONPATH                 # Show Python path (Linux/Mac)"
]

for cmd in diagnostic_commands:
    print(cmd)

## Best Practices Checklist

In [None]:
# Best practices checklist
best_practices = {
    "Environment Management": [
        "✅ Always use virtual environments for projects",
        "✅ One virtual environment per project",
        "✅ Name environments descriptively",
        "✅ Keep virtual environments out of version control",
        "✅ Document Python version requirements"
    ],
    "Dependency Management": [
        "✅ Pin exact versions in production (==)",
        "✅ Use version ranges for development (>=)",
        "✅ Separate dev and prod dependencies",
        "✅ Regular dependency updates",
        "✅ Test after dependency changes"
    ],
    "Requirements Files": [
        "✅ Keep requirements.txt up to date",
        "✅ Use comments to explain unusual dependencies",
        "✅ Group related dependencies",
        "✅ Include indirect dependencies if needed",
        "✅ Use multiple requirements files for different environments"
    ],
    "Security": [
        "✅ Regularly update dependencies",
        "✅ Use tools like safety to check for vulnerabilities",
        "✅ Avoid installing from untrusted sources",
        "✅ Review package source code for critical dependencies",
        "✅ Use lock files when available"
    ],
    "Collaboration": [
        "✅ Include setup instructions in README",
        "✅ Document system-level dependencies",
        "✅ Use consistent Python versions across team",
        "✅ Automate environment setup where possible",
        "✅ Regular team dependency reviews"
    ]
}

print("Python Package Management Best Practices:")
print("=" * 50)

for category, practices in best_practices.items():
    print(f"\n📋 {category}:")
    print("-" * len(category))
    for practice in practices:
        print(f"  {practice}")

# Common mistakes to avoid
print("\n\n❌ Common Mistakes to Avoid:")
print("=" * 35)
mistakes = [
    "Installing packages globally instead of using virtual environments",
    "Not pinning versions in production requirements",
    "Committing virtual environments to version control",
    "Not keeping requirements files up to date",
    "Using sudo with pip (on Unix systems)",
    "Not testing after dependency updates",
    "Mixing conda and pip without understanding interactions",
    "Not documenting system-level dependencies"
]

for i, mistake in enumerate(mistakes, 1):
    print(f"{i:2d}. {mistake}")

## Clean Up Sample Files

In [None]:
# Clean up the sample files we created
import os

sample_files = [
    'sample_requirements.txt',
    'sample_requirements-dev.txt',
    'sample_requirements-test.txt',
    'sample_.gitignore'
]

print("Cleaning up sample files...")
for filename in sample_files:
    try:
        if os.path.exists(filename):
            os.remove(filename)
            print(f"✅ Removed {filename}")
    except Exception as e:
        print(f"❌ Could not remove {filename}: {e}")

print("\nCleanup complete!")

## Key Takeaways

### Essential Concepts:
1. **Virtual environments isolate project dependencies** and prevent conflicts
2. **pip is the standard package installer** for Python packages
3. **Requirements files ensure reproducible environments** across machines
4. **Version specifiers control which package versions** are installed
5. **Different package managers** serve different needs (pip, conda, pipenv, poetry)

### Workflow Summary:
1. **Create project directory**
2. **Create virtual environment**: `python -m venv venv`
3. **Activate environment**: `source venv/bin/activate` (or `venv\Scripts\activate` on Windows)
4. **Upgrade pip**: `pip install --upgrade pip`
5. **Install packages**: `pip install package_name`
6. **Save requirements**: `pip freeze > requirements.txt`
7. **Deactivate when done**: `deactivate`

### For Data Science Projects:
- Consider **conda** for complex scientific dependencies
- Use **separate requirements files** for different environments
- **Pin exact versions** in production
- **Regular dependency audits** for security

## Practice Exercises

1. **Create a data science project** with proper virtual environment setup
2. **Set up multiple requirements files** (dev, test, production)
3. **Practice dependency conflict resolution**
4. **Try different package managers** (conda, pipenv) and compare workflows
5. **Create a project template** with proper structure and documentation
6. **Set up automated testing** in virtual environments
7. **Practice collaborative development** with shared requirements

## Next Steps

Master package management to:
- **Avoid dependency hell** in complex projects
- **Collaborate effectively** with team members
- **Deploy applications** reliably
- **Maintain multiple projects** without conflicts
- **Contribute to open source** projects confidently

Package management is a foundational skill that becomes more critical as projects grow in complexity and team size!