# Notebook 1: Local Setup Guide

**Welcome!** This notebook will guide you through setting up the Excel Diff Server on your local machine.

**What you'll do:**
1. ✅ Check your Python version
2. ✅ Set up a virtual environment (venv)
3. ✅ Install Python dependencies
4. ✅ Configure the `.env` file
5. ✅ Verify optional dependencies (LibreOffice, Redis)
6. ✅ Start the API server for the first time

**Run each cell one by one** and follow the instructions!

## Step 1: Check Python Version

The Excel Diff Server requires **Python 3.7+** (recommended: 3.11+).

Let's check what you have:

In [None]:
import sys

version = sys.version_info
version_str = f"{version.major}.{version.minor}.{version.micro}"

print(f"Python version: {version_str}")
print(f"Executable: {sys.executable}\n")

if version.major == 3 and version.minor >= 7:
    print("✅ Your Python version is compatible!")
    if version.minor >= 11:
        print("   (Python 3.11+ is recommended for best performance)")
else:
    print("⚠️  Warning: Python 3.7+ is required")
    print("   Please install a newer Python version")

## Step 2: Understanding Virtual Environments

**What is a virtual environment (venv)?**
- A isolated Python environment for this project
- Keeps dependencies separate from your system Python
- Prevents conflicts with other projects

**Creating a venv:**

Open a terminal in the project root directory and run:

```bash
# Create virtual environment
python3 -m venv venv

# Activate it (Linux/Mac)
source venv/bin/activate

# Activate it (Windows)
venv\Scripts\activate
```

**How to tell if venv is active:**
- Your terminal prompt should show `(venv)` at the beginning
- Example: `(venv) user@machine:~/excel-differ$`

**Let's check if we're running in a venv:**

In [None]:
import sys
import os

in_venv = hasattr(sys, 'real_prefix') or (hasattr(sys, 'base_prefix') and sys.base_prefix != sys.prefix)

print(f"Python executable: {sys.executable}\n")

if in_venv:
    print("✅ You're running in a virtual environment!")
    print(f"   venv location: {sys.prefix}")
else:
    print("⚠️  Not in a virtual environment")
    print("\n📝 To create and activate a venv:")
    print("   1. Open a terminal in the project root")
    print("   2. Run: python3 -m venv venv")
    print("   3. Run: source venv/bin/activate  (Linux/Mac)")
    print("   4. Or:  venv\\Scripts\\activate    (Windows)")
    print("   5. Restart Jupyter: jupyter notebook")

## Step 3: Install Dependencies

**What are we installing?**

The project uses several Python libraries:
- **FastAPI** - Web framework for the REST API
- **openpyxl** - Read/write Excel files
- **GitPython** - Git operations
- **Celery** - Background job processing (optional)
- **redis** - Redis client (optional)
- And more...

All dependencies are listed in `requirements.txt`.

**To install:**

```bash
# Make sure venv is activated!
pip install -r requirements.txt
```

**Let's check what's currently installed:**

In [None]:
import subprocess
import sys

# List of key packages we need
required_packages = [
    'fastapi',
    'uvicorn',
    'openpyxl',
    'gitpython',
    'celery',
    'redis',
    'pydantic',
    'python-multipart',
]

print("Checking installed packages...\n")
print("="*70)

missing = []
installed = []

for package in required_packages:
    try:
        __import__(package.replace('-', '_'))
        installed.append(package)
        print(f"✅ {package:20s} - Installed")
    except ImportError:
        missing.append(package)
        print(f"❌ {package:20s} - Missing")

print("="*70)

if missing:
    print(f"\n⚠️  {len(missing)} package(s) missing")
    print("\n📝 To install all dependencies:")
    print("   1. Make sure venv is activated")
    print("   2. Run: pip install -r requirements.txt")
else:
    print("\n✅ All required packages are installed!")

## Step 4: Configure the .env File

**What is the .env file?**
- Central configuration file for the server
- Contains settings like ports, paths, Redis URLs, etc.
- NOT committed to git (for security)

**Creating your .env file:**

```bash
# Copy the example file
cp .env.example .env

# Edit with your favorite editor
nano .env
# or
code .env
```

**Key settings to understand:**

| Setting | What it does | Default |
|---------|--------------|--------|
| `PORT` | API server port | 8000 |
| `QUEUE_BACKEND` | Job processing: `celery` or `multiprocessing` | celery |
| `SNAPSHOT_REPO_URL` | Git repo for snapshots (optional) | (empty) |
| `CELERY_BROKER_URL` | Redis URL (only if using Celery) | redis://localhost:6379/0 |
| `CONVERTER_PATH` | LibreOffice path (for XLSB files) | /usr/bin/libreoffice |

**Simple setup (no Redis):**
```bash
QUEUE_BACKEND=multiprocessing
SNAPSHOT_REPO_URL=  # Leave empty for now
```

**Let's check if .env exists:**

In [None]:
from pathlib import Path

# Determine project root
project_root = Path.cwd().parent if Path.cwd().name == 'snippets' else Path.cwd()
env_file = project_root / '.env'
env_example = project_root / '.env.example'

print(f"Project root: {project_root}\n")

if env_file.exists():
    print("✅ .env file exists!\n")
    
    # Show the contents
    print("Current configuration (non-secret values):")
    print("="*70)
    
    with open(env_file, 'r') as f:
        for line in f:
            line = line.strip()
            # Skip comments, empty lines, and secrets
            if line and not line.startswith('#'):
                if 'PASSWORD' in line or 'TOKEN' in line or 'SECRET' in line:
                    key = line.split('=')[0]
                    print(f"{key}=***hidden***")
                else:
                    print(line)
    
    print("="*70)
    
else:
    print("⚠️  .env file not found!\n")
    print("📝 To create it:")
    print(f"   1. Copy: cp {env_example} {env_file}")
    print("   2. Edit the file to set your configuration")
    print("\n💡 For a simple setup without Redis:")
    print("   - Set QUEUE_BACKEND=multiprocessing")
    print("   - Leave SNAPSHOT_REPO_URL empty")
    print("   - Keep other defaults")

## Step 5: Check Optional Dependencies

Some features require external tools:

### LibreOffice (Optional)
**What for?** Converting XLSB files to XLSM
**Do I need it?** Only if you work with `.xlsb` files

**Install:**
- **Linux**: `sudo apt-get install libreoffice`
- **Mac**: `brew install --cask libreoffice` or download from libreoffice.org
- **Windows**: Download from libreoffice.org

### Redis (Optional)
**What for?** Production job queue with Celery
**Do I need it?** No - use `QUEUE_BACKEND=multiprocessing` for simple setups

**Install:**
- **Linux**: `sudo apt-get install redis-server`
- **Mac**: `brew install redis`
- **Docker**: Included in docker-compose.yml

**Let's check what's available:**

In [None]:
import subprocess
import shutil

print("Checking optional dependencies...\n")
print("="*70)

# Check LibreOffice
libreoffice_paths = [
    '/usr/bin/libreoffice',
    '/Applications/LibreOffice.app/Contents/MacOS/soffice',
    shutil.which('libreoffice'),
    shutil.which('soffice'),
]

libreoffice_found = None
for path in libreoffice_paths:
    if path and Path(path).exists():
        libreoffice_found = path
        break

if libreoffice_found:
    print(f"✅ LibreOffice found: {libreoffice_found}")
    print("   Can convert XLSB files")
else:
    print("⚠️  LibreOffice not found")
    print("   XLSB conversion will not work")
    print("   Install: sudo apt-get install libreoffice (Linux)")
    print("   Or:      brew install --cask libreoffice (Mac)")

print()

# Check Redis
redis_available = False
try:
    result = subprocess.run(['redis-cli', 'ping'], capture_output=True, text=True, timeout=2)
    if result.returncode == 0 and 'PONG' in result.stdout:
        redis_available = True
except:
    pass

if redis_available:
    print("✅ Redis is running")
    print("   Can use QUEUE_BACKEND=celery")
else:
    print("⚠️  Redis not running (optional)")
    print("   Use QUEUE_BACKEND=multiprocessing instead")
    print("   Or install: sudo apt-get install redis-server (Linux)")
    print("   Or:         brew install redis (Mac)")

print("="*70)
print("\n💡 Don't worry if optional dependencies are missing!")
print("   The server works fine without them for basic usage.")

## Step 6: Load Configuration

Let's test loading the configuration to make sure everything is set up correctly:

In [None]:
import sys
from pathlib import Path

# Add project to path
project_root = Path.cwd().parent if Path.cwd().name == 'snippets' else Path.cwd()
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

try:
    from src.core.config import get_settings
    
    settings = get_settings()
    
    print("✅ Configuration loaded successfully!\n")
    print("Current settings:")
    print("="*70)
    print(f"Host:               {settings.HOST}")
    print(f"Port:               {settings.PORT}")
    print(f"Queue backend:      {settings.QUEUE_BACKEND}")
    print(f"Max upload size:    {settings.MAX_UPLOAD_BYTES / 1024 / 1024:.0f} MB")
    print(f"Temp storage:       {settings.TEMP_STORAGE_PATH}")
    print(f"Converter path:     {settings.CONVERTER_PATH}")
    
    if settings.SNAPSHOT_REPO_URL:
        print(f"Snapshot repo:      {settings.SNAPSHOT_REPO_URL}")
    else:
        print(f"Snapshot repo:      (not configured - extract endpoint disabled)")
    
    if settings.QUEUE_BACKEND == 'celery':
        print(f"Celery broker:      {settings.CELERY_BROKER_URL}")
    
    print("="*70)
    
except Exception as e:
    print(f"❌ Error loading configuration: {e}\n")
    print("Make sure:")
    print("  1. .env file exists in project root")
    print("  2. All required variables are set")
    print("  3. Python packages are installed")

## Step 7: Understanding How to Start the Server

**The API server** is what processes requests and exposes the REST API.

**To start the server manually:**

```bash
# Make sure venv is activated!
source venv/bin/activate

# Start the server
python -m src.api.main

# Or with uvicorn directly
uvicorn src.api.main:app --host 0.0.0.0 --port 8000
```

**You'll see output like:**
```
INFO:     Started server process [12345]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000
```

**The server exposes:**
- API endpoints: `http://localhost:8000/api/v1/*`
- Interactive docs: `http://localhost:8000/docs`
- Health check: `http://localhost:8000/health`

**If using Celery** (QUEUE_BACKEND=celery), you also need to start a worker:

```bash
# In a separate terminal
source venv/bin/activate
celery -A src.workers.celery_app worker --loglevel=info
```

**If using multiprocessing** (QUEUE_BACKEND=multiprocessing):
- No separate worker needed!
- Jobs run in background threads

**Let's test if we can import the main app:**

In [None]:
try:
    from src.api.main import app
    
    print("✅ API application loaded successfully!\n")
    print("Available routes:")
    print("="*70)
    
    routes = []
    for route in app.routes:
        if hasattr(route, 'path') and hasattr(route, 'methods'):
            for method in route.methods:
                if method != 'HEAD':  # Skip HEAD methods
                    routes.append(f"{method:6s} {route.path}")
    
    for route in sorted(routes):
        print(route)
    
    print("="*70)
    print("\n📝 To start the server:")
    print("   python -m src.api.main")
    print("\n📖 Then visit:")
    print("   http://localhost:8000/docs (interactive docs)")
    
except Exception as e:
    print(f"❌ Error loading API: {e}\n")
    print("This usually means:")
    print("  - Dependencies not installed")
    print("  - .env file misconfigured")
    print("  - Import error in the code")

## Step 8: Test a Quick Health Check

**Note:** This only works if the server is already running!

If you haven't started the server yet, skip this cell and come back after starting it.

In [None]:
import requests

API_URL = "http://localhost:8000"

try:
    response = requests.get(f"{API_URL}/health", timeout=2)
    
    if response.status_code == 200:
        print("✅ Server is running!\n")
        data = response.json()
        print("Health check response:")
        for key, value in data.items():
            print(f"  {key}: {value}")
        print("\n🎉 Setup complete! Ready to use the API!")
    else:
        print(f"⚠️  Server responded with status {response.status_code}")
        
except requests.exceptions.ConnectionError:
    print("⚠️  Server is not running\n")
    print("📝 To start the server:")
    print("   1. Open a terminal")
    print("   2. Activate venv: source venv/bin/activate")
    print("   3. Start server: python -m src.api.main")
    print("   4. Come back and run this cell again")
    
except Exception as e:
    print(f"❌ Error: {e}")

## ✅ Setup Complete!

**What you've learned:**
1. ✅ How to check Python version
2. ✅ How to create and activate a virtual environment
3. ✅ How to install dependencies with pip
4. ✅ How to configure the .env file
5. ✅ What optional dependencies do (LibreOffice, Redis)
6. ✅ How to start the API server
7. ✅ How to verify the server is running

**Quick Setup Summary:**

```bash
# 1. Create venv
python3 -m venv venv
source venv/bin/activate

# 2. Install dependencies
pip install -r requirements.txt

# 3. Configure
cp .env.example .env
nano .env  # Set QUEUE_BACKEND=multiprocessing for simplicity

# 4. Start server
python -m src.api.main

# 5. Visit docs
open http://localhost:8000/docs
```

**Next Steps:**
1. Open `02_test_features.ipynb` to learn how Excel flattening works
2. Open `03_test_api.ipynb` to test the REST API

**Need help?**
- Read [QUICKSTART.md](../QUICKSTART.md) for a quick guide
- Read [SETUP_WITHOUT_DOCKER.md](../docs/SETUP_WITHOUT_DOCKER.md) for detailed setup
- Check [KEY_FILES_EXPLAINED.md](../docs/KEY_FILES_EXPLAINED.md) to understand the code

**Happy coding!** 🚀