# üöÄ AI Fashion Assistant v2.0 - Project Initialization

**Phase 0: Setup**

This notebook initializes the project structure and validates the environment.

---

## üìã Tasks

1. ‚úÖ Mount Google Drive
2. ‚úÖ Create folder structure
3. ‚úÖ Copy data from v1
4. ‚úÖ Validate environment
5. ‚úÖ Install dependencies
6. ‚úÖ Initialize Git
7. ‚úÖ Test imports

---

In [None]:
# ============================================================
# 1) MOUNT GOOGLE DRIVE
# ============================================================

from google.colab import drive
import os

print("üîó Mounting Google Drive...")
drive.mount("/content/drive", force_remount=False)
print("‚úÖ Drive mounted!")

# Check GPU
!nvidia-smi --query-gpu=name,memory.total --format=csv,noheader

In [None]:
# ============================================================
# 2) SET PATHS
# ============================================================

from pathlib import Path

# New project path
PROJECT_ROOT = Path("/content/drive/MyDrive/ai_fashion_assistant_v2")

# Old project path (for data copying)
OLD_PROJECT = Path("/content/drive/MyDrive/ai_fashion_assistant_v1")

print(f"üìÅ New Project: {PROJECT_ROOT}")
print(f"üìÅ Old Project: {OLD_PROJECT}")

# Verify new folder exists
if not PROJECT_ROOT.exists():
    print("‚ùå Project folder does not exist!")
    print(f"   Please create: {PROJECT_ROOT}")
    raise FileNotFoundError(f"Create {PROJECT_ROOT} in Google Drive")
else:
    print("‚úÖ Project folder exists!")

In [None]:
# ============================================================
# 3) CREATE FOLDER STRUCTURE
# ============================================================

print("üìÇ Creating folder structure...\n")

folders = [
    # Data
    "data/raw",
    "data/processed",
    "data/ground_truth",
    "data/schemas",
    "data/user_profiles",

    # Embeddings
    "embeddings/text",
    "embeddings/image",
    "embeddings/hybrid",
    "embeddings/user",
    "embeddings/configs",

    # Models
    "models/fusion",
    "models/reranker",
    "models/personalization",
    "models/checkpoints",

    # Indexes
    "indexes",

    # Source code
    "src",
    "api",
    "api/endpoints",
    "ui",
    "ui/components",
    "ui/assets",

    # Configs
    "configs",

    # Scripts
    "scripts",

    # Tests
    "tests",

    # Docs
    "docs",
    "docs/results",

    # Experiments
    "experiments",

    # Logs
    "logs"
]

for folder in folders:
    folder_path = PROJECT_ROOT / folder
    folder_path.mkdir(parents=True, exist_ok=True)
    print(f"  ‚úÖ {folder}")

print("\n‚úÖ Folder structure created!")

In [None]:
# ============================================================
# 4) COPY DATA FROM V1 (IMPROVED - WITH SYMLINK)
# ============================================================

import shutil
import os

print("üìã Checking for data in old project...\n")

if OLD_PROJECT.exists():
    # Check for data/raw
    old_raw = OLD_PROJECT / "data/raw"
    new_raw = PROJECT_ROOT / "data/raw"

    if old_raw.exists():
        print("  Found data/raw in old project")

        # Copy styles.csv (small file, no problem)
        old_styles = old_raw / "styles.csv"
        if old_styles.exists():
            print("  Copying styles.csv...")
            shutil.copy2(old_styles, new_raw / "styles.csv")
            print("    ‚úÖ styles.csv copied")

        # üî• IMAGES: USE SYMLINK INSTEAD OF COPY
        # This is much faster and doesn't duplicate data
        old_images = old_raw / "images"
        new_images = new_raw / "images"

        if old_images.exists():
            if new_images.exists():
                print("    ‚ö†Ô∏è Images already exist, skipping")
            else:
                # Option A: Create symlink (RECOMMENDED - instant!)
                try:
                    print("  Creating symlink to images folder (instant!)...")
                    os.symlink(str(old_images), str(new_images))
                    print("    ‚úÖ Images symlink created")
                except Exception as e:
                    print(f"    ‚ö†Ô∏è Symlink failed: {e}")
                    print("    üí° Alternative: We'll reference old path directly")
                    # Option B: Just use old path in code
                    IMAGE_DIR = old_images

        # Verify
        if (new_raw / "images").exists() or old_images.exists():
            test_images = list(old_images.glob("*.jpg"))[:5]
            print(f"    ‚úÖ Images accessible: {len(test_images)} samples verified")

    print("\n‚úÖ Data setup completed!")

else:
    print("‚ö†Ô∏è Old project not found")
    print("   You'll need to manually add data to data/raw/")
    print("   Required files:")
    print("     - data/raw/styles.csv")
    print("     - data/raw/images/")

# ============================================================
# IMPORTANT: Update image path in config
# ============================================================

# If symlink worked, use new path
if (PROJECT_ROOT / "data/raw/images").exists():
    IMAGE_DIR = PROJECT_ROOT / "data/raw/images"
    print(f"\n‚úÖ Image directory: {IMAGE_DIR}")
else:
    # Otherwise, use old path directly
    IMAGE_DIR = OLD_PROJECT / "data/raw/images"
    print(f"\nüí° Using old image directory: {IMAGE_DIR}")

print(f"   Total images: {len(list(IMAGE_DIR.glob('*.jpg')))}")

In [None]:
# ============================================================
# 5) VALIDATE DATA
# ============================================================

import pandas as pd

print("üîç Validating data...\n")

styles_path = PROJECT_ROOT / "data/raw/styles.csv"
images_dir = PROJECT_ROOT / "data/raw/images"

# Check styles.csv
if styles_path.exists():
    df = pd.read_csv(styles_path, nrows=5)
    print(f"‚úÖ styles.csv found: {len(pd.read_csv(styles_path))} rows")
    print(f"   Columns: {list(df.columns)}")
else:
    print("‚ùå styles.csv not found!")
    print(f"   Expected at: {styles_path}")

# Check images
if images_dir.exists():
    image_files = list(images_dir.glob("*.jpg"))
    print(f"‚úÖ Images folder found: {len(image_files)} images")
else:
    print("‚ùå Images folder not found!")
    print(f"   Expected at: {images_dir}")

print("\n‚úÖ Data validation completed!")

In [None]:
# ============================================================
# 6) INSTALL DEPENDENCIES
# ============================================================

print("üì¶ Installing core dependencies...\n")

# Install quietly
!pip install -q --upgrade pip
!pip install -q numpy pandas scikit-learn scipy
!pip install -q torch torchvision transformers sentence-transformers accelerate
!pip install -q faiss-cpu lightgbm xgboost
!pip install -q Pillow opencv-python
!pip install -q langdetect nltk
!pip install -q tqdm rich pyyaml python-dotenv jsonschema
!pip install -q fastapi uvicorn pydantic
!pip install -q streamlit

print("\n‚úÖ Dependencies installed!")
print("\nüîç Checking versions:")

import torch
import transformers
import sentence_transformers
import faiss

print(f"  PyTorch: {torch.__version__}")
print(f"  Transformers: {transformers.__version__}")
print(f"  Sentence-Transformers: {sentence_transformers.__version__}")
print(f"  FAISS: {faiss.__version__}")
print(f"  CUDA Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"  GPU: {torch.cuda.get_device_name(0)}")

In [None]:
# ============================================================
# 7) CREATE CORE FILES
# ============================================================

print("üìù Creating core files...\n")

# Create __init__.py files
init_files = [
    "src/__init__.py",
    "api/__init__.py",
    "api/endpoints/__init__.py",
    "ui/__init__.py",
    "ui/components/__init__.py",
    "tests/__init__.py"
]

for init_file in init_files:
    init_path = PROJECT_ROOT / init_file
    init_path.touch(exist_ok=True)
    print(f"  ‚úÖ {init_file}")

print("\n‚úÖ Core files created!")

In [None]:
# ============================================================
# 8) INITIALIZE GIT (Optional)
# ============================================================

print("üîß Initializing Git...\n")

# Change to project directory
%cd {PROJECT_ROOT}

# Initialize git if not already
!git init

# Set git config
!git config user.name "Your Name"
!git config user.email "your.email@example.com"

print("\n‚úÖ Git initialized!")
print("   Don't forget to update .gitignore and add your files!")

In [None]:
# ============================================================
# 9) TEST IMPORTS
# ============================================================

print("üß™ Testing imports...\n")

try:
    import numpy as np
    import pandas as pd
    import torch
    from transformers import CLIPModel, CLIPProcessor
    from sentence_transformers import SentenceTransformer
    import faiss
    import lightgbm as lgb
    from PIL import Image
    import yaml
    from pathlib import Path

    print("‚úÖ All imports successful!")

except ImportError as e:
    print(f"‚ùå Import error: {e}")
    print("   Please run the installation cell again")

In [None]:
# ============================================================
# 10) FINAL CHECKLIST
# ============================================================

print("‚úÖ PROJECT INITIALIZATION COMPLETED!")
print("=" * 60)
print("\nüìã Checklist:")
print("  ‚úÖ Google Drive mounted")
print("  ‚úÖ Folder structure created")
print(f"  {'‚úÖ' if (PROJECT_ROOT / 'data/raw/styles.csv').exists() else '‚ùå'} styles.csv present")
print(f"  {'‚úÖ' if (PROJECT_ROOT / 'data/raw/images').exists() else '‚ùå'} Images folder present")
print("  ‚úÖ Dependencies installed")
print("  ‚úÖ Git initialized")
print("  ‚úÖ Imports tested")

print("\nüöÄ Next Steps:")
print("  1. Review the project structure")
print("  2. Start with Phase 1: Data Preparation")
print("  3. Open: notebooks/phase1_foundation/01_data_preparation.ipynb")

print("\n" + "=" * 60)
print("Project Root:", PROJECT_ROOT)
print("=" * 60)

---

## üìä Project Overview

**Current Status:** ‚úÖ Initialized

**Next Notebook:** `phase1_foundation/01_data_preparation.ipynb`

---

### üìÅ Folder Structure

```
ai_fashion_assistant_v2/
‚îú‚îÄ‚îÄ data/              # Data files
‚îú‚îÄ‚îÄ embeddings/        # Generated embeddings
‚îú‚îÄ‚îÄ models/            # Trained models
‚îú‚îÄ‚îÄ indexes/           # FAISS indexes
‚îú‚îÄ‚îÄ src/               # Source code
‚îú‚îÄ‚îÄ api/               # FastAPI backend
‚îú‚îÄ‚îÄ ui/                # Streamlit frontend
‚îú‚îÄ‚îÄ configs/           # Configuration files
‚îú‚îÄ‚îÄ tests/             # Unit tests
‚îî‚îÄ‚îÄ docs/              # Documentation
```

---

### üéØ Roadmap

- [x] **Phase 0:** Project initialization
- [ ] **Phase 1:** Foundation (Data + SSOT)
- [ ] **Phase 2:** Embeddings
- [ ] **Phase 3:** Retrieval
- [ ] **Phase 4:** Query Understanding
- [ ] **Phase 5:** Ranking
- [ ] **Phase 6:** Personalization
- [ ] **Phase 7:** Evaluation
- [ ] **Phase 8:** Chatbot
- [ ] **Phase 9:** Deployment
- [ ] **Phase 10:** Final Demo

---