# üéØ Notebook 1: Three-Tier Vision System Setup (CORRECTED FOR 6GB GPU)

## ‚ö†Ô∏è IMPORTANT: VRAM Management for RTX 3060 6GB

Your GPU has **6GB VRAM total**, but:
- **Moondream2** needs ~3.5 GB
- **LLaVA-1.5** needs ~4-5 GB
- **Both together** need ~7-8 GB ‚ùå (won't fit!)

**Solution:** We'll use a **dynamic loading** strategy - load models on-demand and clear them when switching.

## What This Notebook Does
1. ‚úÖ Install required libraries
2. ‚úÖ Test GPU availability
3. ‚úÖ Setup dynamic model loading (one at a time)
4. ‚úÖ Load API key from .env file
5. ‚úÖ Test all three modes
6. ‚úÖ Export as reusable module

---
## üì¶ Step 1: Install Dependencies

In [8]:
# Install required packages
!pip install -q torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
#!pip install -q transformers>=4.37.0
#!pip install -q accelerate>=0.26.0
#!pip install -q bitsandbytes>=0.42.0
#!pip install -q einops
#!pip install -q sentencepiece
#!pip install -q protobuf
#!pip install -q Pillow
#!pip install -q openai
!pip install -q python-dotenv  # For loading .env file

print("‚úÖ All packages installed!")

‚úÖ All packages installed!


ERROR: Invalid requirement: '#': Expected package name at the start of dependency specifier
    #
    ^


In [10]:
!pip install -q python-dotenv 

---
## üñ•Ô∏è Step 2: Verify GPU Setup

In [11]:
import torch

print("=" * 60)
print("GPU CONFIGURATION CHECK")
print("=" * 60)

print(f"\nüîß CUDA Available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"üéÆ GPU Name: {torch.cuda.get_device_name(0)}")
    total_vram = torch.cuda.get_device_properties(0).total_memory / 1024**3
    print(f"üíæ Total VRAM: {total_vram:.2f} GB")
    print(f"üî¢ CUDA Version: {torch.version.cuda}")
    print(f"üêç PyTorch Version: {torch.__version__}")
    
    free_vram = (torch.cuda.get_device_properties(0).total_memory - torch.cuda.memory_allocated(0)) / 1024**3
    print(f"\nüìä Free VRAM: {free_vram:.2f} GB")
    
    if total_vram < 8:
        print("\n‚ö†Ô∏è Note: Your GPU has less than 8GB VRAM.")
        print("   We'll use DYNAMIC LOADING - models load on-demand.")
    
    print("\n‚úÖ GPU ready for dynamic model loading!")
else:
    print("\n‚ö†Ô∏è WARNING: No GPU detected. Models will run on CPU (very slow)")

print("=" * 60)

GPU CONFIGURATION CHECK

üîß CUDA Available: True
üéÆ GPU Name: NVIDIA GeForce RTX 3060 Laptop GPU
üíæ Total VRAM: 6.00 GB
üî¢ CUDA Version: 12.1
üêç PyTorch Version: 2.5.1

üìä Free VRAM: 6.00 GB

‚ö†Ô∏è Note: Your GPU has less than 8GB VRAM.
   We'll use DYNAMIC LOADING - models load on-demand.

‚úÖ GPU ready for dynamic model loading!


---
## üîë Step 3: Load Environment Variables (.env file)

In [12]:
import os
from dotenv import load_dotenv

# Load .env file from current directory
load_dotenv()

# Check if OpenAI API key is loaded
openai_key = os.getenv('OPENAI_API_KEY')

if openai_key:
    print("‚úÖ OpenAI API Key loaded from .env file")
    print(f"   Key starts with: {openai_key[:10]}...")
else:
    print("‚ö†Ô∏è No OPENAI_API_KEY found in .env file")
    print("\nMake sure your .env file contains:")
    print("   OPENAI_API_KEY=sk-...")
    print("\nOr set it manually:")
    print("   os.environ['OPENAI_API_KEY'] = 'sk-...'")

‚úÖ OpenAI API Key loaded from .env file
   Key starts with: sk-proj-iv...


---
## üéØ Step 4: Create Dynamic Vision Manager (VRAM-Optimized)

This version loads models **on-demand** and clears them when switching modes.

In [13]:
import time
import io
import base64
import json
import re
from PIL import Image
from transformers import (
    AutoModelForCausalLM, 
    AutoTokenizer,
    LlavaNextProcessor,
    LlavaNextForConditionalGeneration,
    BitsAndBytesConfig
)
from openai import OpenAI, APIError

class VisionManager:
    """
    Dynamic Vision Manager - Optimized for 6GB VRAM
    
    Models are loaded on-demand and cleared when switching modes.
    This allows working with limited VRAM.
    """
    
    def __init__(self, openai_api_key=None):
        """
        Initialize VisionManager
        
        Args:
            openai_api_key: Optional OpenAI API key for premium mode
        """
        self.moondream_model = None
        self.moondream_tokenizer = None
        self.llava_model = None
        self.llava_processor = None
        self.openai_client = None
        
        # Get API key from parameter or environment
        self.openai_api_key = openai_api_key or os.getenv('OPENAI_API_KEY')
        
        # Track currently loaded model
        self.current_model = None
        
        print("üîß VisionManager initialized (Dynamic Loading Mode)")
        print(f"   Premium Mode: {'‚úÖ Ready' if self.openai_api_key else '‚ùå No API key'}")
        print("\nüí° Models will load on-demand to save VRAM")
    
    def clear_all_models(self):
        """Clear all loaded models from VRAM"""
        if self.moondream_model is not None:
            del self.moondream_model
            del self.moondream_tokenizer
            self.moondream_model = None
            self.moondream_tokenizer = None
            print("üßπ Cleared Moondream from VRAM")
        
        if self.llava_model is not None:
            del self.llava_model
            del self.llava_processor
            self.llava_model = None
            self.llava_processor = None
            print("üßπ Cleared LLaVA from VRAM")
        
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
            free_vram = (torch.cuda.get_device_properties(0).total_memory - torch.cuda.memory_allocated(0)) / 1024**3
            print(f"üíæ Free VRAM: {free_vram:.2f} GB")
        
        self.current_model = None
    
    def load_moondream(self):
        """Load Moondream2 model"""
        if self.moondream_model is not None:
            print("‚úÖ Moondream already loaded")
            return True
        
        try:
            print("\nüì• Loading Moondream2...")
            
            # Clear other models first
            if self.current_model == "llava":
                print("   Clearing LLaVA to make room...")
                if self.llava_model is not None:
                    del self.llava_model
                    del self.llava_processor
                    self.llava_model = None
                    self.llava_processor = None
                    torch.cuda.empty_cache()
            
            self.moondream_model = AutoModelForCausalLM.from_pretrained(
                "vikhyatk/moondream2",
                revision="2024-08-26",
                trust_remote_code=True,
                torch_dtype=torch.float16,
                device_map="auto" if torch.cuda.is_available() else "cpu"
            )
            
            self.moondream_tokenizer = AutoTokenizer.from_pretrained(
                "vikhyatk/moondream2",
                revision="2024-08-26",
                trust_remote_code=True
            )
            
            self.current_model = "moondream"
            
            if torch.cuda.is_available():
                vram_used = torch.cuda.memory_allocated(0) / 1024**3
                print(f"‚úÖ Moondream loaded - VRAM used: {vram_used:.2f} GB")
            else:
                print("‚úÖ Moondream loaded on CPU")
            
            return True
            
        except Exception as e:
            print(f"‚ùå Error loading Moondream: {e}")
            return False
    
    def load_llava(self):
        """Load LLaVA-1.5 model with 4-bit quantization"""
        if self.llava_model is not None:
            print("‚úÖ LLaVA already loaded")
            return True
        
        try:
            print("\nüì• Loading LLaVA-1.5 (4-bit quantization)...")
            
            # Clear other models first
            if self.current_model == "moondream":
                print("   Clearing Moondream to make room...")
                if self.moondream_model is not None:
                    del self.moondream_model
                    del self.moondream_tokenizer
                    self.moondream_model = None
                    self.moondream_tokenizer = None
                    torch.cuda.empty_cache()
            
            # Check available VRAM
            if torch.cuda.is_available():
                free_vram = (torch.cuda.get_device_properties(0).total_memory - torch.cuda.memory_allocated(0)) / 1024**3
                print(f"   Free VRAM: {free_vram:.2f} GB")
                
                if free_vram < 4.0:
                    print("\n‚ö†Ô∏è Not enough free VRAM for LLaVA (needs ~4GB)")
                    print("   Trying with CPU offloading...")
            
            # 4-bit quantization config
            quantization_config = BitsAndBytesConfig(
                load_in_4bit=True,
                bnb_4bit_compute_dtype=torch.float16,
                bnb_4bit_use_double_quant=True,
                bnb_4bit_quant_type="nf4",
                llm_int8_enable_fp32_cpu_offload=True  # Enable CPU offloading
            )
            
            self.llava_processor = LlavaNextProcessor.from_pretrained(
                "llava-hf/llava-v1.6-mistral-7b-hf"
            )
            
            self.llava_model = LlavaNextForConditionalGeneration.from_pretrained(
                "llava-hf/llava-v1.6-mistral-7b-hf",
                quantization_config=quantization_config,
                device_map="auto",
                torch_dtype=torch.float16,
                low_cpu_mem_usage=True
            )
            
            self.current_model = "llava"
            
            if torch.cuda.is_available():
                vram_used = torch.cuda.memory_allocated(0) / 1024**3
                print(f"‚úÖ LLaVA loaded - VRAM used: {vram_used:.2f} GB")
            else:
                print("‚úÖ LLaVA loaded on CPU")
            
            return True
            
        except Exception as e:
            print(f"‚ùå Error loading LLaVA: {e}")
            import traceback
            traceback.print_exc()
            return False
    
    def analyze_image(self, image, mode="quick", yolo_detections=None, ocr_results=None):
        """
        Analyze image with selected mode
        
        Args:
            image: PIL Image or path to image
            mode: 'quick', 'standard', or 'premium'
            yolo_detections: List of YOLO detections
            ocr_results: OCR results
        
        Returns:
            dict: Analysis results
        """
        # Load image if path
        if isinstance(image, str):
            image = Image.open(image)
        
        # Build context from YOLO and OCR
        context = self._build_context(yolo_detections, ocr_results)
        
        # Route to appropriate mode
        if mode == "quick":
            return self._analyze_quick(image, context)
        elif mode == "standard":
            return self._analyze_standard(image, context)
        elif mode == "premium":
            return self._analyze_premium(image, context)
        else:
            raise ValueError(f"Invalid mode: {mode}")
    
    def _build_context(self, yolo_detections, ocr_results):
        """Build context from YOLO and OCR results"""
        context = {}
        
        if yolo_detections:
            labels = [d.get('label', '') for d in yolo_detections]
            context['detected_labels'] = ', '.join(labels)
        
        if ocr_results:
            context['ocr_text'] = ocr_results.get('raw_text', '')[:500]
            context['brand'] = ocr_results.get('brand')
            context['product_name'] = ocr_results.get('product_name')
        
        return context
    
    def _analyze_quick(self, image, context):
        """Quick mode with Moondream2"""
        # Load model if not loaded
        if not self.load_moondream():
            return {"error": "Failed to load Moondream model"}
        
        try:
            prompt = "Analyze this food product. "
            if context.get('detected_labels'):
                prompt += f"Labels detected: {context['detected_labels']}. "
            prompt += "Provide: category, main ingredients (if visible), and key features."
            
            start_time = time.time()
            enc_image = self.moondream_model.encode_image(image)
            answer = self.moondream_model.answer_question(enc_image, prompt, self.moondream_tokenizer)
            elapsed = time.time() - start_time
            
            return {
                "mode": "quick",
                "response": answer,
                "time_seconds": round(elapsed, 2),
                "context_used": context
            }
            
        except Exception as e:
            return {"error": f"Quick mode error: {str(e)}"}
    
    def _analyze_standard(self, image, context):
        """Standard mode with LLaVA-1.5"""
        # Load model if not loaded
        if not self.load_llava():
            return {"error": "Failed to load LLaVA model"}
        
        try:
            prompt = "Analyze this food product in detail. "
            if context.get('detected_labels'):
                prompt += f"Labels detected: {context['detected_labels']}. "
            if context.get('ocr_text'):
                prompt += f"Text visible: {context['ocr_text'][:200]}... "
            prompt += "Provide: category, ingredients, nutritional highlights, and dietary suitability."
            
            conversation = [{
                "role": "user",
                "content": [
                    {"type": "image"},
                    {"type": "text", "text": prompt}
                ]
            }]
            
            prompt_text = self.llava_processor.apply_chat_template(conversation, add_generation_prompt=True)
            
            start_time = time.time()
            inputs = self.llava_processor(images=image, text=prompt_text, return_tensors="pt")
            
            # Move to GPU only if available and has space
            if torch.cuda.is_available():
                try:
                    inputs = {k: v.to("cuda") for k, v in inputs.items()}
                except:
                    print("‚ö†Ô∏è Moving inputs to GPU failed, using CPU")
            
            with torch.no_grad():
                output = self.llava_model.generate(**inputs, max_new_tokens=500, do_sample=False)
            
            response = self.llava_processor.decode(output[0], skip_special_tokens=True)
            
            if "[/INST]" in response:
                response = response.split("[/INST]")[-1].strip()
            
            elapsed = time.time() - start_time
            
            return {
                "mode": "standard",
                "response": response,
                "time_seconds": round(elapsed, 2),
                "context_used": context
            }
            
        except Exception as e:
            import traceback
            return {"error": f"Standard mode error: {str(e)}", "traceback": traceback.format_exc()}
    
    def _analyze_premium(self, image, context):
        """Premium mode with OpenAI GPT-4o"""
        if not self.openai_api_key:
            return {"error": "Premium mode not available - no API key. Check your .env file."}
        
        try:
            # Initialize client if needed
            if not self.openai_client:
                self.openai_client = OpenAI(api_key=self.openai_api_key)
            
            # Encode image
            buffered = io.BytesIO()
            image.save(buffered, format="JPEG")
            image_data = base64.b64encode(buffered.getvalue()).decode('utf-8')
            
            prompt = "Analyze this food product and provide structured information.\n\n"
            if context.get('detected_labels'):
                prompt += f"Labels detected: {context['detected_labels']}\n"
            if context.get('ocr_text'):
                prompt += f"Text visible: {context['ocr_text'][:300]}\n"
            
            prompt += """\nRespond ONLY with a JSON object (no markdown) with this structure:
{
  "category": "specific food category",
  "product_type": "brief description",
  "description": "2-3 sentence description",
  "key_ingredients": ["list of main ingredients"],
  "usage_suggestions": "how to use this product",
  "suitable_for": ["dietary types"]
}"""
            
            start_time = time.time()
            response = self.openai_client.chat.completions.create(
                model="gpt-4o",
                messages=[{
                    "role": "user",
                    "content": [
                        {"type": "text", "text": prompt},
                        {
                            "type": "image_url",
                            "image_url": {
                                "url": f"data:image/jpeg;base64,{image_data}",
                                "detail": "low"
                            }
                        }
                    ]
                }],
                max_tokens=800
            )
            
            elapsed = time.time() - start_time
            
            response_text = response.choices[0].message.content
            response_text = re.sub(r'```json\s*|```\s*', '', response_text).strip()
            
            json_match = re.search(r'\{.*\}', response_text, re.DOTALL)
            
            if json_match:
                result = json.loads(json_match.group())
                return {
                    "mode": "premium",
                    "response": result,
                    "time_seconds": round(elapsed, 2),
                    "context_used": context
                }
            else:
                return {"error": "No valid JSON in response", "raw_response": response_text}
            
        except APIError as e:
            return {"error": f"OpenAI API error: {str(e)}"}
        except Exception as e:
            return {"error": f"Premium mode error: {str(e)}"}
    
    def get_status(self):
        """Get current status of all modes"""
        return {
            "quick": {
                "available": True,
                "loaded": self.moondream_model is not None
            },
            "standard": {
                "available": True,
                "loaded": self.llava_model is not None
            },
            "premium": {
                "available": self.openai_api_key is not None,
                "loaded": self.openai_client is not None
            },
            "current_model": self.current_model
        }

# Create global instance
vision_manager = VisionManager()

print("\n‚úÖ VisionManager created successfully!")
print("\nüìä Status:")
status = vision_manager.get_status()
for mode, info in status.items():
    if mode == "current_model":
        print(f"   Currently loaded: {info or 'None'}")
    else:
        avail = "‚úÖ" if info['available'] else "‚ùå"
        loaded = "(loaded)" if info['loaded'] else "(will load on-demand)"
        print(f"   {avail} {mode.capitalize()}: {loaded}")

üîß VisionManager initialized (Dynamic Loading Mode)
   Premium Mode: ‚úÖ Ready

üí° Models will load on-demand to save VRAM

‚úÖ VisionManager created successfully!

üìä Status:
   ‚úÖ Quick: (will load on-demand)
   ‚úÖ Standard: (will load on-demand)
   ‚úÖ Premium: (will load on-demand)
   Currently loaded: None


---
## üß™ Step 5: Test Quick Mode (Moondream2)

In [14]:
# Test Quick Mode
TEST_IMAGE_PATH = r"C:\Users\lokes\Desktop\ironhack\final_project\dataset\dataset\images\nutriScoreA (812).jpg"  # UPDATE THIS

# Uncomment to test
image = Image.open(TEST_IMAGE_PATH)
result = vision_manager.analyze_image(image, mode="quick")
# 
if 'error' in result:
     print(f"‚ùå Error: {result['error']}")
else:
     print(f"‚úÖ Success! Time: {result['time_seconds']}s")
     print(f"\nResponse: {result['response']}")

print("‚ö†Ô∏è Update TEST_IMAGE_PATH to test Quick Mode")


üì• Loading Moondream2...


PhiForCausalLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From üëâv4.50üëà onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.


‚úÖ Moondream loaded - VRAM used: 3.49 GB
‚úÖ Success! Time: 7.22s

Response: The food product is a plastic container filled with a variety of fruits, including watermelon, cantaloupe, and honeydew. The container is labeled as a melon mix, which suggests that it contains multiple types of melons. The container is placed on a table, and the fruits appear to be fresh and ready to be eaten.
‚ö†Ô∏è Update TEST_IMAGE_PATH to test Quick Mode


---
## üß™ Step 6: Test Standard Mode (LLaVA)

In [15]:
# Test Standard Mode
# This will automatically clear Moondream and load LLaVA

# Uncomment to test
image = Image.open(TEST_IMAGE_PATH)
result = vision_manager.analyze_image(image, mode="standard")
# 
if 'error' in result:
     print(f"‚ùå Error: {result['error']}")
else:
     print(f"‚úÖ Success! Time: {result['time_seconds']}s")
     print(f"\nResponse: {result['response']}")

print("‚ö†Ô∏è Update TEST_IMAGE_PATH to test Standard Mode")


üì• Loading LLaVA-1.5 (4-bit quantization)...
   Clearing Moondream to make room...
   Free VRAM: 5.99 GB


Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
Loading checkpoint shards: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 4/4 [00:38<00:00,  9.58s/it]
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


‚úÖ LLaVA loaded - VRAM used: 4.01 GB
‚úÖ Success! Time: 120.59s

Response: Category: Fruit Salad

Ingredients:
- Watermelon
- Cantaloupe
- Honeydew
- Pineapple
- Strawberries

Nutritional Highlights:
- Rich in vitamins and minerals, particularly vitamin C, vitamin A, and potassium
- Good source of dietary fiber
- Low in calories and fat
- Contains natural sugars for a quick energy boost

Dietary Suitability:
- Vegetarian and vegan-friendly
- Suitable for those following a gluten-free diet
- May not be suitable for individuals with allergies to certain fruits, such as pineapple or strawberries
- It is important to note that the packaging indicates that the product contains nuts, which may be a concern for those with nut allergies.

The product is a pre-cut, pre-packaged fruit salad, which is convenient for those looking for a healthy, ready-to-eat snack. The packaging also suggests that the product is suitable for pregnant women, although it is always advisable to consult with a health

---
## üß™ Step 7: Test Premium Mode (OpenAI)

In [16]:
# Test Premium Mode
# This uses your API key from .env file

# Uncomment to test
image = Image.open(TEST_IMAGE_PATH)
result = vision_manager.analyze_image(image, mode="premium")
# 
if 'error' in result:
    print(f"‚ùå Error: {result['error']}")
else:
     print(f"‚úÖ Success! Time: {result['time_seconds']}s")
     print(f"\nResponse:")
     print(json.dumps(result['response'], indent=2))

print("‚ö†Ô∏è Update TEST_IMAGE_PATH and ensure .env has OPENAI_API_KEY")

‚úÖ Success! Time: 5.35s

Response:
{
  "category": "fruit mix",
  "product_type": "pre-packaged melon mix",
  "description": "This is a ready-to-eat mix of various melon types, such as watermelon, cantaloupe, and possibly others, all cut into bite-sized pieces. It is ideal for a quick snack or as a refreshing addition to meals.",
  "key_ingredients": [
    "watermelon",
    "cantaloupe",
    "honeydew"
  ],
  "usage_suggestions": "Enjoy directly from the container, or add to a fruit salad or dessert.",
  "suitable_for": [
    "vegan",
    "vegetarian",
    "gluten-free"
  ]
}
‚ö†Ô∏è Update TEST_IMAGE_PATH and ensure .env has OPENAI_API_KEY


---
## üíæ Step 8: Export as Python Module

In [20]:
# Manual export of vision_manager.py
vision_manager_code = '''"""
NutriGreen Vision Manager - Optimized for 6GB VRAM
Dynamic loading: Models load on-demand and clear when switching
"""

import torch
import time
import os
import io
import base64
import json
import re
from PIL import Image
from dotenv import load_dotenv
from transformers import (
    AutoModelForCausalLM, 
    AutoTokenizer,
    LlavaNextProcessor,
    LlavaNextForConditionalGeneration,
    BitsAndBytesConfig
)
from openai import OpenAI, APIError

# Load environment variables
load_dotenv()


class VisionManager:
    """
    Dynamic Vision Manager - Optimized for 6GB VRAM
    
    Models are loaded on-demand and cleared when switching modes.
    This allows working with limited VRAM.
    """
    
    def __init__(self, openai_api_key=None):
        """
        Initialize VisionManager
        
        Args:
            openai_api_key: Optional OpenAI API key for premium mode
        """
        self.moondream_model = None
        self.moondream_tokenizer = None
        self.llava_model = None
        self.llava_processor = None
        self.openai_client = None
        
        # Get API key from parameter or environment
        self.openai_api_key = openai_api_key or os.getenv('OPENAI_API_KEY')
        
        # Track currently loaded model
        self.current_model = None
        
        print("üîß VisionManager initialized (Dynamic Loading Mode)")
        print(f"   Premium Mode: {'‚úÖ Ready' if self.openai_api_key else '‚ùå No API key'}")
        print("\\nüí° Models will load on-demand to save VRAM")
    
    def clear_all_models(self):
        """Clear all loaded models from VRAM"""
        if self.moondream_model is not None:
            del self.moondream_model
            del self.moondream_tokenizer
            self.moondream_model = None
            self.moondream_tokenizer = None
            print("üßπ Cleared Moondream from VRAM")
        
        if self.llava_model is not None:
            del self.llava_model
            del self.llava_processor
            self.llava_model = None
            self.llava_processor = None
            print("üßπ Cleared LLaVA from VRAM")
        
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
            free_vram = (torch.cuda.get_device_properties(0).total_memory - torch.cuda.memory_allocated(0)) / 1024**3
            print(f"üíæ Free VRAM: {free_vram:.2f} GB")
        
        self.current_model = None
    
    def load_moondream(self):
        """Load Moondream2 model"""
        if self.moondream_model is not None:
            print("‚úÖ Moondream already loaded")
            return True
        
        try:
            print("\\nüì• Loading Moondream2...")
            
            # Clear other models first
            if self.current_model == "llava":
                print("   Clearing LLaVA to make room...")
                if self.llava_model is not None:
                    del self.llava_model
                    del self.llava_processor
                    self.llava_model = None
                    self.llava_processor = None
                    torch.cuda.empty_cache()
            
            self.moondream_model = AutoModelForCausalLM.from_pretrained(
                "vikhyatk/moondream2",
                revision="2024-08-26",
                trust_remote_code=True,
                torch_dtype=torch.float16,
                device_map="auto" if torch.cuda.is_available() else "cpu"
            )
            
            self.moondream_tokenizer = AutoTokenizer.from_pretrained(
                "vikhyatk/moondream2",
                revision="2024-08-26",
                trust_remote_code=True
            )
            
            self.current_model = "moondream"
            
            if torch.cuda.is_available():
                vram_used = torch.cuda.memory_allocated(0) / 1024**3
                print(f"‚úÖ Moondream loaded - VRAM used: {vram_used:.2f} GB")
            else:
                print("‚úÖ Moondream loaded on CPU")
            
            return True
            
        except Exception as e:
            print(f"‚ùå Error loading Moondream: {e}")
            return False
    
    def load_llava(self):
        """Load LLaVA-1.5 model with 4-bit quantization"""
        if self.llava_model is not None:
            print("‚úÖ LLaVA already loaded")
            return True
        
        try:
            print("\\nüì• Loading LLaVA-1.5 (4-bit quantization)...")
            
            # Clear other models first
            if self.current_model == "moondream":
                print("   Clearing Moondream to make room...")
                if self.moondream_model is not None:
                    del self.moondream_model
                    del self.moondream_tokenizer
                    self.moondream_model = None
                    self.moondream_tokenizer = None
                    torch.cuda.empty_cache()
            
            # Check available VRAM
            if torch.cuda.is_available():
                free_vram = (torch.cuda.get_device_properties(0).total_memory - torch.cuda.memory_allocated(0)) / 1024**3
                print(f"   Free VRAM: {free_vram:.2f} GB")
                
                if free_vram < 4.0:
                    print("\\n‚ö†Ô∏è Not enough free VRAM for LLaVA (needs ~4GB)")
                    print("   Trying with CPU offloading...")
            
            # 4-bit quantization config
            quantization_config = BitsAndBytesConfig(
                load_in_4bit=True,
                bnb_4bit_compute_dtype=torch.float16,
                bnb_4bit_use_double_quant=True,
                bnb_4bit_quant_type="nf4",
                llm_int8_enable_fp32_cpu_offload=True
            )
            
            self.llava_processor = LlavaNextProcessor.from_pretrained(
                "llava-hf/llava-v1.6-mistral-7b-hf"
            )
            
            self.llava_model = LlavaNextForConditionalGeneration.from_pretrained(
                "llava-hf/llava-v1.6-mistral-7b-hf",
                quantization_config=quantization_config,
                device_map="auto",
                torch_dtype=torch.float16,
                low_cpu_mem_usage=True
            )
            
            self.current_model = "llava"
            
            if torch.cuda.is_available():
                vram_used = torch.cuda.memory_allocated(0) / 1024**3
                print(f"‚úÖ LLaVA loaded - VRAM used: {vram_used:.2f} GB")
            else:
                print("‚úÖ LLaVA loaded on CPU")
            
            return True
            
        except Exception as e:
            print(f"‚ùå Error loading LLaVA: {e}")
            import traceback
            traceback.print_exc()
            return False
    
    def analyze_image(self, image, mode="quick", yolo_detections=None, ocr_results=None):
        """
        Analyze image with selected mode
        
        Args:
            image: PIL Image or path to image
            mode: 'quick', 'standard', or 'premium'
            yolo_detections: List of YOLO detections
            ocr_results: OCR results
        
        Returns:
            dict: Analysis results
        """
        # Load image if path
        if isinstance(image, str):
            image = Image.open(image)
        
        # Build context from YOLO and OCR
        context = self._build_context(yolo_detections, ocr_results)
        
        # Route to appropriate mode
        if mode == "quick":
            return self._analyze_quick(image, context)
        elif mode == "standard":
            return self._analyze_standard(image, context)
        elif mode == "premium":
            return self._analyze_premium(image, context)
        else:
            raise ValueError(f"Invalid mode: {mode}")
    
    def _build_context(self, yolo_detections, ocr_results):
        """Build context from YOLO and OCR results"""
        context = {}
        
        if yolo_detections:
            labels = [d.get('label', '') for d in yolo_detections]
            context['detected_labels'] = ', '.join(labels)
        
        if ocr_results:
            context['ocr_text'] = ocr_results.get('raw_text', '')[:500]
            context['brand'] = ocr_results.get('brand')
            context['product_name'] = ocr_results.get('product_name')
        
        return context
    
    def _analyze_quick(self, image, context):
        """Quick mode with Moondream2"""
        # Load model if not loaded
        if not self.load_moondream():
            return {"error": "Failed to load Moondream model"}
        
        try:
            prompt = "Analyze this food product. "
            if context.get('detected_labels'):
                prompt += f"Labels detected: {context['detected_labels']}. "
            prompt += "Provide: category, main ingredients (if visible), and key features."
            
            start_time = time.time()
            enc_image = self.moondream_model.encode_image(image)
            answer = self.moondream_model.answer_question(enc_image, prompt, self.moondream_tokenizer)
            elapsed = time.time() - start_time
            
            return {
                "mode": "quick",
                "response": answer,
                "time_seconds": round(elapsed, 2),
                "context_used": context
            }
            
        except Exception as e:
            return {"error": f"Quick mode error: {str(e)}"}
    
    def _analyze_standard(self, image, context):
        """Standard mode with LLaVA-1.5"""
        # Load model if not loaded
        if not self.load_llava():
            return {"error": "Failed to load LLaVA model"}
        
        try:
            prompt = "Analyze this food product in detail. "
            if context.get('detected_labels'):
                prompt += f"Labels detected: {context['detected_labels']}. "
            if context.get('ocr_text'):
                prompt += f"Text visible: {context['ocr_text'][:200]}... "
            prompt += "Provide: category, ingredients, nutritional highlights, and dietary suitability."
            
            conversation = [{
                "role": "user",
                "content": [
                    {"type": "image"},
                    {"type": "text", "text": prompt}
                ]
            }]
            
            prompt_text = self.llava_processor.apply_chat_template(conversation, add_generation_prompt=True)
            
            start_time = time.time()
            inputs = self.llava_processor(images=image, text=prompt_text, return_tensors="pt")
            
            # Move to GPU only if available and has space
            if torch.cuda.is_available():
                try:
                    inputs = {k: v.to("cuda") for k, v in inputs.items()}
                except:
                    print("‚ö†Ô∏è Moving inputs to GPU failed, using CPU")
            
            with torch.no_grad():
                output = self.llava_model.generate(**inputs, max_new_tokens=500, do_sample=False)
            
            response = self.llava_processor.decode(output[0], skip_special_tokens=True)
            
            if "[/INST]" in response:
                response = response.split("[/INST]")[-1].strip()
            
            elapsed = time.time() - start_time
            
            return {
                "mode": "standard",
                "response": response,
                "time_seconds": round(elapsed, 2),
                "context_used": context
            }
            
        except Exception as e:
            import traceback
            return {"error": f"Standard mode error: {str(e)}", "traceback": traceback.format_exc()}
    
    def _analyze_premium(self, image, context):
        """Premium mode with OpenAI GPT-4o"""
        if not self.openai_api_key:
            return {"error": "Premium mode not available - no API key. Check your .env file."}
        
        try:
            # Initialize client if needed
            if not self.openai_client:
                self.openai_client = OpenAI(api_key=self.openai_api_key)
            
            # Encode image
            buffered = io.BytesIO()
            image.save(buffered, format="JPEG")
            image_data = base64.b64encode(buffered.getvalue()).decode('utf-8')
            
            prompt = "Analyze this food product and provide structured information.\\n\\n"
            if context.get('detected_labels'):
                prompt += f"Labels detected: {context['detected_labels']}\\n"
            if context.get('ocr_text'):
                prompt += f"Text visible: {context['ocr_text'][:300]}\\n"
            
            prompt += """\\nRespond ONLY with a JSON object (no markdown) with this structure:
{
  "category": "specific food category",
  "product_type": "brief description",
  "description": "2-3 sentence description",
  "key_ingredients": ["list of main ingredients"],
  "usage_suggestions": "how to use this product",
  "suitable_for": ["dietary types"]
}"""
            
            start_time = time.time()
            response = self.openai_client.chat.completions.create(
                model="gpt-4o",
                messages=[{
                    "role": "user",
                    "content": [
                        {"type": "text", "text": prompt},
                        {
                            "type": "image_url",
                            "image_url": {
                                "url": f"data:image/jpeg;base64,{image_data}",
                                "detail": "low"
                            }
                        }
                    ]
                }],
                max_tokens=800
            )
            
            elapsed = time.time() - start_time
            
            response_text = response.choices[0].message.content
            response_text = re.sub(r'```json\\s*|```\\s*', '', response_text).strip()
            
            json_match = re.search(r'\\{.*\\}', response_text, re.DOTALL)
            
            if json_match:
                result = json.loads(json_match.group())
                return {
                    "mode": "premium",
                    "response": result,
                    "time_seconds": round(elapsed, 2),
                    "context_used": context
                }
            else:
                return {"error": "No valid JSON in response", "raw_response": response_text}
            
        except APIError as e:
            return {"error": f"OpenAI API error: {str(e)}"}
        except Exception as e:
            return {"error": f"Premium mode error: {str(e)}"}
    
    def get_status(self):
        """Get current status of all modes"""
        return {
            "quick": {
                "available": True,
                "loaded": self.moondream_model is not None
            },
            "standard": {
                "available": True,
                "loaded": self.llava_model is not None
            },
            "premium": {
                "available": self.openai_api_key is not None,
                "loaded": self.openai_client is not None
            },
            "current_model": self.current_model
        }
'''

# Save to file
with open('vision_manager.py', 'w', encoding='utf-8') as f:
    f.write(vision_manager_code)

print("‚úÖ Exported to vision_manager.py")
print("\nYou can now use it in your Streamlit app:")
print("""
from vision_manager import VisionManager

# Create instance (loads API key from .env automatically)
vm = VisionManager()

# Analyze image
result = vm.analyze_image(image, mode="quick")  # or "standard" or "premium"
""")

‚úÖ Exported to vision_manager.py

You can now use it in your Streamlit app:

from vision_manager import VisionManager

# Create instance (loads API key from .env automatically)
vm = VisionManager()

# Analyze image
result = vm.analyze_image(image, mode="quick")  # or "standard" or "premium"



---
## ‚úÖ Summary

### What We Built:
1. ‚úÖ Dynamic Vision Manager optimized for 6GB VRAM
2. ‚úÖ Models load on-demand and clear automatically
3. ‚úÖ API key loads from .env file
4. ‚úÖ Three modes: Quick (Moondream), Standard (LLaVA), Premium (OpenAI)

### How It Works:
- **Quick Mode**: Loads Moondream (~3.5GB VRAM)
- **Standard Mode**: Clears Moondream, loads LLaVA (~4GB VRAM)
- **Premium Mode**: No VRAM needed (uses OpenAI API)

### Next Steps:
1. Create `.env` file in your project with: `OPENAI_API_KEY=sk-...`
2. Test all three modes with your product images
3. Ready for Notebook 2: Database + 10k images processing

---

**Ready for Notebook 2?** üöÄ