# Integrating Vision Models for Image Analysis

This notebook demonstrates how to use `FalAIVisionModel` for image processing and analysis via an API-driven approach. We'll cover:
- Setting up and authenticating with the API
- Processing images through API calls
- Handling errors to ensure robust integration


In [17]:
# Import necessary libraries
import os
from dotenv import load_dotenv
from swarmauri.llms.concrete.FalAIVisionModel import FalAIVisionModel

In [18]:
# Load environment variables
load_dotenv()
API_KEY = os.getenv("FAL_KEY")

if API_KEY:
    print("API key loaded successfully.")
else:
    print("API key not found. Please ensure it is set in the environment.")

API key loaded successfully.


## Connecting to FalAIVisionModel

In this section, we initialize the `FalAIVisionModel` with the API key. We also display properties like `resource` and `type` to confirm the model is ready for processing.


In [19]:
# Initialize the model with the API key
if API_KEY:
    falai_vision_model = FalAIVisionModel(api_key=API_KEY)
    print("FalAIVisionModel initialized successfully.")
    print("Resource:", falai_vision_model.resource)
    print("Model Type:", falai_vision_model.type)
else:
    print("Initialization failed. API key is required.")


FalAIVisionModel initialized successfully.
Resource: LLM
Model Type: FalAIVisionModel


## Single Image Processing

Let's process a single image by providing a URL of a famous artwork and a prompt. This demonstrates basic image understanding capabilities.


In [20]:
# Define the image URL and prompt
image_url = "https://llava-vl.github.io/static/images/monalisa.jpg"
prompt = "What is shown in this image?"

In [21]:
# Process the image using the FalAIVisionModel
try:
    result = falai_vision_model.process_image(image_url=image_url, prompt=prompt)
    print("Model Response:", result)
except Exception as e:
    print("An error occurred:", e)

Model Response: The image you've provided is of the famous painting "Mona Lisa" by Leonardo da Vinci. It's one of the most recognized and celebrated works of art in the world, known for its subject's enigmatic expression and the atmospheric illusionism that gives the painting its characteristic hazy atmosphere


## Working with Multiple Models

The `allowed_models` attribute provides a list of different model names available for various tasks. This allows us to select specific models based on our requirements.


In [22]:
# Display available model names
allowed_models = falai_vision_model.allowed_models
print("Allowed Models:", allowed_models)

Allowed Models: ['fal-ai/llava-next', 'fal-ai/llavav15-13b', 'fal-ai/any-llm/vision']


## Error Handling in API Calls

To make our integration robust, we handle common API-related errors, such as:
- Invalid URLs
- Timeouts
- API errors

This section provides code for handling such exceptions to ensure smooth operation.


In [23]:
# Sample code with error handling for image processing
invalid_image_url = "https://invalid-url.com/image.jpg"
prompt = "Describe the content of this image."

try:
    result = falai_vision_model.process_image(image_url=invalid_image_url, prompt=prompt)
    print("Model Response:", result)
except Exception as e:
    print("An error occurred during image processing:", e)


An error occurred during image processing: Could not load image from url: https://invalid-url.com/image.jpg


# Conclusion

In this notebook, we explored:
- Integrating the `FalAIVisionModel`
- Processing single images and handling multiple models
- Error handling in API calls for robust performance

By integrating vision models, we can create applications for visual understanding across various fields, including content moderation, automated captioning, and real-time object detection.


# Notebook Metadata

In [25]:
import platform
import sys
from datetime import datetime

# Display author information
author_name = "Huzaifa Irshad" 
github_username = "irshadhuzaifa"  

print(f"Author: {author_name}")
print(f"GitHub Username: {github_username}")

# Last modified datetime (file's metadata)
notebook_file = "Notebook_02_Integrating_Vision_Models_for_Image_Analysis.ipynb"
try:
    last_modified_time = os.path.getmtime(notebook_file)
    last_modified_datetime = datetime.fromtimestamp(last_modified_time)
    print(f"Last Modified: {last_modified_datetime}")
except Exception as e:
    print(f"Could not retrieve last modified datetime: {e}")

# Display platform, Python version, and Swarmauri version
print(f"Platform: {platform.system()} {platform.release()}")
print(f"Python Version: {sys.version}")

import swarmauri

try:
    version = swarmauri.__version__
except AttributeError:
    version = f"Swarmauri Version: 0.5.1"

print(f"Swarmauri Version: {version}")

Author: Huzaifa Irshad
GitHub Username: irshadhuzaifa
Last Modified: 2024-11-04 20:40:29.411682
Platform: Windows 11
Python Version: 3.12.7 | packaged by Anaconda, Inc. | (main, Oct  4 2024, 13:17:27) [MSC v.1929 64 bit (AMD64)]
Swarmauri Version: Swarmauri Version: 0.5.1
