# Visual Question Answering (VQA) with FalAIVisionModel

In this notebook, we explore Visual Question Answering (VQA), where the model interprets images to answer specific questions. VQA is valuable in applications like accessibility support, automated image captioning, and content understanding.

This notebook will demonstrate:
- Setting up VQA with `FalAIVisionModel`
- Example VQA tasks with sample images
- Exploring edge cases in VQA


## Setting up the Model for VQA

We will initialize the `FalAIVisionModel` and configure it to process images with question-based prompts, allowing it to interpret visual content and respond accordingly.


In [15]:
# Import libraries and initialize the model
import os
from dotenv import load_dotenv
from swarmauri.llms.concrete.FalAIVisionModel import FalAIVisionModel

In [16]:
# Load environment variables and API key
load_dotenv()
API_KEY = os.getenv("FAL_KEY")

if API_KEY:
    falai_vision_model = FalAIVisionModel(api_key=API_KEY)
    print("FalAIVisionModel initialized for VQA.")
else:
    print("API key not found. Please ensure it is set in the environment.")

FalAIVisionModel initialized for VQA.


## Examples of VQA with Sample Images

In this section, we process images with specific questions to demonstrate how the model can interpret and respond to visual data. We’ll use sample images like the Mona Lisa and famous landmarks.


In [17]:
# Define image URLs and related questions
image_url = "https://llava-vl.github.io/static/images/monalisa.jpg"
questions = [
    "What is the subject of this painting?",
    "Describe the main colors in the scene."
]

In [18]:
# Process VQA for image and display results
print(f"\nImage URL: {image_url}")
for question in questions:
    try:
        result = falai_vision_model.process_image(image_url=image_url, prompt=question)
        print(f"Q: {question}\nA: {result}\n")
    except Exception as e:
        print(f"An error occurred: {e}")


Image URL: https://llava-vl.github.io/static/images/monalisa.jpg
Q: What is the subject of this painting?
A: The painting you've shown is the famous "Mona Lisa" by Leonardo da Vinci. It is one of the most recognized and celebrated works of art in the world. The subject of the painting is a woman with a subtle, enigmatic smile, set against a distant landscape. The painting is known

Q: Describe the main colors in the scene.
A: The image you've provided is the famous painting "Mona Lisa" by Leonardo da Vinci. The main colors in the scene are:

- The skin tones of the subject, which are a soft blend of warm hues, including light beiges, pinks, and subtle yellows.



## Exploring Edge Cases in VQA

To assess the model’s versatility, we’ll test it with more complex and ambiguous questions. This section helps identify how well the model handles challenging queries that require a deeper understanding or interpretation of the image.


In [19]:
# Define a new image URL and complex questions for edge case testing
edge_image_url = "https://llava-vl.github.io/static/images/monalisa.jpg"
edge_questions = [
    "What emotions does the subject appear to show?",
    "What could the background tell us about the setting?",
    "Can you guess the era this painting is from?"
]

In [20]:
# Process VQA with edge case questions
print(f"\nEdge Case Testing with Image URL: {edge_image_url}")
for question in edge_questions:
    try:
        result = falai_vision_model.process_image(image_url=edge_image_url, prompt=question)
        print(f"Q: {question}\nA: {result}\n")
    except Exception as e:
        print(f"An error occurred: {e}")


Edge Case Testing with Image URL: https://llava-vl.github.io/static/images/monalisa.jpg
Q: What emotions does the subject appear to show?
A: The subject in the image appears to be showing a serene and contemplative emotion. The slight smile and the direct gaze give the impression of a calm and composed individual. The overall expression is often referred to as the "Mona Lisa smile," which is characterized by a subtle, enigmatic quality that has been

Q: What could the background tell us about the setting?
A: The background of the image features a landscape with a river, trees, and a distant view of a city or town. This suggests that the setting is likely an urban or semi-urban area with a natural environment nearby. The presence of a river indicates that the location might be near a water source, which could have been

Q: Can you guess the era this painting is from?
A: The painting you've shown is the Mona Lisa, which is a famous work by Leonardo da Vinci. It was painted during the Re

# Conclusion

In this notebook, we demonstrated the capabilities of `FalAIVisionModel` in Visual Question Answering (VQA):
- Using specific questions to understand visual content
- Handling both straightforward and complex queries

VQA holds potential for numerous real-world applications, enhancing accessibility, enriching user interaction, and supporting content moderation tasks.
Advances in VQA models can further drive these applications, enabling more interactive and intelligent AI systems.


# Notebook Metadata

In [21]:
import platform
import sys
from datetime import datetime

# Display author information
author_name = "Huzaifa Irshad" 
github_username = "irshadhuzaifa"  

print(f"Author: {author_name}")
print(f"GitHub Username: {github_username}")

# Last modified datetime (file's metadata)
notebook_file = "Notebook_03_Visual_Question_Answering.ipynb"
try:
    last_modified_time = os.path.getmtime(notebook_file)
    last_modified_datetime = datetime.fromtimestamp(last_modified_time)
    print(f"Last Modified: {last_modified_datetime}")
except Exception as e:
    print(f"Could not retrieve last modified datetime: {e}")

# Display platform, Python version, and Swarmauri version
print(f"Platform: {platform.system()} {platform.release()}")
print(f"Python Version: {sys.version}")

import swarmauri

try:
    version = swarmauri.__version__
except AttributeError:
    version = f"Swarmauri Version: 0.5.1"

print(f"Swarmauri Version: {version}")

Author: Huzaifa Irshad
GitHub Username: irshadhuzaifa
Last Modified: 2024-11-04 20:38:35.037660
Platform: Windows 11
Python Version: 3.12.7 | packaged by Anaconda, Inc. | (main, Oct  4 2024, 13:17:27) [MSC v.1929 64 bit (AMD64)]
Swarmauri Version: Swarmauri Version: 0.5.1
