# **Understanding Swarmauri Vision Models**  
---

In this notebook, we’ll explore the various **Vision Models** provided by Swarmauri and their capabilities.  

We’ll also cover:  
- How to view the supported vision models.  
- The methods available in these classes and how to use them effectively.  

This notebook is a concise guide to understanding and working with Swarmauri’s Vision Models to streamline your AI-powered visual analysis tasks.  

---

## **What Are Vision Models?**  

Vision models are AI systems designed to analyze and process visual data such as images or videos. They leverage deep learning techniques to extract meaningful insights, enabling applications across diverse industries.  

### **Use Cases of Vision Models**  

1. **Object Detection and Recognition**  
   - Identifying and classifying objects in images or videos.  
   - Detecting faces, vehicles, or other specific items for real-time applications.  

2. **Image Segmentation**  
   - Separating images into distinct regions for medical imaging or autonomous driving.  

3. **Content Moderation**  
   - Detecting inappropriate or restricted content in media platforms.  

4. **Retail and E-Commerce**  
   - Visual search for products based on user-uploaded images.  

5. **Accessibility**  
   - Converting images into text for visually impaired users.  

By leveraging these capabilities, Swarmauri Vision Models empower users to solve complex problems efficiently.  

With this understanding, let’s dive deeper into the specific Vision Models provided by Swarmauri and the functionalities they offer. 

---  

## List of Vision Classes in Swarmauri
---

Swarmauri provides the following Vision classes, named based on the providers of the models:

1. **FalAIVisionModel**:  
2. **GroqVisionModel**:  
3. **HyperbolicVisionModel**: 

### Provider Naming Convention  
Swarmauri follows a *provider naming convention*. This means that the file and class names reflect the **provider** of the providers of the Vision models and not the actual vision model.

## How to see the Allowed Models in Vision class
---

1. ### Import the class 

- Here we will `FalAIVisionModel`, you can use any other class of your choice from the List of `Vision` Classes in Swarmauri 

In [1]:
from swarmauri.llms.concrete.FalAIVisionModel import FalAIVisionModel

2. ### Instantiate the Model Class

In [2]:
model = FalAIVisionModel(api_key="put your api key here") # Note: You don't need an API key to see the allowed_model, you can leave it as it is

3. ### List all available models
- To list the allowed models, we use the `allowed_models` class attribute, just like we did when working with LLMs and Image Generation Models

In [3]:
available_models = model.allowed_models
print(available_models)

['fal-ai/llava-next']


As you can see, `FalAIVision` has one allowed model for now and it has been printed.  

This approach is similar to how we worked with LLMs and Image Generation Models earlier. Swarmauri ensures consistency by allowing you to use the same methods and attributes across different classes in a unified manner. This design simplifies the process, making it more intuitive and efficient to build with Swarmauri.

## Methods Available in Each Vision Model Class  
---

Each **Vision Model** class in **Swarmauri** offers a set of methods designed to streamline interactions with the models. These methods provide flexibility for **synchronous**, **asynchronous**, and **streaming** workflows, catering to various use cases in visual analysis.  

1. **`predict`**:  
   - **Description**: This method performs visual analysis synchronously (blocking).  
   - **Use Case**: When you need immediate results from a single vision model request.  

2. **`apredict`**:  
   - **Description**: The asynchronous counterpart of `predict`. It processes visual data without blocking the program.  
   - **Use Case**: When running multiple tasks concurrently using `asyncio`.  

3. **`stream`**:  
   - **Description**: Streams results incrementally as the vision model processes the data.  
   - **Use Case**: When working with large or time-sensitive visual data that requires real-time feedback.  

4. **`astream`**:  
   - **Description**: The asynchronous version of `stream`. It streams results while allowing other tasks to run concurrently.  
   - **Use Case**: When processing large datasets in an asynchronous environment and need real-time updates.  

These methods provide comprehensive support for both single and streaming workflows, ensuring that you can adapt to any use case involving vision models in Swarmauri. This flexibility empowers you to handle everything from immediate predictions to real-time visual analysis with ease.  

## **NOTEBOOK METADATA**

In [1]:
from swarmauri.utils import print_notebook_metadata

metadata = print_notebook_metadata.print_notebook_metadata("Victory Nnaji", "3rd-Son")
print(metadata) 

Author: Victory Nnaji
GitHub Username: 3rd-Son
Notebook File: Notebook_01_Understanding_Swarmauri_Vision_Models.ipynb
Last Modified: 2024-12-24 13:13:17.363559
Platform: Darwin 24.1.0
Python Version: 3.11.11 (main, Dec 11 2024, 10:25:04) [Clang 14.0.6 ]
Swarmauri Version: 0.5.2
None
