## **Embedding Models in Swarmauri**

Swarmauri provides seamless integration with various embedding models, allowing you to transform text into vector representations for machine learning and information retrieval tasks. These embeddings are available in two categories:  

1. **Core Embeddings**: Offered in the `swarmauri/embeddings/concrete` module.  
2. **Community Embeddings**: Found in the `swarmauri_community/embeddings/concrete` module.

### **Core Embeddings**  
---

The following embedding models are included in the core `swarmauri` package:  

- **CohereEmbedding**  
   Generates high-quality embeddings using Cohere's API for semantic search and natural language tasks.  

- **OpenAIEmbedding**  
   Leverages OpenAI's state-of-the-art embeddings for text understanding and similarity tasks.  

- **MistralEmbedding**  
   Provides embeddings with a focus on speed and efficiency.  

- **GeminiEmbedding**  
   Specialized for handling multilingual content and diverse datasets.  

- **VoyageEmbedding**  
   A general-purpose embedding model designed for robust text representation.

### **Community Embeddings**  
---

The `swarmauri_community` module extends the functionality of embeddings, offering more specialized tools:  

- **Doc2VecEmbedding**  
   A neural network-based embedding model that generates document-level vector representations, making it suitable for document classification and retrieval.  

- **MlmEmbedding**  
   Based on Masked Language Models (MLMs), this embedding is highly effective for capturing contextual information within text.  

By combining these embeddings with Swarmauri's tools, you can create powerful pipelines for a wide range of applications, including search engines, clustering, and classification.


## Example usage using OpenAIEmbedding Class
### Import the OpenAIEmbedding Class

In [2]:
from swarmauri.embeddings.concrete.OpenAIEmbedding import OpenAIEmbedding
import os
from dotenv import load_dotenv

In [3]:
# Load environment variables
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

### Initialize the OpenAIEmbedding class

In [4]:
openai_embedding = OpenAIEmbedding(api_key=OPENAI_API_KEY)

In [6]:
# List of text data to generate embeddings for
texts = ["Hello, world!", "Data science is awesome.", "AI is transforming the world."]


embeddings = openai_embedding.infer_vector(texts) # we use the infer_vector method to generate embeddings

# Display the results
for i, vector in enumerate(embeddings):
    print(f"Text: {texts[i]}")
    print(f"Embedding: {vector.value[:5]}... (truncated)")
    print("-" * 50)

Text: Hello, world!
Embedding: [-0.019143932, -0.025292054, -0.0017211713, 0.018834507, -0.033821393]... (truncated)
--------------------------------------------------
Text: Data science is awesome.
Embedding: [-0.016720703, -0.022658665, -0.003726754, 0.021702131, 0.0564479]... (truncated)
--------------------------------------------------
Text: AI is transforming the world.
Embedding: [-0.014297752, -0.0074625984, 0.041171115, 0.036899146, 0.01747503]... (truncated)
--------------------------------------------------


### Conlusion
With this example, we have demonstrated how to generate embeddings using the OpenAI API. The embeddings generated can be used for various NLP tasks such as text classification, clustering, and similarity search.
With this same pattern, you can use other embedding models provided by swamauri to generate embeddings for your text data, pdfs, docx, etc

# **NOTEBOOK METADATA**

In [8]:
from swarmauri.utils import print_notebook_metadata

metadata = print_notebook_metadata.print_notebook_metadata("Victory Nnaji", "3rd-Son")
print(metadata) 

Author: Victory Nnaji
GitHub Username: 3rd-Son
Notebook File: Notebook_01_Embedding_Models_in_Swarmauri.ipynb
Last Modified: 2025-01-07 08:26:08.393557
Platform: Darwin 24.1.0
Python Version: 3.11.11 (main, Dec 11 2024, 10:25:04) [Clang 14.0.6 ]
Swarmauri Version: 0.5.2
None
