# Convert dataframe column into embeddings

Since the RAG model can't deal with text directly, I need to transform the text into embedded tensors. I'll use HuggingFace's `SentenceTransformer` to tokenize and embed the values from the `Text` column that I made in the last step.

In [1]:
import pandas as pd
from sentence_transformers import SentenceTransformer

df = pd.read_csv('fresh/all_data.csv')

# Load the model
model = SentenceTransformer('distiluse-base-multilingual-cased-v2')

# Generate embeddings for the Text column
embeddings = model.encode(df['Text'].tolist(), convert_to_tensor=True)

# Save the Embeddings for Later Use

For the vector store, I'm going to use [FAISS (Facebook AI Similarity Search)](https://ai.meta.com/tools/faiss/), "a library that allows developers to quickly search for embeddings of multimedia documents that are similar to each other."

In [2]:
import faiss
import numpy as np

# Convert embeddings to NumPy array
embedding_matrix = np.vstack(embeddings)

# Build FAISS index
index = faiss.IndexFlatL2(embedding_matrix.shape[1])
index.add(embedding_matrix)

# Save the index
faiss.write_index(index, "vector_index.faiss")