# Customer Feedback Vector Indexing (RAG â€“ FAISS)

### Objective

customer feedback data for the **GenAI (RAG) layer** by:

- Loading feedback from PostgreSQL
- Converting text into embeddings
- Storing vectors in FAISS for semantic search


In [11]:
%pip install langchain langchain-community langchain-groq sentence-transformers faiss-cpu python-dotenv pandas sqlalchemy psycopg2-binary

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.2 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


## Import Required Libraries

These libraries are used for:

- Database connection
- Environment variable management
- LangChain document and vector handling


In [19]:
from dotenv import load_dotenv
import os
import pandas as pd
from sqlalchemy import create_engine

from langchain_core.documents import Document
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS

In [20]:
# ----------- Load environment variables ---------------- #
load_dotenv()

DB_HOST = os.getenv("DB_HOST")
DB_PORT = os.getenv("DB_PORT")
DB_NAME = os.getenv("DB_NAME")
DB_USER = os.getenv("DB_USER")
DB_PASSWORD = os.getenv("DB_PASSWORD")

In [21]:
# --------- Database connection & Load Data -------------- #

engine = create_engine(
    f"postgresql://{DB_USER}:{DB_PASSWORD}@{DB_HOST}:{DB_PORT}/{DB_NAME}"
)

df = pd.read_sql("SELECT * FROM customer_feedback_view", engine)
print("Feedback rows:", len(df))

Feedback rows: 5000


In [22]:
print(df.shape)
print(df.columns)

(5000, 5)
Index(['feedback_id', 'feedback_text', 'feedback_category', 'sentiment',
       'area'],
      dtype='object')


## Convert Feedback into LangChain Documents

Each feedback is converted into a `Document` object with metadata.
This metadata helps during filtered retrieval in RAG.


In [23]:
# ---------- Convert feedback into Documents ------------- #
documents = [
    Document(
        page_content=row["feedback_text"],
        metadata={
            "feedback_id": row["feedback_id"],
            "category": row["feedback_category"],
            "sentiment": row["sentiment"],
            "area": row["area"]
        }
    )
    for _, row in df.iterrows()
]


## Create Embeddings and FAISS Vector Store

Sentence embeddings are generated and stored in FAISS for fast semantic retrieval.


In [24]:
# ------------ Create embeddings & vector store --------------- #
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

vectorstore = FAISS.from_documents(documents, embeddings)

In [None]:
# Save FAISS index locally for reuse
vectorstore.save_local("../data/faiss_index")

print("FAISS index saved successfully")


FAISS index saved successfully
