Develop a Python script that:
1- Reads and preprocesses customer service conversations from a CSV file.
2- Generates embeddings for customer messages using sentence-transformers.
3- Stores embeddings in a vector database (FAISS) for fast similarity search.
4- Implements a function to retrieve the top 3 most relevant responses based on a user query.
ID	customer_message										agent_response
1	"My order hasn’t arrived yet. Can you check?"			"I’m sorry for the delay! Can you provide your order number?"
2	"I need to reset my password but the link is broken."	"I understand. Let me generate a new reset link for you."
3	"Do you have a refund policy?"							"Yes! We offer a 30-day money-back guarantee. Would you like me to process a refund?"
4	"The product I received is defective."					"I'm sorry to hear that. We can arrange a replacement or ref
4	"The product I received is defective."					"I'm sorry to hear that. We can arrange a replacement or refund. What would you prefer?"
5	"How can I contact support?"							"You can reach us via email at support@example.com or call our helpline."

## 1. Reading data source

In [35]:
import pandas as pd

df = pd.read_csv('customer.csv', sep=',')
df.shape
df.head(3)

Unnamed: 0,ID,customer_message,agent_response
0,1,"""My order hasn’t arrived yet. Can you check?""","""I’m sorry for the delay! Can you provide you..."
1,2,"""I need to reset my password but the link is ...","""I understand. Let me generate a new reset li..."
2,3,"""Do you have a refund policy?""","""Yes! We offer a 30-day money-back guarantee...."


In [36]:
customer_msgs = list(df['agent_response'])
customer_msgs = [s.replace('"', '') for s in customer_msgs]
customer_msgs

[' I’m sorry for the delay! Can you provide your order number?',
 ' I understand. Let me generate a new reset link for you.',
 ' Yes! We offer a 30-day money-back guarantee. Would you like me to process a refund?',
 " I'm sorry to hear that. We can arrange a replacement or ref",
 " I'm sorry to hear that. We can arrange a replacement or refund. What would you prefer?",
 ' You can reach us via email at support@example.com or call our helpline.']

## Generating embeddings for customer messages

In [37]:
from sentence_transformers import SentenceTransformer

# Load the pre-trained model
model = SentenceTransformer('paraphrase-MiniLM-L6-v2')

In [38]:
embeddings = model.encode(customer_msgs)
print(f"Embeddings shape: {embeddings.shape}")

Embeddings shape: (6, 384)


In [39]:
embeddings

array([[-0.4136008 , -0.00662601,  0.3042746 , ..., -0.61076784,
        -0.21803936, -0.01262602],
       [-0.70867133, -0.3809409 ,  0.15109436, ...,  0.5913447 ,
        -0.0804904 , -0.34057817],
       [-0.40036717, -0.10858591,  0.12255137, ..., -0.25375068,
         0.13183513, -0.01804513],
       [-0.66004777,  0.02018758,  0.31692493, ..., -0.02357097,
         0.8902247 , -0.01102751],
       [-0.46169487, -0.12921603,  0.47223616, ..., -0.15604918,
         0.4272553 ,  0.06497989],
       [-0.56212497, -0.4405592 , -0.09126253, ..., -0.12221459,
         0.31684357,  0.03616011]], shape=(6, 384), dtype=float32)

## Storing vectors in a FAISS database

In [40]:
import faiss

# Dimensions of our embeddings
d = embeddings.shape[1]

# Creating an index for our dense vectors
index = faiss.IndexFlatL2(d)  # Using L2 (Euclidean) distance

In [41]:
# Adding the embeddings to the index
index.add(embeddings)

print(f"Total sentences indexed: {index.ntotal}")

Total sentences indexed: 6


## Testing querying our FAISS DB

In [42]:
query_sentence = "I wanna reset my passwod"
query_embedding = model.encode([query_sentence])
query_embedding.shape

(1, 384)

In [43]:
k = 3
distances, indices = index.search(query_embedding, k)
print(f'Distances: {distances} | Indices: {indices}')

Distances: [[54.96907 62.21291 64.50435]] | Indices: [[1 3 4]]


In [44]:
print("Most similar sentences:")
for i, idx in enumerate(indices[0]):
    print(f"{i + 1}: {customer_msgs[idx]} (Distance: {distances[0][i]})")

Most similar sentences:
1:  I understand. Let me generate a new reset link for you. (Distance: 54.96907043457031)
2:  I'm sorry to hear that. We can arrange a replacement or ref (Distance: 62.21290969848633)
3:  I'm sorry to hear that. We can arrange a replacement or refund. What would you prefer? (Distance: 64.50434875488281)
