# PPE-Vision 360 — FAISS Q&A Retrieval Pipeline 🔍

In this notebook, we are building a **Q&A Search Assistant** for PPE Compliance using:
- **Sentence-Transformers (MiniLM)** for text embeddings.
- **FAISS (Facebook AI Similarity Search)** to quickly find the closest matching Q&A.
- This will allow our chatbot to retrieve OSHA-related answers when a user asks a compliance question.

### Goal of This Notebook:
✅ Load cleaned PPE Q&A dataset  
✅ Convert questions to embeddings (vectors)  
✅ Build a FAISS index to enable fast search  
✅ Test user queries and retrieve best matching answers


### Install Required Libraries

In [1]:
!pip install faiss-cpu
!pip install sentence-transformers
!pip install pandas

Collecting faiss-cpu
  Downloading faiss_cpu-1.11.0.post1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (5.0 kB)
Downloading faiss_cpu-1.11.0.post1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (31.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m31.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.11.0.post1
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.met

### Load Your Cleaned Q&A CSV

In [2]:
import pandas as pd

# Load the cleaned CSV
df = pd.read_csv('/content/drive/MyDrive/PPE-Vision/datasets/nlp/osha_qa_cleaned.csv')  # replace with your filename if different
df.head()


Unnamed: 0,category,questions,answers,clean_question,clean_answer
0,1,When is a helmet required on a construction site?,Helmets are mandatory where there’s a risk of ...,helmet require construction site,helmet mandatory risk head injury fall object
1,2,Is wearing gloves compulsory when handling che...,"Yes, gloves must be worn when handling hazardo...",wear glove compulsory handle chemical,yes glove wear handle hazardous material
2,3,Are safety vests required for roadside workers?,High-visibility vests are essential for roadsi...,safety vest require roadside worker,high visibility vest essential roadside constr...
3,4,Is eye protection needed when welding?,"Yes, safety goggles or face shields must be us...",eye protection need weld,yes safety goggle face shield weld
4,5,Are steel-toe boots mandatory in warehouses?,Steel-toe boots are required in environments w...,steel toe boot mandatory warehouse,steel toe boot require environment risk foot i...


### Generate Embeddings using MiniLM

In [4]:
import numpy as np
from sentence_transformers import SentenceTransformer

# Load MiniLM model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Generate embeddings for the clean_question column
embeddings = model.encode(df['clean_question'].tolist(), convert_to_numpy=True)
# Save embeddings
np.save("qa_embeddings.npy", embeddings)

print(f"Embeddings shape: {embeddings.shape}")


Embeddings shape: (60, 384)


### Build FAISS Index

In [5]:
import faiss
import numpy as np

# Dimension of embedding vectors
d = embeddings.shape[1]

# Initialize FAISS Index (Flat L2 Distance)
index = faiss.IndexFlatL2(d)

# Convert to float32 and add to index
index.add(np.array(embeddings).astype('float32'))

print(f"Total vectors indexed: {index.ntotal}")


Total vectors indexed: 60


### Search Function to Find Closest Q&A

In [6]:
def search_faiss(query, top_k=1):
    # Preprocess query like before
    from sentence_transformers import util
    query_embedding = model.encode([query])

    # Search in FAISS
    D, I = index.search(np.array(query_embedding).astype('float32'), top_k)

    # Return top_k results
    for i in range(top_k):
        print(f"Query: {query}")
        print(f"Best Match: {df.iloc[I[0][i]]['questions']}")
        print(f"Answer: {df.iloc[I[0][i]]['answers']}")
        print(f"Distance: {D[0][i]}")


### Test The Search!

In [7]:
search_faiss("When should I wear a helmet?")
search_faiss("Do I need gloves for chemicals?")
search_faiss("Is a vest compulsory for roadside workers?")


Query: When should I wear a helmet?
Best Match: When is a helmet required on a construction site?
Answer: Helmets are mandatory where there’s a risk of head injury from falling objects.
Distance: 1.0581722259521484
Query: Do I need gloves for chemicals?
Best Match: Is wearing gloves compulsory when handling chemicals?
Answer: Yes, gloves must be worn when handling hazardous materials.
Distance: 0.5714074373245239
Query: Is a vest compulsory for roadside workers?
Best Match: Are safety vests required for roadside workers?
Answer: High-visibility vests are essential for roadside and construction workers.
Distance: 0.3129766583442688


###  Save FAISS Index

In [8]:
# Save FAISS Index
faiss.write_index(index, 'faiss_index.bin')

### 🧠 High-Level Summary of What We Built (As a Client/Stakeholder Brief):
> “We developed a PPE Compliance Q&A Search Assistant where a user can type any safety-related question, and the system intelligently retrieves the most relevant answer from an OSHA-style database. This system uses MiniLM embeddings to understand the meaning of user queries, and a FAISS vector search index to instantly fetch the closest match. This forms the base of our intelligent compliance chatbot, which will later be extended with LLM responses and advanced recommendations.”