# Email Search AI

<img src="email_search_ai.png" style="display:block; margin-left:auto; margin-right:auto;"/>

## Problem Statement

In enterprise environments, email threads often contain critical discussions, decisions, and context across multiple stakeholders. However, locating specific information within large, unstructured, and nested email threads is time-consuming and inefficient using conventional keyword-based search tools. Professionals face challenges in extracting relevant insights without wading through entire conversations manually. There is a pressing need for an intelligent system that enables **semantic search** and **automated summarization** of email threads.

## Project Overview

**HelpMate_Email Search_AI** is an end-to-end **RAG-based AI assistant** designed for semantic search and summarization of email threads. It uses **Sentence Transformers** for creating dense **vector embeddings** of email content, stores them in a **ChromaDB vector store**, and enables **semantic index search**. Upon receiving a user query, it retrieves the most relevant chunks using **vector similarity**, improves result quality through **cross-encoder-based reranking**, and finally, generates context-aware responses using **OpenAI's LLM (e.g., GPT-3.5-turbo)**.

The system employs a **Retrieval-Augmented Generation (RAG)** architecture with **caching** for efficiency and performance.

## Project Objectives

- **Semantic Understanding of Emails**: Use **Sentence Transformers (all-MiniLM-L6-v2)** to convert email chunks into vector representations capturing semantic meaning.

- **Vector Database Indexing**: Store email embeddings in **ChromaDB**, enabling fast approximate nearest neighbor (ANN) vector search.

- **Semantic Search & Retrieval**: Support user queries via **embedding-based similarity search** across indexed email chunks.

- **Result Reranking**: Improve retrieval accuracy with **cross-encoder reranking (ms-marco-MiniLM-L-6-v2)**, scoring relevance between query and result pairs.

- **Contextual Answer Generation**: Use a **Retrieval-Augmented Generation (RAG)** pipeline to feed retrieved results into OpenAI GPT models for answer synthesis.

- **Query Caching**: Implement a file-based **caching** layer to store query results and avoid repeated computations.

## Functional Features


| **Component**                   | **Description**                                                                 | **Technology Used**                             |
|--------------------------------|---------------------------------------------------------------------------------|-------------------------------------------------|
| **Email Preprocessing**        | Cleans raw email bodies by removing quoted replies and normalizing text         | `Regex`, custom cleaning                        |
| **Chunking**                   | Splits cleaned emails into overlapping token-limited chunks                     | Custom logic                                    |
| **Embeddings**                 | Transforms email chunks into dense semantic vectors                             | `SentenceTransformer` (`all-MiniLM-L6-v2`)      |
| **Vector Indexing**            | Stores and indexes embeddings for fast similarity search                        | `ChromaDB`                                      |
| **Query Embedding**            | Embeds natural language queries for semantic comparison                         | `SentenceTransformer`                           |
| **Initial Vector Search**      | Retrieves top-N similar email chunks using ANN search                           | `ChromaDB`                                      |
| **Reranking**                  | Reorders retrieved results by true semantic relevance                           | `CrossEncoder` (`ms-marco-MiniLM-L-6-v2`)       |
| **Retrieval-Augmented Generation (RAG)** | Combines retrieved chunks with the query to form a prompt for GPT      | `OpenAI GPT-3.5-turbo`                          |
| **Answer Generation**          | Synthesizes a coherent answer based on context                                  | `OpenAI Chat Completion API`                    |
| **Caching**                    | Stores query results using hashed query keys for faster repeat access           | JSON file-based custom `Cache` class            |


In [11]:
# Import the libraries
import os
import re
import pandas as pd
from typing import List, Dict
from sentence_transformers import SentenceTransformer
import chromadb
from chromadb.config import Settings
from chromadb.utils import embedding_functions

  from .autonotebook import tqdm as notebook_tqdm


In [12]:
#Install required modules
!pip3 install -r requirements.txt

In [13]:
#Let's check the version of OpenAI
import openai
print(openai.__version__)

1.95.1


In [14]:
# Read key from text file
with open("openai_key.txt", "r") as f:
    api_key = f.read().strip()

In [15]:
# Pass key to OpenAI client
from openai import OpenAI
client = OpenAI(api_key=api_key)

In [16]:
# Read the input dataset
df_email_thread = pd.read_csv("email_dataset/email_thread_details.csv")
df_email_thread.head()

Unnamed: 0,thread_id,subject,timestamp,from,to,body
0,1,FW: Master Termination Log,2002-01-29 11:23:42,"Gossett, Jeffrey C. JGOSSET","['Giron', 'Darron C. Dgiron', 'Love', 'Phillip...",\n\n -----Original Message-----\nFrom: =09Ther...
1,1,FW: Master Termination Log,2002-01-31 12:50:00,"Theriot, Kim S. KTHERIO","['Murphy', 'Melissa Mmurphy', 'Gossett', 'Jeff...",\n\n -----Original Message-----\nFrom: =09Panu...
2,1,FW: Master Termination Log,2002-02-05 15:03:35,"Theriot, Kim S. KTHERIO","['Murphy', 'Melissa Mmurphy', 'Anderson', 'Dia...",Note to Stephanie Panus....\n\nStephanie...ple...
3,1,FW: Master Termination Log,2002-02-05 15:06:25,"Theriot, Kim S. KTHERIO","['Hall', 'D. Todd Thall', 'Sweeney', 'Kevin Ks...",\n\n -----Original Message-----\nFrom: =09Panu...
4,1,FW: Master Termination Log,2002-05-28 07:20:35,"Kelly, Katherine L. KKELLY","['Germany', 'Chris Cgerman']",\n\n -----Original Message-----\nFrom: =09McMi...


In [17]:
df_email_thread.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21684 entries, 0 to 21683
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   thread_id  21684 non-null  int64 
 1   subject    21684 non-null  object
 2   timestamp  21684 non-null  object
 3   from       21684 non-null  object
 4   to         21684 non-null  object
 5   body       21684 non-null  object
dtypes: int64(1), object(5)
memory usage: 1016.6+ KB


**Description**:
The email_thread_details file provides a detailed perspective on individual email threads, encompassing crucial information such as subject, timestamp, sender, recipients, and the content of the email.

**Columns**:
- **thread_id**: A unique identifier for each email thread.
- **subject**: Subject of the email thread.
- **timestamp**: Timestamp indicating when the message was sent.
- **from**: Sender of the email.
- **to**: List of recipients of the email.
- **body**: Content of the email message.

## Overall Structure  of the code

## Embedding Layer

<img src="embedding_layer_flow.jpg" style="display:block; margin-left:auto; margin-right:auto;"/>

In [22]:
# src/embedding_layer.py

# Import the libraries
import os
import re
import pandas as pd
from typing import List, Dict
from sentence_transformers import SentenceTransformer
import chromadb
from chromadb.config import Settings
from chromadb.utils import embedding_functions

# === CLEANING FUNCTIONS ===

def clean_email_body(body: str) -> str:
    body = re.sub(r'[\r\n]+', ' ', body)  # remove newlines
    body = re.sub(r'\s+', ' ', body)  # normalize spaces
    body = re.sub(r'On .* wrote:', '', body)  # remove quoted replies
    return body.strip()


# === CHUNKING STRATEGY ===

def chunk_text(text: str, max_tokens: int = 512, overlap: int = 50) -> List[str]:
    words = text.split()
    chunks = []
    for i in range(0, len(words), max_tokens - overlap):
        chunk = ' '.join(words[i:i + max_tokens])
        if len(chunk.split()) > 10:
            chunks.append(chunk)
    return chunks


# === EMBEDDING + CHROMA ===

class EmbeddingProcessor:
    def __init__(self, client, chroma_path: str = "chroma_db", model_name: str = "all-MiniLM-L6-v2"):
        self.model = SentenceTransformer(model_name)
        self.client = client
        self.chroma_collection = self.client.get_or_create_collection(name="email_chunks")

    def process_emails(self, df: pd.DataFrame):
        all_chunks = []
        metadatas = []

        for idx, row in df.iterrows():
            cleaned_body = clean_email_body(row['body'])
            chunks = chunk_text(cleaned_body)

            for i, chunk in enumerate(chunks):
                chunk_id = f"{row['thread_id']}_{idx}_{i}"
                all_chunks.append({
                    "id": chunk_id,
                    "text": chunk,
                    "metadata": {
                        "thread_id": row["thread_id"],
                        "subject": row["subject"],
                        "from": row["from"],
                        "timestamp": row["timestamp"]
                    }
                })

        print(f"Embedding {len(all_chunks)} chunks...")
        embeddings = self.model.encode([c['text'] for c in all_chunks], show_progress_bar=True).tolist()

        self.chroma_collection.add(
            documents=[c['text'] for c in all_chunks],
            embeddings=embeddings,
            metadatas=[c['metadata'] for c in all_chunks],
            ids=[c['id'] for c in all_chunks]
        )
        

    def get_collection(self):
        return self.chroma_collection


## Cache Layer

<img src="cache_layer_flow1.jpg" style="display:block; margin-left:auto; margin-right:auto;"/>

In [25]:
# src/cache.py

import json
import os


class Cache:
    def __init__(self, cache_file: str):
        self.cache_file = cache_file
        if os.path.exists(cache_file):
            with open(cache_file, "r") as f:
                self.cache = json.load(f)
        else:
            self.cache = {}

    def contains(self, key: str) -> bool:
        return key in self.cache

    def get(self, key: str):
        return self.cache.get(key, None)

    def set(self, key: str, value):
        self.cache[key] = value
        self._save()

    def _save(self):
        with open(self.cache_file, "w") as f:
            json.dump(self.cache, f, indent=2)


## Utils

In [27]:
import numpy as np

def convert_np_types(obj):
    if isinstance(obj, dict):
        return {k: convert_np_types(v) for k, v in obj.items()}
    elif isinstance(obj, list):
        return [convert_np_types(i) for i in obj]
    elif isinstance(obj, np.float32) or isinstance(obj, np.float64):
        return float(obj)
    elif isinstance(obj, np.int32) or isinstance(obj, np.int64):
        return int(obj)
    else:
        return obj


## Search Layer

<img src="search_layer_flow.jpg" style="display:block; margin-left:auto; margin-right:auto;"/>

In [30]:
# src/search_layer.py

import hashlib
import json
import os
from typing import List, Dict

import numpy as np
from sentence_transformers import SentenceTransformer, CrossEncoder
import chromadb
from chromadb.config import Settings

#from .cache import Cache

# === CONFIGURATION ===
CACHE_PATH = "cache/search_cache.json"
CHROMA_PATH = "chroma_db"

class SearchEngine:
    def __init__(
        self,
        client,
        embedding_model_name: str = "all-MiniLM-L6-v2",
        cross_encoder_model_name: str = "cross-encoder/ms-marco-MiniLM-L-6-v2"
    ):
        self.embedder = SentenceTransformer(embedding_model_name)
        self.reranker = CrossEncoder(cross_encoder_model_name)
        self.cache = Cache(CACHE_PATH)

        self.client = client
        self.collection = self.client.get_or_create_collection(name="email_chunks")

    def embed_query(self, query: str) -> List[float]:
        return self.embedder.encode(query).tolist()

    def search(self, query: str, top_k: int = 5, filter_thread_id: int = None) -> List[Dict]:
        query_hash = hashlib.md5(query.encode()).hexdigest()

        if self.cache.contains(query_hash):
            return self.cache.get(query_hash)

        query_embedding = self.embed_query(query)

        search_args = {
                            "query_embeddings": [query_embedding],
                            "n_results": top_k * 2,  # get more to allow for reranking
                      }
        
        if filter_thread_id is not None:
            search_args["where"] = {"thread_id": int(filter_thread_id)}

        results = self.collection.query(**search_args)

        documents = results["documents"][0]
        metadatas = results["metadatas"][0]

        # === Re-ranking ===
        pairs = [(query, doc) for doc in documents]
        scores = self.reranker.predict(pairs)

        reranked = sorted(zip(documents, metadatas, scores), key=lambda x: x[2], reverse=True)

        top_chunks = [
            {"chunk": doc, "metadata": meta, "score": score}
            for doc, meta, score in reranked[:top_k]
        ]

        scored_chunks = convert_np_types(top_chunks)

        self.cache.set(query_hash, scored_chunks)
        return top_chunks


## Generation Layer

<img src="generation_layer_flow.jpg" style="display:block; margin-left:auto; margin-right:auto;"/>

In [33]:
# src/generation_layer.py

import openai
from typing import List, Dict
import os

# Set your OpenAI API key securely
openai.api_key = os.getenv("OPENAI_API_KEY")

# === Prompt Template ===

def build_prompt(query: str, chunks: List[Dict], few_shot: bool = False) -> str:
    prompt = "You are an assistant that summarizes and extracts insights from corporate email threads.\n"
    prompt += "Given a user query and the relevant email thread excerpts, answer the question concisely and accurately. answer the question based on what is explicitly stated in the emails\n\n"

    if few_shot:
        prompt += (
            "Example:\n"
            "Query: What was the decision on the marketing budget for Q2?\n"
            "Context:\n"
            "- The marketing team proposed a 20% increase for digital campaigns.\n"
            "- Finance approved a 10% increase after negotiation.\n"
            "Answer: A 10% increase in the Q2 marketing budget was approved after negotiation.\n\n"
        )

    prompt += f"Query: {query}\nContext:\n"
    for idx, chunk in enumerate(chunks):
        prompt += f"- {chunk['chunk']}\n"
    prompt += "\nAnswer:"
    return prompt


# === Generator Function ===

def generate_answer(query: str, chunks: List[Dict], model: str = "gpt-3.5-turbo") -> str:
    prompt = build_prompt(query, chunks, few_shot=True)

    try:
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": prompt}
            ],
            temperature=0.3,
            max_tokens=300
        )
        answer = response.choices[0].message.content.strip()
        return answer
    except Exception as e:
        print(f"Error calling OpenAI API: {e}")
        return "Sorry, I couldn't generate a response due to an error."


In [34]:
# main.py

# === CONFIG ===
DATA_PATH = "email_dataset/email_thread_details.csv"
TOP_K = 3

# === LOAD DATA ===
print("Loading dataset...")
df = pd.read_csv(DATA_PATH).dropna(subset=["body"])

chroma_client = chromadb.Client(Settings(persist_directory="chroma_db"))

# === EMBEDDING PHASE ===
print("Embedding data...")
embedder = EmbeddingProcessor(client=chroma_client)
embedder.process_emails(df)

# === SEARCH PHASE ===
search_engine = SearchEngine(client=chroma_client)

Loading dataset...


Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given


Embedding data...


Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


Embedding 24987 chunks...


Batches: 100%|██████████| 781/781 [30:01<00:00,  2.31s/it]  
Failed to send telemetry event CollectionAddEvent: capture() takes 1 positional argument but 3 were given
Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


### Query wise execution & generate results

### Self designed Queries

Here are the 3 required queries:

1. What is the agenda of the Credit Group Lunch on May 5th?

2. Which golf courses were mentioned as potential venues?

3. Who generated and sent the manual invoice to Southwest Gas, and when?

### Query1 : What is the agenda of the Credit Group Lunch on May 5th?

In [38]:
query1 = "What is the agenda of the Credit Group Lunch on May 5th?"
TOP_K = 3

print(f"\n=== QUERY: {query1} ===")

# Search Layer outputs
top_chunks1 = search_engine.search(query1, top_k=TOP_K)

# Print top chunks
print("\nTop Retrieved Chunks:")
for i, chunk in enumerate(top_chunks1):
    print(f"\n--- Chunk {i+1} ---")
    print(f"{chunk['chunk']}")
    print(f"Metadata: {chunk['metadata']}")


=== QUERY: What is the agenda of the Credit Group Lunch on May 5th? ===

Top Retrieved Chunks:

--- Chunk 1 ---
Gosh, I guessed right!!!! Kaye Ellis 04/18/2000 01:51 PM To: Sara Shackleton/HOU/ECT@ECT cc: Subject: Re: Credit Group Lunch Jeff Sorenson would like the meeting on May 12 to be from 11:30a to 1p.
Metadata: {'from': 'Sara Shackleton', 'subject': 'Credit Group Lunch', 'thread_id': 2, 'timestamp': '2000-04-18 08:29:00'}

--- Chunk 2 ---
Suzanne: Here is the complete list of credit folks. Please send an e-mail to each of them concerning the 5th. Please include the description that I have bolded. In our group, you don't need to include Marie or Shari. Thanks. Carol ---------------------- Forwarded by Carol St Clair/HOU/ECT on 04/18/2000 11:52 AM --------------------------- From: John Suttle 04/18/2000 11:47 AM To: Carol St Clair/HOU/ECT@ECT cc: Subject: Re: Credit Group Lunch Carol, Three more have recently joined our group: Ed Sacks Brad Schneider Wendy LeBrocq JS Carol St Clai

### Query1 : What is the agenda of the Credit Group Lunch on May 5th?

In [39]:
# Generatove layer Outputs
open_ai_output1 = generate_answer(query1, top_chunks1)
print("\nGenerated Layer Output:")
print(open_ai_output1)


Generated Layer Output:
The agenda of the Credit Group Lunch on May 5th is to go through in detail how the ISDA and CSA Masters and Schedules work.


### Query2 : Which golf courses were mentioned as potential venues?

In [41]:
query2 = "Which golf courses were mentioned as potential venues?"
TOP_K = 3
print(f"\n=== QUERY: {query2} ===")
# Search Layer outputs
top_chunks2 = search_engine.search(query2, top_k=TOP_K)
# Print top chunks
print("\nTop Retrieved Chunks:")
for i, chunk in enumerate(top_chunks2):
    print(f"\n--- Chunk {i+1} ---")
    print(f"{chunk['chunk']}")
    print(f"Metadata: {chunk['metadata']}")


=== QUERY: Which golf courses were mentioned as potential venues? ===

Top Retrieved Chunks:

--- Chunk 1 ---
Doug, Sounds fun but I can't commit now. Please do not wait on me, go ahead and fill up the foursome. thanks,mike From: Doug Leach 09/29/2000 10:01 AM To: Mike McConnell/HOU/ECT@ECT cc: Randal T Maffett/HOU/ECT@ECT, Tom Briggs/NA/Enron@Enron Subject: golf Mike, Can you join us on Wednesday morning, November 15 to play golf at Canyon Springs golf course in San Antonio prior to the Enron Management Conference? Canyon Springs will allow us to book a 10:00am tee time thirty days in advance. Once I have confirmed the tee time I will forward directions to the course. Should be a fun group. Might even need a practice round at Champions prior to the trip. Doug
Metadata: {'from': 'Mike McConnell', 'subject': 'golf', 'thread_id': 1884, 'timestamp': '2000-10-02 10:47:00'}

--- Chunk 2 ---
Call me if you wanted anything else. ---------------------- Forwarded by Brad Guilmino/HOU/EES on 09

### Query2 : Which golf courses were mentioned as potential venues?

In [42]:
# Generatove layer Outputs
open_ai_output2 = generate_answer(query2, top_chunks2)
print("\nGenerated Layer Output:")
print(open_ai_output2)


Generated Layer Output:
Query: Which golf courses were mentioned as potential venues?
Context:
- Doug Leach invited Mike to play golf at Canyon Springs golf course in San Antonio.
- Ryan O'Rourke mentioned Pinehurst, Clear Creek, Hermann Park, and Magnolia Creek as potential golf courses.
- Chris Gann confirmed playing at Raveneaux Country Club.
Answer: Potential golf venues mentioned include Canyon Springs, Pinehurst, Clear Creek, Hermann Park, Magnolia Creek, and Raveneaux Country Club.


### Query3 : Who generated and sent the manual invoice to Southwest Gas, and when?

In [44]:
query3 = "Who generated and sent the manual invoice to Southwest Gas, and when?"
TOP_K = 3
print(f"\n=== QUERY: {query3} ===")
# Search Layer outputs
top_chunks3 = search_engine.search(query3, top_k=TOP_K)
# Print top chunks
print("\nTop Retrieved Chunks:")
for i, chunk in enumerate(top_chunks3):
    print(f"\n--- Chunk {i+1} ---")
    print(f"{chunk['chunk']}")
    print(f"Metadata: {chunk['metadata']}")


=== QUERY: Who generated and sent the manual invoice to Southwest Gas, and when? ===

Top Retrieved Chunks:

--- Chunk 1 ---
Would you see if we sent out invoices our if someone at Enron requested that no invoices be sent out. Thanks -----Original Message----- From: Dhont, Margaret Sent: Friday, March 22, 2002 2:04 PM To: Germany, Chris Subject: RE: Letter re Unpaid Invoice for Post petition Deliveries Chris We were not paid by either cornerstone Propane or Midamerican for these deliveries. Margaret -----Original Message----- From: Germany, Chris Sent: Thursday, March 14, 2002 6:32 PM To: Dhont, Margaret; Wynne, Rita; Chance, Lee Ann Cc: Olinger, Kimberly S.; Concannon, Ruth Subject: RE: Letter re Unpaid Invoice for Post petition Deliveries I had a letter that needed to be sent out today so I left early. This is what I need and you can tell me what the process is. ENA purchased gas from TDC (sitara #1143983) in the month of December 2001 on NGPL. It appears that we scheduled the gas o

### Query3 : Who generated and sent the manual invoice to Southwest Gas, and when?

In [45]:
# Generatove layer Outputs
open_ai_output3 = generate_answer(query3, top_chunks3)
print("\nGenerated Layer Output:")
print(open_ai_output3)


Generated Layer Output:
The manual invoice to Southwest Gas was generated and sent by Chris Germany. The specific date is not mentioned in the email thread.


## Batch Evaluator

In [47]:
import pandas as pd
from tqdm import tqdm
from evaluate import load
#from src.search_layer import SearchEngine
#from src.generation_layer import generate_answer

# Load ROUGE metric
rouge = load("rouge")

# Load datasets
threads_df = pd.read_csv("email_dataset/email_thread_details.csv")
summaries_df = pd.read_csv("email_dataset/email_thread_summaries.csv")


# Number of samples to test (limit for speed, e.g., 10)
N = 10

# Collect results
results = []

for i in tqdm(range(N)):
    row = summaries_df.iloc[i]
    thread_id = row['thread_id']
    reference_summary = row['summary']

    # Get full email body for the thread
    thread_emails = threads_df[threads_df['thread_id'] == thread_id]
    email_texts = thread_emails['body'].tolist()

    if not email_texts:
        continue

    query = "Summarize the key decisions made in this email thread."
    # Perform search on chunks
    top_chunks = search_engine.search(query, top_k=3, filter_thread_id=thread_id)

    # If no chunks found, skip
    if not top_chunks:
        continue

    # Generate answer from chunks
    try:
        generated_summary = generate_answer(query, top_chunks)
    except Exception as e:
        print(f"Error generating summary: {e}")
        continue

    # Compute ROUGE scores
    rouge_scores = rouge.compute(predictions=[generated_summary], references=[reference_summary])

    results.append({
        "thread_id": thread_id,
        "query": query,
        "reference_summary": reference_summary,
        "generated_summary": generated_summary,
        "rouge1": rouge_scores["rouge1"],
        "rouge2": rouge_scores["rouge2"],
        "rougeL": rouge_scores["rougeL"]
    })

# Save to CSV
results_df = pd.DataFrame(results)
results_df.to_csv("batch_evaluation_results.csv", index=False)

print("Batch evaluation complete. Results saved to `batch_evaluation_results.csv`.")


100%|██████████| 10/10 [00:13<00:00,  1.36s/it]

Batch evaluation complete. Results saved to `batch_evaluation_results.csv`.





## Future Enhancements

**Embedding Layer Enhancements:**

- Parallelize or batch chunking and embedding for large datasets.

- Support multilingual email embedding using a multilingual transformer model (e.g., distiluse-base-multilingual-cased).

- Add logging and error handling during embedding and chunking.

- Deduplicate similar chunks before storing in the vector DB to reduce redundancy.

- Store additional metadata (e.g., department, priority) to enable advanced filtering during search.

**Cahce Layer Enhancements:**

- Replace JSON with Redis or SQLite for faster lookup and persistence in multi-user environments.

- Add cache eviction policy (e.g., LRU) to avoid unlimited growth.

- Track cache hit/miss stats for performance analytics.

- Encrypt cache contents if storing sensitive queries or responses.

**Search Layer Enhancements:**

- Improve reranking with better cross-encoders like bge-reranker or cohere models.

- Add semantic filters beyond thread_id (e.g., date, sender, topic).

- Support multi-query or follow-up query handling (e.g., thread-based QA).

- Paginate results and allow sorting based on relevance, timestamp, etc.

- Expose the search as an API with configurable parameters.

**Generation Layer Enhancements:**

- Use function calling / structured output instead of plain text (for automation).

- Support custom prompt templates per use case (summarization, classification, etc.).

- Switch to a self-hosted model (e.g., LLaMA 3, Mistral) for cost and privacy control.

- Limit token count dynamically to avoid truncation of large prompts.

- Stream responses if using GPT-4-turbo for better UX.

**Overall Architecture Enhancements**

- Centralized logging and monitoring (e.g., using logging, Sentry, or Prometheus).

- Unit and integration tests for all layers to ensure robustness.

- Add retry mechanisms for external API calls (OpenAI, Chroma).

- Implement role-based access control (RBAC) if deployed in an enterprise environment.

- Deploy as a containerized microservice (Docker + FastAPI) with endpoints for embedding, search, and generation.

- Add a front-end interface for uploading emails, searching threads, and viewing generated insights.