## 1) Easiest: LangChain (0.3.x) with EnsembleRetriever (RRF)

In [None]:
# pip install langchain langchain-community langchain-openai chromadb rank_bm25
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_community.retrievers import BM25Retriever
from langchain.retrievers import EnsembleRetriever
from langchain_core.documents import Document

docs = [
    Document(page_content="The Houston Rockets play at Toyota Center."),
    Document(page_content="NBA fantasy values high-usage players."),
    Document(page_content="Chroma is a local vector database."),
]

# BM25 (in-memory)
bm25 = BM25Retriever.from_documents(docs)
bm25.k = 5

# Vector (Chroma)
emb = OpenAIEmbeddings()
chroma = Chroma.from_documents(docs, embedding=emb)
vec = chroma.as_retriever(search_kwargs={"k": 5})

# Hybrid via Reciprocal Rank Fusion (built-in)
hybrid = EnsembleRetriever(retrievers=[bm25, vec], weights=[0.5, 0.5])  # tweak weights

# Use like any retriever
results = hybrid.get_relevant_documents("Where do the Rockets play?")
for d in results:
    print(d.page_content, d.metadata if d.metadata else "")


Why this is nice: minimal code; EnsembleRetriever uses Reciprocal Rank Fusion (RRF) under the hood (robust, no score normalization needed). You can swap Chroma for FAISS/Pinecone easily

## 2) DIY fusion (FAISS + BM25) with RRF

In [None]:
# pip install sentence-transformers faiss-cpu rank_bm25
from sentence_transformers import SentenceTransformer
from rank_bm25 import BM25Okapi
import numpy as np, faiss

texts = [
    "The Houston Rockets play at Toyota Center.",
    "NBA fantasy values high-usage players.",
    "Chroma is a local vector database.",
]

# ---- BM25 candidates ----
tok = [t.lower().split() for t in texts]
bm25 = BM25Okapi(tok)
q = "Where do the Rockets play?"
bm25_scores = bm25.get_scores(q.lower().split())  # length N

# ---- Vector candidates ----
model = SentenceTransformer("all-MiniLM-L6-v2")
X = model.encode(texts, normalize_embeddings=True)
d = X.shape[1]
index = faiss.IndexFlatIP(d)
index.add(X)
qv = model.encode([q], normalize_embeddings=True)
vec_scores, vec_idxs = index.search(qv, k=len(texts))
vec_scores = vec_scores[0]  # length N but ordered by vec rank
vec_order = vec_idxs[0]     # indices in rank order

# ---- RRF fusion ----
def rrf(ranks, k=60):
    return 1.0 / (k + ranks)  # ranks start at 1

# Build rank maps for each method
bm25_rank = np.argsort(np.argsort(-bm25_scores)) + 1  # 1..N
vec_rank = np.empty_like(bm25_rank)
vec_rank[vec_order] = np.arange(1, len(texts)+1)

alpha = 0.5  # weight BM25 vs vector
fusion = alpha*rrf(bm25_rank) + (1-alpha)*rrf(vec_rank)

# Final ranking
final_order = np.argsort(-fusion)
for i in final_order:
    print(f"{texts[i]} | fused={fusion[i]:.4f} (bm25={bm25_scores[i]:.3f}, vec-rank={vec_rank[i]})")


Why RRF? It’s stable across scales (no messy score normalization). Tune alpha if you want to bias toward exact terms (BM25) or semantics (vector).

 ## 3) Elasticsearch/OpenSearch (BM25 + kNN) with app-side fusion

Elasticsearch/OpenSearch don’t always natively fuse BM25 and kNN into one score. A reliable approach is two queries + fuse in app (again, RRF or normalized sum).

In [None]:
# pip install elasticsearch sentence-transformers
from elasticsearch import Elasticsearch
from sentence_transformers import SentenceTransformer
import numpy as np

es = Elasticsearch("http://localhost:9200")
index = "docs"

# 1) BM25 search
q = "Where do the Rockets play?"
bm25_res = es.search(index=index, query={"match": {"text": q}}, size=50)
bm25_hits = bm25_res["hits"]["hits"]  # each has _id and _score

# 2) kNN search
model = SentenceTransformer("all-MiniLM-L6-v2")
qv = model.encode([q])[0].tolist()
knn_res = es.search(index=index, knn={"field":"emb","query_vector":qv,"k":50,"num_candidates":200})
knn_hits = knn_res["hits"]["hits"]

# 3) Fuse with RRF
def ranks(hits):
    # map doc_id -> rank (1..)
    return {h["_id"]: r+1 for r, h in enumerate(hits)}

r_bm25 = ranks(sorted(bm25_hits, key=lambda h: -h["_score"]))
r_knn  = ranks(knn_hits)  # already rank-ordered for kNN

def rrf_score(doc_id, k=60):
    return (1/(k + r_bm25.get(doc_id, 10**6))) + (1/(k + r_knn.get(doc_id, 10**6)))

doc_ids = {h["_id"] for h in bm25_hits} | {h["_id"] for h in knn_hits}
fused = sorted(doc_ids, key=lambda d: -rrf_score(d))

print(fused[:10])  # top IDs; fetch full docs if needed


Tuning tips (works for all approaches)

- k (top-k per channel): start 20–50 each, fuse, then return top 5–10.

- RRF k constant: 60 is common; bigger softens rank differences.

- Bias toward BM25 (exact names/codes)? increase BM25 weight (e.g., weights=[0.7, 0.3] or higher alpha).

- Add a re-ranker: After fusion top-50, run a cross-encoder (e.g., ms-marco-MiniLM-L-6-v2) to pick the final top-k—big precision boost.

- Filters: apply metadata/date/type filters before fusion to cut noise.

- Chunking: ~200–500 tokens + small overlap tends to help vector recall without diluting meaning