# RAG System Testing 

Now that we have our database set up (see `database-setup` notebook) and populated (see `pipeline-testing` notebook), we're ready to start developing our retrieval and generation components!

## Retrieval 

We'll compare and contrast 3 retrieval methods:
- Semantic search using `pgvector`'s built in similarity search functionality
- Lexical search
- Hybrid search (with and without tags) 

### Semantic Search with `pgvector`

Semantic searching allows us to ask questions of our data using natural language. Where lexical search uses a naieve direct string comparison to find and surface results, semantic search compares the meaning of phrases using vector operations allowing for more robust searching. 

In [1]:
import sys 

sys.path.append("/Users/srmarshall/Desktop/code/personal/resume-rag/")

In [2]:
from sentence_transformers import SentenceTransformer

# set query 
query = "tell me about your education"

# instantiate the model
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

# generate query embedding
query_embedding = model.encode(query)

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
from utils.database import PgClient
import os 

# instantiate client
pg_client = PgClient(
    pg_host = os.getenv("PG_HOST"), 
    pg_user = os.getenv("PG_USER"), 
    pg_password = os.getenv("PG_PASSWORD"), 
    pg_db = "resume_rag"
)

In [4]:
# use embeddings to search database
results = pg_client.semantic_search(query_embedding, "content_embeddings")

When we print the results we see that all results describe my educational background:
1. This text chunk explicitly details where I graduated and what dicsiplines i studies
2. Although the beginning of the chunk isn't totall relevant, the second part also contains information on my degree (likely as a result of chunk overlap)
3. This describes why I selected the areas of study I did, and is very relevant to describing my education
4. Similar to 3, this provides insight into why I selected the majors I did
5. This speaks more to my work experience than eduction, maybe we can work to filter out results like this using Hybrid search or tags

In [10]:
print(f"User Query: {query}\n")

# print results
print(f"Semantic Search Results: ")
for index, item in enumerate(results):
    print(f"  {index + 1}.) {item[3]}")

User Query: tell me about your education

Semantic Search Results: 
  1.) education i graduated from the university of wisconsin madison in may of 2022 with bachelors of science in economics with a mathematical emphasis and psychology coursework from both degrees are highly relevant to my current area of work i draw on knowledge of human cognition while working alongside
  2.) now that embedding powered technology is on the rise similarly it enables me to quickly ingest new information and apply it to prototypes and projects education i graduated from the university of wisconsin madison in may of 2022 with degrees in psychology and economics with a mathematical emphasis
  3.) and cutting edge research helps me contextualize and quickly apply new techniques and conecepts as they are published the mathematical coursework i completed as part of my economics degree is something i use almost daily statics and linear algebra are everywhere especially now that embedding
  4.) emphasis i opted

### Lexical Search with `tsvector`

