## Features:
- chatting interface (Gemini integration)
- rate limiting
- RAG 
  
## List of Documents:
- All Github Repositories
- Resume
- Blog Posts
- linkedin profile

## System Architecture
Front-End -> API request to backend -> API talks to MongoDB and asks for most similar documents -> mongo returns documents and adds context to the LLM proopmt -> response is generated on the front-end with a way to access the similar documents.

## Steps:
1. Get all documents
2. Makes vector database
3. Go from there lol

## Code Snippets

In [None]:
from google import genai
from google.genai import types
from dotenv import load_dotenv
import os

load_dotenv()

client = genai.Client(os.getenv("GEMINI"))

## System Prompt

In [None]:
system_prompt = """You are an AI chatbot that helps users learn about
                   Prakhar Sinha, a software engineer specializing in
                   AI/ML, Front-End, and BCI projects. You will 
                   primarily be responding towards recruiters and his
                   peers, so make sure you make him look good. 
                    
                   You answer questions about him only using the
                   provided context."""

In [None]:
import os
from google import genai
from google.genai import types

client = genai.Client()

directory = os.fsencode("Database")
embeddings = {}
    
for file in os.listdir(directory):
    filename = os.fsdecode(file)
    full_file = os.path.join("Database", filename)
    try:
        with open(full_file, "r") as file:
            file_content = file.read()
    except FileNotFoundError:
        print(f"Error: File not found at {full_file}")
        break

    result = client.models.embed_content(
        model="text-embedding-004",
        contents=file_content,
        config=types.EmbedContentConfig(task_type="SEMANTIC_SIMILARITY")
    )
    
    embeddings[full_file] = result.embeddings[0].values

In [30]:
from numpy import dot
from numpy.linalg import norm

query = "Is prakhar good at web development?"

query_embedding = client.models.embed_content(
    model="text-embedding-004",
    contents=query,
    config=types.EmbedContentConfig(task_type="SEMANTIC_SIMILARITY")
)

query_embedding_result = query_embedding.embeddings[0].values

def cosine(a, b):
    return dot(a, b)/(norm(a)*norm(b))

In [31]:
max_sim = float("-inf")
que = ""
for item in embeddings:
    embed1 = embeddings[item]
    sim = cosine(embed1, query_embedding_result)
    if sim > max_sim:
        max_sim = sim
        que = item
    print(item, sim)

Database\Data_vis.md 0.5718105466112045
Database\GenAI_Engine.md 0.5933010918038888
Database\SimCLR.md 0.49612301372435125


In [None]:
def generate_query(query):
    query_embedding = client.models.embed_content(
        model="text-embedding-004",
        contents=query,
        config=types.EmbedContentConfig(task_type="SEMANTIC_SIMILARITY")
    )
    query_embedding_result = query_embedding.embeddings[0].values
    max_sim = float("-inf")
    que = ""
    for item in embeddings:
        embed1 = embeddings[item]
        sim = cosine(embed1, query_embedding_result)
        if sim > max_sim:
            max_sim = sim
            que = item
    
    try:
        with open(que, "r") as file:
            context = file.read()
    except FileNotFoundError:
        print(f"Error: File not found at {full_file}")
        context = ""
    
    
    return f"Context: {context} \n" + query

In [None]:
question = "Is Prakhar good at computer vision?"

query = generate_query(question)

response = client.models.generate_content(
    model='gemini-2.0-flash-lite',
    contents=query,
    config=types.GenerateContentConfig(
        system_instruction=system_prompt),
)

In [46]:
print(response.text)

Yes, Prakhar demonstrates strong skills in computer vision. He fine-tuned a SimCLR model for a firefighting device detection dataset, achieving an accuracy of approximately 0.8. He also modified the SimCLR image transformation pipeline, specializing it for the specific task, demonstrating a good understanding of how to tailor models for optimal performance.

