In [1]:
from resources.yosemitecommon import YosemiteNotebookCommon
note = YosemiteNotebookCommon()
note.init()
note.display_title(title="RAG", header="Retrieval Augmented Generation")
note.box_text_code(title="Goal - Simple Modular RAG", content="At the very least, you need to build a pipeline with a few stages to properly setup RAG. Below is sample outline of libraries that would be necessary and still not the full list needed to properly achieve this goal. As a developer at home, it is a huge hassle building such a long pipeline for a simple concept.",language="python", code=str(f"""
# Parsing Libraries \n
from parser_x import PDF
from parser_y import Text
from ... import Image, GraphParser, etc...

# NLP
from cleaner import TextCleaner, ..Tokenizer, Chunker
from transformer import SentenceTransformer, CrossEncoder, ..SimilaritySearch

# Most Cases would Require a PreExisting Database Service
# Database Service & Libraries
from database.service import DatabaseClient, Search Client

# Vector Index & Search
from vector_library import IndexCreator, IndexSearcher
....
"""))

from dotenv import load_dotenv
load_dotenv()

True

In [2]:
note.display_subtitle("Lets start building the Database!")

In [3]:
# One Class for the RAG Pipeline
from yosemite.llms import RAG

# Create an Instance of the RAG Class
rag = RAG()

# Build the RAG Database using a directory of specified Documents
rag.build("documents/")

LLM initialized with provider: openai
Creating New Database... @ default path = './databases/db'
Loading documents from documents/...


In [9]:
note.display_subtitle("Thats It.")
note.display_subtitle("You have a fully functional RAG Database. Now you can start querying the database for information.")

In [10]:
results = rag.search("Who is going to win the Piston Cup?")

In [11]:
for result in results:
    print(f"Document ID: {result['document_id']}")
    print(f"Chunk: {result['chunk']}")
    print(f"Relevance Score: {result['relevance_score'][1]}")
    print("---")

Document ID: 9291cc69-7306-4dff-aeb4-d33b6a51e1ea
Chunk: Oh, yeah, that... That is spectacular advice. Thank you, Mr. The King. Ladies and gentlemen, for the first time in Piston Cup history... A rookie has won the Piston Cup.
Relevance Score: 9.376338958740234
---
Document ID: 9291cc69-7306-4dff-aeb4-d33b6a51e1ea
Chunk: Lightning McQueen is gonna win the Piston Cup! Come on! You got it! You got it, Stickers! I am not comin' in behind you again, old man.
Relevance Score: 6.979335784912109
---
Document ID: 9291cc69-7306-4dff-aeb4-d33b6a51e1ea
Chunk: Will you still race for the Piston Cup? - Stickers? - Sally! Come on, give us some bolt! You're here! Thank the manufacturer! You're alive! - Mack?
Relevance Score: 6.312305927276611
---
Document ID: 9291cc69-7306-4dff-aeb4-d33b6a51e1ea
Chunk: If this gets more exciting, they're gonna have to tow me outta the booth! Right you are, Darrell. Three cars are tied for the season points lead, heading into the final race of the season. And the winn

In [7]:
note.display_subtitle("Time to Generate a Response!")

In [12]:
rag.invoke("What is Quiet-STaR learning?")

"Quiet-STaR learning is a method of machine learning that focuses on minimizing the impact of noisy or irrelevant data during the training process. It emphasizes the importance of identifying and disregarding irrelevant features or examples to improve the overall performance of the model. This approach helps to reduce the impact of noisy data, leading to more accurate and reliable predictions. It's a valuable technique in situations where data quality is a concern, and it can be particularly beneficial for enhancing the robustness of machine learning models."