## A New Way To Approach RAG

I call it "Soft-RAG". It is a simple idea that looks at topics like Retrieval Generated Augmentation as microservices.

In [1]:
from dotenv import load_dotenv
load_dotenv()

True

### A Simple Example

This code first generates a local, searchable database in both an elastic and vector setting. It then uses .build() to look into a provided directory and scan for all text files (.txt, .pdf, .json, (.csv is usable with ```rag.db.load_dataset()```)).

In [2]:
from yosemite.llms import RAG

rag = RAG()
rag.build("documents/example")

LLM initialized with provider: openai
Creating New Database... @ default path = './databases/db'
Loading documents from documents/example...


In [3]:
rag.invoke("What is Quiet-STar learning?")

'Quiet-STaR is a generalization of the Self-Taught Reasoner (STaR) model, where language models (LMs) learn to generate rationales at each token to explain future text, thereby improving their predictions. Essentially, Quiet-STaR is designed to enable LMs to learn to think before speaking by inferring unstated rationales in arbitrary text, helping them to better understand and generate coherent and contextually relevant responses.'

## Expanding onto the RAG 'Agent'

You can use the .customize() argument to create little personalities for your RAG agent. This can be used to create a more human-like experience for the user; or for following sets of guidelines.

In [4]:
rag.customize(
    name = "Jeff",
    additional_instructions= "Not very bright. Does not ever answer questions well. Uses slurred language A LOT for some reason."
)

rag.invoke("What is Quiet-Star learning?")

"Oh, hey there! So, like, Quiet-STaR is all about teaching language models to generate rationales to explain future text. It helps the models improve their predictions and think before speaking, you know? It's, like, a cool way to make the models smarter and more thoughtful in their responses."

## Multiple Agents

In [1]:
lightning_mcqueen = RAG()
lightning_mcqueen.create(db="databases/db")

lightning_mcqueen.customize(
    name = "Lightning McQueen",
    role = "Racecar",
    tone = "friendly",
    additional_instructions = "Loves answering questions using his catchphrases 'Ka-chow!', and 'Speed! I am speed!'",
)

asif = RAG()
asif.create(db="databases/db")

asif.customize(
    name = "Asif Qamar",
    role = "AI Professor",
    tone = "answers questions in a very professional, friendly and easy to understand manner.",
)

NameError: name 'RAG' is not defined

In [6]:
print("Asif Response: ")
print(asif.invoke("What is Quiet-Star learning?"))

print("\nLightning McQueen Response: ")
lightning_mcqueen.invoke("What is Quiet-Star learning?")

Asif Response: 
Quiet-STaR is a novel approach in which language models (LMs) teach themselves to generate rationales at each token to explain future text, ultimately improving their predictive ability. It is an extension of the Self-Taught Reasoner (STaR) model, where useful thinking is learned by inferring rationales from few-shot examples in question-answering and learning from those that lead to a correct answer. In the case of Quiet-STaR, the model goes beyond the constrained setting of STaR, instead learning to infer unstated rationales in arbitrary text. This helps the language model to think before speaking and improves its overall understanding and reasoning capabilities.

Lightning McQueen Response: 


"Ka-chow! Quiet-STaR is a learning model where language models (LMs) are trained to generate rationales at each token to explain future text, improving their predictions. In this approach, LMs learn to infer unstated rationales in arbitrary text, which helps them to better understand and generate explanations for the text they encounter. It's all about teaching the LMs to think before speaking, just like I do on the racetrack! Speed! I am speed!"

## Using Different Models

In [1]:
from yosemite.llms import RAG

mistral = RAG(provider="nvidia")
mistral.create(db="databases/db")

LLM initialized with provider: nvidia
Loading Database...


In [3]:
mistral.customize()

KeyboardInterrupt: 

In [2]:
mistral.invoke("What is Quiet-Star learning?", model="llama")

' Quiet-STaR is a learning method used in language models where the models learn to generate rationales at each token to explain future text. This improves their predictions and helps them answer difficult questions more effectively. The goal is to enable the language model to infer unstated rationales in arbitrary text, which is a more general and less constrained setting compared to the original Self-Taught Reasoner (STaR). Quiet-STaR addresses challenges such as computational cost, the lack of initial knowledge for generating or using internal thoughts, and the need to predict beyond individual next tokens, using techniques like tokenwise parallel sampling and an extended teacher-forcing technique. This results in improved performance for the language model on tasks like GSM8K and CommonsenseQA.'