This notebook shows a simple implementation of Retrieval Augmented Generation (RAG) using an LLM and to generate text similar to Rumi poetry

In [1]:
import os
import tqdm as notebook_tqdm
from langchain.llms import HuggingFaceHub
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceInferenceAPIEmbeddings
from langchain.vectorstores import Chroma
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

In [23]:
import warnings
warnings.filterwarnings('ignore')
# Known warnings : Langchain is still using the deprecated huggingface_hub InferenceApi instead of newer InferenceApi

In [111]:
import getpass

hf_inference_api_key = getpass.getpass("Enter your HF Inference API Key:\n\n")

Enter your HF Inference API Key:

 ········


Select llm from huggingface

In [112]:
llm = HuggingFaceHub(repo_id = "HuggingFaceH4/zephyr-7b-alpha",
                     huggingfacehub_api_token = hf_inference_api_key,
                     model_kwargs={
                                   "temperature": 0.7,
                                   "do_sample": True,
                                   "num_beams": 5,
                                   "num_beam_groups": 4,
                                   "no_repeat_ngram_size": 3,
                                   "exponential_decay_length_penalty": (8, 0.5),
                                   "repetition_penalty": 1.3
                        })

Load text file created from Rumi poems and used in prior LSTM and seq2seq projects.

In [10]:
loader = TextLoader("/Users/isaacobaid/rumi_project/rumi.txt")
document = loader.load()

1


In [11]:
# Assuming `documentclasstext` is a Document object
def json_serializable(documentclasstext):
    text_list = []
    for text in range(len(documentclasstext)):
        doc = documentclasstext[text]
        json_serializable_doc = {'page_content': doc.page_content,'metadata': doc.metadata}
        text_list.append(json_serializable_doc.get('page_content'))
    return text_list

fulltext = json_serializable(document)

In [98]:

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=300,
    chunk_overlap=30,
    length_function=len,
    add_start_index=True,
)

texts = text_splitter.create_documents(fulltext)

print("Number of chunks: " + str(len(texts)))
print("\nSample chunk:" + "\n")
print(texts[-10])
print()
#print(texts[-2] + "\n")
#print(texts[-1] + "\n")

Number of chunks: 1252

Sample chunk:

page_content='for what circles so perfectly.\nA secret turning in us makes the universe turn.\nHead unaware of feet,\nand feet head. Neither cares.\nThey keep turning.\nThis moment this love comes to rest in me, many beings in one being.' metadata={'start_index': 283068}



Embeddings from HuggingFace

In [30]:
embeddings = HuggingFaceInferenceAPIEmbeddings(
    api_key=hf_inference_api_key, 
    model_name="sentence-transformers/all-MiniLM-l6-v2"
)

Save embeddings into a database for similarity query. Sample retriever with query words.

In [71]:
db = Chroma.from_texts(json_serializable(texts), embeddings)
retriever = db.as_retriever(search_type="mmr", k=6) #maximal marginal relevance for similarity + diversity


Sample retriever with query words.

In [113]:

subject = "mystic, mountains and dreams"

retrieved_docs = retriever.invoke(subject)
retrieved_docs

[Document(page_content="of comfort and pain is better\nthan any attending ritual. That splinter of intelligence is substance.\nThe fire and water themselves accidental, done with mirrors.\n\nFor sixty years I have been forgetful,\nevery minute, but not for a second\nhas this flowing toward me stopped or slowed. I deserve nothing. Today I recognize\nthat I am the guest the mystics talk about.\nI play this living music for my host.\nEverything today is for the host.\nI saw you last night in the gathering,\nbut could not take you openly in my arms,\nso I put my lips next to your cheek, pretending to talk privately.\n\nOutside, the freezing desert night.\nThis other night inside grows warm, kindling. Let the landscape be covered with thorny crust. We have a soft garden in here.\nThe continents blasted,\ncities and little towns, everything\nbecome a scorched, blackened ball.\n\n\n The news we hear is full of grief for that future, but the real news inside here\nis there's no news at all.\nF

Setup basic template

In [57]:

template = """

A ghazal is a poetic form that consists of rhyming couplets and a refrain, with each line sharing the same meter.

Here is an example of a ghazal about vitality, eternal peace and love: 

Where the water of life flows, no illness remains. 
In the garden of union, no thorn remains. 
They say there's a door between one heart and another. 
How can there be a door where no wall remains?

Write a ghazal about {question} based on the following context: {context}

"""


Generate text

In [114]:

prompt = PromptTemplate.from_template(template)

model = llm

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

response = chain.invoke(subject)
print(response)

My mind wanders somewhere high above and wildly beyond
Mountains pass by beneath the folds of time
Dreams dance their own choreography like morning mist before dawn
As I sit quietly listening intently to the silent melody within

At times lost in thought like wisps of smoke escaping from afar
Yet somehow compelled to return once more
To seek solace in the stillness and simplicity of pure nature's call
And sense the sacred presence dwelling deep within every soul

Through ages past and countless generations gone
Love has remained steadfast as both mountain and song
An unrelenting force that endures far longer than mortals dare conceive
Guiding souls while teaching them humility and gratitude to receive
