# INDEXING

### Step 0: Adding OpenAI API Key

In [1]:
import os
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'

### Step 1: Loading the documents

In [2]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader('./pdfs/PDF-1.pdf')
pages = loader.load_and_split()

print(type(pages))

Ignoring wrong pointing object 15 0 (offset 0)
Ignoring wrong pointing object 17 0 (offset 0)
Ignoring wrong pointing object 19 0 (offset 0)
Ignoring wrong pointing object 26 0 (offset 0)
Ignoring wrong pointing object 28 0 (offset 0)
Ignoring wrong pointing object 41 0 (offset 0)
Ignoring wrong pointing object 43 0 (offset 0)
Ignoring wrong pointing object 61 0 (offset 0)
Ignoring wrong pointing object 94 0 (offset 0)
Ignoring wrong pointing object 181 0 (offset 0)
Ignoring wrong pointing object 191 0 (offset 0)
Ignoring wrong pointing object 216 0 (offset 0)
Ignoring wrong pointing object 218 0 (offset 0)
Ignoring wrong pointing object 220 0 (offset 0)
Ignoring wrong pointing object 226 0 (offset 0)
Ignoring wrong pointing object 241 0 (offset 0)
Ignoring wrong pointing object 284 0 (offset 0)
Ignoring wrong pointing object 381 0 (offset 0)
Ignoring wrong pointing object 729 0 (offset 0)
Ignoring wrong pointing object 731 0 (offset 0)
Ignoring wrong pointing object 733 0 (offset 0)
I

<class 'list'>


### Step 2: Splitting the texts into smaller chunks

In [3]:
from langchain_text_splitters import RecursiveCharacterTextSplitter  # This splitter splits the chunks from the end of the line

text_splitter = RecursiveCharacterTextSplitter(chunk_size = 2000, chunk_overlap = 300, add_start_index = True)
chunks = text_splitter.split_documents(pages)
print(type(chunks[56]))

<class 'langchain_core.documents.base.Document'>


### Step 3: Store the chunks

In [4]:
from langchain_chroma import Chroma
from langchain_google_genai import GoogleGenerativeAIEmbeddings

vectorstore = Chroma.from_documents(documents=chunks,embedding=GoogleGenerativeAIEmbeddings(model = "models/embedding-001"))
print(type(vectorstore))

  from .autonotebook import tqdm as notebook_tqdm


<class 'langchain_chroma.vectorstores.Chroma'>


In [5]:
rertiever = vectorstore.as_retriever()

# Prompt for Multi-Query

In [13]:
from langchain.prompts import ChatPromptTemplate

# Multi Query: Different Perspectives
template = """You are an AI language model assistant. Your task is to generate five 
different versions of the given user question to retrieve relevant documents from a vector 
database. By generating multiple perspectives on the user question, your goal is to help
the user overcome some of the limitations of the distance-based similarity search. 
Provide these alternative questions separated by newlines. Original question: {question}"""
prompt_perspectives = ChatPromptTemplate.from_template(template)

from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

generate_queries = (
    prompt_perspectives 
    | ChatOpenAI(temperature=0) 
    | StrOutputParser() 
    | (lambda x: x.split("\n"))
)

In [14]:
from langchain.load import dumps, loads

def get_unique_union(documents: list[list]):
    """ Unique union of retrieved docs """
    # Flatten list of lists, and convert each Document to string
    flattened_docs = [dumps(doc) for sublist in documents for doc in sublist]
    # Get unique documents
    unique_docs = list(set(flattened_docs))
    # Return
    return [loads(doc) for doc in unique_docs]

# Retrieve
question = "How to be a good parent?"
retrieval_chain = generate_queries | retriever.map() | get_unique_union
docs = retrieval_chain.invoke({"question":question})
len(docs)

8

In [10]:
docs

[Document(page_content='Intentional Parenting', metadata={'page': 0, 'source': './pdfs/PDF-1.pdf', 'start_index': 0}),
 Document(page_content='Chapter One  \n 22\nguidelines, provide consistent, non-violent, and unconditional love, and \ndemonstrate respect for their childre n’s developmental stages and unique \nneeds within each stage. The objective of positive parenting is to teach \ndiscipline in a way that builds a child’s self-esteem and supports a mutually \nrespectful parent-child relationship without breaking the child’s spirit \n(Godfrey, 2019). The overall picture of positive parenting is that of a warm, \nthoughtful, and loving – but not permissive parent. \nQualities of Children Raised within a Positive Parenting Style \nParental use of a warm, loving, and supportive style, results in children \ndeveloping a strong sense of prosocial behavior, the ability to appropriately \nfunction in social settings, and the understanding of social conventions. \nChildren raised within a 

In [15]:
from operator import itemgetter
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnablePassthrough

# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

llm = ChatOpenAI(temperature=0)

final_rag_chain = (
    {"context": retrieval_chain, 
     "question": itemgetter("question")}
    | prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke({"question":question})

"To be a good parent, one should follow positive parenting practices such as providing consistent, non-violent, and unconditional love, demonstrating respect for their child's developmental stages and unique needs, and teaching discipline in a way that builds the child's self-esteem. Additionally, being warm, loving, and supportive, fostering a nurturing and empowering environment, and modeling positive behavior are essential qualities of good parenting. It is also important to be mindful, conscious, and aware of the child's behavior, focus on the present moment, and make decisions in a consistent and positive manner. Ultimately, a good parent nurtures their child's self-esteem, creativity, belief in a positive future, ability to get along with others, and sense of mastery over their environment."

# RAG-Fusion

In [25]:
from langchain.prompts import ChatPromptTemplate

# RAG-Fusion: Related
template = """You are a helpful assistant that generates multiple search queries based on a single input query. \n
Generate multiple search queries related to: {question} \n
Output (4 queries):"""
prompt_rag_fusion = ChatPromptTemplate.from_template(template)

In [26]:
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

generate_queries = (
    prompt_rag_fusion 
    | ChatOpenAI(temperature=0)
    | StrOutputParser() 
    | (lambda x: x.split("\n"))
)

In [27]:
from langchain.load import dumps, loads

def reciprocal_rank_fusion(results: list[list], k=60):
    """ Reciprocal_rank_fusion that takes multiple lists of ranked documents 
        and an optional parameter k used in the RRF formula """
    
    # Initialize a dictionary to hold fused scores for each unique document
    fused_scores = {}

    # Iterate through each list of ranked documents
    for docs in results:
        # Iterate through each document in the list, with its rank (position in the list)
        for rank, doc in enumerate(docs):
            # Convert the document to a string format to use as a key (assumes documents can be serialized to JSON)
            doc_str = dumps(doc)
            # If the document is not yet in the fused_scores dictionary, add it with an initial score of 0
            if doc_str not in fused_scores:
                fused_scores[doc_str] = 0
            # Retrieve the current score of the document, if any
            previou1s_score = fused_scores[doc_str]
            # Update the score of the document using the RRF formula: 1 / (rank + k)
            fused_scores[doc_str] += 1 / (rank + k)

    # Sort the documents based on their fused scores in descending order to get the final reranked results
    reranked_results = [
        (loads(doc), score)
        for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    ]

    # Return the reranked results as a list of tuples, each containing the document and its fused score
    return reranked_results

question = "How to be a good parent?"
retrieval_chain_rag_fusion = generate_queries | retriever.map() | reciprocal_rank_fusion
docs = retrieval_chain_rag_fusion.invoke({"question": question})
len(docs)

5

In [28]:
from langchain_core.runnables import RunnablePassthrough

# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    {"context": retrieval_chain_rag_fusion,
     "question": itemgetter("question")} 
    | prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke({"question":question})

'To be a good parent, one can start by being intentional in their parenting approach, understanding child development from infancy through adolescence, and being aware of the practical guide to awareness integration theory provided by Nicole Jafari, Foojan Zeine, and Eileen Manoukian in the book "Intentional Parenting." Additionally, exploring career opportunities related to parenting can also contribute to being a good parent.'

# Decomposition

In [16]:
from langchain.prompts import ChatPromptTemplate

# Decomposition
template = """You are a helpful assistant that generates multiple sub-questions related to an input question. \n
The goal is to break down the input into a set of sub-problems / sub-questions that can be answers in isolation. \n
Generate multiple search queries related to: {question} \n
Output (3 queries):"""
prompt_decomposition = ChatPromptTemplate.from_template(template)

In [17]:
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

# LLM
llm = ChatOpenAI(temperature=0)

# Chain
generate_queries_decomposition = ( prompt_decomposition | llm | StrOutputParser() | (lambda x: x.split("\n")))

# Run
question = "How to be a good parent?"
questions = generate_queries_decomposition.invoke({"question":question})

In [18]:
questions

['1. What are effective communication strategies for parents to use with their children?',
 '2. How can parents establish and maintain a positive and supportive relationship with their children?',
 '3. What are some effective discipline techniques for parents to use when addressing challenging behavior in children?']

In [19]:
# Answer each sub-question individually 

from langchain import hub
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

# RAG prompt
prompt_rag = hub.pull("rlm/rag-prompt")

def retrieve_and_rag(question,prompt_rag,sub_question_generator_chain):
    """RAG on each sub-question"""
    
    # Use our decomposition 
    sub_questions = sub_question_generator_chain.invoke({"question":question})
    
    # Initialize a list to hold RAG chain results
    rag_results = []
    
    for sub_question in sub_questions:
        
        # Retrieve documents for each sub-question
        retrieved_docs = retriever.get_relevant_documents(sub_question)
        
        # Use retrieved documents and sub-question in RAG chain
        answ0er = (prompt_rag | llm | StrOutputParser()).invoke({"context": retrieved_docs, 
                                                                "question": sub_question})
        rag_results.append(answer)
    
    return rag_results,sub_questions

# Wrap the retrieval and RAG process in a RunnableLambda for integration into a chain
answers, questions = retrieve_and_rag(question, prompt_rag, generate_queries_decomposition)

In [20]:
def format_qa_pairs(questions, answers):
    """Format Q and A pairs"""
    
    formatted_string = ""
    for i, (question, answer) in enumerate(zip(questions, answers), start=1):
        formatted_string += f"Question {i}: {question}\nAnswer {i}: {answer}\n\n"
    return formatted_string.strip()

context = format_qa_pairs(questions, answers)

# Prompt
template = """Here is a set of Q+A pairs:

{context}

Use these to synthesize an answer to the question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke({"context":context,"question":question})

'To be a good parent, it is important to utilize effective communication strategies with your children, such as being sensitive and responsive, maintaining mutually reciprocal relationships, and providing consistent and unconditional care. Establishing and maintaining a positive and supportive relationship with your children involves practicing positive parenting, which includes nurturing, empowering, and acknowledging their accomplishments. Setting boundaries and discipline in a fair and consistent manner can be achieved by providing consistent, non-violent, and unconditional love, respecting their developmental stages and unique needs, and teaching discipline in a way that builds their self-esteem. Overall, being a good parent involves being mindful, focusing on the present moment, modeling positive behaviors, and making decisions that support a respectful and loving parent-child relationship.'

# HyDE

In [6]:
from langchain.prompts import ChatPromptTemplate

# HyDE document genration
template = """Please write a scientific paper passage to answer the question
Question: {question}
Passage:"""
prompt_hyde = ChatPromptTemplate.from_template(template)

from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

generate_docs_for_retrieval = (
    prompt_hyde | ChatOpenAI(temperature=0) | StrOutputParser() 
)

# Run
question = "How to be a good parent?"
generate_docs_for_retrieval.invoke({"question":question})

"Parenting is a complex and multifaceted task that requires a combination of skills, knowledge, and personal qualities. Research has shown that there are several key factors that contribute to being a good parent. \n\nFirst and foremost, it is important for parents to provide a safe and nurturing environment for their children. This includes ensuring that their basic needs are met, such as food, shelter, and clothing, as well as creating a supportive and loving atmosphere in the home. Studies have shown that children who grow up in a stable and secure environment are more likely to thrive and develop into well-adjusted adults.\n\nIn addition to providing for their children's physical needs, good parents also play an active role in their children's emotional and cognitive development. This includes engaging in activities that promote learning and growth, such as reading to them, playing educational games, and encouraging them to explore their interests and talents. Research has shown th

In [7]:
# Retrieve
retrieval_chain = generate_docs_for_retrieval | retriever
retireved_docs = retrieval_chain.invoke({"question":question})
retireved_docs

[Document(page_content='their early childhood years, the effects of biology and genetics on an \nindividual’s physical development, and the impactfulness of multi-factors \nwithin bioecological systems on child-rearing and parenting. However, \nthe complexity of the contexts and the number of perspectives in the field \nof human growth and development make it difficult to establish a definitive \nconcept that characterizes the best me thod of parenting. To reach a better \nunderstanding of the complexities of raising a child, this chapter is offered \nas an exploration of the topic of parenting to enrich our understanding of \nthe science of parenting.', metadata={'page': 15, 'source': './pdfs/PDF-1.pdf', 'start_index': 1650}),
 Document(page_content='meeting the child’s biological basic needs, physical and spiritual wellbeing, \nsafety and security, and educational needs. Additionally, modern parents \nhave an obligation to create an environment that provides developmentally \nappropr

In [8]:
# RAG RAAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    prompt
    | ChatOpenAI(temperature=0)
    | StrOutputParser()
)

final_rag_chain.invoke({"context":retireved_docs,"question":question})

"To be a good parent, it is important to meet the child's biological basic needs, physical and spiritual well-being, safety and security, and educational needs. Additionally, creating an environment that provides developmentally appropriate practices and experiences is crucial. Parents should also address the psychosocial and emotional needs of the child to ensure healthy emotional development, cultural continuity, and spiritual and religious upbringing. Positive parenting, which involves caring, teaching, leading, communicating, and providing for the needs of a child consistently and unconditionally, is also essential. Building a healthy attachment style through sensitivity, responsivity, and mutually reciprocal relationships is important for the parent-child bond. Adapting to the child's developmental shifts and constantly learning and tackling new parenting challenges are also key aspects of being a good parent. Additionally, concepts such as positive parenting, mindful parenting, a

# Step-Back Prompting

In [16]:
# Few Shot Examples
from langchain_core.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate
examples = [
     {
        "input": "Could the members of The Police perform lawful arrests?",
        "output": "what can the members of The Police do?",
    },
    {
        "input": "Jan Sindel’s was born in what country?",
        "output": "what is Jan Sindel’s personal history?",
    },
]    
# We now transform these to example messages
example_prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}"),
        ("ai", "{output}"),
    ]
)
few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are an expert at world knowledge. Your task is to step back and paraphrase a question to a more generic step-back question, which is easier to answer. Here are a few examples:""",
        ),
        # Few shot examples
        few_shot_prompt,
        # New question
        ("user", "{question}"),
    ]
)

In [17]:
generate_queries_step_back = prompt | ChatOpenAI(temperature=0) | StrOutputParser()
question = "How to be a good Parent?"
generate_queries_step_back.invoke({"question": question})

'What are the qualities of a good parent?'

In [18]:
# Response prompt 
from langchain_core.runnables import RunnableLambda
response_prompt_template = """You are an expert of world knowledge. I am going to ask you a question. Your response should be comprehensive and not contradicted with the following context if they are relevant. Otherwise, ignore them if they are not relevant.

# {normal_context}
# {step_back_context}

# Original Question: {question}
# Answer:"""
response_prompt = ChatPromptTemplate.from_template(response_prompt_template)

chain = (
    {
        # Retrieve context using the normal question
        "normal_context": RunnableLambda(lambda x: x["question"]) | retriever,
        # Retrieve context using the step-back question
        "step_back_context": generate_queries_step_back | retriever,
        # Pass on the question
        "question": lambda x: x["question"],
    }
    | response_prompt
    | ChatOpenAI(temperature=0)
    | StrOutputParser()
)

chain.invoke({"question": question})

"Being a good parent involves a combination of factors that contribute to the overall well-being and development of your child. Here are some key aspects to consider:\n\n1. **Unconditional Love and Support**: One of the most important things you can do as a parent is to show your child unconditional love and support. This means being there for them no matter what, and letting them know that you will always be their biggest cheerleader.\n\n2. **Effective Communication**: Good communication is essential in any relationship, including the one between parent and child. Listen to your child, validate their feelings, and communicate openly and honestly with them.\n\n3. **Setting Boundaries**: While it's important to be loving and supportive, it's also crucial to set boundaries and provide structure for your child. This helps them feel safe and secure, and teaches them important life skills.\n\n4. **Positive Reinforcement**: Encouraging and praising your child for their accomplishments, no ma

# Retrieval and Generation

### Step 1: Retieval of similar documents from the vector store


In [5]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 3})
retrieved_docs = retriever.invoke("How to be a good parent?")
# for doc in retrieved_docs:
#     print(doc.page_content)
#     print("***************************************")

### Step 2: Generating

In [6]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo",temperature=0.4,max_tokens=1024)

from langchain import hub

prompt = hub.pull('rlm/rag-prompt')

from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

for chunk in rag_chain.stream("How to be a good parent?"):
    print(chunk, end="", flush=True)

Being a good parent involves being intentional in your approach to parenting, which means being aware of your actions and decisions. It is important to integrate theory and practical guidance in your parenting style to ensure the best outcomes for your children. By being intentional in your parenting, you can create a nurturing and supportive environment for your children to thrive.