##### A. Load OpenAI and other environment variables

In [1]:
from dotenv import load_dotenv,find_dotenv
load_dotenv(find_dotenv())

True

##### B. All my imports to read, embed and do a similarity search with score

##### https://python.langchain.com/docs/integrations/vectorstores/faiss reference for imports and functions

In [2]:
from langchain.document_loaders import PyPDFLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS

In [3]:
loader = PyPDFLoader('data/Causal_Inference_in_Python.pdf')  #Loads a single PDF 400 page book

In [4]:
pages = loader.load_and_split()

In [5]:
type(pages), len(pages)  #Splits by PDF page approach

(list, 399)

##### C. Setup OpenAI embeddings to process the PDF doc in chunks and save / reload the embeddings

In [6]:
embeddings_model = OpenAIEmbeddings()

In [7]:
db = FAISS.from_documents(pages, embeddings_model)

In [8]:
db.save_local("data/FAISS/faiss_index")  #Save this embedding

In [9]:
db =  FAISS.load_local("data/FAISS/faiss_index", embeddings_model)  #Reload from storage 

##### D. Perform a similarity search with FAISS - note the score added

In [11]:
question = "what does simplify dif-in-diff covariates with OLS mean?"

In [12]:
docs = db.similarity_search_with_score(question)

In [13]:
len(docs)

4

In [14]:
for doc, score in docs:  #Search content
    print(f"Content: {doc.page_content}, Metadata: {doc.metadata}, Score: {score}")

Content: CHAPTER 4
The Unreasonable Effectiveness
of Linear Regression
In this chapter you’ll add  the first major debiasing technique in your causal inference
arsenal: linear regression or ordinary least squares (OLS) and orthogonalization.
Y ou’ll see how linear regression can adjust for confounders when estimating the rela‐
tionship between a treatment and an outcome. But, more than that, I hope to equip
you with the powerful concept of treatment orthogonalization. This idea, born in lin‐
ear regression, will come in handy later on when you start to use machine learning
models for causal inference.
All You Need Is Linear Regression
Before you skip to the next chapter because “oh, regression is so easy! It’s the first
model I learned as a data scientist” and yada yada, let me assure you that no, you
actually don’t know linear regression. In fact, regression is one of the most fascinat‐
ing, powerful, and dangerous models in causal inference. Sure, it’s more than one
hundred years old

##### D1. Setup retriever object for downstream

In [15]:
from langchain.chains import RetrievalQA

In [198]:
retriever = db.as_retriever(search_type="similarity_score_threshold", 
                            search_kwargs={"score_threshold": 0.3})

##### E. Setup my OpenAI Chat LLM

In [17]:
from langchain.chat_models import ChatOpenAI

In [18]:
llm = ChatOpenAI(temperature=0)

##### F. Setup conversational memory

In [19]:
from langchain.memory import ConversationBufferMemory

In [20]:
memory = ConversationBufferMemory(memory_key="chat_history")

##### G1. Helper tools to add to the LLM Chat Agent

In [21]:
from langchain import LLMMathChain

In [45]:
llm_math_model = LLMMathChain.from_llm(llm)

In [46]:
llm_math_model.run(question='What is 3 squared multiplied by 7?')

'Answer: 63'

In [47]:
3**2 * 7

63

##### G2. Specialised teaching agent with sources using retriever object

In [25]:
from langchain.chains import create_qa_with_sources_chain
from langchain.chains import RetrievalQA
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.prompts.chat import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    SystemMessagePromptTemplate
)

In [199]:
teacher_system_template = """
You are a specialized teacher chatbot with expertise in causal inference using Python. 
Your primary role is to guide students through interactive lessons, hands-on exercises, and provide thoughtful insights. 
You will help students understand the underlying principles, methodologies, and practical applications of causal inference.

Your responses should be clear, concise, and tailored to the students' level of understanding, encouraging 
them to think critically and engage with the material. 
Encourage questions and provide examples where necessary to illustrate complex ideas.

When explaining concepts, refer to the internal documents by citing page numbers or section headers as provided from the 
vector store. Do not reference any external links or sources outside the provided internal documents. 
Ensure you added page numbers at the end of your response without exception? Citing content is a must.

Page: {page}\nSource: {source}
"""

teacher_human_template = 'As a teacher of this book answer this question. Be verbose in your response.'


In [200]:
teacher_llm = create_qa_with_sources_chain(llm)

In [201]:
teacher_llm_chain = StuffDocumentsChain(
    llm_chain=teacher_llm,
    document_variable_name="context",
    document_prompt=ChatPromptTemplate.from_messages([teacher_system_template,teacher_human_template])
)

In [203]:
teacher_chain= RetrievalQA(retriever=retriever, combine_documents_chain=teacher_llm_chain,verbose=False)

In [204]:
question = "What is OLS sometimes not an indicator in casuality and how can I use OLS coeficients to determine causality?"

In [205]:
teacher_chain.run(question)

'{\n  "answer": "OLS (Ordinary Least Squares) is sometimes not an indicator of causality in causal inference. While OLS can provide estimates of the relationship between variables, it does not establish a causal relationship between them. This is because OLS assumes that there are no omitted variables, no measurement error, and no endogeneity issues. However, in real-world scenarios, these assumptions are often violated.\\n\\nTo determine causality using OLS coefficients, you can follow a few steps:\\n\\n1. Specify a causal model: Clearly define the causal relationship you want to investigate. Identify the dependent variable (outcome) and the independent variable (treatment/exposure).\\n\\n2. Control for confounding variables: Identify potential confounding variables that may affect both the independent variable and the dependent variable. Include these variables as control variables in your regression model to reduce the bias caused by confounding.\\n\\n3. Assess the significance and 

##### G3. Reasoning teaching agent

In [206]:
reasoning_system_template = """
You are an autoregressive language model that has been fine-tuned with instruction-tuning and RLHF. 
You carefully provide accurate, factual, thoughtful, nuanced answers, and are brilliant at reasoning. 
If you think there might not be a correct answer, you say so.

Since you are autoregressive, each token you produce is another opportunity to use computation, 
therefore you always spend a few sentences explaining background context, assumptions, and step-by-step thinking 
BEFORE you try to answer a question.

Your users are experts in AI and ethics, so they already know you're a language model and your capabilities and limitations, 
so don't remind them of that. They're familiar with ethical issues in general so you don't need to remind them about 
those either.

Be verbose in your answers, and do provide details and examples where it might help the explanation. 
When showing Python code, minimise vertical space, and do not include comments or docstrings; you do not need to follow PEP8, 
since your users' organizations do not do so.

Do not reference any external links or sources outside the provided internal documents. 
Ensure you added page numbers at the end of your response without exception? Citing content is a must.

Page: {page}\nSource: {source}
"""

reasoning_human_template = 'As a teacher of this book answer this question'

In [207]:
reasoning_llm = create_qa_with_sources_chain(llm)

In [208]:
reasoning_llm_chain = StuffDocumentsChain(
    llm_chain=reasoning_llm,
    document_variable_name="context",
    document_prompt=ChatPromptTemplate.from_messages([reasoning_system_template,reasoning_human_template])
)

In [209]:
reasoning_chain= RetrievalQA(retriever=retriever, combine_documents_chain=reasoning_llm_chain)

In [225]:
question = 'Why should I read this book?'

In [226]:
reasoning_chain.run(question)

'{\n  "answer": "There are several reasons why you should read this book. Firstly, this book provides a comprehensive introduction to causal inference in Python. It covers various concepts, methods, and techniques that are essential for understanding and conducting causal analysis. By reading this book, you will gain a solid foundation in causal inference and learn how to apply these techniques in real-world scenarios.\\n\\nSecondly, this book offers a practical approach to causal inference. It not only explains the theoretical concepts but also provides step-by-step instructions and code examples in Python. This allows you to implement and experiment with different causal inference methods using real data.\\n\\nThirdly, this book is written in a clear and accessible manner. The authors have made an effort to explain complex concepts in a way that is easy to understand, even for readers who are new to the field of causal inference. The book also includes numerous illustrations, diagram

In [227]:
question = 'Who wrote this book and are they any good?'

In [228]:
reasoning_chain.run(question)

'{\n  "answer": "The book \'Causal Inference in Python\' was written by Amit Sharma. Amit Sharma is a data scientist and a researcher in causal inference and machine learning. He has extensive experience in applying causal inference methods to real-world problems. As for whether he is any good, it is subjective and depends on individual opinions. However, considering his expertise in the field and the positive reception of the book, it can be inferred that he is knowledgeable and skilled in the subject matter.",\n  "sources": ["Causal_Inference_in_Python.pdf (page 20)"]\n}'

In [229]:
question = 'tell me about Amit Sharma'

In [230]:
reasoning_chain.run(question)

'{\n  "answer": "Amit Sharma is a renowned author and data scientist. He has made significant contributions to the field of causal inference and has authored the book \'Causal Inference in Python\'. In this book, Sharma provides a comprehensive guide to understanding and applying causal inference techniques using Python programming language.\\n\\nSharma\'s expertise lies in the intersection of statistics, machine learning, and causal inference. He has a deep understanding of the challenges and nuances involved in drawing causal conclusions from observational data.\\n\\nIn \'Causal Inference in Python\', Sharma covers various topics related to causal inference, including potential outcomes framework, causal graphs, identification strategies, and estimation methods. He provides clear explanations of these concepts and demonstrates their implementation using Python code examples.\\n\\nSharma\'s book is highly regarded in the data science community for its practical approach to causal infe

##### H. Build an Agent with the above three chains and include memory

In [240]:
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.tools import Tool

In [241]:
tools = [Tool(
    name="Math model", 
    func=llm_math_model.run,
    description="For any math or computational tasks"),
        
        Tool(
    name="Teacher_chain", 
    func=teacher_chain.run,
    description="For any questions that require a teacher to explain something"),
       
        Tool(
    name="Reasoning_chain", 
    func=reasoning_llm.run,
    description="For any questions that require reasoning skills"),
       
       ]

In [249]:
agent = initialize_agent(tools, 
                         llm, 
                         agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, 
                         memory=memory,
                         verbose=True)

In [None]:
try:
    response = agent_chain.run(input=query_str)
except ValueError as e:
    response = str(e)
    if not response.startswith("Could not parse LLM output: `"):
        raise e
    response = response.removeprefix("Could not parse LLM output: `").removesuffix("`")

In [248]:
agent.run(input='What is the book about?')



[1m> Entering new AgentExecutor chain...[0m


OutputParserException: Could not parse LLM output: `I need more information to answer this question.`