### Reseach paper analysis Using LangGraph Rag and Graphs

##### source (mistral cookbook) = https://github.com/mistralai/cookbook/blob/main/third_party/langchain/langgraph_crag_mistral.ipynb
##### research paper = "DETERMINANTS OF LLM-ASSISTED DECISION-MAKING" from Arxiv


In [2]:
import os
from dotenv import load_dotenv
load_dotenv()

# Set your OpenAI API key
os.environ['OPENAI_API_KEY'] = ""

In [3]:
# Set your Tavily API key
os.environ["TAVILY_API_KEY"] = ""

# Building RAG

#### Download and prepare File

In [4]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Load research paper. This may take 1-2 minutes since the PDF is large
sec_filing_pdf = "https://arxiv.org/pdf/2402.17385"

# Create your PDF loader
loader = PyPDFLoader(sec_filing_pdf)

# Load the PDF document
documents = loader.load()

# Chunk the research paper
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

#### Load the file into vector store

In [5]:
from langchain_community.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings

# Load the document into Chroma
embedding_function = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(docs, embedding_function)

retriever = vectorstore.as_retriever()

  warn_deprecated(


##### Test RAG

In [6]:
from langchain_core.prompts import ChatPromptTemplate
from langchain.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough

from langchain_core.output_parsers import StrOutputParser
from langchain_core.output_parsers import JsonOutputParser


In [7]:
from langchain_openai import ChatOpenAI
llm=ChatOpenAI(temperature=0.5, model_name="gpt-3.5-turbo")

In [8]:
#RAG Chain
prompt = PromptTemplate(
    template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
    You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\n

     <|eot_id|><|start_header_id|>user<|end_header_id|>
    QUESTION: {question} \n
    CONTEXT: {context} \n
    Answer:
    <|eot_id|>
    <|start_header_id|>assistant<|end_header_id|>
    """,
    input_variables=["question","context"],
)

chain = prompt | llm | StrOutputParser()

QUESTION = """Who's the Author of DETERMINANTS OF LLM-ASSISTED DECISION-MAKING?"""
CONTEXT = retriever.invoke(QUESTION)

result = chain.invoke({"question": QUESTION, "context":CONTEXT})

print(result)

The authors of "DETERMINANTS OF LLM-ASSISTED DECISION-MAKING" are Eva Eigner and Thorsten Händler from the Ferdinand Porsche Mobile University of Applied Sciences (FERNFH) in Wiener Neustadt, Austria.


In [9]:
rag_chain = (
    {"context": retriever , "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [10]:
rag_chain.invoke("How do Large Language Models (LLMs) enhance human decision-making processes?")

'Large Language Models (LLMs) enhance human decision-making processes by offering versatile assistance in summarizing extensive text data, generating different solutions, and identifying patterns. They can provide support in analyzing decision situations, evaluating alternatives, and simulating debates. Ultimately, LLMs can help decision-makers comprehend key insights swiftly and enhance the creation of various alternatives.'

In [11]:
rag_chain.invoke("How does task difficulty affect decision-making with LLM support?")

'Task difficulty can lead individuals to rely more on LLMs for assistance in decision-making processes. As tasks become more difficult, people tend to over-rely on decision aids like advice from algorithms or decision-support systems. However, the influence of task difficulty on user-reliance diminishes with higher expertise levels.'

In [12]:
rag_chain.invoke("Why is decision-making considered a fundamental capability in everyday life?")

'Decision-making is considered a fundamental capability in everyday life because it involves evaluating options to achieve goals, relying on skills, values, preferences, and beliefs. Situational and contextual variables, such as time pressure, also influence the decision-making process. Additionally, accountability, irreversibility, and the significance of a decision can impact the level of analytical information processing involved.'

In [13]:
rag_chain.invoke("What are some potential risks associated with the increased capabilities of LLMs?")

"Some potential risks associated with the increased capabilities of LLMs include perpetuating unfair discrimination, leakage of private data, and use for harmful purposes such as fraud or virus development. Additionally, there is a risk of overestimating the system's capabilities, leading to over-reliance or unsafe use. The lack of transparency, explainability, and reproducibility in LLM research also pose challenges and risks."

#### Define graph State

In [14]:
from typing import Dict, TypedDict

from langchain_core.messages import BaseMessage


class GraphState(TypedDict):
    """
    Represents the state of our graph.

    Attributes:
        keys: A dictionary where each key is a string.
    """

    keys: Dict[str, any]

#### Define the graph's Nodes and Edges

In [15]:
import json
import operator
from typing import Annotated, Sequence, TypedDict

from langchain import hub
from langchain.output_parsers.openai_tools import PydanticToolsParser
from langchain.prompts import PromptTemplate
from langchain.schema import Document
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_community.vectorstores import Chroma
from langchain_core.messages import BaseMessage, FunctionMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_core.runnables import RunnablePassthrough
from langchain_core.utils.function_calling import convert_to_openai_tool
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

### Nodes ###


def retrieve(state):
    """
    Retrieve documents

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): New key added to state, documents, that contains retrieved documents
    """
    print("---RETRIEVE---")
    state_dict = state["keys"]
    question = state_dict["question"]
    documents = retriever.get_relevant_documents(question)
    return {"keys": {"documents": documents, "question": question}}


def generate(state):
    """
    Generate answer

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): New key added to state, generation, that contains LLM generation
    """
    print("---GENERATE---")
    state_dict = state["keys"]
    question = state_dict["question"]
    documents = state_dict["documents"]

    # Prompt
    prompt = hub.pull("rlm/rag-prompt")

    # LLM
    llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0, streaming=True)

    # Post-processing
    def format_docs(docs):
        return "\n\n".join(doc.page_content for doc in docs)

    # Chain
    rag_chain = prompt | llm | StrOutputParser()

    # Run
    generation = rag_chain.invoke({"context": documents, "question": question})
    return {
        "keys": {"documents": documents, "question": question, "generation": generation}
    }


def grade_documents(state):
    """
    Determines whether the retrieved documents are relevant to the question.

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Updates documents key with relevant documents
    """

    print("---CHECK RELEVANCE---")
    state_dict = state["keys"]
    question = state_dict["question"]
    documents = state_dict["documents"]

    # Data model
    class grade(BaseModel):
        """Binary score for relevance check."""

        binary_score: str = Field(description="Relevance score 'yes' or 'no'")

    # LLM
    model = ChatOpenAI(temperature=0, model="gpt-4-0125-preview", streaming=True)

    # Tool
    grade_tool_oai = convert_to_openai_tool(grade)

    # LLM with tool and enforce invocation
    llm_with_tool = model.bind(
        tools=[convert_to_openai_tool(grade_tool_oai)],
        tool_choice={"type": "function", "function": {"name": "grade"}},
    )

    # Parser
    parser_tool = PydanticToolsParser(tools=[grade])

    # Prompt
    prompt = PromptTemplate(
        template="""You are a grader assessing relevance of a retrieved document to a user question. \n
        Here is the retrieved document: \n\n {context} \n\n
        Here is the user question: {question} \n
        If the document contains keyword(s) or semantic meaning related to the user question, grade it as relevant. \n
        Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question.""",
        input_variables=["context", "question"],
    )

    # Chain
    chain = prompt | llm_with_tool | parser_tool

    # Score
    filtered_docs = []
    search = "No"  # Default do not opt for web search to supplement retrieval
    for d in documents:
        score = chain.invoke({"question": question, "context": d.page_content})
        grade = score[0].binary_score
        if grade == "yes":
            print("---GRADE: DOCUMENT RELEVANT---")
            filtered_docs.append(d)
        else:
            print("---GRADE: DOCUMENT NOT RELEVANT---")
            search = "Yes"  # Perform web search
            continue

    return {
        "keys": {
            "documents": filtered_docs,
            "question": question,
            "run_web_search": search,
        }
    }


def transform_query(state):
    """
    Transform the query to produce a better question.

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Updates question key with a re-phrased question
    """

    print("---TRANSFORM QUERY---")
    state_dict = state["keys"]
    question = state_dict["question"]
    documents = state_dict["documents"]

    # Create a prompt template with format instructions and the query
    prompt = PromptTemplate(
        template="""You are generating questions that is well optimized for retrieval. \n
        Look at the input and try to reason about the underlying sematic intent / meaning. \n
        Here is the initial question:
        \n ------- \n
        {question}
        \n ------- \n
        Formulate an improved question: """,
        input_variables=["question"],
    )

    # Grader
    model = ChatOpenAI(temperature=0, model="gpt-4-0125-preview", streaming=True)

    # Prompt
    chain = prompt | model | StrOutputParser()
    better_question = chain.invoke({"question": question})

    return {"keys": {"documents": documents, "question": better_question}}


def web_search(state):
    """
    Web search based on the re-phrased question using Tavily API.

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Updates documents key with appended web results
    """

    print("---WEB SEARCH---")
    state_dict = state["keys"]
    question = state_dict["question"]
    documents = state_dict["documents"]

    tool = TavilySearchResults()
    docs = tool.invoke({"query": question})
    web_results = "\n".join([d["content"] for d in docs])
    web_results = Document(page_content=web_results)
    documents.append(web_results)

    return {"keys": {"documents": documents, "question": question}}


### Edges


def decide_to_generate(state):
    """
    Determines whether to generate an answer or re-generate a question for web search.

    Args:
        state (dict): The current state of the agent, including all keys.

    Returns:
        str: Next node to call
    """

    print("---DECIDE TO GENERATE---")
    state_dict = state["keys"]
    question = state_dict["question"]
    filtered_documents = state_dict["documents"]
    search = state_dict["run_web_search"]

    if search == "Yes":
        # All documents have been filtered check_relevance
        # We will re-generate a new query
        print("---DECISION: TRANSFORM QUERY and RUN WEB SEARCH---")
        return "transform_query"
    else:
        # We have relevant documents, so generate answer
        print("---DECISION: GENERATE---")
        return "generate"
    

#### Build the graph

In [16]:
import pprint

from langgraph.graph import END, StateGraph

workflow = StateGraph(GraphState)

# Define the nodes
workflow.add_node("retrieve", retrieve)  # retrieve
workflow.add_node("grade_documents", grade_documents)  # grade documents
workflow.add_node("generate", generate)  # generatae
workflow.add_node("transform_query", transform_query)  # transform_query
workflow.add_node("web_search", web_search)  # web search

# Build graph
workflow.set_entry_point("retrieve")
workflow.add_edge("retrieve", "grade_documents")
workflow.add_conditional_edges(
    "grade_documents",
    decide_to_generate,
    {
        "transform_query": "transform_query",
        "generate": "generate",
    },
)
workflow.add_edge("transform_query", "web_search")
workflow.add_edge("web_search", "generate")
workflow.add_edge("generate", END)

# Compile
app = workflow.compile()

#### Run the graph

In [17]:
# Run
question = "how can we understand the factors that influence people's decisions when they are using LLMs??"
inputs = {"keys": {"question": question}}
print(f"Question: {question}\n")
for output in app.stream(inputs):
    for key, value in output.items():
        # Print Node
        print()

# Final generation
answer = value['keys']['generation']
print(f"Answer: {answer}")

Question: how can we understand the factors that influence people's decisions when they are using LLMs??

---RETRIEVE---


  warn_deprecated(



---CHECK RELEVANCE---
---GRADE: DOCUMENT RELEVANT---
---GRADE: DOCUMENT RELEVANT---
---GRADE: DOCUMENT RELEVANT---
---GRADE: DOCUMENT RELEVANT---
---DECIDE TO GENERATE---
---DECISION: GENERATE---

---GENERATE---

Answer: Understanding the factors that influence people's decisions when using LLMs involves awareness of determinants and their interdependencies, empowering decision-makers to improve decision quality and mitigate risks like over-reliance on LLMs. Users can enhance decision quality by formulating precise queries to LLMs, critically assessing LLM-generated output, and leveraging advantages through knowledge of influencing factors. Incorporating this understanding into training initiatives for organizations can enhance personnel development strategies and enable designers to create more tailored interfaces for users.


# GRAPHS

In [18]:
#initialize model

from langchain_openai import ChatOpenAI
llm=ChatOpenAI(temperature=0.5, model_name="gpt-3.5-turbo")

In [19]:
from langchain_experimental.graph_transformers import LLMGraphTransformer
llm_transformer = LLMGraphTransformer(llm=llm)

In [20]:
graph_documents= llm_transformer.convert_to_graph_documents(docs)


In [21]:
print(f"Nodes:{graph_documents[0].nodes}")
print(f"Relationships:{graph_documents[0].relationships}")

Nodes:[Node(id='Eva Eigner', type='Person'), Node(id='Thorsten Händler', type='Person'), Node(id='Ferdinand Porsche Mobile University Of Applied Sciences (Fernfh)', type='Organization'), Node(id='Wiener Neustadt', type='Location')]
Relationships:[Relationship(source=Node(id='Eva Eigner', type='Person'), target=Node(id='Ferdinand Porsche Mobile University Of Applied Sciences (Fernfh)', type='Organization'), type='AFFILIATED_WITH'), Relationship(source=Node(id='Thorsten Händler', type='Person'), target=Node(id='Ferdinand Porsche Mobile University Of Applied Sciences (Fernfh)', type='Organization'), type='AFFILIATED_WITH'), Relationship(source=Node(id='Ferdinand Porsche Mobile University Of Applied Sciences (Fernfh)', type='Organization'), target=Node(id='Wiener Neustadt', type='Location'), type='LOCATED_IN')]


In [22]:
from langchain_community.graphs import Neo4jGraph
from langchain.vectorstores.neo4j_vector import Neo4jVector
from neo4j import GraphDatabase
# Next, we need to define Neo4j credentials

NEO4J_USER = os.getenv("NEO4J_USER")
NEO4J_PASSWORD = os.getenv("NEO4J_PASSWORD")
NEO4J_CONNECTION_URL = os.getenv("NEO4J_CONNECTION_URL")


In [23]:
# Instantiate Neo4j vector from documents
neo4j_vector = Neo4jVector.from_documents(
    docs,
    OpenAIEmbeddings(),
    url=os.environ["NEO4J_CONNECTION_URL"],
    username=os.environ["NEO4J_USER"],
    password=os.environ["NEO4J_PASSWORD"]
)



In [24]:
# After ingesting the documents in the vector index, 
# we perform vector similarity search for a sample user query and retrieve top2 most similar documents.

query = "Who's the Author of DETERMINANTS OF LLM-ASSISTED DECISION-MAKING??"
vector_results = neo4j_vector.similarity_search(query, k=2)
for i, res in enumerate(vector_results):
    print(res.page_content)
    if i != len(vector_results)-1:
        print()
vector_result = vector_results[0].page_content

DETERMINANTS OF LLM- ASSISTED DECISION -MAKING
Eva Eigner and
 Thorsten Händler
Ferdinand Porsche Mobile University of Applied Sciences (FERNFH)
Wiener Neustadt, Austria
eva.eigner@fernfh.ac.at; thorsten.haendler@fernfh.ac.at
ABSTRACT
Decision-making is a fundamental capability in everyday life. Large Language Models
(LLMs) provide multifaceted support in enhancing human decision-making processes.
However, understanding the influencing factors of LLM-assisted decision-making is crucial
for enabling individuals to utilize LLM-provided advantages and minimize associated risks
in order to make more informed and better decisions. This study presents the results of a
comprehensive literature analysis, providing a structural overview and detailed analysis of
determinants impacting decision-making with LLM support. In particular, we explore the
effects of technological aspects of LLMs, including transparency and prompt engineering,

decision-specific determinants of LLM-assisted decision-maki