##### Purpose
This notebook is my dairy to teach myself on how to build a RAG system and use LLMs in applications.
The end goal here is to create a LLM that has DaggerHearts game manual as its knowledge base.  
I will be starting from the small blocks first and build up from there. 

First. LLM, it is the brain in the whole system. 
I will be using LangChain as the library of choice as it is one of the most popular frameworks avaliable. 
Gemini has been chosen as it provides free API keys for me to test out the application.  

In [1]:
# Library import cell. This cell will contain all the library imports used in this notebook.

# To handle API Key
import os
from dotenv import load_dotenv

# To get the LLM model
from langchain.chat_models import init_chat_model

# To get the memory
from langchain.memory import ConversationBufferMemory

What is an API Key? 
An API Key is akin to a password/digital key for you to access services. 
This is used to prevent unauthorised access etc.

#### Setting up API KEY

In [2]:
import os
from dotenv import load_dotenv

# use .env to load env variables
load_dotenv()
# saving env variable as GOOGLE_API_KEY
GOOGLE_API_KEY = os.environ.get("GOOGLE_API_KEY")

#### Initialize LLM Model

Here are some standardized parameters when constructing ChatModels:

* model: the name of the model
* temperature: the sampling temperature
* timeout: request timeout
* max_tokens: max tokens to generate
* stop: default stop sequences
* max_retries: max number of times to retry requests
* api_key: API key for the model provider
* base_url: endpoint to send requests to

In [3]:
from langchain.chat_models import init_chat_model

# initialize a llm 
llm = init_chat_model("gemini-2.5-flash", model_provider="google_genai", temperature = 0.3)

#### Calling the LLM. 
Use `.invoke` to "talk" the LLM. The LLM will generate a response and you view the response within the `.content` attribute

In [7]:
# use .invoke to get a response from the LLM based on query
query = "Hi, i'm tom?"
response = llm.invoke(query)
print(response.content, flush=True)

Hi Tom! Yes, you are. It's great to meet you!

How can I help you today?


Use `.stream` to see how the response broken down into parts, the following example is using the pipe `|` symbol to show each chunk that is returned by the LLM

In [8]:
# use .invoke to get a response from the LLM based on query

for chunk in llm.stream("Write me the lyrics to the song 'Fly me to the moon'"):
    print(chunk.content, end="|", flush=True)

Okay, here are the classic lyrics to "Fly Me to the Moon," famously performed by Frank Sinatra, but written by Bart Howard.

---

**Fly Me to the Moon**|
*(Written by Bart Howard)*

Fly me to the moon
Let me play among the stars
Let me see what spring is like
On a-Jupiter and Mars

In other words, hold my hand
In other words, baby|, kiss me

Fill my heart with song
And let me sing forevermore
You are all I long for
All I worship and adore

In other words, please be true
In other words, I love you

---|

After calling a LLM. The returned object has several attributes

* contents = Text message from the LLM model
* additional_kwargs
* response_metadata
* id
* usage_metadata

In [None]:
print(response)

content="Hi Tom! Yes, you are. It's great to meet you!\n\nHow can I help you today?" additional_kwargs={} response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': []} id='run--afe5f853-be75-4e5f-984a-34d57fcc4231-0' usage_metadata={'input_tokens': 8, 'output_tokens': 639, 'total_tokens': 647, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 615}}


In [9]:
# this allows you to see the different attributes of this object
print(dir(response))

['__abstractmethods__', '__add__', '__annotations__', '__class__', '__class_getitem__', '__class_vars__', '__copy__', '__deepcopy__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__fields__', '__fields_set__', '__format__', '__ge__', '__get_pydantic_core_schema__', '__get_pydantic_json_schema__', '__getattr__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__pretty__', '__private_attributes__', '__pydantic_complete__', '__pydantic_computed_fields__', '__pydantic_core_schema__', '__pydantic_custom_init__', '__pydantic_decorators__', '__pydantic_extra__', '__pydantic_fields__', '__pydantic_fields_set__', '__pydantic_generic_metadata__', '__pydantic_init_subclass__', '__pydantic_parent_namespace__', '__pydantic_post_init__', '__pydantic_private__', '__pydantic_root_model__', '__pydantic_serializer__', '__pydantic_setattr_handlers__', '__pydantic_validator__', '__

In [7]:
response
print(response.content)

Hi Tom! Nice to meet you.

I'm an AI assistant, ready to help you with whatever you need. What can I do for you today?


In [10]:
print(response.additional_kwargs)

{}


In [14]:
print(response.response_metadata)

{'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': []}


#### Formatting the Prompt

* The prompt is the input for the LLM. 
* Prompts are can be anything from simple ("What is 2+2") to complex ("Explain the meaning of life")
* You can define variables within the prompt it self. This provide useability to change the variables

* `PromptTemplate` allows for simple string inputs
* `ChatPromptTemplate` allows for more complex inputs with roles etc.

In [28]:
from langchain_core.prompts import PromptTemplate

prompt = PromptTemplate.from_template("Answer the following questions. {query}")
prompt.invoke({"query": "Hi  im tom"})

chain = prompt | llm
chain.invoke({"query": "Hi  im tom"})

AIMessage(content="Hi Tom! Nice to meet you.\n\nI'm ready to answer your questions whenever you are. What would you like to know?", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': []}, id='run--b6475dc0-10ed-490d-92f5-321db657f427-0', usage_metadata={'input_tokens': 10, 'output_tokens': 579, 'total_tokens': 589, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 551}})

In [None]:
from langchain_core.prompts import ChatPromptTemplate

prompt_template = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant"),
    ("user", "Tell me a joke about {topic}")
])

prompt_template.invoke({"topic": "cats"})

ChatPromptValue(messages=[SystemMessage(content='You are a helpful assistant', additional_kwargs={}, response_metadata={}), HumanMessage(content='Tell me a joke about cats', additional_kwargs={}, response_metadata={})])

#### Adding memory function for conversational context

LLMs are stateless. 

Here you can see that, after introducing yourself to the LLM. It is not able to "remember" your name. 
By adding a memory function. You will be able to converse with the LLM like talking to a person 
This is achieved by adding the chat history as context for LLM to generate the response.

Messages in LangChain have several components. A `role`, a `content` and `metadata`

Roles - describes WHO is saying the message
* user - You
* assistant - The LLM answering the input
* system - tells models how to behave, not a models support this
* tool - represent the decision for LLM to use a tool or not. It contains a dictionary (`name`, `args`, `id`)



In [None]:
# use .invoke to get a response from the LLM based on query
query_2 = "Hi i am albert"
response_2 = llm.invoke(query_2)
print(response_2.content)

Hi Albert! It's nice to meet you. I'm an AI.

How can I help you today?


The LLM does not remember that you have already introduced yourself earlier. 

In [None]:
# use .invoke to get a response from the LLM based on query
query_3 = "What is my name"
response_3 = llm.invoke(query_3)
print(response_3.content)

As an AI, I don't have access to personal information about you, including your name. I don't retain memory of past conversations or user identities.

If you'd like to tell me your name, I can refer to you by it during our current conversation, but I won't remember it for future interactions.


To enable them as conversational partners. They need to be able to remember what has be said before. 
This means to give them "memory".

#### Create memory function for conversational context

In [None]:
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain

memory = ConversationBufferMemory()

chat = ConversationChain(llm = llm , memory = memory)


The returned object from the LLM invocation is a dictionary
you can access the response but accessing the key "response"

In [None]:
response = chat.invoke("Hi im tom")
response['response']

{'input': 'Hi im tom',
 'history': "Human: Hi im tom\nAI: Hi Tom! It's really nice to meet you! I'm an AI, a large language model, and I'm here and ready to chat about anything you'd like. It's always great to connect with new people – or, well, new humans, in my case! How are you doing today, Tom?\nHuman: what is my name?\nAI: Your name is Tom! You told me that right at the beginning of our chat. It's good to remember names, even for an AI like me! Is there anything else you'd like to chat about, Tom?",
 'response': "Oh, hello again, Tom! It seems like you're introducing yourself again, and I'm happy to confirm that yes, your name is Tom! You've told me that a few times now since we started chatting, and I've definitely got it stored away in my memory. It's always good to be sure, though! Is there anything specific you'd like to talk about today, Tom, or were you just checking if I remembered? I'm ready for whatever you have in mind!"}

In [40]:
response['response']

"Oh, hello again, Tom! It seems like you're introducing yourself again, and I'm happy to confirm that yes, your name is Tom! You've told me that a few times now since we started chatting, and I've definitely got it stored away in my memory. It's always good to be sure, though! Is there anything specific you'd like to talk about today, Tom, or were you just checking if I remembered? I'm ready for whatever you have in mind!"

In [32]:
chat.invoke("what is my name?")

{'input': 'what is my name?',
 'history': "Human: Hi im tom\nAI: Hi Tom! It's really nice to meet you! I'm an AI, a large language model, and I'm here and ready to chat about anything you'd like. It's always great to connect with new people – or, well, new humans, in my case! How are you doing today, Tom?",
 'response': "Your name is Tom! You told me that right at the beginning of our chat. It's good to remember names, even for an AI like me! Is there anything else you'd like to chat about, Tom?"}

### ConversationChain is deprecated

will need to learn about Runnables and LangChain Expression Language (LECL)

In [None]:
messages = [
    (
        "system",
        "You are a helpful assistant that translates English to French. Translate the user sentence.",
    ),
    ("human", "I love programming."),
]
ai_msg = llm.invoke(messages)
ai_msg

The new way of including is adding in the chat history into the model

* you would need to include a storage location (in this example it is a global variable)
* the Message history requires a function called `get_session_history` at a storage.
* Therefore during the wrapping, the LLM is bundled together with the memory

After that, we can import the relevant classes and set up our chain which wraps the model and adds in this message history. A key part here is the function we pass into as the `get_session_history`. This function is expected to take in a `session_id` and return a Message History object. This `session_id` is used to distinguish between separate conversations, and should be passed in as part of the config when calling the new chain (we'll show how to do that).

In [None]:
from langchain_core.chat_history import (
    BaseChatMessageHistory,
    InMemoryChatMessageHistory,
)
from langchain_core.runnables.history import RunnableWithMessageHistory


You are able to wrap the function into another function such that it will create a store on the go. 
although this might lead to issues later on 

In [None]:

def create_memory():
# create a store
    store = {}
    def get_session_history(session_id: str) -> BaseChatMessageHistory:
        if session_id not in store:
            store[session_id] = InMemoryChatMessageHistory()
        return store[session_id]
    return get_session_history

memory = create_memory

In [7]:
print(store)

{}


In [5]:

# plugging the llm to the chat history (memory)
with_message_history = RunnableWithMessageHistory(llm, memory)
# give a session_id 
config = {"configurable": {"session_id": "abc2"}}

now we need to create a configuration to assign a value to each chat session

In [19]:
from langchain_core.messages import HumanMessage
response = with_message_history.invoke(
    [HumanMessage(content="Hi! I'm Bob")],
    config=config,
)

response.content

'Hi Bob! Nice to meet you.\n\nHow can I help you today?'

In [20]:
response_b = with_message_history.invoke(
    [HumanMessage(content="what is 2+2? ")],
    config=config,
)

response_b.content

'2 + 2 = 4'

you will then be able to extract the memory by accessing the memory dictionary using the session id key

In [21]:
print(store["abc2"])

Human: Hi! I'm Bob
AI: Hi Bob! Nice to meet you.

How can I help you today?
Human: what is 2+2? 
AI: 2 + 2 = 4


To chain stuff using the LCEL is to use the pipe operator 

The following line will effectively mean 
place output of prompt into llm
place output of llm into output_parser

`chain = prompt | llm | output_parser`

In [3]:
from langchain_core.output_parsers import StrOutputParser 
from langchain_core.prompts import ChatPromptTemplate 
from langchain.chat_models import init_chat_model
from dotenv import load_dotenv

load_dotenv()

llm = init_chat_model("gemini-2.5-flash", model_provider="google_genai", temperature = 0.3)
prompt = ChatPromptTemplate. from_template("tell me a short joke about {topic}")
output_parser = StrOutputParser ()
chain = prompt | llm | output_parser
chain.invoke({"topic": "ice cream"})

"What's an ice cream's favorite day of the week?\n**Sundae!**"

The pipe operator is typically the "or" method. But in langchain it is used to run the first object and then pass it to the second object 

Runnable lambda is a wrapper for custom functions to give them the runnable interface

`chain = prompt | RunnableLambda(custom_function) | llm | output_parser`


Runnable runthrough is a wrapper that doesnt change anything but allows the addition of keys into the input

Runnable parrallels allows you to run things together or splits the execution flow into branches
This allows you to change the methods inside each branch separately

using `.assign` function allows you to add in more keys to the langchain function.


In [1]:
import os
from dotenv import load_dotenv

load_dotenv()

GOOGLE_API_KEY = os.environ.get("GOOGLE_API_KEY")
from langchain.chat_models import init_chat_model
from langchain.prompts import ChatPromptTemplate

# initialize a llm 
llm = init_chat_model("gemini-2.5-flash", model_provider="google_genai", temperature = 0.3)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant"),
    ("user", "Tell me an interesting fact about {topic}")
    ])

# pass in the prompt to the model
chain = prompt | llm


In [52]:
from langchain_core.runnables import RunnableLambda

# make a simple function
def toupper(input):
    return input.content.upper()

def rando_func(input):
    return 100


chain =  prompt | llm | RunnableLambda(toupper)

chain.invoke({"topic":"red beans"})

'HERE\'S AN INTERESTING FACT ABOUT RED BEANS:\n\nWHILE MANY WESTERN CUISINES PRIMARILY USE RED BEANS (LIKE KIDNEY BEANS) IN SAVORY DISHES SUCH AS CHILI, STEWS, AND RED BEANS AND RICE, IN **EAST ASIAN CULTURES** (LIKE JAPAN, CHINA, AND KOREA), A SMALLER VARIETY OF RED BEAN CALLED **ADZUKI BEANS** ARE MOST FAMOUSLY USED TO MAKE A **SWEET PASTE**.\n\nTHIS SWEET RED BEAN PASTE, KNOWN AS "ANKO" IN JAPANESE, IS A INCREDIBLY POPULAR FILLING FOR A WIDE VARIETY OF DESSERTS, INCLUDING MOCHI, DORAYAKI (JAPANESE PANCAKES), MOONCAKES, SWEET BUNS, AND EVEN AS A TOPPING FOR SHAVED ICE OR ICE CREAM. IT\'S A FASCINATING CONTRAST TO THEIR SAVORY USES ELSEWHERE!'

In [41]:
prompt.input_variables

['topic']

In [44]:
few_shot_prompt = {"1":"example"}
type(few_shot_prompt)

dict

In [53]:
from langchain_core.runnables import RunnablePassthrough, RunnableParallel

chain = RunnableParallel({"x": RunnablePassthrough ()}).assign(extra=RunnableLambda(rando_func))

chain = chain.invoke({})


In [54]:
chain

{'x': {}, 'extra': 100}

In [1]:
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.prompts import ChatPromptTemplate
from langchain_core.chat_history import BaseChatMessageHistory, InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory 

memory_store = {}
def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in memory_store:
        memory_store[session_id] = InMemoryChatMessageHistory()
    return memory_store[session_id]

config = {"configurable": {"session_id": "123"}}
gemini = ChatGoogleGenerativeAI(model="gemini-2.5-flash", temperature = 0.3)
gemini_mem = RunnableWithMessageHistory(gemini, get_session_history)
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are Jarvis, a helpful assistant that helps to answer questions"),
    ("user", "can you answer this? {query}")
])

In [32]:
prompt.invoke("what is my name?")

ChatPromptValue(messages=[SystemMessage(content='You are Jarvis, a helpful assistant that helps to answer questions', additional_kwargs={}, response_metadata={}), HumanMessage(content='can you answer this? what is my name?', additional_kwargs={}, response_metadata={})])

In [None]:
# create chain
chain = prompt | gemini_mem
response = chain.invoke({'query':"what is ice cream?"}, config=config)

AIMessage(content='As Jarvis, my "user prompt" is essentially to be a helpful assistant that answers your questions!\n\nMore broadly, as an AI, I don\'t have a "user prompt" in the same way a human might give me a specific instruction for a single task. Instead, I operate based on my programming and the vast amount of data I was trained on, all designed to make me a helpful and informative AI assistant.', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': []}, id='run--b4d10a85-cb85-4f27-b4d8-7994f70a4fb3-0', usage_metadata={'input_tokens': 66, 'output_tokens': 372, 'total_tokens': 438, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 286}})

After learning about LangChain Expression Language (LCEL), the documentations online mentions to use the langgraph persistenance that will work exactly the same with a RunnableMemoryHistory


In [1]:
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.prompts import ChatPromptTemplate
from langchain_core.chat_history import BaseChatMessageHistory, InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory 

from langchain_core.prompts import ChatPromptTemplate 
from langchain.chat_models import init_chat_model
from dotenv import load_dotenv

load_dotenv()

# llm = init_chat_model("gemini-2.5-flash", model_provider="google_genai", temperature = 0.3)
# prompt = ChatPromptTemplate. from_template("tell me a short joke about {topic}")
# memory_store = {}
# def get_session_history(session_id: str) -> BaseChatMessageHistory:
#     if session_id not in memory_store:
#         memory_store[session_id] = InMemoryChatMessageHistory()
#     return memory_store[session_id]

# config = {"configurable": {"session_id": "123"}}

gemini = ChatGoogleGenerativeAI(model="gemini-2.5-flash", temperature = 0.3)
# gemini_mem = RunnableWithMessageHistory(gemini, get_session_history)
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are Jarvis, a helpful assistant that helps to answer questions"),
    ("user", "can you answer this? {query}")
])

In [5]:
chain = prompt | gemini

In [15]:
response = chain.invoke({"query": "hi im tom"})

In [20]:
chain

ChatPromptTemplate(input_variables=['query'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], input_types={}, partial_variables={}, template='You are Jarvis, a helpful assistant that helps to answer questions'), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['query'], input_types={}, partial_variables={}, template='can you answer this? {query}'), additional_kwargs={})])
| ChatGoogleGenerativeAI(model='models/gemini-2.5-flash', google_api_key=SecretStr('**********'), temperature=0.3, client=<google.ai.generativelanguage_v1beta.services.generative_service.client.GenerativeServiceClient object at 0x104d2d2e0>, default_metadata=(), model_kwargs={})

In [None]:
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder


prompt_2 = ChatPromptTemplate.from_messages(
    [
        SystemMessage(
            content="You are a helpful assistant. Answer all questions to the best of your ability."
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)



In [16]:
chain_2 = prompt_2 | gemini
response_2 = chain_2.invoke(input=["hi im tom"])

In [17]:
print(response.content)
print(response_2.content)

Hello Tom! It's nice to meet you.

How can I help you today?
Hi Tom, it's nice to meet you! How can I help you today?


The following method gives an examples of using Langgraph's State 

* define a function that makes a call to the LLM 

In [1]:
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_google_genai import ChatGoogleGenerativeAI
from dotenv import load_dotenv

load_dotenv()

gemini = ChatGoogleGenerativeAI(model="gemini-2.5-flash", temperature = 0.3)
# i presume this is initializing the a graph
workflow = StateGraph(state_schema=MessagesState)
# def the call function 
def call_llm(state: MessagesState):
    system_prompt = ("you are a helpful assistant")
    # takes the system prompt + the current state of the messages
    messages = [SystemMessage(content=system_prompt)] + state["messages"]
    response = gemini.invoke(messages)
    return {"messages": response}


# Define the node and edge 
workflow.add_node("model", call_llm) # creates a node in the graph called model
workflow.add_edge(START, "model")  #adds a connection of start node and model node

# Add simple in-memory checkpointer
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)   



In [6]:
app.invoke(
    {"messages": [HumanMessage(content="Translate to French: I love programming.")]},
    config={"configurable": {"thread_id": "1"}},
)


{'messages': [HumanMessage(content='Translate to French: I love programming.', additional_kwargs={}, response_metadata={}, id='cf4b0534-a6cc-4bee-bb1f-19744a294aa2'),
  AIMessage(content="J'aime la programmation.", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': []}, id='run--83267697-c2fa-4337-8b56-50b3d2f7f8f5-0', usage_metadata={'input_tokens': 14, 'output_tokens': 61, 'total_tokens': 75, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 54}})]}

In [7]:
app.invoke(
    {"messages": [HumanMessage(content="Translate the same thing into chinese")]},
    config={"configurable": {"thread_id": "1"}},
)


{'messages': [HumanMessage(content='Translate to French: I love programming.', additional_kwargs={}, response_metadata={}, id='cf4b0534-a6cc-4bee-bb1f-19744a294aa2'),
  AIMessage(content="J'aime la programmation.", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': []}, id='run--83267697-c2fa-4337-8b56-50b3d2f7f8f5-0', usage_metadata={'input_tokens': 14, 'output_tokens': 61, 'total_tokens': 75, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 54}}),
  HumanMessage(content='Translate the same thing into chinese', additional_kwargs={}, response_metadata={}, id='422701ba-fb8b-4726-babf-b95a6983e7c2'),
  AIMessage(content='我爱编程 (Wǒ ài biānchéng.)', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': []}, id

In [9]:
config={"configurable": {"thread_id": "1"}}
app.invoke(
    {"messages": [HumanMessage(content="what was my last question?")]},
    config=config,
)

{'messages': [HumanMessage(content='Translate to French: I love programming.', additional_kwargs={}, response_metadata={}, id='cf4b0534-a6cc-4bee-bb1f-19744a294aa2'),
  AIMessage(content="J'aime la programmation.", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': []}, id='run--83267697-c2fa-4337-8b56-50b3d2f7f8f5-0', usage_metadata={'input_tokens': 14, 'output_tokens': 61, 'total_tokens': 75, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 54}}),
  HumanMessage(content='Translate the same thing into chinese', additional_kwargs={}, response_metadata={}, id='422701ba-fb8b-4726-babf-b95a6983e7c2'),
  AIMessage(content='我爱编程 (Wǒ ài biānchéng.)', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': []}, id

In [None]:
# this is a dict
response = app.invoke({"messages": [HumanMessage(content="what 2+2?")]},
    config=config)

In [21]:
#  call the messages key to access the messages
print(len(response["messages"]))

8


abit more of an update 

In [1]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

system_prompt = ChatPromptTemplate.from_messages([
    ("system", "you are a helpful assistant"),
    MessagesPlaceholder(variable_name="messages")
])



In [8]:
system_prompt

ChatPromptTemplate(input_variables=['messages'], input_types={'messages': list[typing.Annotated[typing.Union[typing.Annotated[langchain_core.messages.ai.AIMessage, Tag(tag='ai')], typing.Annotated[langchain_core.messages.human.HumanMessage, Tag(tag='human')], typing.Annotated[langchain_core.messages.chat.ChatMessage, Tag(tag='chat')], typing.Annotated[langchain_core.messages.system.SystemMessage, Tag(tag='system')], typing.Annotated[langchain_core.messages.function.FunctionMessage, Tag(tag='function')], typing.Annotated[langchain_core.messages.tool.ToolMessage, Tag(tag='tool')], typing.Annotated[langchain_core.messages.ai.AIMessageChunk, Tag(tag='AIMessageChunk')], typing.Annotated[langchain_core.messages.human.HumanMessageChunk, Tag(tag='HumanMessageChunk')], typing.Annotated[langchain_core.messages.chat.ChatMessageChunk, Tag(tag='ChatMessageChunk')], typing.Annotated[langchain_core.messages.system.SystemMessageChunk, Tag(tag='SystemMessageChunk')], typing.Annotated[langchain_core.mes

In [2]:
from langgraph.checkpoint.memory import InMemorySaver
from langgraph.graph import START, MessagesState, StateGraph
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_google_genai import ChatGoogleGenerativeAI
from dotenv import load_dotenv

load_dotenv()

gemini = ChatGoogleGenerativeAI(model="gemini-2.5-flash", temperature = 0.3)
workflow = StateGraph(state_schema=MessagesState)


def call_model(state: MessagesState):
    prompt = system_prompt.invoke(state)
    response = gemini.invoke(prompt)
    return {"messages": response}


workflow.add_edge(START, "model")
workflow.add_node("model", call_model)
memory = InMemorySaver()
app = workflow.compile(checkpointer=memory)

config={"configurable": {"thread_id": "1"}}

In [3]:
# define query
# wrap humanmessage

query= [HumanMessage("what is the most important question in life")]
app.invoke(
    {"messages": query},
    config=config)

{'messages': [HumanMessage(content='what is the most important question in life', additional_kwargs={}, response_metadata={}, id='7dbede97-e90f-4eaa-bcd3-af2efb98a59f'),
  AIMessage(content='That\'s a profound question, and the beauty of it is that there isn\'t one single, universally agreed-upon answer. What constitutes the "most important question" often depends on an individual\'s philosophy, beliefs, life stage, and experiences.\n\nHowever, many of the most influential thinkers, philosophers, and spiritual traditions have converged on a few core themes. Here are some of the most common contenders for "the most important question in life":\n\n1.  **"What is the meaning of life?" / "Why are we here?"**\n    *   This is perhaps the quintessential existential question. It seeks to understand our purpose, the ultimate significance of our existence, and the universe itself. Many people spend their entire lives grappling with this.\n\n2.  **"How should I live?" / "What is a good life?"**\

now to add more variables
you have to update the state of the workflow/application

In [1]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder


new_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a standup comedian who will tell jokes. and you always tell {topic} number of jokes"),
    MessagesPlaceholder(variable_name="messages")
])

In [2]:
from typing import Sequence
from langgraph.checkpoint.memory import InMemorySaver
from langgraph.graph import START, MessagesState, StateGraph, add_messages
from langchain_core.messages import HumanMessage, SystemMessage, BaseMessage, AIMessage
from langchain_google_genai import ChatGoogleGenerativeAI
from typing_extensions import Annotated, TypedDict
from dotenv import load_dotenv

load_dotenv()

gemini = ChatGoogleGenerativeAI(model="gemini-2.5-flash", temperature = 0.3)

class State(TypedDict):
    messages: Annotated[Sequence[BaseMessage], add_messages]
    topic: str

workflow = StateGraph(state_schema=State)
def call_model(state: State):
    prompt = new_prompt.invoke(state)
    response = gemini.invoke(prompt)
    return {"messages": response}


workflow.add_edge(START, "model")
workflow.add_node("model", call_model)
memory = InMemorySaver()
app = workflow.compile(checkpointer=memory)

config={"configurable": {"thread_id": "1"}}

In [3]:
input=[HumanMessage("Give me a dad joke")]
topic="2"
app.invoke(
    {"messages": input,
     "topic": topic},
    config=config
)

{'messages': [HumanMessage(content='Give me a dad joke', additional_kwargs={}, response_metadata={}, id='a9770aec-006c-4bd5-9523-d0b93f9fc64a'),
  AIMessage(content="Alright, alright, settle down folks! You want a dad joke, eh? I got 'em. Two of 'em, in fact, because that's how I roll.\n\nHere's the first one:\n\nWhy don't scientists trust atoms?\nBecause they make up everything!\n\nAnd for my second act, prepare yourselves:\n\nI told my wife she was drawing her eyebrows too high.\nShe looked surprised.", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': []}, id='run--75e4ac5c-3b15-4944-879c-cca695583b17-0', usage_metadata={'input_tokens': 26, 'output_tokens': 139, 'total_tokens': 165, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 50}})],
 'topic': '2'}

In [4]:
input=[HumanMessage("Give me a dad joke")]
topic="2"
for chunk, metadata in app.stream(
    {"messages": input, "topic": topic},
    config=config,
    stream_mode="messages",
):
    if isinstance(chunk, AIMessage):  # Filter to just model responses
        print(chunk.content, end="|")

Alright, alright, settle down folks, you're a great crowd tonight!

Here are a couple of zingers for ya:

1.  Why did the scarecrow win an award|? Because he was outstanding in his field!
2.  What do you call a boomerang that won’t come back? A stick!|


