# Conversation Memory for LLMs
First we need to understand Training an LLM VS Chatting with an LLM.
 - **Training an LLM**: This is like teaching the LLM. It learns from big sets of information, kind of like reading many books. It becomes smarter over time as it learns more. This is what we call explicit memory (pre-trained data). 
 - **Chatting with an LLM**: This is when we talk with the LLM. It uses what it learned during training to answer our questions or talk with us. But it doesn't remember our past chats unless we send the full chat history. This is what we call implicit memory. By default, LLM models are stateless, each time we interact with it, we have to provide all the necessary information again. Currently, all LLM's work this way—they need the full set of instructions every time.
 - **Remember**: LLMs don't remember our past chats automatically. We have to give the full instructions/context if we want them to remember specific things.
 - Watch this video: https://youtu.be/MmSMAYooRas

Read more:
 - https://simonwillison.net/2024/May/29/training-not-chatting/
 - https://community.openai.com/t/does-the-open-ai-engine-with-gpt-4-model-remember-the-previous-prompt-tokens-and-respond-using-them-again-in-subsequent-requests/578148/7
 - https://community.openai.com/t/retain-past-responses-in-memory-without-sending-them-again-at-every-api-request/199647/12
 - https://community.openai.com/t/is-it-possible-to-reuse-previous-chat-history-on-the-openai-side-to-avoid-sending-repetitive-tokens/206137/6
 

First, initialize the connection to OpenAI:

In [3]:
import os
from openai import OpenAI
from dotenv import load_dotenv, find_dotenv

# Load the .env
load_dotenv(find_dotenv())
client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

An example Chat Completions API call in python looks like the following, let's ask a question:

In [16]:
response = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "system", "content": "You are a helpful assistant. Your response is always in once sentence and concise."},
    {"role": "user", "content": "What is the most exciting recent development in artificial intelligence in one sentence?"},
  ]
)

print("Response:", response.choices[0].message.content)

Response: The recent development of transformer models such as GPT-3 has pushed the boundaries of natural language understanding and generation in AI.


Let's manually add the response to the array of message objects and add a few more questions and answers. Lastly, let's ask it what the first question we asked was:

In [30]:
response = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "system", "content": "You are a helpful assistant. Your response is always in once sentence and concise."},
    {"role": "user", "content": "What is the most exciting recent development in artificial intelligence in one sentence?"},
    {"role": "assistant", "content": """The recent development of transformer models such as GPT-3 has pushed the boundaries 
     of natural language understanding and generation in AI."""},
    {"role": "user", "content": "How do large language models like GPT-4 generate human-like text?"},
    {"role": "assistant", "content": """Large language models like GPT-4 generate human-like text through unsupervised 
	learning where they predict the next word in a sequence based on the context provided by the input text."""},
    {"role": "user", "content": "What is my first question?"}, # ask what the first question was

  ]
)

print("Response:", response.choices[0].message.content)

Response: Your first question was about the most exciting recent development in artificial intelligence.


It understands this because whenever we send a request to the Chat Completion endpoint, we include the messages array containing all our past conversations with the LLM.

> Including conversation history is important when user instructions refer to prior messages. Because the models have no memory of past requests, all relevant information must be supplied as part of the conversation history in each request. 

Read more: https://platform.openai.com/docs/guides/text-generation/chat-completions-api


Let's create a helper that automatically adds our input and the response of the LLM:

In [50]:
import json  # This will be used to format the response into JSON format for better readability.

def get_completion(messages, prompt, model="gpt-3.5-turbo"):
    # Add the user prompt to the messages array
    messages.append({"role": "user", "content": prompt})
    
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0
    )
    # When OpenAI replies, we add that to messages array.
    messages.append({
        "role": "assistant",
        "content": response.choices[0].message.content
    })
    return response.choices[0].message.content

messages = [
    {
        "role": "system",
        "content": "You are a helpful assistant."
    }
]

In [51]:
print("Response:", get_completion(messages, "My name is Aaron"))

Response: Nice to meet you, Aaron! How can I assist you today?


In [52]:
# See the array of messages objects in better format.
print(json.dumps(messages, indent=4))

[
    {
        "role": "system",
        "content": "You are a helpful assistant."
    },
    {
        "role": "user",
        "content": "My name is Aaron"
    },
    {
        "role": "assistant",
        "content": "Nice to meet you, Aaron! How can I assist you today?"
    }
]


When making another request, we send the whole array back:

In [59]:
print("Response:", get_completion(messages, "What is my name?"))

Response: Your name is Aaron.


In [54]:
print(json.dumps(messages, indent=4))

[
    {
        "role": "system",
        "content": "You are a helpful assistant."
    },
    {
        "role": "user",
        "content": "My name is Aaron"
    },
    {
        "role": "assistant",
        "content": "Nice to meet you, Aaron! How can I assist you today?"
    },
    {
        "role": "user",
        "content": "What is my name?"
    },
    {
        "role": "assistant",
        "content": "Your name is Aaron."
    }
]


If we want to keep talking to the LLM, we have to save the list of message records somewhere, preferably in a database. When we start a new chat with the LLM, it won't remember anything unless we give it the conversation history again. Plus, if the conversation history is too big for the model to handle, we'll have to trim it down somehow. These are both things we need to consider.