In [None]:
import tiktoken

# For every model the token generated are little different. Please check gpt tokenizer web portal 
encoding = tiktoken.encoding_for_model("gpt-4.1-mini")
tokens = encoding.encode("Hi my name is Rohit and I like Saag aloo")

In [2]:
tokens

[12194, 922, 1308, 382, 65416, 278, 326, 357, 1299, 8455, 348, 434, 3782]

``` 
Notice Saag aloo got 4 different token ids

In [4]:
# let's decode this 
for tid in tokens:
    t_text=encoding.decode([tid])
    print(f"{tid} = {t_text}")

12194 = Hi
922 =  my
1308 =  name
382 =  is
65416 =  Roh
278 = it
326 =  and
357 =  I
1299 =  like
8455 =  Sa
348 = ag
434 =  al
3782 = oo


# Illusion of memory
Every API call to LLM is stateless i.e. it does not have previous data 

In [5]:
import os
from dotenv import load_dotenv

load_dotenv(override=True)
api_key=os.getenv("OPENAI_API_KEY")

if api_key and api_key.startswith("sk-proj"): 
    pass
else: 
    print("OpenAI key not found")

In [7]:
messages=[
    {"role":"system", "content": "You are helpful assistant"}, 
    {"role": "user", "content": "Hi! I am Rohit Abhishek"}
]

In [9]:
from openai import OpenAI 
openai=OpenAI() 
response=openai.chat.completions.create(model="gpt-4.1-mini", messages=messages)

response.choices[0].message.content

'Hello Rohit Abhishek! How can I assist you today?'

In [10]:
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "What's my name?"}
]

response=openai.chat.completions.create(model="gpt-4.1-mini", messages=messages)
response.choices[0].message.content

'I’m sorry, but I don’t know your name. Could you please tell me?'

``` 
As LLMs are unable to maintain state - the LLM doesnt know my name now. It is a fresh run.. But if we had provided the earlier response along then we had

In [11]:
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Hi! I'm Rohit Abhishek!"},
    {"role": "assistant", "content": "Hi Rohit Abhishek! How can I assist you today?"},
    {"role": "user", "content": "What's my name?"}
]

In [12]:
response=openai.chat.completions.create(model="gpt-4.1-mini", messages=messages)
response.choices[0].message.content

'Your name is Rohit Abhishek. How can I help you further?'

## To recap

With apologies if this is obvious to you - but it's still good to reinforce:

1. Every call to an LLM is stateless
2. We pass in the entire conversation so far in the input prompt, every time
3. This gives the illusion that the LLM has memory - it apparently keeps the context of the conversation
4. But this is a trick; it's a by-product of providing the entire conversation, every time
5. An LLM just predicts the most likely next tokens in the sequence; if that sequence contains "My name is Ed" and later "What's my name?" then it will predict.. Ed!

The ChatGPT product uses exactly this trick - every time you send a message, it's the entire conversation that gets passed in.

"Does that mean we have to pay extra each time for all the conversation so far"

For sure it does. And that's what we WANT. We want the LLM to predict the next tokens in the sequence, looking back on the entire conversation. We want that compute to happen, so we need to pay the electricity bill for it!

