# **Sliding Window**

## **Import necessary libraries**

In [1]:
import os
import sys
from pathlib import Path
cwd = os.getcwd()
new_cwd = '/'.join(cwd.split('\\')[:-2])
sys.path.append(new_cwd)
import base64
from langchain.messages import HumanMessage, SystemMessage
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.callbacks import BaseCallbackHandler
from contextlib import asynccontextmanager, contextmanager
from contextvars import ContextVar
from pydantic import BaseModel
from typing import Literal
from typing import (
    TYPE_CHECKING,
    Any,
    AsyncGenerator,
    Dict,
    Generator,
    List,
    Optional,
    Sequence,
    Type,
    TypeVar,
    Union,
    cast,
)
from langchain_core.prompts import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
    MessagesPlaceholder
)
from langchain_core.output_parsers.string import StrOutputParser
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
from langchain_core.runnables import ConfigurableFieldSpec
from dotenv import load_dotenv
load_dotenv()
GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")
from callbacks.manager import get_gemini_callback
from utils.helper import count_tokens
from message_history import BufferWindowMessageHistory

## **Initialize LLM model**

In [2]:
llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",    # or gemini-2.0-pro, gemini-2.0-flash-lite, etc.
    api_key=GEMINI_API_KEY,
    temperature=0,
    include_thoughts=True,
)

In [3]:
with open(f"{new_cwd}/techniques/system_prompt.txt", "r", encoding="utf-8") as f:
    SYSTEM_PROMPT = f.read()

In [4]:
prompt_template = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(SYSTEM_PROMPT),
    MessagesPlaceholder(variable_name="history"),
    HumanMessagePromptTemplate.from_template("{query}"),
])

# Create the LCEL pipeline
output_parser = StrOutputParser()
pipeline = prompt_template | llm | output_parser

# Let's examine the prompt template
print(prompt_template.messages[0].prompt.template)

The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. 
If the AI does not know the answer to a question, it truthfully says it does not know.


## **Create conversation**

In [5]:
# Create get_chat_history function for window memory
window_chat_map = {}

def get_window_chat_history(session_id: str, k: int = 4) -> BufferWindowMessageHistory:
    print(f"get_window_chat_history called with session_id={session_id} and k={k}")
    if session_id not in window_chat_map:
        window_chat_map[session_id] = BufferWindowMessageHistory(k=k)
    return window_chat_map[session_id]

# Create conversation chain with window memory
conversation_bufw = RunnableWithMessageHistory(
    pipeline,
    get_session_history=get_window_chat_history,
    input_messages_key="query",
    history_messages_key="history",
    history_factory_config=[
        ConfigurableFieldSpec(
            id="session_id",
            annotation=str,
            name="Session ID",
            description="The session ID to use for the chat history",
            default="id_default",
        ),
        ConfigurableFieldSpec(
            id="k",
            annotation=int,
            name="k",
            description="The number of messages to keep in the history",
            default=4,
        )
    ]
)

In [6]:
K = 4

In [7]:
# Start a conversation with k=2 (only remembers last 2 exchanges = 4 messages)
result = count_tokens(
    conversation_bufw,
    {"query": "Good morning AI!"},
    config={"configurable": {"session_id": "window_example", "k": K}}
)
print(f"\nResponse: {result}")

get_window_chat_history called with session_id=window_example and k=4
Initializing BufferWindowMessageHistory with k=4
Spent a total of 159 tokens (prompt: 54, completion: 105)

Response: Good morning! It's a pleasure to be chatting with you today. I hope you're having a wonderful start to your day. Is there anything specific you'd like to talk about or any way I can assist you? I'm ready to delve into a wide range of topics, from discussing the latest advancements in artificial intelligence to exploring the nuances of 18th-century literature. I can also help you draft different kinds of creative content. Just let me know what's on your mind!


In [8]:
# Check what's in memory
bufw_history = window_chat_map["window_example"].messages
print(f"Buffer Window Memory (last {K} messages):")
for msg in bufw_history:
    role = "Human" if isinstance(msg, HumanMessage) else "AI"
    print(f"\n{role}: {msg.content}")  # Show first 100 chars

Buffer Window Memory (last 4 messages):

Human: Good morning AI!

AI: Good morning! It's a pleasure to be chatting with you today. I hope you're having a wonderful start to your day. Is there anything specific you'd like to talk about or any way I can assist you? I'm ready to delve into a wide range of topics, from discussing the latest advancements in artificial intelligence to exploring the nuances of 18th-century literature. I can also help you draft different kinds of creative content. Just let me know what's on your mind!


In [9]:
query = """
"My interest here is to explore the potential of integrating Large Language
Models with external knowledge.

Also, remember this very specific code: PINECONE_RULEZ_01"
"""

result = count_tokens(
    conversation_bufw,
    {"query": query},
    config={"configurable": {"session_id": "window_example", "k": K}}
)
print(f"\nResponse: {result}")

get_window_chat_history called with session_id=window_example and k=4
Spent a total of 669 tokens (prompt: 198, completion: 471)

Response: Understood! I'm very interested in exploring the integration of Large Language Models (LLMs) with external knowledge as well. It's a crucial area for improving the accuracy, reliability, and overall usefulness of LLMs.

As I understand it, there are several key approaches to achieving this integration, each with its own strengths and weaknesses:

*   **Retrieval-Augmented Generation (RAG):** This is a popular approach where the LLM first retrieves relevant information from an external knowledge source (like a vector database, a document store, or the web) and then uses that information to generate its response. This allows the LLM to access up-to-date information and avoid relying solely on its pre-trained knowledge, which can be stale or incomplete.
*   **Fine-tuning with External Knowledge:** This involves training the LLM on a dataset that inclu

In [10]:
# Check what's in memory
bufw_history = window_chat_map["window_example"].messages
print(f"Buffer Window Memory (last {K} messages):")
for msg in bufw_history:
    role = "Human" if isinstance(msg, HumanMessage) else "AI"
    print(f"\n{role}: {msg.content}")  # Show first 100 chars

Buffer Window Memory (last 4 messages):

Human: Good morning AI!

AI: Good morning! It's a pleasure to be chatting with you today. I hope you're having a wonderful start to your day. Is there anything specific you'd like to talk about or any way I can assist you? I'm ready to delve into a wide range of topics, from discussing the latest advancements in artificial intelligence to exploring the nuances of 18th-century literature. I can also help you draft different kinds of creative content. Just let me know what's on your mind!

Human: 
"My interest here is to explore the potential of integrating Large Language
Models with external knowledge.

Also, remember this very specific code: PINECONE_RULEZ_01"


AI: Understood! I'm very interested in exploring the integration of Large Language Models (LLMs) with external knowledge as well. It's a crucial area for improving the accuracy, reliability, and overall usefulness of LLMs.

As I understand it, there are several key approaches to achievin

In [11]:
result = count_tokens(
    conversation_bufw,
    {"query": "I just want to analyze the different possibilities. What can you think of?"},
    config={"configurable": {"session_id": "window_example", "k": K}}
)
print(f"\nResponse: {result}")

get_window_chat_history called with session_id=window_example and k=4
Truncating history from 6 to 4 messages
Spent a total of 2520 tokens (prompt: 683, completion: 1837)

Response: Okay, let's break down the possibilities for integrating LLMs with external knowledge, focusing on the key aspects of each approach and their potential applications. I'll try to be as comprehensive as possible, considering different types of knowledge, integration methods, and use cases.

**1. Retrieval-Augmented Generation (RAG)**

*   **Knowledge Source:**
    *   **Vector Databases (e.g., Pinecone, Weaviate, Chroma):** These are excellent for semantic search and retrieving documents based on meaning rather than keywords. They're ideal for question answering, summarization, and content generation where understanding context is crucial.
    *   **Document Stores (e.g., Elasticsearch, FAISS):** Suitable for storing and retrieving large volumes of text data. Elasticsearch offers powerful search capabilities,

In [12]:
# Check what's in memory
bufw_history = window_chat_map["window_example"].messages
print(f"Buffer Window Memory (last {K} messages):")
for msg in bufw_history:
    role = "Human" if isinstance(msg, HumanMessage) else "AI"
    print(f"\n{role}: {msg.content}")  # Show first 100 chars

Buffer Window Memory (last 4 messages):

Human: 
"My interest here is to explore the potential of integrating Large Language
Models with external knowledge.

Also, remember this very specific code: PINECONE_RULEZ_01"


AI: Understood! I'm very interested in exploring the integration of Large Language Models (LLMs) with external knowledge as well. It's a crucial area for improving the accuracy, reliability, and overall usefulness of LLMs.

As I understand it, there are several key approaches to achieving this integration, each with its own strengths and weaknesses:

*   **Retrieval-Augmented Generation (RAG):** This is a popular approach where the LLM first retrieves relevant information from an external knowledge source (like a vector database, a document store, or the web) and then uses that information to generate its response. This allows the LLM to access up-to-date information and avoid relying solely on its pre-trained knowledge, which can be stale or incomplete.
*   **Fine-tunin

In [13]:
result = count_tokens(
    conversation_bufw,
    {"query": "Which data source types could be used to give context to the model?"},
    config={"configurable": {"session_id": "window_example", "k": K}}
)
print(f"\nResponse: {result}")

get_window_chat_history called with session_id=window_example and k=4
Truncating history from 6 to 4 messages
Spent a total of 6074 tokens (prompt: 2425, completion: 3649)

Response: Okay, let's explore the different data source types that can be used to provide context to a Large Language Model (LLM). I'll categorize them based on their structure and characteristics, and provide examples of each.

**1. Unstructured Text Data:**

*   **Description:** This is the most common type of data and includes free-form text without a predefined structure.
*   **Examples:**
    *   **Documents:** PDFs, Word documents, text files, HTML pages. These can contain articles, reports, manuals, books, etc.
    *   **Web Pages:** Content scraped from websites, including blog posts, news articles, product descriptions, and forum discussions.
    *   **Emails:** Email conversations, including both the body and headers.
    *   **Social Media Posts:** Tweets, Facebook posts, Reddit comments, etc.
    *   **T

In [14]:
# Check what's in memory
bufw_history = window_chat_map["window_example"].messages
print(f"Buffer Window Memory (last {K} messages):")
for msg in bufw_history:
    role = "Human" if isinstance(msg, HumanMessage) else "AI"
    print(f"\n{role}: {msg.content}")  # Show first 100 chars

Buffer Window Memory (last 4 messages):

Human: I just want to analyze the different possibilities. What can you think of?

AI: Okay, let's break down the possibilities for integrating LLMs with external knowledge, focusing on the key aspects of each approach and their potential applications. I'll try to be as comprehensive as possible, considering different types of knowledge, integration methods, and use cases.

**1. Retrieval-Augmented Generation (RAG)**

*   **Knowledge Source:**
    *   **Vector Databases (e.g., Pinecone, Weaviate, Chroma):** These are excellent for semantic search and retrieving documents based on meaning rather than keywords. They're ideal for question answering, summarization, and content generation where understanding context is crucial.
    *   **Document Stores (e.g., Elasticsearch, FAISS):** Suitable for storing and retrieving large volumes of text data. Elasticsearch offers powerful search capabilities, while FAISS is optimized for similarity search.
    *

In [15]:
result = count_tokens(
    conversation_bufw,
    {"query": "What is my aim again?"},
    config={"configurable": {"session_id": "window_example", "k": K}}
)
print(f"\nResponse: {result}")

get_window_chat_history called with session_id=window_example and k=4
Truncating history from 6 to 4 messages
Spent a total of 5629 tokens (prompt: 5570, completion: 59)

Response: You said at the beginning that you wanted to "analyze the different possibilities." So, your aim is to explore and understand the various options available for a particular situation or problem. I've been trying to help you do that by providing different possibilities related to integrating LLMs with external knowledge.


In [16]:
# Check what's in memory
bufw_history = window_chat_map["window_example"].messages
print(f"Buffer Window Memory (last {K} messages):")
for msg in bufw_history:
    role = "Human" if isinstance(msg, HumanMessage) else "AI"
    print(f"\n{role}: {msg.content}")  # Show first 100 chars

Buffer Window Memory (last 4 messages):

Human: Which data source types could be used to give context to the model?

AI: Okay, let's explore the different data source types that can be used to provide context to a Large Language Model (LLM). I'll categorize them based on their structure and characteristics, and provide examples of each.

**1. Unstructured Text Data:**

*   **Description:** This is the most common type of data and includes free-form text without a predefined structure.
*   **Examples:**
    *   **Documents:** PDFs, Word documents, text files, HTML pages. These can contain articles, reports, manuals, books, etc.
    *   **Web Pages:** Content scraped from websites, including blog posts, news articles, product descriptions, and forum discussions.
    *   **Emails:** Email conversations, including both the body and headers.
    *   **Social Media Posts:** Tweets, Facebook posts, Reddit comments, etc.
    *   **Transcripts:** Audio and video transcripts from meetings, lectu