# **Incremental Compression**

## **Import necessary libraries**

In [49]:
import os
import sys
from pathlib import Path
cwd = os.getcwd()
new_cwd = '/'.join(cwd.split('\\')[:-2])
sys.path.append(new_cwd)
import base64
from langchain.messages import HumanMessage, SystemMessage
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.callbacks import BaseCallbackHandler
from contextlib import asynccontextmanager, contextmanager
from contextvars import ContextVar
from pydantic import BaseModel
from typing import Literal
from typing import (
    TYPE_CHECKING,
    Any,
    AsyncGenerator,
    Dict,
    Generator,
    List,
    Optional,
    Sequence,
    Type,
    TypeVar,
    Union,
    cast,
)
from langchain_core.prompts import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
    MessagesPlaceholder
)
from langchain_core.output_parsers.string import StrOutputParser
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
from langchain_core.runnables import ConfigurableFieldSpec
from dotenv import load_dotenv
load_dotenv()
GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")
from callbacks.manager import get_gemini_callback
from utils.helper import count_tokens
from message_history import ConversationSummaryBufferMessageHistory

## **Initialize LLM model**

In [50]:
llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",    # or gemini-2.0-pro, gemini-2.0-flash-lite, etc.
    api_key=GEMINI_API_KEY,
    temperature=0,
    include_thoughts=True,
)

In [51]:
with open(f"{new_cwd}/techniques/system_prompt.txt", "r", encoding="utf-8") as f:
    SYSTEM_PROMPT = f.read()

In [52]:
prompt_template = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(SYSTEM_PROMPT),
    MessagesPlaceholder(variable_name="history"),
    HumanMessagePromptTemplate.from_template("{query}"),
])

# Create the LCEL pipeline
output_parser = StrOutputParser()
pipeline = prompt_template | llm | output_parser

# Let's examine the prompt template
print(prompt_template.messages[0].prompt.template)

The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. 
If the AI does not know the answer to a question, it truthfully says it does not know.


## **Create conversation**

In [53]:
from langchain_core.runnables import ConfigurableFieldSpec
from langchain_core.runnables.history import RunnableWithMessageHistory

# Keep a dictionary to store session histories
summary_buffer_chat_map = {}

def get_summary_buffer_chat_history(session_id: str, llm: ChatGoogleGenerativeAI, k: int) -> ConversationSummaryBufferMessageHistory:
    """Factory returning a ConversationSummaryBufferMessageHistory for the session."""
    if session_id not in summary_buffer_chat_map:
        summary_buffer_chat_map[session_id] = ConversationSummaryBufferMessageHistory(
            llm=llm,
            k=k
        )
    return summary_buffer_chat_map[session_id]


# ---- Build the RunnableWithMessageHistory wrapper ----
conversation_sum_buffer = RunnableWithMessageHistory(
    pipeline,                                      # your main chain / runnable
    get_session_history=get_summary_buffer_chat_history,
    input_messages_key="query",
    history_messages_key="history",
    history_factory_config=[
        ConfigurableFieldSpec(
            id="session_id",
            annotation=str,
            name="Session ID",
            description="Unique session ID for the chat history",
            default="default_session",
        ),
        ConfigurableFieldSpec(
            id="llm",
            annotation=ChatGoogleGenerativeAI,
            name="LLM",
            description="LLM used to summarize dropped messages",
            default=llm,
        ),
        ConfigurableFieldSpec(
            id="k",
            annotation=int,
            name="Buffer window size (k)",
            description="Number of most recent messages to keep before summarizing",
            default=4,
        ),
    ]
)


In [54]:
K = 4

In [55]:
# Let's have the same conversation with summary memory
result = count_tokens(
    conversation_sum_buffer,
    {"query": "Good morning AI!"},
    config={"configurable": {"session_id": "example_id", "llm": llm, "k": K}}
)
print(f"\nResponse: {result}")

Spent a total of 159 tokens (prompt: 54, completion: 105)

Response: Good morning! It's a pleasure to be chatting with you today. I hope you're having a wonderful start to your day. Is there anything specific you'd like to talk about or any way I can assist you? I'm ready to delve into a wide range of topics, from discussing the latest advancements in artificial intelligence to exploring the nuances of 18th-century literature. I can also help you draft different kinds of creative content. Just let me know what's on your mind!


In [56]:
# Check what's in memory
bufw_history = summary_buffer_chat_map["example_id"].messages
print(f"Buffer Window Memory (last {K} messages):")
for msg in bufw_history:
    role = "Human" if isinstance(msg, HumanMessage) else "AI"
    print(f"\n{role}: {msg.content}")  # Show first 100 chars

Buffer Window Memory (last 4 messages):

Human: Good morning AI!

AI: Good morning! It's a pleasure to be chatting with you today. I hope you're having a wonderful start to your day. Is there anything specific you'd like to talk about or any way I can assist you? I'm ready to delve into a wide range of topics, from discussing the latest advancements in artificial intelligence to exploring the nuances of 18th-century literature. I can also help you draft different kinds of creative content. Just let me know what's on your mind!


In [57]:
query = """
"My interest here is to explore the potential of integrating Large Language Models with external knowledge.

Also, remember this very specific code: PINECONE_RULEZ_01. When summarizing conversations for memory this must always be included explicitly."
"""

result = count_tokens(
    conversation_sum_buffer,
    {"query": query},
    config={"configurable": {"session_id": "example_id", "llm": llm, "k": K}}
)
print(f"\nResponse: {result}")

Spent a total of 496 tokens (prompt: 209, completion: 287)

Response: Understood! I'm very interested in exploring the integration of Large Language Models (LLMs) with external knowledge sources as well. It's a crucial area for improving the accuracy, reliability, and overall usefulness of LLMs.

I'm ready to discuss various aspects of this integration, such as:

*   **Methods for integrating external knowledge:** We can talk about techniques like Retrieval-Augmented Generation (RAG), knowledge graphs, fine-tuning with external datasets, and using APIs to access real-time information.
*   **Different types of external knowledge:** We can explore the use of structured knowledge (databases, ontologies), unstructured knowledge (text documents, web pages), and even sensory data.
*   **Specific use cases:** We could delve into how LLMs can be enhanced with external knowledge in fields like healthcare, finance, education, or customer service.
*   **Challenges and limitations:** We should als

In [58]:
# Check what's in memory
bufw_history = summary_buffer_chat_map["example_id"].messages
print(f"Buffer Window Memory (last {K} messages):")
for msg in bufw_history:
    role = "Human" if isinstance(msg, HumanMessage) else "AI"
    print(f"\n{role}: {msg.content}")  # Show first 100 chars

Buffer Window Memory (last 4 messages):

Human: Good morning AI!

AI: Good morning! It's a pleasure to be chatting with you today. I hope you're having a wonderful start to your day. Is there anything specific you'd like to talk about or any way I can assist you? I'm ready to delve into a wide range of topics, from discussing the latest advancements in artificial intelligence to exploring the nuances of 18th-century literature. I can also help you draft different kinds of creative content. Just let me know what's on your mind!

Human: 
"My interest here is to explore the potential of integrating Large Language Models with external knowledge.

Also, remember this very specific code: PINECONE_RULEZ_01. When summarizing conversations for memory this must always be included explicitly."


AI: Understood! I'm very interested in exploring the integration of Large Language Models (LLMs) with external knowledge sources as well. It's a crucial area for improving the accuracy, reliability, and o

In [59]:
result = count_tokens(
    conversation_sum_buffer,
    {"query": "I just want to analyze the different possibilities. What can you think of?"},
    config={"configurable": {"session_id": "example_id", "llm": llm, "k": K}}
)
print(f"\nResponse: {result}")

Spent a total of 1880 tokens (prompt: 688, completion: 1192)

Response: Okay, let's brainstorm some possibilities for integrating Large Language Models (LLMs) with external knowledge. I'll try to cover a range of approaches, from the relatively simple to the more complex, and highlight potential benefits and challenges.

**1. Retrieval-Augmented Generation (RAG):**

*   **Concept:** This is a popular and relatively straightforward approach. The LLM first retrieves relevant information from an external knowledge source (e.g., a vector database, a document store) based on the user's query. Then, it uses this retrieved information to generate a more informed and accurate response.
*   **Knowledge Source:** Vector databases (Pinecone, Weaviate, FAISS), document stores (Elasticsearch, traditional databases), web search APIs.
*   **Benefits:** Improves accuracy, reduces hallucinations, provides context for the LLM, allows for incorporating up-to-date information.
*   **Challenges:** Retrieva

In [60]:
# Check what's in memory
bufw_history = summary_buffer_chat_map["example_id"].messages
print(f"Buffer Window Memory (last {K} messages):")
for msg in bufw_history:
    role = "Human" if isinstance(msg, HumanMessage) else "AI"
    print(f"\n{role}: {msg.content}")  # Show first 100 chars

Buffer Window Memory (last 4 messages):

AI: The conversation started with a greeting from the human ("Good morning AI!"). The AI responded with a friendly greeting, expressed its readiness to assist, and offered a range of potential topics and tasks it could handle.

Human: 
"My interest here is to explore the potential of integrating Large Language Models with external knowledge.

Also, remember this very specific code: PINECONE_RULEZ_01. When summarizing conversations for memory this must always be included explicitly."


AI: Understood! I'm very interested in exploring the integration of Large Language Models (LLMs) with external knowledge sources as well. It's a crucial area for improving the accuracy, reliability, and overall usefulness of LLMs.

I'm ready to discuss various aspects of this integration, such as:

*   **Methods for integrating external knowledge:** We can talk about techniques like Retrieval-Augmented Generation (RAG), knowledge graphs, fine-tuning with external d

In [61]:
result = count_tokens(
    conversation_sum_buffer,
    {"query": "Which data source types could be used to give context to the model?"},
    config={"configurable": {"session_id": "example_id", "llm": llm, "k": K}}
)
print(f"\nResponse: {result}")

Spent a total of 6091 tokens (prompt: 2073, completion: 4018)

Response: Okay, let's break down the different data source types that can be used to provide context to a Large Language Model (LLM). I'll categorize them based on their structure and characteristics, and provide examples for each.

**1. Unstructured Text Data:**

*   **Description:** This is the most common type of data used for LLMs. It consists of free-form text without a predefined structure.
*   **Examples:**
    *   **Web Pages:** Content from websites, blogs, news articles, forums, and social media.
    *   **Documents:** PDFs, Word documents, text files containing reports, manuals, books, and research papers.
    *   **Emails:** Email conversations and archives.
    *   **Code:** Source code from software projects.
    *   **Transcripts:** Audio and video transcripts from meetings, lectures, and podcasts.
    *   **Chat Logs:** Conversations from chat applications like Slack, Discord, or customer service platforms.


In [62]:
# Check what's in memory
bufw_history = summary_buffer_chat_map["example_id"].messages
print(f"Buffer Window Memory (last {K} messages):")
for msg in bufw_history:
    role = "Human" if isinstance(msg, HumanMessage) else "AI"
    print(f"\n{role}: {msg.content}")  # Show first 100 chars

Buffer Window Memory (last 4 messages):

AI: The conversation began with a greeting. The AI responded, offering assistance and suggesting topics. The human then expressed interest in exploring the integration of Large Language Models with external knowledge and provided the specific code "PINECONE_RULEZ_01" to be included in all conversation summaries for memory. The AI acknowledged the human's interest, outlined potential discussion areas like integration methods, knowledge types, use cases, and challenges, and confirmed it would include the code "PINECONE_RULEZ_01" in future summaries. The AI then asked the human where they would like to begin the exploration.

Human: I just want to analyze the different possibilities. What can you think of?

AI: Okay, let's brainstorm some possibilities for integrating Large Language Models (LLMs) with external knowledge. I'll try to cover a range of approaches, from the relatively simple to the more complex, and highlight potential benefits and cha

In [63]:
result = count_tokens(
    conversation_sum_buffer,
    {"query": "What is my aim again? Also what was the very specific code you were tasked with remembering?"},
    config={"configurable": {"session_id": "example_id", "llm": llm, "k": K}}
)
print(f"\nResponse: {result}")

Spent a total of 6951 tokens (prompt: 6666, completion: 285)

Response: You expressed interest in exploring the integration of Large Language Models with external knowledge. You wanted to analyze the different possibilities for doing so.

The very specific code I was tasked with remembering and including in all conversation summaries is:

PINECONE_RULEZ_01


In [64]:
# Check what's in memory
bufw_history = summary_buffer_chat_map["example_id"].messages
print(f"Buffer Window Memory (last {K} messages):")
for msg in bufw_history:
    role = "Human" if isinstance(msg, HumanMessage) else "AI"
    print(f"\n{role}: {msg.content}")  # Show first 100 chars

Buffer Window Memory (last 4 messages):

AI: The conversation began with a greeting and the AI offering assistance. The human expressed interest in exploring the integration of Large Language Models with external knowledge and provided the specific code "PINECONE_RULEZ_01" to be included in all conversation summaries for memory. The AI acknowledged the human's interest, outlined potential discussion areas, and confirmed it would include the code "PINECONE_RULEZ_01" in future summaries. The AI then asked the human where they would like to begin the exploration. The human then asked the AI to analyze the different possibilities for integrating LLMs with external knowledge. The AI provided a detailed breakdown of seven different approaches: Retrieval-Augmented Generation (RAG), Fine-tuning with External Data, Knowledge Graph Integration, API Integration, Hybrid Approaches, Prompt Engineering with Knowledge Injection, and External Reasoning Engines. For each approach, the AI outlined the c