# CogCache

This notebook shows how to use LangChain with CogCache [chat models](/docs/concepts/#chat-models). For detailed documentation of all ChatCogCache features and configurations head to the [API reference](https://python.langchain.com/api_reference/community/chat_models/langchain_community.chat_models.cogcache.ChatCogCache.html).

CogCache has several chat models. You can find information about their latest models and their costs, context windows, and supported input types in the [CogCache docs](https://cogcache.readme.io/reference/models).


In [1]:
import getpass
import os

if not os.environ.get("COGCACHE_API_KEY"):
    os.environ["COGCACHE_API_KEY"] = getpass.getpass("Enter your CogCache API key: ")

### Installation

The LangChain CogCache integration lives in the `langchain` package:

In [None]:
%pip install -qU langchain

## Instantiation

Now we can instantiate our model object and generate chat completions:

In [15]:
from langchain_community.chat_models import ChatCogCache

llm = ChatCogCache(
    api_key="YOUR_API_KEY",
    model="gpt-4o-2024-08-06",
    # temperature=0,
    # max_tokens=None,
    # n=1,
    # verbose=True,
    # timeout=None,
    # model_kwargs={ "response_format": { "type": "json_object" } }
    # default_headers={"Cache-Control": "no-store"}
    # ...
    # other params...
)

## Invocation

In [4]:
messages = [
    (
        "system",
        "You are a helpful assistant that translates English to French. Translate the user sentence.",
    ),
    ("human", "I love programming."),
]
ai_msg = llm.invoke(messages)
ai_msg

AIMessage(content="J'aime programmer.", additional_kwargs={}, response_metadata={'token_usage': {'prompt_tokens': 31, 'completion_tokens': 4, 'total_tokens': 35, 'latency_time': 0.002396106719970703}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_d54531d9eb', 'finish_reason': 'stop', 'logprobs': None}, id='run-e7722a39-eeaf-4c71-9726-10d58a8ad87f-0')

In [5]:
print(ai_msg.content)

J'aime programmer.


## Streaming

In [6]:
messages = [
    (
        "system",
        "You are a helpful assistant that translates English to French. Translate the user sentence.",
    ),
    ("human", "I love programming."),
]
ai_stream = llm.stream(messages)

for ai_msg in ai_stream:
    print(ai_msg)

content='' additional_kwargs={} response_metadata={'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_d54531d9eb'} id='run-0865242f-0dbe-4db6-94df-078df0f84c22'
content='J' additional_kwargs={} response_metadata={'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_d54531d9eb'} id='run-0865242f-0dbe-4db6-94df-078df0f84c22'
content="'" additional_kwargs={} response_metadata={'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_d54531d9eb'} id='run-0865242f-0dbe-4db6-94df-078df0f84c22'
content='aime' additional_kwargs={} response_metadata={'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_d54531d9eb'} id='run-0865242f-0dbe-4db6-94df-078df0f84c22'
content=' ' additional_kwargs={} response_metadata={'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_d54531d9eb'} id='run-0865242f-0dbe-4db6-94df-078df0f84c22'
content='programmer' additional_kwargs={} response_metadata={'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_d54531d9eb'} id=

## Chaining

We can [chain](/docs/how_to/sequence/) our model with a prompt template like so:

In [7]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant that translates {input_language} to {output_language}.",
        ),
        ("human", "{input}"),
    ]
)

chain = prompt | llm
chain.invoke(
    {
        "input_language": "English",
        "output_language": "Spanish",
        "input": "I am writing spanish. I love programming.",
    }
)

AIMessage(content='Estoy escribiendo en español. Me encanta programar.', additional_kwargs={}, response_metadata={'token_usage': {'prompt_tokens': 31, 'completion_tokens': 11, 'total_tokens': 42, 'latency_time': 2.6073453426361084}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_d54531d9eb', 'finish_reason': 'stop', 'logprobs': None}, id='run-2d019da0-665e-4a98-b440-11d6128ba2e0-0')

## Add parameters in individual invocation

Instead of putting all parameters during class instantiation. We can pass parameters in individual invocation.

In [17]:
from langchain_community.chat_models import ChatCogCache

llm = ChatCogCache(
    api_key="YOUR_API_KEY",
)

messages = [
    (
        "system",
        "You are a helpful assistant that translates English to French. Translate the user sentence.",
    ),
    ("human", "I love programming."),
]
llm.invoke(
    messages,
    model="gpt-4o-2024-08-06",
    # temperature=0,
    # max_tokens=None,
    # n=1
)

AIMessage(content="J'aime programmer.", additional_kwargs={}, response_metadata={'token_usage': {'prompt_tokens': 31, 'completion_tokens': 4, 'total_tokens': 35, 'latency_time': 0.0024280548095703125}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_d54531d9eb', 'finish_reason': 'stop', 'logprobs': None}, id='run-aedc9d7c-78b4-4593-a5e1-85a9d36b05e9-0')

## Caching behavior

CogCache automatically caches our responses by default. This feature enhances response speed and promotes cost savings.

In [23]:
from langchain_community.chat_models import ChatCogCache

llm = ChatCogCache(
    api_key="YOUR_API_KEY",
    model="gpt-4o-2024-08-06",
    include_response_headers=True
)
messages = [
    (
        "system",
        "You are a helpful assistant.",
    ),
    ("human", "How does photosynthesis work?"),
]


### First time prompt

In [12]:
%%time

llm.invoke(messages)

CPU times: user 4.92 ms, sys: 2.66 ms, total: 7.58 ms
Wall time: 12.2 s


AIMessage(content="Photosynthesis is the process by which green plants, algae, and some bacteria convert light energy into chemical energy stored in glucose. This process primarily occurs in the chloroplasts of plant cells. Here's a simplified overview of how it works:\n\n1. **Light Absorption**: Chlorophyll, the green pigment in chloroplasts, absorbs sunlight, primarily in the blue and red wavelengths.\n\n2. **Water Splitting (Photolysis)**: The absorbed light energy is used to split water molecules (H₂O) into oxygen, protons, and electrons. This occurs in the thylakoid membranes of the chloroplasts.\n\n3. **Oxygen Release**: Oxygen (O₂) is released as a byproduct into the atmosphere.\n\n4. **Electron Transport Chain**: The electrons released from water splitting are transferred through a series of proteins embedded in the thylakoid membrane, known as the electron transport chain. This movement helps generate ATP and NADPH.\n\n5. **ATP and NADPH Formation**: The energy from the electr

### Exact prompt

In [13]:
%%time

llm.invoke(messages)

CPU times: user 3.83 ms, sys: 2.25 ms, total: 6.08 ms
Wall time: 12 ms


AIMessage(content="Photosynthesis is the process by which green plants, algae, and some bacteria convert light energy into chemical energy stored in glucose. This process primarily occurs in the chloroplasts of plant cells. Here's a simplified overview of how it works:\n\n1. **Light Absorption**: Chlorophyll, the green pigment in chloroplasts, absorbs sunlight, primarily in the blue and red wavelengths.\n\n2. **Water Splitting (Photolysis)**: The absorbed light energy is used to split water molecules (H₂O) into oxygen, protons, and electrons. This occurs in the thylakoid membranes of the chloroplasts.\n\n3. **Oxygen Release**: Oxygen (O₂) is released as a byproduct into the atmosphere.\n\n4. **Electron Transport Chain**: The electrons released from water splitting are transferred through a series of proteins embedded in the thylakoid membrane, known as the electron transport chain. This movement helps generate ATP and NADPH.\n\n5. **ATP and NADPH Formation**: The energy from the electr

### Similar prompt

In [25]:
%%time

messages = [
    (
        "system",
        "You are a helpful assistant.",
    ),
    ("human", "How photosynthesis works"),
]

llm.invoke(messages)

CPU times: user 2.89 ms, sys: 1.85 ms, total: 4.75 ms
Wall time: 144 ms


AIMessage(content="Photosynthesis is the process by which green plants, algae, and some bacteria convert light energy into chemical energy stored in glucose. This process primarily occurs in the chloroplasts of plant cells. Here's a simplified overview of how it works:\n\n1. **Light Absorption**: Chlorophyll, the green pigment in chloroplasts, absorbs sunlight, primarily in the blue and red wavelengths.\n\n2. **Water Splitting (Photolysis)**: The absorbed light energy is used to split water molecules (H₂O) into oxygen, protons, and electrons. This occurs in the thylakoid membranes of the chloroplasts.\n\n3. **Oxygen Release**: Oxygen (O₂) is released as a byproduct into the atmosphere.\n\n4. **Electron Transport Chain**: The electrons released from water splitting are transferred through a series of proteins embedded in the thylakoid membrane, known as the electron transport chain. This movement helps generate ATP and NADPH.\n\n5. **ATP and NADPH Formation**: The energy from the electr

## Control caching behavior

We can control caching behavior by using the `Cache-Control` request header.

This header affects the cache behavior as follows:

- `no-store` — disables storing the answer from the LLM in the cache
- `no-cache` — disables retrieval of answer from cache, goes directly to the LLM
- `only-if-cached` — retrieves answer only from the cache and if not found it returns 504 error code
- `public` — normal behavior, it returns from cache if answer found there, otherwise retrieves answer from LLM and stores it in the cache

In [20]:
from langchain_community.chat_models import ChatCogCache

llm = ChatCogCache(
    api_key="YOUR_API_KEY",
    model="gpt-4o-2024-08-06",
    default_headers={"Cache-Control": "no-store"}
)

llm.invoke(
    'What is the capital of Australia?',
    # headers={"Cache-Control": "no-store"} # alternative approach
)

AIMessage(content='The capital of Australia is Canberra.', additional_kwargs={}, response_metadata={'token_usage': {'prompt_tokens': 14, 'completion_tokens': 7, 'total_tokens': 21, 'latency_time': 2.031675338745117}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_d54531d9eb', 'finish_reason': 'stop', 'logprobs': None}, id='run-fc18b0cc-720d-4c79-9187-7adafe5541d4-0')

## Response Headers

We can access response headers directly with prompt invocation.

#### 1. `X-Cache`
This header tells if the response was provided from the cache or not. However, the header will not be available if Cache-Control has a value of `no-store` or `no-cache`.

Possibles values:

- `hit` - the answer was retrieved from cache
- `miss` - the answer was retrieved directly from the LLM


#### 2. `CogCache-Cache-Entry-ID`
This response header holds the cache entry unique ID when the response is retrieved from cache. When the response is retrieved from the LLM this header will not be available.

In [12]:
from langchain_community.chat_models import ChatCogCache

llm = ChatCogCache(
    api_key="YOUR_API_KEY",
    model="gpt-4-1106-preview",
    include_response_headers=True,  # set to True to include headers in the response
)

messages = [
    (
        "system",
        "You are a helpful assistant that translates English to French. Translate the user sentence.",
    ),
    ("human", "I love programming."),
]
response = llm.invoke(messages)
response.response_metadata["headers"]

{'date': 'Thu, 17 Oct 2024 14:50:33 GMT',
 'server': 'uvicorn',
 'x-cache': 'hit',
 'access-control-allow-origin': '*',
 'access-control-allow-credentials': 'true',
 'access-control-expose-headers': 'content-type,x-cache,cogcache-hit-type,cogcache-similarity-match-score,cogcache-hit-processing-ms,cogcache-latency-ms',
 'access-control-allow-methods': 'GET, OPTIONS, POST',
 'access-control-max-age': '600',
 'vary': 'Origin',
 'cogcache-hit-type': 'exact-match',
 'cogcache-prompt-type': 'other',
 'cogcache-cache-entry-id': 'b14b8f518b54e4ef2187972f217dacc0',
 'content-length': '1125',
 'content-type': 'application/json',
 'cogcache-hit-processing-ms': '3.1'}

## API reference

For detailed documentation of all CogCache features and configurations head to the API reference: https://cogcache.readme.io/reference/overview