# Caching in Ragas

Ragas offers a flexible caching system to speed up evaluations and testset generation by avoiding redundant computations, especially for LLM and embedding model calls. This document explains how to use both exact (disk-based) caching and the more advanced semantic caching.

## Global Cache Configuration

Ragas now features a global cache configuration managed by `ragas.config.ragas_cache`. This cache is initialized based on environment variables when Ragas is imported. You can control caching behavior by setting these variables:

*   **`RAGAS_CACHE_ENABLED`**: (boolean, default: "true") - Set to "false" to disable all caching.
*   **`RAGAS_CACHE_BACKEND`**: (string, default: "exact") - Choose between `"exact"` (for `DiskCacheBackend`) or `"semantic"` (for `SemanticCacheBackend`).
*   **`RAGAS_CACHE_DIR`**: (string, default: ".ragas_cache") - Directory for `DiskCacheBackend`.
*   **`RAGAS_SEMANTIC_CACHE_THRESHOLD`**: (float, default: 0.85) - Cosine similarity threshold for `SemanticCacheBackend`.
*   **`RAGAS_SEMANTIC_CACHE_EMBEDDING_PROVIDER`**: (string, default: "openai") - Embedding provider (e.g., "openai", "huggingface") for `SemanticCacheBackend`.
*   **`RAGAS_SEMANTIC_CACHE_EMBEDDING_MODEL_NAME`**: (string, optional) - Specific model name for the embedding provider. Defaults to standard models like "text-embedding-ada-002" for OpenAI.

Functions decorated with `@cacher()` (like those within LLM and embedding wrappers) will automatically use this globally configured cache.

You can also implement your own custom cacher by implementing the [CacheInterface][ragas.cache.CacheInterface].

## Using Exact (Disk) Caching

Exact caching stores results based on an exact match of the input arguments. This is typically handled by the `DiskCacheBackend`.

**Default Behavior (Environment Controlled):**
If you haven't changed `RAGAS_CACHE_BACKEND` (or it's explicitly set to `"exact"`), Ragas will use `DiskCacheBackend` by default. You can control the cache directory with `RAGAS_CACHE_DIR`.

**Explicit Usage:**
You can also instantiate `DiskCacheBackend` directly and pass it to specific components if you need finer-grained control or multiple cache instances.

In [None]:
from ragas.cache import DiskCacheBackend

# Example of explicit instantiation
disk_cache = DiskCacheBackend(cache_dir=".my_custom_cache_location")

# This instance can then be passed to LLM/embedding wrappers if needed:
# from ragas.llms import LangchainLLMWrapper
# from langchain_openai import ChatOpenAI
# cached_llm = LangchainLLMWrapper(ChatOpenAI(), cache=disk_cache)

print(f"DiskCacheBackend initialized at: {disk_cache.cache_dir}")

### Evaluation Example with Exact Caching

Let's set up an evaluation. If `RAGAS_CACHE_ENABLED` is "true" and `RAGAS_CACHE_BACKEND` is "exact" (or not set), the LLM calls during evaluation will be cached to disk.

First, ensure your environment variables are set (or rely on defaults for exact caching). For this example, we'll simulate the default behavior by explicitly creating a `LangchainLLMWrapper` that would use the global cache. If you were running this in a fresh environment with default settings, the `@cacher()` inside `LangchainLLMWrapper`'s methods would pick up the global `DiskCacheBackend`.

In [None]:
# Ensure RAGAS_CACHE_ENABLED=true and RAGAS_CACHE_BACKEND=exact (or defaults)
# For demonstration, we'll explicitly pass a DiskCacheBackend instance
# In a typical scenario with env vars set, this explicit passing isn't needed for default behavior.

import os
from ragas.llms import LangchainLLMWrapper
from langchain_openai import ChatOpenAI

# Simulate global cache being a DiskCacheBackend
# If RAGAS_CACHE_ENABLED is true and RAGAS_CACHE_BACKEND is 'exact' or unset,
# ragas.config.ragas_cache would be a DiskCacheBackend instance.
# We create one here for clarity in the example.
llm_cache_instance = DiskCacheBackend(cache_dir=".ragas_cache/llm_calls")

cached_llm = LangchainLLMWrapper(
    ChatOpenAI(model="gpt-3.5-turbo"), # Using a faster model for example
    cache=llm_cache_instance # Explicitly passing cache for this example
)

# To see cache in action, set logging (optional)
import logging
from ragas.utils import set_logging_level
set_logging_level("ragas.cache", logging.DEBUG)

In [None]:
from ragas import evaluate
from ragas.metrics import faithfulness
from datasets import Dataset

# Prepare a very small dataset for quick example
dummy_data = {
    'question': ['What is the capital of France?'],
    'answer': ['The capital of France is Paris.'],
    'contexts': [['France is a country in Western Europe. Paris is its capital.']],
    'ground_truth': ['Paris is the capital of France.']
}
eval_dataset = Dataset.from_dict(dummy_data)

print("Running evaluation (first pass)...")
results_pass1 = evaluate(
    dataset=eval_dataset,
    metrics=[faithfulness],
    llm=cached_llm
)
print(results_pass1)

print("\nRunning evaluation (second pass, should use cache)...")
results_pass2 = evaluate(
    dataset=eval_dataset,
    metrics=[faithfulness],
    llm=cached_llm
)
print(results_pass2)

assert results_pass1.to_pandas().equals(results_pass2.to_pandas()), "Results should be identical"

The first pass will make LLM calls and store them. The second pass should be significantly faster as it retrieves results from the cache. You'll see DEBUG logs from `ragas.cache` if logging is enabled, indicating cache hits.

This caching also applies to testset generation if the `generator_llm` or other components are wrapped with the `@cacher()` and use the global cache.

## Semantic Caching

Semantic caching offers a more intelligent way to cache results, especially for LLM-generated content. Instead of relying on exact input matches, it caches based on the semantic similarity of a designated part of the input (e.g., a user's query or a document), while still requiring exact matches for other parameters (like function name or specific model settings).

**Benefits:**
*   **Resilience to Paraphrasing:** If a query is slightly rephrased but retains the same meaning, semantic cache can still provide a hit.
*   **Reduced LLM Calls:** Can significantly cut down on LLM API usage even if inputs aren't identical byte-for-byte.

The `SemanticCacheBackend` handles this. It works by:
1.  Parsing the structured key (function name, args, kwargs).
2.  Identifying a primary string argument for semantic comparison.
3.  Embedding this string argument.
4.  Comparing this embedding with stored embeddings using cosine similarity.
5.  If similarity is above a threshold, it then checks for exact matches on all other non-semantic parts of the key.


### Configuring Semantic Caching via Environment Variables

This is the easiest way to enable semantic caching globally for your Ragas application. Set these environment variables **before importing Ragas**:

In [None]:
# Example environment variable setup (run this in your terminal or a setup script)
# For a Python notebook, you might need to set these before the kernel starts, 
# or use os.environ and then re-import/reload Ragas modules if already imported.

# os.environ["RAGAS_CACHE_ENABLED"] = "true" # Default, but good to be explicit
# os.environ["RAGAS_CACHE_BACKEND"] = "semantic"
# os.environ["RAGAS_SEMANTIC_CACHE_EMBEDDING_PROVIDER"] = "openai" 
# os.environ["RAGAS_SEMANTIC_CACHE_EMBEDDING_MODEL_NAME"] = "text-embedding-ada-002"
# os.environ["RAGAS_SEMANTIC_CACHE_THRESHOLD"] = "0.85" # Example threshold

# After setting these, if you run an evaluation, LangchainLLMWrapper (if used)
# will automatically use SemanticCacheBackend via ragas.config.ragas_cache.

print("Imagine environment variables are set as above.")
print("If Ragas is imported now, ragas.config.ragas_cache would be a SemanticCacheBackend.")

If you had set the environment variables as shown above, running the previous evaluation example (without explicitly passing a `cache` object to `LangchainLLMWrapper`) would automatically use semantic caching for the LLM calls. The `@cacher()` decorator within the wrapper would pick up the globally configured `SemanticCacheBackend`.

### Using `SemanticCacheBackend` Directly

For more control, you can instantiate `SemanticCacheBackend` and pass it to components.

In [None]:
from ragas.cache import SemanticCacheBackend
from ragas.embeddings import OpenAIEmbeddings # Make sure you have 'openai' extras installed
from ragas.llms import LangchainLLMWrapper
from langchain_openai import ChatOpenAI

# 1. Initialize your embedding model
# Ensure you have OPENAI_API_KEY set in your environment for this to work
try:
    my_embed_model = OpenAIEmbeddings(model_name="text-embedding-ada-002")
except ImportError:
    print("OpenAI embeddings not available. Install with 'pip install ragas[openai]'")
    my_embed_model = None

if my_embed_model:
    # 2. Create the SemanticCacheBackend instance
    semantic_cache = SemanticCacheBackend(
        embedding_model=my_embed_model, 
        similarity_threshold=0.85 # Adjust as needed
    )

    # 3. Use this cache with an LLM wrapper
    semantic_cached_llm = LangchainLLMWrapper(
        ChatOpenAI(model="gpt-3.5-turbo"), 
        cache=semantic_cache
    )

    print("SemanticCacheBackend initialized and LLM wrapped.")

    # Example of how it might be used (conceptual)
    # Note: The LangchainLLMWrapper's methods are decorated with @cacher().
    # When you call generate_text or generate_text_with_image, 
    # the cacher will use the 'semantic_cache' instance we passed.

    # First call (original query)
    # response1 = semantic_cached_llm.generate_text(prompt="Tell me about the Eiffel Tower.")
    # print(f"Response 1: {response1.generations[0][0].text[:50]}...")

    # Second call (semantically similar query)
    # response2 = semantic_cached_llm.generate_text(prompt="What do you know regarding the Eiffel Tower?")
    # print(f"Response 2: {response2.generations[0][0].text[:50]}...")
    
    # If semantic caching worked and the prompts were deemed similar enough by the embedding model
    # and the threshold, the second call would be a cache hit (assuming other parameters like model settings are identical).
else:
    print("Skipping direct SemanticCacheBackend example as embedding model failed to load.")

**Important Considerations for Semantic Caching:**
*   **Embedding Model Choice:** The quality of your semantic cache heavily depends on the chosen embedding model. Models good at capturing nuanced semantic similarity for your domain will perform best.
*   **Threshold Tuning:** The `similarity_threshold` is critical. Too low, and you might get false positives (caching unrelated content). Too high, and you'll miss many potential cache hits. This often requires experimentation.
*   **Cost of Embedding:** While caching saves on LLM calls, generating embeddings itself has a small cost. For very short, frequently changing semantic parts, this might be a factor, though usually negligible compared to LLM costs.
*   **Key Structure:** `SemanticCacheBackend` expects the cached function's arguments to be serializable into a JSON key, with the first string argument in `args` being the candidate for semantic comparison. This is handled by the `@cacher` decorator when applied to methods like those in `LangchainLLMWrapper`.