## Qu'est ce que le "prompt caching" ?
Il permet de stocker puis de réutiliser les réponses générées par les invites exécutées lors de l'utilisation de modèles de langage.

Ici, nous allons utiliser Redis pour stocker les questions et réponses. Nous mettrons en place une recherche sémantique sur les données enregistrées, et si le résultat dépasse un certain seuil de similarité, nous retournerons directement la réponse stockée au lieu de faire un nouvel appel API au modèle de langage (LLM).




In [None]:
!pip install redis
# docker run -d -p 6379:6379 redis

In [44]:
# Setting up redis connection
import redis

REDIS_URL = "redis://localhost:6379"
redis_client = redis.from_url(REDIS_URL)
redis_client.ping()

True

In [2]:
# needed dependencies for `prompt` & `prompt caching`
!pip install langchain-redis langchain_huggingface langchain-mistralai


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [None]:
from langchain_redis import RedisSemanticCache
from langchain.globals import set_llm_cache
from langchain_huggingface import HuggingFaceEmbeddings

embeddings_model = HuggingFaceEmbeddings(model_name="Qwen/Qwen3-Embedding-0.6B")

set_llm_cache(
    RedisSemanticCache(redis_url=REDIS_URL, embeddings=embeddings_model)
)

In [50]:
import getpass
from langchain_mistralai import ChatMistralAI

MISTRAL_APIKEY = getpass.getpass("Please enter your Mistral API key (hit enter): ")
llm = ChatMistralAI(model="mistral-small-latest", mistral_api_key=MISTRAL_APIKEY)

In [None]:
!pip install --upgrade langchain-community

In [55]:
import os
import time

def execute_with_timing(prompt):
    start_time = time.time()
    result = llm.invoke(prompt)
    end_time = time.time()
    return result, end_time - start_time

# Original prompt (in French)
original_prompt = "What is the capital of France?"
result1, time1 = execute_with_timing(original_prompt)
print(f"Original query:\nPrompt: {original_prompt}\n")
print(f"{result1}\nTime: {time1:.2f} seconds\n")

# Semantically similar prompt (in English)
similar_prompt = "Can you tell me the capital city of France?"
result2, time2 = execute_with_timing(similar_prompt)
print(f"Similar query:\nPrompt: {similar_prompt}\n")
print(f"{result2}\nTime: {time2:.2f} seconds\n")

print(f"Speed improvement: {time1 / time2:.2f}x faster")

15:20:26 httpx INFO   HTTP Request: POST https://api.mistral.ai/v1/chat/completions "HTTP/1.1 200 OK"
Original query:
Prompt: What is the capital of France?

content='The capital of France is **Paris**.\n\nIt is one of the most famous and visited cities in the world, known for its iconic landmarks such as the **Eiffel Tower**, **Louvre Museum**, and **Notre-Dame Cathedral**. Paris is also a major cultural, economic, and political center in Europe.\n\nWould you like to know more about Paris or France in general? 😊' additional_kwargs={} response_metadata={'token_usage': {'prompt_tokens': 10, 'total_tokens': 91, 'completion_tokens': 81}, 'model_name': 'mistral-small-latest', 'model': 'mistral-small-latest', 'finish_reason': 'stop'} id='run--2d03855d-fb2a-4aca-aaeb-550af1da81df-0' usage_metadata={'input_tokens': 10, 'output_tokens': 81, 'total_tokens': 91}
Time: 2.07 seconds

Similar query:
Prompt: Can you tell me the capital city of France?

content='The capital of France is **Paris**.\n\