# [Langchain Callbacks](https://python.langchain.com/docs/modules/callbacks/)
Le **callback** di Langchain ci permettono di ottenere lo *streaming* del processo di esecuzione di Open AI.<br/>
Lo stdout handler di default che introduciamo oggi semplicemente "logga" tutti gli eventi sullo standard output. Questo standard handler è implementato nelle librerie di Langchain sotto *conda_env/Lib/site-packages/langchain/callbacks/stdout.py*.<br/>

## [Callbacks documentation](https://python.langchain.com/docs/modules/callbacks/)

**Callbacks**:<br>
LangChain provides a callbacks system that allows you to hook into the various stages of your LLM application. This is useful for logging, monitoring, streaming, and other tasks. You can subscribe to these events by using the callbacks argument available throughout the API. 

**Callback handlers**:<br>
CallbackHandlers are objects that implement the **CallbackHandler** interface, which has a method for each event that can be subscribed to. The CallbackManager will call the appropriate method on each handler when the event is triggered.<br/>
Nel nostro esempio abbiamo la possibilità di utilizzare l'interfaccia di CallbackHandler implementata nella classe `StdOutCallbackHandler` di  `common/callbacks.py`, però per capire ancora meglio implementeremo in una cella questa specifica interfaccia. Tale classe deriva da `langchain.callbacks.base`.

---------------
We will incorporate a handler for the callbacks, enabling us to observe the response as it streams and to gain insights into the Agent's reasoning process. This will prove incredibly valuable when we aim to stream the bot's responses to users and keep them informed about the ongoing process as they await the answer.

## Constants

In [1]:
import os
from dotenv import load_dotenv
# load AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, OPENAI_API_VERSION and AZURE_OPENAI_API_TYPE
# plus COMPLETION4_DEPLOYMENT, to be assigned to the MODEL string

load_dotenv("./../credentials_my.env")
MODEL = os.environ["GPT4-0613-8k"] 

from langchain.chat_models import AzureChatOpenAI
llm = AzureChatOpenAI(deployment_name=MODEL, temperature=0, max_tokens=1000)

## Let's start with Langchain HelloWorld with a normal LLM
### No handlers here for the moment

In [2]:
import os
from dotenv import load_dotenv
from langchain.llms import AzureOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

# Now we create a simple prompt template
QUESTION = "What is CLP?" # What are the approaches to Task Decomposition?
language = "Italian" # German

prompt = PromptTemplate(
    input_variables=["question", "language"],
    template='Answer the following question: "{question}". Give your response in {language}',
)

# Let's print the question using Langchain prompt
print(prompt.format(question=QUESTION, language=language))

# And finally we create our first generic chain
chain_chat1 = LLMChain(llm=llm, prompt=prompt, verbose=False) # verbose=False is the default
chain_chat1({"question": QUESTION, "language": language})

Answer the following question: "What is CLP?". Give your response in Italian


{'question': 'What is CLP?',
 'language': 'Italian',
 'text': "CLP è l'acronimo di Chilean Peso, la valuta del Cile."}

### Now we ask the same question, using the default standard handler (StdOutCallbackHandler)
**Please note**: rather than `callbacks=[standard_handler]` we could write `verbose=True`. So in this case `verbose=False` is ignored

In [3]:
from langchain.callbacks import StdOutCallbackHandler

standard_handler = StdOutCallbackHandler()
chain_chat2 = LLMChain(llm=llm, prompt=prompt, verbose=True)# False, callbacks=[standard_handler])
chain_chat2({"question": QUESTION, "language": language})



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mAnswer the following question: "What is CLP?". Give your response in Italian[0m

[1m> Finished chain.[0m


{'question': 'What is CLP?',
 'language': 'Italian',
 'text': "CLP è l'acronimo di Chilean Peso, la valuta del Cile."}

## CUSTOM Callback Handler
I simply duplicate the `StdOutCallbackHandler` class of `conda_env/Lib/site-packages/langchain/callbacks/stdout.py` and customize `on_chain_start` to *speak* Italian

In [None]:
import sys
from typing import Any, Dict, List, Optional, Union
from langchain.callbacks.base import BaseCallbackHandler
from langchain.schema import AgentAction, AgentFinish, LLMResult

class StdOutCallbackHandler_custom(BaseCallbackHandler):
    """Callback handler for streaming in agents.
    Only works with agents using LLMs that support streaming.
    """
    
    def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
        """Run on new LLM token. Only available when streaming is enabled."""
        message = f"\n\n\033[1m> Sto analizzando il nuovo token {token}...\033[0m" # <<<--- ITALIAN
        print(message) # <<<<<<<< ITALIAN!
        sys.stdout.write(token)
        sys.stdout.flush()

    def on_llm_error(self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any) -> Any:
        """Run when LLM errors."""
        sys.stdout.write(f"LLM Error: {error}\n")

    def on_chain_start(self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any) -> Any:
        """Print out that we are entering a chain."""
        class_name = serialized.get("name", serialized.get("id", ["<unknown>"])[-1])
        message = f"\n\n\033[1m> Sto entrando nella catena {class_name}...\033[0m" # <<<--- ITALIAN
        print(message) # <<<<<<<< ITALIAN!
        sys.stdout.write(message.upper())
        sys.stdout.flush()
        
    def on_tool_start(self, serialized: Dict[str, Any], input_str: str, **kwargs: Any) -> Any:
        sys.stdout.write(f"Tool: {serialized['name']}\n")
        class_name = serialized.get("name", serialized.get("id", ["<unknown>"])[-1])
        message = f"\n\n\033[1m> Sto avviando il TOOL per {class_name}...\033[0m"
        print(message) # <<<<<<<< ITALIAN!
        sys.stdout.write(message.upper())
        sys.stdout.flush()
        
    def on_agent_action(self, action: AgentAction, **kwargs: Any) -> Any:
        sys.stdout.write(f"{action.log}\n")

In [None]:
from langchain.callbacks.manager import CallbackManager
custom_handler = StdOutCallbackHandler_custom()
custom_cb_manager = CallbackManager(handlers=[custom_handler])
chain_chat3 = LLMChain(llm=llm, prompt=prompt, verbose=False, callback_manager=custom_cb_manager)
chain_chat3({"question": QUESTION, "language": language})

## Let's add [*streaming*](https://python.langchain.com/docs/modules/model_io/models/llms/streaming_llm) now!
- Upon receiving the client’s request, the server starts sending the data back in chunks or as a continuous stream. It doesn't wait for the entire response to be generated before sending it.
- As the server sends the data in chunks, the client can start receiving and processing the received data immediately. This enables real-time updates and allows the client to display or act upon the received information as it arrives.
- The connection remains open: Unlike traditional HTTP requests where the connection is closed after the server sends the response, in HTTP streaming, the connection remains open. This allows the server to continue sending data to the client over the same connection.
- Continuous data delivery: The server keeps sending data to the client as it becomes available or as per a defined interval. This enables a continuous flow of data from the server to the client, similar to a stream of water flowing steadily

In [None]:
from langchain.callbacks.manager import CallbackManager

llm_streaming = AzureChatOpenAI(deployment_name=MODEL, temperature=0, max_tokens=1000, streaming=True)

chain_chat4 = LLMChain(llm=llm_streaming, prompt=prompt, verbose=True, callback_manager=custom_cb_manager)

chain_chat4({"question": QUESTION, "language": language})

In [None]:
resp = chain_chat4({"question": "Write me a song about sparkling water.", "language": language})

In [None]:
for chunk in resp:
    key=chunk
    value=resp[key]
    print(f"*** {key}: ***\n{value}\n\n")