# LangChain

LangChain è un framework open source per la creazione di applicazioni basate su modelli linguistici di grandi dimensioni (LLM).
LangChain fornisce strumenti e astrazioni per migliorare la personalizzazione, l'accuratezza e la pertinenza delle informazioni generate dai modelli. Ad esempio, gli sviluppatori possono utilizzare i componenti LangChain per creare nuove catene di prompt o personalizzare i modelli esistenti. LangChain include anche componenti che consentono agli LLM di accedere a nuovi set di dati senza riqualificazione.

In [None]:
!pip install langchain

In [9]:
import pandas as pd
import os
import logging

In [10]:
logging.basicConfig(level=logging.ERROR)

## Test LLM Google (Free - HugginFace)

Proviamo ad utilizzare un LLM free sviluppato da Google (***google/flan-t5-large***) scaricabile da HugginFace per fare qualche test preliminare.

In [None]:
!pip install git+https://github.com/huggingface/transformers
!pip install accelerate bitsandbytes

In [12]:
import torch

from langchain.llms import HuggingFacePipeline

from transformers import BitsAndBytesConfig
from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, AutoModelForSeq2SeqLM

In [None]:
model_id = 'google/flan-t5-large'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

pipe = pipeline(
    "text2text-generation",
    model=model,
    tokenizer=tokenizer,
    max_length=100
)

local_llm = HuggingFacePipeline(pipeline=pipe)

In [None]:
print("ST-bot\n[Digit 'x' and confirm to quit the conversation]\n\n- - -")
question = ""

while True:
    question = input("\nAsk me something:\n> ")
    if question == 'x':
        break
    answer = local_llm(question)

    print(f"\nAssistant: {answer}")

ST-bot
[Digit 'x' and confirm to quit the conversation]

- - -

Ask me something:
> hello

Assistant: i'm a sailor

Ask me something:
> how you doin sailor?

Assistant: good

Ask me something:
> x


## Chain con memoria (Q&A)

In questa sezione andiamo a creare delle chain dotate di memoria per sostenere una semplice conversazione di domande e risposte.

### Configurazione

In [None]:
!pip install anthropic
!pip install langchain-google-vertexai

In [14]:
import json

from google.auth import credentials
from google.oauth2 import service_account
import google.cloud.aiplatform as aiplatform

from langchain.chains import ConversationChain
from langchain.chains.conversation.memory import ConversationBufferMemory, ConversationSummaryBufferMemory, ConversationBufferWindowMemory
from langchain_google_vertexai import VertexAI

import vertexai
from anthropic import AnthropicVertex

In [17]:
# Lettura JSON e aggiornamento parametri
with open(
    "service_account.json"
) as f:
    credentials = json.load(f)

my_credentials = service_account.Credentials.from_service_account_info(
    credentials
)

# Inizializzazione AI Platform
aiplatform.init(
    credentials=my_credentials,
)

with open("service_account.json", encoding="utf-8") as f:
    project_json = json.load(f)
    project_id = project_json["project_id"]

# Inizializzazione Vertex AI
vertexai.init(project=project_id, location="europe-west1")

In [18]:
llm = VertexAI(model_name="text-bison@002", max_output_tokens=50)

Facciamo una prova per vedere se siamo connessi.

In [None]:
question = input("Ask me something\n> ")
answer = llm.invoke(question)

print(answer)

### Memoria

Sperimentiamo secondo tre diversi metodi presentati nelle prossime sezioni come aggiungere il componente di memoria alle nostre conversazioni.

#### Buffer memory

Tiene in memoria la cronologia della conversazione.

In [None]:
memory_buffer = ConversationBufferMemory()

conversation_buffer = ConversationChain(
    llm=llm,
    verbose=False,
    memory=memory_buffer
)

In [None]:
question = "I have a black cat called Meowton"
print(conversation_buffer.predict(input=question))

 Hello! That's a great name for a cat. Black cats are often seen as symbols of mystery and magic. Did you know that in ancient Egypt, cats were revered as sacred animals and were often mummified after they died?


In [None]:
question = "Can you tell me what is the name of my cat?"
print(conversation_buffer.predict(input=question))

 You mentioned that your cat's name is Meowton. Is there anything else you'd like to know about your cat?


In [None]:
print(conversation_buffer.memory.buffer)

Human: I have a black cat called Meowton
AI:  Hello! That's a great name for a cat. Black cats are often seen as symbols of mystery and magic. Did you know that in ancient Egypt, cats were revered as sacred animals and were often mummified after they died?
Human: Can you tell me what is the name of my cat?
AI:  You mentioned that your cat's name is Meowton. Is there anything else you'd like to know about your cat?


#### Buffer Window memory

Tiene in memoria la cronologia della conversazione fino a k messaggi precedenti all'utimo.

In [None]:
memory_window = ConversationBufferWindowMemory(k=2)

conversation_window = ConversationChain(
    llm=llm,
    verbose=False,
    memory=memory_window
)

In [None]:
question = "I have a black cat called Meowton"
print(conversation_window.predict(input=question))

 Hello! That's a great name for a cat. Black cats are often seen as symbols of mystery and magic. Did you know that in ancient Egypt, cats were revered as sacred animals and were often mummified after they died?


In [None]:
question = "I have a white dog called Barkton"
print(conversation_window.predict(input=question))

 Oh, how lovely! White dogs are often seen as symbols of purity and innocence. Did you know that the ancient Romans believed that white dogs had the power to ward off evil spirits?


In [None]:
question = "I have a red fish called Nemo"
print(conversation_window.predict(input=question))

 That's a great name for a fish! Red fish are often seen as symbols of good luck and prosperity. Did you know that in ancient China, red fish were often kept in ponds and aquariums as a way to attract wealth and good fortune


In [None]:
question = "Do you remember the name of my cat, my dog and my fish?"
print(conversation_window.predict(input=question))

 Yes, you mentioned that your dog's name is Barkton, your fish's name is Nemo, and you did not mention the name of your cat.


Non si ricorda il nome del gatto perchè abbiamo impostato come finestra di memoria k=2, quindi ricorderà le informazioni fino a due messaggi precedenti a quello appena inoltrato.

In [None]:
print(conversation_window.memory.buffer)

Human: I have a red fish called Nemo
AI:  That's a great name for a fish! Red fish are often seen as symbols of good luck and prosperity. Did you know that in ancient China, red fish were often kept in ponds and aquariums as a way to attract wealth and good fortune
Human: Do you remember the name of my cat, my dog and my fish?
AI:  Yes, you mentioned that your dog's name is Barkton, your fish's name is Nemo, and you did not mention the name of your cat.


#### Buffer Summary memory

Tiene in memoria la cronologia della conversazione ed è in grado di farne un riassunto.

In [None]:
memory_summary = ConversationSummaryBufferMemory(llm=llm)

conversation_summary = ConversationChain(
    llm=llm,
    memory=memory_summary,
    verbose=False
)

In [None]:
question = "I have a black cat called Meowton"
print(conversation_summary.predict(input=question))

 Hello! That's a great name for a cat. Black cats are often seen as symbols of mystery and magic. Did you know that in ancient Egypt, cats were revered as sacred animals and were often mummified after they died?


In [None]:
question = "I have a white dog called Barkton"
print(conversation_summary.predict(input=question))

 Oh, how lovely! White dogs are often seen as symbols of purity and innocence. Did you know that the breed of dog most commonly associated with the color white is the Samoyed, which originated in Siberia and was bred to withstand harsh Arctic conditions


In [None]:
print(conversation_summary.memory.buffer)

[HumanMessage(content='I have a black cat called Meowton'), AIMessage(content=" Hello! That's a great name for a cat. Black cats are often seen as symbols of mystery and magic. Did you know that in ancient Egypt, cats were revered as sacred animals and were often mummified after they died?"), HumanMessage(content='I have a white dog called Barkton'), AIMessage(content=' Oh, how lovely! White dogs are often seen as symbols of purity and innocence. Did you know that the breed of dog most commonly associated with the color white is the Samoyed, which originated in Siberia and was bred to withstand harsh Arctic conditions')]


Otteniamo ora un riassunto della conversazione

In [None]:
messages = memory_summary.chat_memory.messages
previous_summary = ""

# Riassume la conversazione
memory_summary.predict_new_summary(messages, previous_summary)

' The human introduces their pets, a black cat named Meowton and a white dog named Barkton. The AI comments on the symbolism and historical significance associated with black cats and white dogs, respectively.'

### Test

In [None]:
memory = ConversationSummaryBufferMemory(llm=llm)

conversation = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=False
)

In [None]:
print("ST-bot\n[Say 'bye' and confirm to quit the conversation]\n\n- - -")
question = ""

while True:
    question = input("\nYou:\n> ")
    if question.lower() == "bye":
        print("\nChat terminated.")
        break

    answer = conversation.predict(input=question)
    print(f"\nAssistant:\n>{answer}")

print("\nRiassunto della conversazione:")
memory.predict_new_summary(memory.chat_memory.messages, "")

# How many eggs do I need for a carbonara for 5 people?
# With carbonara, what's better between red and white wine?
# How long does it take to cook carbonara generally?
# Bye

ST-bot
[Say 'bye' and confirm to quit the conversation]

- - -

You:
> How many eggs do I need for a carbonara for 5 people?

Assistant:
> A traditional carbonara recipe usually doesn't include eggs. However, there are variations of carbonara that do include eggs. One common variation is to add one egg yolk per person, so for 5 people, you would need 5 egg yolks

You:
> With carbonara, what's better between red and white wine?

Assistant:
> White wine is typically used in carbonara, as it adds a delicate flavor that complements the richness of the dish. Red wine can also be used, but it will give the carbonara a more robust flavor. Ultimately, the choice between red and white

You:
> How long does it take to cook carbonara generally?

Assistant:
> Cooking time for carbonara can vary depending on the recipe and personal preferences. Generally, it takes around 15-20 minutes to cook carbonara. This includes the time to boil the pasta, prepare the sauce, and combine everything together.

Y

" The human asks how many eggs they need for a carbonara for 5 people. The AI says that a traditional carbonara recipe doesn't include eggs, but there are variations that do. One common variation is to add one egg yolk per person"

## RAG

### Configurazione

In [None]:
!pip install langchain_community tiktoken langchain-openai langchainhub chromadb langchain langsmith pypdf sentence-transformers

In [22]:
import bs4
import chromadb.utils.embedding_functions as embedding_functions

from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_community.document_loaders import DirectoryLoader
from langchain.document_loaders import PyPDFLoader
from langchain.embeddings import SentenceTransformerEmbeddings

In [23]:
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_API_KEY'] = 'ls__b3a86f568a6948868904c54eda03574a'

### Breve test RAG

In [30]:
# Caricamento e lettura di tutti i documenti .pdf
loader = DirectoryLoader(f'', glob="./*.pdf", loader_cls=PyPDFLoader)
documents = loader.load()

In [31]:
# Creazione degli split all'interno del testo con dimensione del chunk e sovrapposizione fissati
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(documents)

In [32]:
# Utilizzo di un modello di embedding free esistente per la generazione di embedding
embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = Chroma.from_documents(documents=splits, embedding=embeddings)
retriever = vectorstore.as_retriever()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [33]:
# Test per vedere se trova informazioni rilevanti rispetto alle risorse (.pdf)
query = "Why Community Involvement Matters?"
matching_docs = vectorstore.similarity_search(query)

matching_docs[0]

Document(page_content='their friendship a shining beacon of warmth and joy for all who knew them.', metadata={'page': 1, 'source': 'billy.pdf'})

In [None]:
# Modello di prompt predefinito
prompt = hub.pull("rlm/rag-prompt")

In [24]:
# Funzione di formattazione del testo
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

Testiamo una chain che recupera le informazioni richieste tra le risorse esistenti e le rielabora

In [None]:
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

rag_chain.invoke("Why Community Involvement Matters?")

' Community involvement is crucial because it allows individuals to pool their resources, knowledge, and creativity to create a positive impact on the environment. By working together, communities can organize clean-up drives, tree planting activities, and educational programs that not only beautify'

## Chain con memoria (Q&A + RAG)

### Test con risorse fissate

In [25]:
from langchain.prompts.prompt import PromptTemplate

In [27]:
resources = "my nose is red"

In [28]:
template = """The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know or retrieves from the external resources.

Current conversation:
{history}{resources}
Human: {input}
AI Assistant:"""
PROMPT = PromptTemplate.from_template(template).partial(resources=resources)
memory = ConversationBufferMemory()

conversation = ConversationChain(
    prompt=PROMPT,
    llm=llm,
    verbose=False,
    memory=memory
)

In [None]:
answer = conversation.predict(input="I have a blue dog")
print(f"\nAssistant:\n>{answer}")


Assistant:
> Okay, so your dog is blue. Can you tell me more about your dog? What's its name and breed?


In [None]:
answer = conversation.predict(input="What's the color of my dog and my nose?")
print(f"\nAssistant:\n>{answer}")


Assistant:
> Based on the information you've provided, your dog is blue and your nose is red. Is there anything else you'd like to know about your dog or your nose?


### Test con risorse esterne

In [None]:
# Caricamento e lettura di tutti i documenti .pdf
loader = DirectoryLoader(f'', glob="./*.pdf", loader_cls=PyPDFLoader)
documents = loader.load()

# Creazione degli split all'interno del testo con dimensione del chunk e sovrapposizione fissati
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(documents)

# Utilizzo di un modello di embedding free esistente per la generazione di embedding
embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = Chroma.from_documents(documents=splits, embedding=embeddings)
retriever = vectorstore.as_retriever()

Creaimo una funzione che consenta la ricerca sia sui dati di addestramento che dalle risorse esterne.

In [49]:
def answer_with_resources(input, conversation, template, verbose):
    # Recupera le risorse
    resources = vectorstore.similarity_search(input)

    # Crea il prompt personalizzato per le risorse recuperate e assegnalo alla conversazione
    prompt = PromptTemplate.from_template(template).partial(resources=resources)
    conversation.prompt = prompt

    # Elabora la risposta
    answer = conversation.predict(input=input)

    # Stampa le risorse trovate se richiesto
    if(verbose):
        print(resources)

    return answer

Creiamo template, memoria e chain per il nostro test.

In [106]:
template = """The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know or retrieves from the external resources.

Current conversation:
{history}{resources}
Human: {input}
AI Assistant:"""
memory = ConversationBufferMemory()

conversation = ConversationChain(
    llm=llm,
    verbose=False,
    memory=memory
)

Facciamo una domanda generica per vedere se il modello è in grado di rispondere con i suoi dati di addestramento:

In [None]:
answer = answer_with_resources(
    input="what can you tell me about dogs?",
    conversation=conversation,
    template=template,
    verbose=False
)

print(f"\nAssistant:\n>{answer}")


Assistant:
> Dogs, scientifically known as Canis lupus familiaris, are domesticated mammals that have been a part of human society for thousands of years. They are believed to have descended from wolves and have evolved into various breeds with distinct physical and behavioral characteristics. Dogs are


Ora chiediamogli qualcosa di specifico sui dati esterni (.pdf):

In [None]:
answer = answer_with_resources(
    input="What are the names of the blue dog and the red cat?",
    conversation=conversation,
    template=template,
    verbose=True
)

print(f"\nAssistant:\n>{answer}")

[Document(page_content="Molly the blue dogOnce upon a time, in a quaint little village nestled between rolling hills and lush forests, there lived a peculiar dog named Molly. What made Molly so unique was her striking blue fur, a rare sight that never failed to captivate anyone who laid eyes on her.Molly was not just any ordinary dog; she possessed a keen intelligence and a gentle soul that endeared her to all who knew her. Despite her unusual appearance, she was beloved by the villagers, who often sought her company during their daily walks through the countryside.One sunny afternoon, as Molly roamed the village streets, she stumbled upon a group of children playing in the town square. Intrigued by her azure coat, the children ﬂocked around her, their laughter ﬁlling the air as they showered her with affection.Among the children was a little girl named Emily, whose eyes sparkled with wonder at the sight of Molly's radiant blue fur. Instantly drawn to the enchanting dog, Emily approach

## Recupero risorse da internet

### Configurazione

In [None]:
!pip install pyduckduckgosearch
!pip install -U duckduckgo-search

In [19]:
from websearch import search
from langchain.agents import AgentType,initialize_agent,load_tools

### Cerca nei dati di addestramento, nelle risorse e suggerisci da internet

Creiamo una funzione che generi le risposte cercando tra i dati di addestramento + risorse esterne e fornisca suggerimenti utili rispetto alla richiesta dalle risorse online.

In [48]:
def answer_with_resourcers_and_suggest(input, conversation, template, verbose):
    partial_answer = answer_with_resources(input, conversation, template, verbose)

    tools = load_tools(["ddg-search"], llm=conversation.llm)
    agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION)
    search_result = agent.run(input)

    return partial_answer + "\n\nPotrebbe interessarti (da DuckDuckGo):\n" + search_result

In [288]:
answer = answer_with_resourcers_and_suggest(
    input="What are the names of the blue dog and the red cat?",
    conversation=conversation,
    template=template,
    verbose=False
)

print(f"\nAssistant:\n>{answer}")


Assistant:
> The blue dog's name is Molly, and the red cat's name is Billy.

Potrebbe interessarti (da DuckDuckGo):
The blue dog can be an Australian Shepherd or a Blue Heeler or Kerry Blue Terrier. The


### Cerca nei dati di addestramento e nelle risorse. Se nelle risorse non c'è corrispondenza con la domanda dell'utente, cerca su internet

In [76]:
import warnings
warnings.filterwarnings("ignore", category=UserWarning)

Creaimo una funzione che verifichi la corripondenza tra la domanda dell'utente e i documenti. Il grado di similarità è regolato tramite il parametro ***relevance_treshold***.

In [111]:
def check_similarity_with_treshold(input, relevance_treshold, verbose):

    texts = []
    max_score = 0

    # Misurazione della similarità tra richiesta e risorse
    for doc in documents:
        relevance = vectorstore.similarity_search_with_relevance_scores(input)[0]
        text = relevance[0].page_content
        score = relevance[1]

        # Salva se rilevante
        if score > relevance_treshold:
            texts.append(text)
            max_score = score

    if verbose:
        if max_score == 0:
            print("Couldn't retrieve relevant documents")
        else:
            print(f"Document retrieved with max relevance: {round(max_score, 3)}")

    return texts, max_score

Creiamo la funzione che consenta di rispondere con i dati di addestramento + le risorse esterne, se queste ultime sono disponibili, altrimenti con i dati di addestramento + le risorse online.

In [97]:
def answer_with_resourcers_or_online(input, conversation, template, relevance_treshold, verbose):
    # Recupera le risorse se rilevanti
    texts, score = check_similarity_with_treshold(input, relevance_treshold, verbose)

    # Se non sono state recuperate informazioni rilevanti dalle risorse esterne, recuperale da internet
    if score == 0:
        tools = load_tools(["ddg-search"], llm=conversation.llm)
        agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION)
        search_result = agent.run(input)

        if verbose:
            print(f"From internet: {search_result}")

        prompt = PromptTemplate.from_template(template).partial(resources=search_result)
        conversation.prompt = prompt
        answer = conversation.predict(input=input)

    # Se sono state recuperate informazioni rilevanti dalle risorse esterne, modella il prompt con le risorse
    else:
        prompt = PromptTemplate.from_template(template).partial(resources=texts)
        conversation.prompt = prompt
        answer = answer_with_resources(input, conversation, template, False)

    return answer

Proviamo a chiedere al modello se conosce LangChain. Questo, non essendo aggiornato quotidianamente, dai soli dati di addestramento non sarà in grado di generare una risposta utile. Andrà a cercare tra le risorse esterne, ma anche qua non troverà nulla. Quindi, cercando tra le risorse online, sarà in grado di trovare qualcosa di pertinente con la richiesta e creare un output adeguato.

In [112]:
answer = answer_with_resourcers_or_online(
    input="Do you know Langchain?",
    conversation=conversation,
    template=template,
    relevance_treshold=0.2,
    verbose=True
)

print(f"\nAssistant:\n>{answer}")

Couldn't retrieve relevant documents
From internet: LangChain is an open-source Python library for building LLM-powered applications.

Assistant:
> Yes, I know about LangChain. LangChain is an open-source Python library that enables developers to build language model-powered applications. It provides a comprehensive set of tools and functionalities for training, fine-tuning, and deploying language models. Lang


Ora proviamo a chiedergli qualcosa che sappiamo trovarsi tra le risorse esterne. Per questo il modello, trovando ciò che cerca tra i pdf, potrà rispondere senza andare a interrogare le risorse online.

In [110]:
answer = answer_with_resourcers_or_online(
    input="Do you know the story of the red cat?",
    conversation=conversation,
    template=template,
    relevance_treshold=0.2,
    verbose=True
)

print(f"\nAssistant:\n>{answer}")

Document retrieved with max relevance: 0.319

Assistant:
> Yes, I know the story of the red cat named Billy. Billy lived in a cozy neighborhood filled with quaint streets and flower-filled gardens. Despite his fiery red fur, Billy had a gentle heart and spent his days roaming the cobblestone alleys,
