<a href="https://colab.research.google.com/github/nicolinux72/bootstrap-llama-assistant-on-personal-data/blob/main/langchain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Bootstrap a LLaMa assistant on personal data
*How to quickly fine tuning a LLM so that it responds about private data also.*

You could find full article at [medium](https://medium.com/@nicolasanti_43152/bootstrap-a-llama-assistant-on-personal-data-2-2-16062fa5aa6d)

We can then proceed to install libraries using the usual pip command.

In [None]:
!pip install -qU transformers accelerate einops langchain xformers bitsandbytes faiss-gpu sentence_transformers pypdf

We can conclude the application setup by logging in.

In [None]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|
    
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) n
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


To load and execute the model, we will once again use the huggingface libraries.

In [None]:
from torch import cuda, bfloat16
from transformers import AutoTokenizer,AutoModelForCausalLM,AutoConfig,BitsAndBytesConfig,pipeline


model_name = 'meta-llama/Llama-2-7b-chat-hf'

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    config=AutoConfig.from_pretrained(model_name),
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type='nf4',
        bnb_4bit_use_double_quant=True,
        bnb_4bit_compute_dtype=bfloat16),
    device_map='cuda:0',
)
# enable model inference
model.eval()

tokenizer = AutoTokenizer.from_pretrained(model_name)

pipeline = pipeline(
    task='text-generation',
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,  # needed by langchain
    # model params
    max_new_tokens=512,
    temperature=0.1,  # creativity of responses: 0.0 none ->  1.0 max
    repetition_penalty=1.1  # to avoid repeating output
)


ModuleNotFoundError: ignored

Here is how to use the huggingface runtime we have previously configured within LangChain:

In [None]:
from langchain.llms import HuggingFacePipeline

llm = HuggingFacePipeline(pipeline=pipeline)

#to query the model simply use langchain abstraction, as below
#llm(prompt="Rainbow colors are ...")

Upload your pdf and load it inside langchain.

In [None]:
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader('./AZIENDA_IT_0120_low.pdf')
documents = loader.load_and_split()
#to see the loaded content, uncomment the follow line
#documents[1]


It's possible to load document from web, also.

In [None]:
from langchain.document_loaders import WebBaseLoader

loader = WebBaseLoader(["https://www.espn.com/","https://www.brt.it/it/sostenibilita/"])
documents = loader.load() + documents

#to see the loaded content, uncomment the follow line
#documents[1]

This text splitter is the recommended one for generic text. It is parameterized by a list of characters. It tries to split on them in order until the chunks are small enough. The default list is ["\n\n", "\n", " ", ""]. This has the effect of trying to keep all paragraphs (and then sentences, and then words) together as long as possible, as those would generically seem to be the strongest semantically related pieces of text.

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
texts = text_splitter.split_documents(documents)

#for text in texts:
#  print(text)

FAIS is a fast, in memory vector db from Meta.

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': False}
embeddings = HuggingFaceEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)
# storing embeddings in the vector store
vectorstore = FAISS.from_documents(texts, embeddings)

Now, we can build and use the chain.

In [None]:
from langchain.chains import ConversationalRetrievalChain
from langchain import PromptTemplate

tt = PromptTemplate.from_template("""[INST] <<SYS>>Act like an italian so answer in italian only, be very clear and detailed<</SYS>>
{content}[/INST]""")

chain = ConversationalRetrievalChain.from_llm(llm, vectorstore.as_retriever(), return_source_documents=True)

chat_history = []

query = "Le destinazioni internazionali servite sono..."
result = chain({"question": tt.format(content=query), "chat_history": chat_history})
print(result['answer'])

print("----------------------")

chat_history = [(query, result["answer"])]
query = "Cos'è l'avviso di tentata consegna di BRT?"
result = chain({"question": tt.format(content=query), "chat_history": chat_history})
print(result['answer'])


Le destinazioni internazionali servite sono numerose e diverse, ma qui di seguito vi elenchio alcune delle principali:

* Europa: Bologna, Crespellano, Milano, Torino, Verona, Guidonia, Rovigo, Torino, Valdarno, Verona.
* Francia: Parigi, Marsiglia, Nizza, Lione, Strasburgo.
* Spagna: Madrid, Barcellona, Saragozza, Valencia, Bilbao, Barcellona.
* Inghilterra: Londra, Manchester, Birmingham, Liverpool, Glasgow.
* Germania: Berlino, Monaco di Baviera, Stoccarda, Amburgo, Colonia, Düsseldorf.
* Paesi Bassi: Amsterdam, Rotterdam, Utrecht, Eindhoven, Groninga.
* Svizzera: Zurigo, Ginevra, Losanna, Berna, Winterthur.
* Belgio: Bruxelles, Anversa, Gand, Liegi, Charleroi.
* Austria: Vienna, Graz, Salisburgh, Innsbruck, Linz.
* Polonia: Varsavia, Cracovia, Poznań, Łódź, Kraków.
* Rep. Ceca: Praga, Brno, Plzeň, Ostrava, Pilsen.
* Slovacchia: Bratislava, Košice, Prešov, Trnava.
* Ungheria: Budapest, Debrecen, Miskolc, Pécs, Szeged.
* Grecia: Atene, Tessalonica, Salonicco, Patrasso, Rodi.
* Turch