How to use a local LLM, obtained from Hugging Face, in a Q&A scenario.

## Installs, Imports and Configuration

In [1]:
import json

In [2]:
from langchain.llms import OpenAI
from langchain.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain import HuggingFacePipeline, LLMChain, PromptTemplate

## QA Pipeline
See https://python.langchain.com/docs/use_cases/question_answering/

### Loading Documents
Use the `WebBaseLoader` to fetch websites.

In [3]:
pages = ["https://en.wikipedia.org/wiki/Vincent_van_Gogh",
         "https://medium.com/@anastasia_mze/vincent-van-gogh-from-a-bum-to-a-master-743aec22755e",
         "https://www.history.com/news/7-things-you-may-not-know-about-vincent-van-gogh",
         "https://arthinkal.com/vincent-van-gogh-a-brief-biography-1853-1890/",
         "https://www.metmuseum.org/toah/hd/gogh/hd_gogh.htm"]

loader = WebBaseLoader(pages)
docs = loader.load()

Display documents:

In [4]:
# print("Document metadata:\n")
# for i in range(len(docs)):
#     print(20*"-")
#     print(docs[i].metadata["source"])
#     print(docs[i].metadata["title"])
#     print(docs[i].metadata["language"])

### Split
Split documents into chunks with the `RecursiveCharacterTextSplitter`.

In [5]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 200)
splits = text_splitter.split_documents(docs)

Display info on splits:

In [6]:
# print("Split metadata:\n")
# print(f"Number of splits: {len(splits)}\n")
# for i in range(15,20):
#     print(20*"-")
#     print(splits[i].metadata, "\n")
#     print(splits[i].page_content)
    
#     print("\n")

### Embed in Vector Store
Embed the splits into a chroma vector DB using `HuggingFaceEmbeddings`.

In [7]:
hf_embeddings = HuggingFaceEmbeddings()
vectorstore = Chroma.from_documents(documents=splits, embedding=hf_embeddings)

Display embedding settings:

In [8]:
# display(hf_embeddings.schema())
# print(hf_embeddings.schema()["description"])
# display(hf_embeddings.schema()["properties"])

Query the DB:

In [9]:
question = "Where was Vincent van Gogh born?"

# qa_docs = vectorstore.similarity_search(question)
qa_docs = vectorstore.similarity_search_with_relevance_scores(question)

In [10]:
for i in range(len(qa_docs)):
    print("-"*40)
    print(qa_docs[i][0].metadata["title"])
    print("-"*40)
    print(qa_docs[i][0].page_content)
    print("\n")
    print(f"Similarity score: {qa_docs[i][1]}")
    print("\n")

----------------------------------------
Vincent van Gogh - Wikipedia
----------------------------------------
Vincent Willem van Gogh was born on 30 March 1853 in Groot-Zundert, in the predominantly Catholic province of North Brabant in the Netherlands.[24] He was the oldest surviving child of Theodorus van Gogh (1822–1885), a minister of the Dutch Reformed Church, and his wife, Anna Cornelia Carbentus (1819–1907). Van Gogh was given the name of his grandfather and of a brother stillborn exactly a year before his birth.[note 2] Vincent was a common name in the Van Gogh family. The name had been borne


Similarity score: 0.6875122015924975


----------------------------------------
Vincent van Gogh - A Brief Biography (1853-1890) - Arthinkal Magazine
----------------------------------------
Early Life and Education
Vincent van Gogh was born on 30th March 1853 in Groot-Zundert, a municipality and town in the south of the Netherlands.
Van Gogh was the oldest surviving child of Theodorus 

### Generate Response
see https://www.markhneedham.com/blog/2023/06/23/hugging-face-run-llm-model-locally-laptop/

Useful models on Hugging Face:
* google/flan-t5-small (https://huggingface.co/google/flan-t5-small)
* google/flan-t5-base (https://huggingface.co/google/flan-t5-base)

Path to Hugging Face models downloaded via `HuggingFacePipeline.from_model_id`: `<user>/.cache/huggingface/hub/`

First download a model from Hugging Face (depending on the model size this could take a while):

In [11]:
llm = HuggingFacePipeline.from_model_id(
    model_id="google/flan-t5-base",
    task="text2text-generation",
    model_kwargs={"temperature": 0, "max_length": 1000}
)

In [12]:
# display model properties (optional)
# dict(llm)

Ask a question:

In [13]:
# prepare LLM chain
qa_chain = RetrievalQA.from_chain_type(llm,retriever=vectorstore.as_retriever())

# use LLM chain to ask a question
qa_chain({"query": question})



{'query': 'Where was Vincent van Gogh born?', 'result': 'Groot-Zundert'}

## Playground
Play with different questions, test the LLM.

In [14]:
question = "Who was Vincent van Gogh?"
#question = "Who was van Gogh's best friend?"
#question = "When was van Gogh's born?"
#question = "When did van Gogh die?"
#question = "Who was Pablo Picasso?"

result = qa_chain({"query": question})
display(result)

Token indices sequence length is longer than the specified maximum sequence length for this model (538 > 512). Running this sequence through the model will result in indexing errors


{'query': 'Who was Vincent van Gogh?',
 'result': 'Dutch post-impressionist painter'}