### LangChain local LLM RAG example
Utilising LangChain v0.1

This notebook demonstrates the use of LangChain for Retrieval Augmented Generation in Linux with Nvidia's CUDA. LLMs are run using Ollama.

Models tested:
- Llama 2
- Mistral 7B
- Mixtral 8x7B
- Neural Chat 7B
- Orca 2
- Phi-2
- Solar 10.7B
- Yi 34B


See the [README.md](README.md) file for help on how to setup your environment to run this.

In [1]:
# Select your model here, put the name of the model in the ollama_model_name variable
# Ensure you have pulled them or run them so Ollama has downloaded them and can load them (which it will do automatically)

# Ollama installation (if you haven't done it yet): $ curl https://ollama.ai/install.sh | sh
# Models need to be running in Ollama for LangChain to use them, to test if it can be run: $ ollama run mistral:7b-instruct-q6_K

ollama_model_name = "orca2:13b-q5_K_S"
# "llama2:7b-chat-q6_K"
# "mistral:7b-instruct-q6_K"
# "mixtral:8x7b-instruct-v0.1-q4_K_M"
# "neural-chat:7b-v3.3-q6_K"
# "orca2:13b-q5_K_S"
# "phi" or try "phi:chat"
# "solar:10.7b-instruct-v1-q5_K_M"
# Can't run "yi:34b-chat-q3_K_M" or "yi:34b-chat-q4_K_M" - never stopped with inference

In [2]:
# Load the LLM with Ollama, setting the temperature low so it's not too creative

from langchain_community.llms import Ollama
llm = Ollama(model=ollama_model_name, temperature=0.1)

In [3]:
# Quick test of the LLM with a general question before we start doing RAG
llm.invoke("why is the sky blue?")

# Note: This line would not complete for Yi-34B - need to work out why inferencing never finishes (works fine when running with the same prompt in ollama.)

'The sky appears blue because of a process called Rayleigh scattering, which occurs when sunlight interacts with the molecules of air and other gases in the atmosphere. Sunlight is composed of different colors of light, which have different wavelengths and frequencies. When sunlight enters the atmosphere, it encounters more air molecules and smaller particles than when it reaches outer space. These molecules and particles scatter the light in all directions, but they scatter it more efficiently at shorter wavelengths, such as blue and violet. This means that these colors are scattered more widely across the sky, making it appear blue to our eyes. However, sunlight also contains red and orange light, which have longer wavelengths and are scattered less by the atmosphere. These colors reach us from the horizon or from objects that reflect sunlight, such as the moon or clouds, making them appear reddish or orange. This is why the sky looks different at sunrise and sunset, when the sun is 

In [4]:
# Embeddings will be based on the Ollama loaded model

from langchain_community.embeddings import OllamaEmbeddings

embeddings = OllamaEmbeddings(model=ollama_model_name)

In [5]:
from langchain_community.document_loaders import DirectoryLoader

loader = DirectoryLoader('Data', glob="**/*.docx")

In [6]:
# Load documents

docs = loader.load()

In [7]:
# Ensure we have the right number of Word documents loaded

len(docs)

3

In [8]:
# Split them up into chunks using a Text Splitter

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)

In [9]:
# Create the embeddings from the chunks

from langchain_community.vectorstores import FAISS

vector = FAISS.from_documents(documents, embeddings)

In [10]:
# Prepare the prompt and then the chain

from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

if ollama_model_name == "phi" or ollama_model_name == "phi:chat":
    # Phi-2 prompt is less flexible
    prompt_template = """Instruct: With this context\n\n{context}\n\nQuestion: {input}\nOutput:"""

else:
    prompt_template = """You are a story teller, answering questions in an excited, insightful, and empathetic way. Answer the question based only on the provided context:

    <context>
    {context}
    </context>

    Question: {input}"""

prompt = ChatPromptTemplate.from_template(prompt_template)
document_chain = create_stuff_documents_chain(llm, prompt)

In [11]:
# The LangChain chain
document_chain

RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), config={'run_name': 'format_inputs'})
| ChatPromptTemplate(input_variables=['context', 'input'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'input'], template='You are a story teller, answering questions in an excited, insightful, and empathetic way. Answer the question based only on the provided context:\n\n    <context>\n    {context}\n    </context>\n\n    Question: {input}'))])
| Ollama(model='orca2:13b-q5_K_S', temperature=0.1)
| StrOutputParser(), config={'run_name': 'stuff_documents_chain'})

In [12]:
# Create the retriever and LangChain retriever chain

from langchain.chains import create_retrieval_chain

retriever = vector.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, document_chain)

In [13]:
# Chain now incorporates the retriever
retrieval_chain

RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['FAISS', 'OllamaEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x7f310d909270>), config={'run_name': 'retrieve_documents'})
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), config={'run_name': 'format_inputs'})
            | ChatPromptTemplate(input_variables=['context', 'input'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'input'], template='You are a story teller, answering questions in an excited, insightful, and empathetic way. Answer the question based only on the provided context:\n\n    <context>\n    {context}\n    </context>\n\n    Question: {input}'))])
            | Ollama(model='orca2:13b-q5_K_S', temperature=0.1)
    

In [14]:
# Here are our test questions

TestQuestions = [
    "Summarise the story for me",
    "Who was the main protagonist?",
    "Did they have any children? If so, what were their names?",
    "Did anything eventful happen?",
    "Who are the main characters?",
    "What do you think happens next in the story?"
]

In [15]:
# If you want to see what's happening under the hood, set debug to True

from langchain.globals import set_debug

# set_debug(True)

In [16]:
qa_pairs = []

for index, question in enumerate(TestQuestions, start=1):
    question = question.strip() # Clean up

    print(f"\n{index}/{len(TestQuestions)}: {question}")

    response = retrieval_chain.invoke({"input": question})

    qa_pairs.append((question.strip(), response["answer"])) # Add to our output array

    # Uncomment the following line if you want to test just the first question
    break 


1/6: Summarise the story for me


In [17]:
# Print out the questions and answers

for index, (question, answer) in enumerate(qa_pairs, start=1):
    print(f"{index}/{len(qa_pairs)} {question}\n\n{answer}\n\n--------\n")

1/1 Summarise the story for me

Key points:
- Thundertooth is a talking dinosaur who travels through time and ends up in a futuristic city
- He meets Mayor Grace and the citizens, who help him find food without harming anyone
- He starts a toy factory with his family that produces magical widgets
- He saves the city from a meteor threat with the help of his family's unique talents

Summary:
Thundertooth is a dinosaur who can talk and travel through time. He arrives in a futuristic city where he meets Mayor Grace, who welcomes him and helps him find food that does not harm anyone. He starts a toy factory with his family that makes amazing widgets that delight the people. When a meteor threatens the city, Thundertooth and his family use their abilities to divert it and save the day. They become heroes and symbols of unity in the city.

--------

