# Not a Lawyer | Local

The Not a Lawyer project is a project to create an AI assistant chatbot which has read German laws. It is a RAG flow, meaning that it first creates a database of legal document cunks from the provided legal documents. Then, using a semantic search, it finds the most relevant chunks to the user's question. Finally, it passes in the relevant chunks along with a user's question into a text generation model to generate an answer to the user's question.

This notebook contains the local version of the chatbot. This means that the vector database is stored locally and the text generation model is stored locally. Only the embedding model is stored remotely: we're using the `HuggingFaceInferenceAPIEmbeddings` class from the `langchain` library to access the HuggingFace Hub API.

## Setup

* Embedding Model: Hugging Face sentence-transformers/all-MiniLM-l6-v2 -> https://python.langchain.com/docs/integrations/text_embedding/huggingfacehub 
* Vectorstore: Chroma 
* LLM: Ollama (mixtral:8x7b-instruct-v0.1-q3_K_M)


### Step 1: Import Libraries

In [None]:
! pip install langchain chromadb

In [None]:
import os
from langchain.embeddings import HuggingFaceInferenceAPIEmbeddings
from langchain.llms import Ollama
from langchain.text_splitter import RecursiveCharacterTextSplitter, HTMLHeaderTextSplitter
from langchain.vectorstores import Chroma

from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate 

from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableParallel, RunnablePassthrough


### Step 2: Define llm and embedding models

FIRST: go to your terminal, and type `ollama run mistral`

In [None]:
# FOR THIS TO WORK, YOU NEED TO:
# 1. Download ollama from ollama.ai 
# 2. Download the mistral model
# 3. Run mistral with the following terminal command: ollama run mistral
llm = Ollama(model='mixtral:8x7b-instruct-v0.1-q3_K_M')

In [None]:
# Note: This is not 100% local, but you can modify this code to use a local embedding model
HF_API_KEY = os.environ.get("HF_API_KEY")

hf_embeddings = HuggingFaceInferenceAPIEmbeddings(
    api_key=HF_API_KEY, model_name="sentence-transformers/all-MiniLM-l6-v2"
)

# set huggingface embeddings 
embedding = hf_embeddings

### Step 3: Process Data & Set up Vector Database with Chroma

#### Step 3.1: Define the data (URLs in this case) for the vector database

In [None]:
# German Residence Laws

aufentv =  "https://www.gesetze-im-internet.de/aufenthv/BJNR294510004.html"
aufenthg = "https://www.gesetze-im-internet.de/aufenthg_2004/BJNR195010004.html"
urls = [aufentv, aufenthg]

#### Step 3.2: Split by HTML Headers
*Note:* This is one of many ways to skin this cat. You could also use different types of splitters, e.g. by paragraphs, by sentences, by words, etc.

As part of the evaluation, this needs to be tested.

In [None]:
headers_to_split_on = [
    ("h1", "Header 1"),
    ("h2", "Header 2"),
    ("h3", "Header 3"),
]

html_splitter = HTMLHeaderTextSplitter(headers_to_split_on=headers_to_split_on)

docs = []

for url in urls:
    html_header_splits = html_splitter.split_text_from_url(url)
    docs += html_header_splits


chunk_size = 1000
chunk_overlap = 200
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=chunk_size, chunk_overlap=chunk_overlap
)

# Split
splits = text_splitter.split_documents(html_header_splits)


In [None]:
len(splits)

#### Step 3.3: Create a vector database with Chroma
This takes ~40 seconds.

In [None]:
# This takes about 40 seconds to run
vectorstore = Chroma.from_documents(
    documents = splits, 
    embedding=embedding
    )

### Step 4: Set up the Prompt
I'll use a variety of Propmts to see what works best.

In [None]:
template = """
        You are a wonderful, careful, and professional question-answering AI assistant knowledgeable in reading German law and explaining it to non-legal people. 

        You will be provided with a question and some legal context.        
        
        Please answer the question to the best of your ability using only the provided legal texts.

        Below the answer, please list out all the referenced sources (i.e. legal paragraphs backing up your claims)

        Let's think step by step. Here is the question, and here is the law. What is the answer?

        ---- Start User Question ----
        Question: {question}
        ---- End User Question ----

        ---- Start Law Context ----
        Law: {context}
        ---- End Law Context ----

        If you can't find the answer in the texts provided, or if there are no texts provided, say only: "I'm sorry, but I don't know the answer to this question."

        Helpful Answer with Sources:

        """
prompt = PromptTemplate.from_template(template)


In [None]:
# Set the retriever to use the vectorstore

retriever = vectorstore.as_retriever()

In [None]:
# LangSmith Tags
tags = ["plain","runnables","htmlheadersplitter", "chroma", "ollama:mixtral:8x7b-instruct-v0.1-q3_K_M", "hf_embeddings"]

In [None]:
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt 
    | llm
    | StrOutputParser()    
).with_config({"tags": tags})

In [None]:
chain = (
    RunnableParallel({"context": retriever, "question": RunnablePassthrough()})
    | prompt
    | llm
    | StrOutputParser()
).with_config({"tags": tags})

In [None]:
question = "What is the maximum duration of a residence permit?"
rag_chain.invoke(question)
chain.invoke(question)

#### Get the Response from the Prompt

In [None]:

response = chain.invoke("How do I get a blue card?")

In [None]:
# This displays it nicely
print(response)

-----

#### Step 4.1: MultiQueryRetreiver 

In [None]:
qa_chain_mr = RetrievalQA.from_chain_type(
    llm=llm, 
    retriever = MultiQueryRetriever.from_llm(retriever=vectorstore.as_retriever(), llm=llm),
    chain_type="stuff", # options are "stuff" "refine" or "map_reduce"
    chain_type_kwargs={"prompt": prompt}
)