# Not a German Lawyer 

A Jupyter notebook to help navigate the residency law in Germany. This project uses local embeddings and models to do RAG (Retreival Augmented Generation) over the German residency law. This means that the model is living locally on the computer, the embeddings are done locally, and the querying is done locally.

You can ask questions like:

* What are the requirements for a Blue Card?
* What are the requirements for a student visa?
* What are the requirements for a work visa?

I've taken the Aufenthaltsgesetz and Aufenthaltsverordnung from Gesetze im Internet as XML and using the Unstructured XML loader, I've loaded them in as a LangChain document.

### Still to do

* play with different ways of splitting the document into sections, different document loaders
* use the HTML loader to load the document and split by headings
* create an array of questions to test the model

## Project Steps

1. Load the XML files into a LangChain document
2. Split the document into sections
3. Embeddings
4. Vector Store
5. LLM Setup (Prompt Template & Querying)

### 1: Use LangChain Unstructured XML Loader to Load in the German Residence Law

In [None]:
from langchain.document_loaders import UnstructuredXMLLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter
from langchain.embeddings import OllamaEmbeddings, OpenAIEmbeddings, HuggingFaceInferenceAPIEmbeddings
from langchain.vectorstores import Chroma, Qdrant
from langchain.chains import RetrievalQA, ConversationalRetrievalChain   
from langchain.prompts import PromptTemplate 
from langchain.llms import Ollama, OpenAI

import time

### 2: Load & Split the Text

In [None]:
# German Residence Law
# source: https://www.gesetze-im-internet.de/aufenthv/BJNR294510004.html
file = "german-law/laws/Aufenthaltsverordnung/BJNR294510004.xml"

aufenthg = "german-law/laws/Aufenthalt-BJNR195010004.xml"

# # load German Residence Law XML file with UnstructuredXMLLoader , mode=elements
loader = UnstructuredXMLLoader(file_path = file)
old_docs = loader.load()

In [None]:
len(old_docs)

In [None]:
files = [file, aufenthg]

In [None]:
# Load multiple files into the document 
docs = []
for file in files: 
    # load German Residence Law XML file with UnstructuredXMLLoader
    loader = UnstructuredXMLLoader(file_path = file)
    docs += loader.load()

In [None]:
type(docs[1])

**Recursive Character Text Splitter**

Use recursive character text splitter to split texts into chunks of 1000

In [None]:
# Try with the RecursiveCharacterTextSplitter

r_text_splitter = RecursiveCharacterTextSplitter(chunk_size = 10000, chunk_overlap  = 1000)
r_texts = r_text_splitter.split_documents(docs)
# r_texts_old = r_text_splitter.create_documents([docs[0].page_content])


In [None]:
r_texts[0]

In [None]:
type(r_texts_old)

In [None]:
print(type(r_texts))
print(type(r_texts[0]))

In [None]:
# Try with the CharacterTextSplitter

c_text_splitter = CharacterTextSplitter(chunk_size = 1500, chunk_overlap  = 150)
c_texts = c_text_splitter.create_documents([docs[0].page_content])


In [None]:
# get HF_API_KEY from os 
import os
HF_API_KEY = os.getenv("HF_API_KEY")


### 3: Create Vectorstore

In [None]:
# embedding mdoel BAAI/bge-small-en-v1.5 

embeddings = HuggingFaceInferenceAPIEmbeddings(
    api_key=HF_API_KEY, model_name="sentence-transformers/all-MiniLM-l6-v2"
)


In [None]:
# OpenAI Embeddings, Chroma as vectorstore
openai_vectorstore = Chroma.from_documents(documents = r_texts, embedding=embeddings)
retreiver = openai_vectorstore.as_retriever()

### Local Embeddings
(Note: takes about 9 minutes / document)

In [None]:
# Ollama Embeddings (openhermes2.5), Qdrant as vectorstore 
# Note: (Chroma does not work, as Ollama creates 4096-dimensional vectors and Chroma accepts 1536-dimensional vectors only)

# loader = TextLoader("/Users/ingrid/Developer/GitHub/lawyer/README.md")
# docs = loader.load()

# test_text_splitter = CharacterTextSplitter(chunk_size = 1500, chunk_overlap  = 150)
# test_texts = test_text_splitter.create_documents([docs[0].page_content])

# REMEMBER: set the documents= to the docs that you want to embed (this function is expensive)

ollama_vectorstore = Qdrant.from_documents(
    documents=r_texts, 
    embedding=OllamaEmbeddings(
        model="llama2",
        show_progress=True,
        ),
    location=":memory:",  # Local mode with in-memory storage only
    collection_name="texts",
)


In [None]:
ollama_retreiver = ollama_vectorstore.as_retriever()

### 4: LLM Setup

**LLM Setup**

In [None]:
# Temporarily set the model to 'mistral'
# llm = Ollama(model='llama2')

# switch it to use LM Studio
llm = OpenAI(base_url="http://localhost:1234/v1")

**Retrieval QA Prompt**

#### Let the Not a Lawyer be a Not Lawyer

Improve prompt: 
* check that returned snippets are relevant to answering the question
* instruct the model on the formatting of the result.

In [None]:
# define a function which takes as inputs the llm, embeddings, and outputs the result (printed)
# ideally log as tags which llm and embeddings was used, allow me to categorize outputs as (good, not good, or comment in some ways)
import time 
def test_llm(vectorstore, model, question):

    start = time.time()

    # build prompt 
    template = """
        You are helpful question-answering AI assistant. You will be provided a ### Question ### and some $$$ legal texts $$$ that may be relevant. 
        
        Start your response by providing an overview of the Question provided by the user. 
        

        Below the answer, list out all the referenced sources (i.e. legal paragraphs backing up your claims)
        
        ### Question: {question} ###

    
        $$$ Law: {context} $$$

        Let's think step by step. 


        Helpful Answer with Sources:

        """

    # create prompt template
    QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

    # set qa chain
    qa_chain_mr = RetrievalQA.from_chain_type(
        Ollama(model=model), 
        retriever = vectorstore.as_retriever(),
        chain_type="stuff", # options are "stuff" "refine" or "map_reduce"
        chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
    )

    # get the result
    result = qa_chain_mr({"query": question})

    # print the result
    print(result["result"])

    end = time.time()
    elapsed_time = end - start
    print("The function took", elapsed_time, "seconds to run.")


## Use Ollama to Install the local Models You Want to Use

Run the following commands in your terminal to install the models you want to use:

`ollama run llama2`

`ollama run mistral`

In [None]:
frage = "How can I move to germany to study? I'm from the United states. I have applied, but don't know if i'll be accepted to university"
test_llm(ollama_vectorstore, 'llama2', frage)

In [None]:
frage = "How can I move to germany? I'm from the United states."
test_llm(ollama_vectorstore, 'mistral', frage)

In [None]:
frage = "How can I move to germany? I'm from the United states."
test_llm(ollama_vectorstore, 'mistral', frage)

In [None]:
frage = "I just got a job in Germany paying me 80,000 euros annually. What are my options for a residence permit?"
test_llm(ollama_vectorstore, 'mistral', frage)

In [None]:
frage = "What are the requirements for a Blue Card?"
test_llm(openai_vectorstore, 'llama2', frage)

In [None]:
frage = "What are the requirements for a Blue Card?"
test_llm(openai_vectorstore, 'openhermes2.5-mistral:7b-q5_K_M', frage)

In [None]:
frage = "What are the requirements for a Blue Card?"
test_llm(openai_vectorstore, 'mistral', frage)

In [None]:
frage = "What are the requirements for a Blue Card?"
test_llm(ollama_vectorstore, 'mistral', frage)

In [None]:
frage = "What are the requirements for a Blue Card?"
test_llm(ollama_vectorstore, 'llama2', frage)

In [None]:
frage = "What are the requirements for a Blue Card?"
test_llm(ollama_vectorstore, 'openhermes2.5-mistral:7b-q5_K_M', frage)

In [None]:
frage = "How can a resident of Germany obtain citizenship?"
test_llm(ollama_vectorstore, 'mistral', frage)

In [None]:
test_llm(ollama_vectorstore, 'llama2', frage)

### Findings:

Recursive Text Splitter
 * mistral: 19.5s
 * llama2: 26.2s

 Text splitter
 * mistral: 26.5s
 * llama2: 79.7s

 Conclusion: mistral is faster, recursive character text splitter is faster. Why? No idea.

-----

### Set up memory

In [None]:
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)

**Define a function to run the conversational retrieval chain (including memory)**

In [None]:
def test_llm_inkl_memory(vectorstore, model, question):

    retriever=vectorstore.as_retriever()
    qa = ConversationalRetrievalChain.from_llm(
        Ollama(model=model),
        retriever=retriever,
        memory=memory
    )
    result = qa({"question": question}) 
    print(result['answer'])

In [None]:
qyery = "can i travel outside the EU with a blue card valid for less than 6 months?"

test_llm_inkl_memory(ollama_vectorstore, 'mistral', qyery)

In [None]:
# Get the answer
question = "How do I get a bluecard?"
test_llm_inkl_memory(openai_vectorstore, 'mistral', question)

In [None]:

question = "I don't already have a bluecard, but I just got a job offer for 100k. Can I get a bluecard?"
test_llm_inkl_memory(openai_vectorstore, 'mistral', question)

In [None]:

question = "How do i get one if i haven't had one before?"
test_llm_inkl_memory(openai_vectorstore, 'mistral', question)