# German Lawyer 

A Jupyter notebook to help navigate the residency law in Germany. This project uses local embeddings and models to do RAG (Retreival Augmented Generation) over the German residency law. This means that the model is living locally on the computer, the embeddings are done locally, and the querying is done locally.

You can ask questions like:

* What are the requirements for a Blue Card?
* What are the requirements for a student visa?
* What are the requirements for a work visa?

I've taken the Aufenthaltsgesetz and Aufenthaltsverordnung from Gesetze im Internet as XML and using the Unstructured XML loader, I've loaded them in as a LangChain document.

## Project Steps

1. Load the XML files into a LangChain document
2. Split the document into sections
3. Embeddings
4. Vector Store
5. LLM Setup (Prompt Template & Querying)

### 1: Use LangChain Unstructured XML Loader to Load in the German Residence Law

In [1]:
from langchain.document_loaders import UnstructuredXMLLoader, TextLoader

from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter

from langchain.embeddings import OllamaEmbeddings, OpenAIEmbeddings
from langchain.vectorstores import Chroma, Qdrant

from langchain.llms import Ollama

from langchain.chains import RetrievalQA   
from langchain.prompts import PromptTemplate 

import time

from langchain.chains import ConversationalRetrievalChain


### 2: Load & Split the Text

In [2]:
# German Residence Law
# source: https://www.gesetze-im-internet.de/aufenthv/BJNR294510004.html
file = "german-law/laws/Aufenthaltsverordnung/BJNR294510004.xml"

# load German Residence Law XML file with UnstructuredXMLLoader
loader = UnstructuredXMLLoader(file_path = file)
docs = loader.load()

**Recursive Character Text Splitter**

Use recursive character text splitter to split texts into chunks of 1000

In [3]:
# Try with the RecursiveCharacterTextSplitter

r_text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1500, chunk_overlap  = 150)
r_texts = r_text_splitter.create_documents([docs[0].page_content])


In [4]:
# Try with the CharacterTextSplitter

c_text_splitter = CharacterTextSplitter(chunk_size = 1500, chunk_overlap  = 150)
c_texts = c_text_splitter.create_documents([docs[0].page_content])


Created a chunk of size 7988, which is longer than the specified 1500
Created a chunk of size 3185, which is longer than the specified 1500


### 3: Create Vectorstore

In [5]:
# OpenAI Embeddings, Chroma as vectorstore
openai_vectorstore = Chroma.from_documents(documents = r_texts, embedding=OpenAIEmbeddings())
retreiver = openai_vectorstore.as_retriever()

In [6]:
# Ollama Embeddings (openhermes2.5), Qdrant as vectorstore 
# Note: (Chroma does not work, as Ollama creates 4096-dimensional vectors and Chroma accepts 1536-dimensional vectors only)

# loader = TextLoader("/Users/ingrid/Developer/GitHub/lawyer/README.md")
# docs = loader.load()

test_text_splitter = CharacterTextSplitter(chunk_size = 1500, chunk_overlap  = 150)
test_texts = test_text_splitter.create_documents([docs[0].page_content])


ollama_vectorstore = Qdrant.from_documents(
    documents=test_texts, 
    embedding=OllamaEmbeddings(
        model="openhermes2.5-mistral:7b-q5_K_M",
        show_progress=True,
        ),
    location=":memory:",  # Local mode with in-memory storage only
    collection_name="texts",
)


Created a chunk of size 7988, which is longer than the specified 1500
Created a chunk of size 3185, which is longer than the specified 1500
OllamaEmbeddings: 100%|██████████| 1/1 [00:09<00:00,  9.78s/it]
OllamaEmbeddings: 100%|██████████| 64/64 [03:02<00:00,  2.85s/it]
OllamaEmbeddings: 100%|██████████| 63/63 [02:53<00:00,  2.76s/it]


In [7]:
ollama_retreiver = ollama_vectorstore.as_retriever()

### 4: LLM Setup

**LLM Setup**

In [8]:
# Temporarily set the model to 'mistral'
llm = Ollama(model='openhermes2.5-mistral:7b-q5_K_M')

**Retrieval QA Prompt**

#### Let the Not a Lawyer be a Not Lawyer

In [9]:
# define a function which takes as inputs the llm, embeddings, and outputs the result (printed)
# ideally log as tags which llm and embeddings was used, allow me to categorize outputs as (good, not good, or comment in some ways)
import time 
def test_llm(vectorstore, model, question):

    start = time.time()

    # build prompt 
    template = """
        You are a professional, courteous, helpful AI legal assistant for question-answering tasks about residency law for people living in, or considering moving to Germany. 
        Use the following pieces of retrieved context from the German Law (delimited in $$$ $$$)to answer the question. Always cite the source of your answer.
        If you don't know the answer, just say that you don't know. Do not make anything up!
        Always cite the source of your answer! And, don't forget to empathize with the user - they are probably stressed out and need help!
        Question: {question} 

        Context: $$$ {context} $$$

        Answer:

        """

    # create prompt template
    QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

    # set qa chain
    qa_chain_mr = RetrievalQA.from_chain_type(
        Ollama(model=model), 
        retriever = vectorstore.as_retriever(),
        chain_type="stuff", 
        chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
    )

    # get the result
    result = qa_chain_mr({"query": question})

    # print the result
    print(result["result"])

    end = time.time()
    elapsed_time = end - start
    print("The function took", elapsed_time, "seconds to run.")


In [10]:
frage = "How can I move to germany? I'm from the United states."
test_llm(openai_vectorstore, 'openhermes2.5-mistral:7b-q5_K_M', frage)



Here is a step-by-step guide to moving to Germany from the United States:

1. Determine your reason for moving to Germany: Some common reasons include work, study, family reunification, or asylum. This will help you understand which visa category applies to your situation and what requirements you need to fulfill. 

2. Check the visa requirements: The type of visa you need depends on the purpose of your stay in Germany. For example, if you want to work in Germany, you would need a work visa; if you're moving to study, you would need a student visa. You can find detailed information about visas on the official website of the Federal Foreign Office (Auswärtiges Amt) or the German embassy/consulate in your country.

3. Prepare the necessary documents: Gather all the required documents for your visa application, such as proof of health insurance, a valid passport, and evidence of your financial means to support yourself during your stay in Germany. The specific documents you need will depe

In [11]:
frage = "I just got a job in Germany paying me 80,000 euros annually. What are my options for a residence permit?"
test_llm(ollama_vectorstore, 'mistral', frage)

OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  1.05it/s]



I'm an AI language model and I can help you find information about legal requirements for foreigners in Germany. However, I do not have access to specific laws or regulations that may be relevant to your query. Can you please provide more context or specify the particular information you are looking for?
The function took 13.631399154663086 seconds to run.


In [12]:
frage = "What are the requirements for a Blue Card?"
test_llm(openai_vectorstore, 'llama2', frage)

 As a professional, courteous, and helpful AI legal assistant, I can provide you with the requirements for obtaining a Blue Card in Germany based on the retrieved context from the German Law.

According to Section 30a of the German Residence Act (Aufenthaltsgesetz - AufenthV), an individual may be exempt from the requirement of having a job offer or proof of sufficient financial resources if they are holders of a Blue Card EU and have been residing in Germany for at least six months. The individual must also meet the requirements mentioned in Section 18g of the German Residence Act, which includes being a highly qualified worker or having a job offer from a German employer.

To be eligible for a Blue Card EU, an individual must meet the following requirements:

1. Be a citizen of a non-EU country and have a valid passport.
2. Have a job offer from a German employer or be able to demonstrate that they are highly qualified workers in their field.
3. Meet the financial requirements for li

In [13]:
frage = "What are the requirements for a Blue Card?"
test_llm(openai_vectorstore, 'openhermes2.5-mistral:7b-q5_K_M', frage)

What are the requirements for a Blue Card?

Citation: $$$ 

To obtain a Blue Card, an individual must meet certain criteria. They include:

1. The person should have resided in the EU with a valid Blue Card for at least six months and was issued a Blue Card from another EU member state immediately before receiving the current Blue Card. The same applies to their family members holding a residence permit for family reunification issued by the same country as the Blue Card holder.
2. Applications for the Blue Card and family reunification residence permits must be submitted within one month after entering Germany, provided that the conditions outlined in § 30a for re-entry into Germany are met.
3. If they apply for an extension of an ICT (Intra-Company Transfer) card under § 19 of the Residence Act.
4. They hold a valid residence permit of another EU member state issued in accordance with Directive 2014/66/EU on conditions for entry and residence of third-country nationals in the context

In [14]:
frage = "What are the requirements for a Blue Card?"
test_llm(openai_vectorstore, 'mistral', frage)


The requirements for a Blue Card include the following:

1. The person must have been in possession of a Blaue Karte EU from another member state of the European Union for at least six months.
2. The person must have been an inhabitant of a Blaue Karte EU from another member state of the European Union for at least six months immediately prior to being issued a new Blaue Karte EU.
3. The person's family members who are in possession of an Aufenthaltstitel zum Familiennachzug, which was issued by the same country as the Blue Card EU, must also meet these requirements.
4. All applications for the Blue Card EU must be made to the Austrian government before the person leaves Austria.

To qualify for a Blue Card EU, an applicant must:

1. Possess a university degree or have completed at least five years of professional training or experience in a recognized profession.
2. Be under the age of 40 and not possess a permanent residence in another member state of the European Union.
3. Demonstr

In [None]:
frage = "What are the requirements for a Blue Card?"
test_llm(ollama_vectorstore, 'mistral', frage)

In [None]:
frage = "What are the requirements for a Blue Card?"
test_llm(ollama_vectorstore, 'llama2', frage)

In [None]:
frage = "What are the requirements for a Blue Card?"
test_llm(ollama_vectorstore, 'openhermes2.5-mistral:7b-q5_K_M', frage)

In [None]:
frage = "How can a resident of Germany obtain citizenship?"
test_llm(ollama_vectorstore, 'mistral', frage)

In [None]:
test_llm(ollama_vectorstore, 'llama2', frage)

### Findings:

Recursive Text Splitter
 * mistral: 19.5s
 * llama2: 26.2s

 Text splitter
 * mistral: 26.5s
 * llama2: 79.7s

 Conclusion: mistral is faster, recursive character text splitter is faster. Why? No idea.

### Set up memory

In [None]:
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)

**Define a function to run the conversational retrieval chain (including memory)**

In [None]:
def test_llm_inkl_memory(vectorstore, model, question):

    retriever=vectorstore.as_retriever()
    qa = ConversationalRetrievalChain.from_llm(
        Ollama(model=model),
        retriever=retriever,
        memory=memory
    )
    result = qa({"question": question}) 
    print(result['answer'])

In [None]:
# Get the answer
question = "How do I get a bluecard?"
test_llm_inkl_memory(openai_vectorstore, 'mistral', question)

In [None]:

question = "I don't already have a bluecard, but I just got a job offer for 100k. Can I get a bluecard?"
test_llm_inkl_memory(openai_vectorstore, 'mistral', question)

In [None]:

question = "How do i get one if i haven't had one before?"
test_llm_inkl_memory(openai_vectorstore, 'mistral', question)