# German Lawyer 

A simple Jupyter notebook to help navigate the residency law in Germany. This project uses local embeddings and models to do RAG over the German residency law. This means that the model is living locally on the computer, the embeddings are done locally, and the querying is done locally.

You can ask questions like:

* What are the requirements for a Blue Card?
* What are the requirements for a student visa?
* What are the requirements for a work visa?

I've taken the Aufenthaltsgesetz and Aufenthaltsverordnung from Gesetze im Internet as XML and using the Unstructured XML loader, I've loaded them in as a LangChain document.

## Project Steps

1. Load the XML files into a LangChain document
2. Split the document into sections
3. Embeddings
4. Vector Store
5. LLM Setup (Prompt Template & Querying)

In [91]:
import pandas as pd
import numpy as np
from langchain.document_loaders import DirectoryLoader, UnstructuredXMLLoader

### 1: Use LangChain Unstructured XML Loader to Load in the German Residence Law

In [92]:
# German Residence Law
# source: https://www.gesetze-im-internet.de/aufenthv/BJNR294510004.html
file = "german-law/laws/Aufenthaltsverordnung/BJNR294510004.xml"

In [93]:
# load German Residence Law XML file with UnstructuredXMLLoader
loader = UnstructuredXMLLoader(file_path = file)
docs = loader.load()

# print the first 400 characters of the first page
docs[0].page_content[:400]

'290 AufenthV Inhaltsübersicht Kapitel 1 Allgemeine Bestimmungen § 1 Begriffsbestimmungen Kapitel 2 Einreise und Aufenthalt im Bundesgebiet Abschnitt 1 Passpflicht für Ausländer § 2 Erfüllung der Passpflicht durch Eintragung in den Pass eines gesetzlichen Vertreters § 3 Zulassung nichtdeutscher amtlicher Ausweise als Passersatz § 4 Deutsche Passersatzpapiere für Ausländer § 5 Allgemeine Voraussetzu'

### 2: Split the Text

**Recursive Character Text Splitter**

Use recursive character text splitter to split texts into chunks of 1000

In [94]:
# Try with the CharacterTextSplitter
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 10000,
    chunk_overlap  = 1000
)

r_texts = text_splitter.create_documents([docs[0].page_content])
print(len(r_texts))

20


In [95]:
# Try with the CharacterTextSplitter
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter

text_splitter = CharacterTextSplitter(
    chunk_size = 5000,
    chunk_overlap  = 500
)

texts = text_splitter.create_documents([docs[0].page_content])
print(len(texts))

Created a chunk of size 7988, which is longer than the specified 5000


40


### 3: Embedding

In [30]:
from langchain.embeddings import OllamaEmbeddings

In [31]:
# function to set the embeddings model used 

def set_embeddings_model(model):
    embeddings_model = OllamaEmbeddings(model=model)
    return embeddings_model

In [32]:
set_embeddings_model('mistral')

OllamaEmbeddings(base_url='http://localhost:11434', model='mistral', embed_instruction='passage: ', query_instruction='query: ', mirostat=None, mirostat_eta=None, mirostat_tau=None, num_ctx=None, num_gpu=None, num_thread=None, repeat_last_n=None, repeat_penalty=None, temperature=None, stop=None, tfs_z=None, top_k=None, top_p=None, model_kwargs=None)

In [33]:
embeddings_model = OllamaEmbeddings()

In [34]:
print(OllamaEmbeddings())

base_url='http://localhost:11434' model='llama2' embed_instruction='passage: ' query_instruction='query: ' mirostat=None mirostat_eta=None mirostat_tau=None num_ctx=None num_gpu=None num_thread=None repeat_last_n=None repeat_penalty=None temperature=None stop=None tfs_z=None top_k=None top_p=None model_kwargs=None


In [35]:
print(OllamaEmbeddings(model='mistral'))

base_url='http://localhost:11434' model='mistral' embed_instruction='passage: ' query_instruction='query: ' mirostat=None mirostat_eta=None mirostat_tau=None num_ctx=None num_gpu=None num_thread=None repeat_last_n=None repeat_penalty=None temperature=None stop=None tfs_z=None top_k=None top_p=None model_kwargs=None


**Apply the embedding model**

In [36]:
# Apply to the character split texts
embeddings = embeddings_model.embed_documents(texts)

In [37]:
len(embeddings[0])

4096

In [38]:
# Apply to the recursively character split texts
embeddings_r_texts = embeddings_model.embed_documents(r_texts)

### 4: Vector Store: define the database to use

In [51]:
from langchain.vectorstores import Qdrant

**Qdrant texts from non-recursive splitter**

Note: when using qdrant, it creates the embeddings for you as a function of creating the vector database

In [52]:
# Create a function that takes in the
#  1. embeddings model 
#  2. texts
#  3. sets a name for the collection 

# and outputs a qdrant database



In [53]:
qdrant_texts = Qdrant.from_documents(
    documents=texts,
    embedding=embeddings_model,
    location=":memory:",  # Local mode with in-memory storage only
    collection_name="texts",
)

**Qdrant texts from recursive splitter**

In [54]:
qdrant_r_texts = Qdrant.from_documents(
    documents=r_texts,
    embedding=embeddings_model,
    location=":memory:",  # Local mode with in-memory storage only
    collection_name="r_texts",
)

### 5: LLM Setup

**LLM Setup**

In [61]:
from langchain.llms import Ollama

In [62]:
# llm = Ollama(model='llama2', temperature=0)

In [63]:
# Function defines the model as "llm"

def set_model(model):
    llm = Ollama(model=model, temperature=0)
    return llm

# TO CALL THIS FUNCTION:
# set_model('mistral')

**Retrieval QA Prompt**

In [64]:
from langchain.chains import RetrievalQA   

In [65]:
from langchain.prompts import PromptTemplate 

# build prompt 
template = """
    You are an empathetic and helpful legal advisor with intimate knowledge of German law.
    Use the following pieces of context to answer the question at the end. 
    If you don't know the answer, just say that you don't know, don't try to make up an answer. 
    Use ten sentences maximum. Keep the answer as easy to understand for the average person and 
    as concise as possible. 
    Always reference the laws that pertain to the answer so your client can reference them later. 
    Reference these inline and at the end (i.e. Paragraph 9 refers to the amount of time you need to live in xyz)
    
    {context}
    Question: {question}
    Helpful Answer:
    """
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)


In [67]:
qa_chain_mr = RetrievalQA.from_chain_type(
    set_model('mistral'), 
    retriever = qdrant_r_texts.as_retriever(),
    chain_type="stuff", 
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}

)

**Define Question**

In [68]:
question = "How can a resident of Germany obtain citizenship?"

In [69]:
result = qa_chain_mr({"query": question})


In [70]:
import time

In [71]:
# define a function which takes as inputs the llm, embeddings, and outputs the result (printed)
# ideally log as tags which llm and embeddings was used, allow me to categorize outputs as (good, not good, or comment in some ways)

def test_llm(vectorstore, model, question):

    start = time.time()

    # set qa chain
    qa_chain_mr = RetrievalQA.from_chain_type(
        set_model(model), 
        retriever = vectorstore.as_retriever(),
        chain_type="stuff", 
        chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
    )

    # get the result
    result = qa_chain_mr({"query": question})

    # print the result
    print(result["result"])

    end = time.time()
    elapsed_time = end - start
    print("The function took", elapsed_time, "seconds to run.")


In [96]:
frage = "What are the requirements for a Blue Card?"
test_llm(qdrant_r_texts, 'mistral', frage)

A Blue Card is a type of residence permit issued to highly skilled non-EU workers who wish to work and live in Germany for an extended period. It is typically issued for up to four years and can be renewed for another four years if the worker meets certain requirements, such as earning at least €45,000 per year or holding a PhD in a specialized field.

To apply for a Blue Card, you must first find a job in Germany that sponsors the card and is willing to pay the required salary. Once you have a job offer, you can apply for the Blue Card through the German Federal Office for Migration and Refugees (BAMF). To be eligible, you must meet the following requirements:

* Have a bachelor's degree or equivalent in a specialized field
* Speak at least B2 level of German or another EU language
* Earn at least €45,000 per year in your first year of employment (this can include salary and any additional income you receive from other sources)

If you meet these requirements, BAMF will review your ap

In [86]:
frage = "How can a resident of Germany obtain citizenship?"
test_llm(qdrant_r_texts, 'mistral', frage)


It appears that you are asking about how a resident of Germany can obtain citizenship. In general, the process for obtaining German citizenship involves meeting certain requirements and going through an application and examination process. Here are some basic steps:

1. Eligibility: To be eligible for German citizenship, you must have lived in Germany for a certain period of time, usually 8-10 years, depending on your circumstances. You must also have a good conduct record and not have any criminal convictions.
2. Application: Once you meet the eligibility requirements, you can submit an application for citizenship to the appropriate government agency in Germany. This typically involves providing personal information, including your identity, address, and employment status.
3. Examination: After submitting your application, you will need to undergo an examination to determine your knowledge of German language, culture, and history. This exam is usually taken at a local government offi

In [79]:
test_llm(qdrant_r_texts, 'llama2', frage)

 To obtain German citizenship, you will need to meet the eligibility requirements and go through the application process. Here are the general steps:

1. Meet the eligibility requirements: You must be at least 18 years old, have been living in Germany for at least 3 years (or 2 years if you are married to a German citizen), and have a good knowledge of the German language and culture. Additionally, you must renounce any other citizenship you may hold.
2. Determine your status: If you were born in Germany to non-German parents or acquired citizenship through birth in Germany, you may be eligible for German citizenship automatically. However, if you were born abroad or acquired citizenship through naturalization, you will need to apply for citizenship.
3. Gather required documents: You will need to provide proof of your identity, as well as documents showing your residence and immigration status in Germany. You may also be required to provide certificates from your home country or previo

**Testing qdrant_texts**

In [80]:
test_llm(qdrant_texts, 'mistral', frage)


In order to become a citizen of Germany, a person must meet certain requirements and follow specific procedures. These requirements include having lived in Germany for a certain period of time, having good conduct and not having been convicted of certain crimes, and having a basic knowledge of German language and culture. The specific procedures involve applying for citizenship through the local authorities where the applicant resides, providing all required documents, and taking an oath of allegiance. It's also important to note that there are different types of citizenship in Germany, such as naturalization and acquired citizenship, which have different requirements and procedures.
The function took 14.861018180847168 seconds to run.


In [81]:
test_llm(qdrant_texts, 'llama2', frage)

 To obtain German citizenship, a resident of Germany must meet certain eligibility requirements and go through a process that involves several steps. Here is an overview of the main requirements and procedures:

Eligibility Requirements:

1. Age: The applicant must be at least 18 years old (or 16 years old if they have been living in Germany for at least three years).
2. Residence: The applicant must have lived in Germany for at least eight years (or six years if they have been married to a German citizen or have been living in Germany as a refugee or subsidiary protection beneficiary).
3. Language Skills: The applicant must have a basic knowledge of the German language.
4. Cultural Knowledge: The applicant must have a basic understanding of German culture and society.
5. Good Character: The applicant must be of good character and not have any criminal convictions or outstanding warrants.

Step 1: Register with the Residence Office (Aufenthaltstelle)
The first step is to register with 

### Findings:

Recursive Text Splitter
 * mistral: 19.5s
 * llama2: 26.2s

 Text splitter
 * mistral: 26.5s
 * llama2: 79.7s

 Conclusion: mistral is faster, recursive character text splitter is faster. Why? No idea.

### Set up memory

In [194]:
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)

**Conversational Retreival Chain**

In [195]:
from langchain.chains import ConversationalRetrievalChain


**Define a function to run the conversational retrieval chain (including memory)**

In [200]:
def test_llm_inkl_memory(vectorstore, model, question):

    retriever=vectorstore.as_retriever()
    qa = ConversationalRetrievalChain.from_llm(
        set_model(model),
        retriever=retriever,
        memory=memory
    )
    result = qa({"question": question}) 
    print(result['answer'])

In [203]:
# Set the question
question = "Does the content contain information about what happens if you get a job and want to live in germany?"


In [204]:
# Get the answer
test_llm_inkl_memory(qdrant_r_texts, 'mistral', question)


I don't have information on the specific requirements and processes for obtaining permanent residency in Germany, as there is no direct reference to such a topic in the given context. However, I can suggest that you look up more detailed information on the topic from official government websites or legal resources, such as the Bundesamt für Auswanderung und Flüchtlinge (Federal Office for Migration and Refugees) or the German Ministry of the Interior.
