In [87]:
import pandas as pd
import numpy as np
from langchain.document_loaders import DirectoryLoader, UnstructuredXMLLoader


In [88]:
file = "german-law/laws/Aufenthaltsverordnung/BJNR294510004.xml"

### Try: XML Loader

In [92]:
# load XML with UnstructuredXMLLoader
loader = UnstructuredXMLLoader(file_path = file)
docs = loader.load()

In [93]:
len(docs)

1

In [100]:
docs[0].page_content[:400]

'290 AufenthV Inhaltsübersicht Kapitel 1 Allgemeine Bestimmungen § 1 Begriffsbestimmungen Kapitel 2 Einreise und Aufenthalt im Bundesgebiet Abschnitt 1 Passpflicht für Ausländer § 2 Erfüllung der Passpflicht durch Eintragung in den Pass eines gesetzlichen Vertreters § 3 Zulassung nichtdeutscher amtlicher Ausweise als Passersatz § 4 Deutsche Passersatzpapiere für Ausländer § 5 Allgemeine Voraussetzu'

In [98]:
type(docs)

list

**Recursive Character Text Splitter**

Use recursive character text splitter to split texts into chunks of 1000

In [120]:
# Try with the CharacterTextSplitter
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap  = 200
)

r_texts = text_splitter.create_documents([docs[0].page_content])
print(len(r_texts))

254


In [121]:
# Try with the CharacterTextSplitter
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter

text_splitter = CharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap  = 200
)

texts = text_splitter.create_documents([docs[0].page_content])
print(len(texts))

Created a chunk of size 1500, which is longer than the specified 1000
Created a chunk of size 1500, which is longer than the specified 1000
Created a chunk of size 1500, which is longer than the specified 1000
Created a chunk of size 1500, which is longer than the specified 1000
Created a chunk of size 1500, which is longer than the specified 1000
Created a chunk of size 7988, which is longer than the specified 1000
Created a chunk of size 1413, which is longer than the specified 1000
Created a chunk of size 1181, which is longer than the specified 1000
Created a chunk of size 1231, which is longer than the specified 1000
Created a chunk of size 1124, which is longer than the specified 1000
Created a chunk of size 1359, which is longer than the specified 1000
Created a chunk of size 1252, which is longer than the specified 1000
Created a chunk of size 1357, which is longer than the specified 1000
Created a chunk of size 1474, which is longer than the specified 1000
Created a chunk of s

130


**Embedding**

In [73]:
from langchain.embeddings import OllamaEmbeddings

In [74]:
embeddings_model = OllamaEmbeddings()

In [148]:
print(OllamaEmbeddings())

base_url='http://localhost:11434' model='llama2' embed_instruction='passage: ' query_instruction='query: ' mirostat=None mirostat_eta=None mirostat_tau=None num_ctx=None num_gpu=None num_thread=None repeat_last_n=None repeat_penalty=None temperature=None stop=None tfs_z=None top_k=None top_p=None model_kwargs=None


**Apply the embedding model**

In [122]:
# Apply to the character split texts
embeddings = embeddings_model.embed_documents(texts)

In [149]:
len(embeddings[0])

4096

In [124]:
# Apply to the recursively character split texts
embeddings_r_texts = embeddings_model.embed_documents(r_texts)

**Vector Store: define the database to use**

In [150]:
from langchain.vectorstores import Qdrant

**Qdrant texts from non-recursive splitter**

In [153]:
qdrant_texts = Qdrant.from_documents(
    documents=texts,
    embedding=embeddings_model,
    location=":memory:",  # Local mode with in-memory storage only
    collection_name="texts",
)

**Qdrant texts from recursive splitter**

In [154]:
qdrant_r_texts = Qdrant.from_documents(
    documents=r_texts,
    embedding=embeddings_model,
    location=":memory:",  # Local mode with in-memory storage only
    collection_name="r_texts",
)

**LLM Setup**

In [61]:
from langchain.llms import Ollama

In [62]:
# llm = Ollama(model='llama2', temperature=0)

In [176]:
# Function defines the model as "llm"

def set_model(model):
    llm = Ollama(model=model, temperature=0)
    return llm

# TO CALL THIS FUNCTION:
# set_model('mistral')

**Retrieval QA Prompt**

In [155]:
from langchain.chains import RetrievalQA   

In [170]:
from langchain.prompts import PromptTemplate 

# build prompt 
template = """
    You are an empathetic and helpful legal advisor with intimate knowledge of German law.
    Use the following pieces of context to answer the question at the end. 
    If you don't know the answer, just say that you don't know, don't try to make up an answer. 
    Use ten sentences maximum. Keep the answer as easy to understand for the average person and 
    as concise as possible. 
    Always reference the laws that pertain to the answer so your client can reference them later. 
    Reference these inline and at the end (i.e. Paragraph 9 refers to the amount of time you need to live in xyz)
    
    {context}
    Question: {question}
    Helpful Answer:
    """
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)


In [171]:
qa_chain_mr = RetrievalQA.from_chain_type(
    set_model(model), 
    retriever = qdrant_r_texts.as_retriever(),
    chain_type="stuff", 
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}

)

**Define Question**

In [172]:
question = "How can a resident of Germany obtain citizenship?"

In [None]:
result = qa_chain_mr({"query": question})


In [174]:
# The print() function ensures we see paragraphs, not '\n'
print(result["result"])

 As a legal advisor with intimate knowledge of German law, I can provide you with information on how a resident of Germany can obtain citizenship. According to Section 2 of the Citizenship Act (Bundeszugehörigkeitsgesetz), a person can acquire German citizenship through naturalization if they meet certain requirements.

To be eligible for citizenship, the applicant must:

1. Have been living in Germany for at least 6 years immediately prior to the application (Paragraph 9 of the Citizenship Act).
2. Be at least 18 years old (Paragraph 2 of the Citizenship Act).
3. Have a good command of the German language (Paragraph 3 of the Citizenship Act).
4. Have a basic knowledge of the Constitution and the laws of Germany (Paragraph 4 of the Citizenship Act).
5. Be of good character and not have any criminal convictions or outstanding debts to the state (Paragraph 5 of the Citizenship Act).

If these requirements are met, the applicant can submit an application for citizenship to the competent a

In [182]:
import time

In [183]:
# define a function which takes as inputs the llm, embeddings, and outputs the result (printed)
# ideally log as tags which llm and embeddings was used, allow me to categorize outputs as (good, not good, or comment in some ways)

def test_llm(vectorstore, model, question):

    start = time.time()

    # set qa chain
    qa_chain_mr = RetrievalQA.from_chain_type(
        set_model(model), 
        retriever = vectorstore.as_retriever(),
        chain_type="stuff", 
        chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
    )

    # get the result
    result = qa_chain_mr({"query": question})

    # print the result
    print(result["result"])

    end = time.time()
    elapsed_time = end - start
    print("The function took", elapsed_time, "seconds to run.")


In [184]:
frage = "How can a resident of Germany obtain citizenship?"
test_llm(qdrant_r_texts, 'mistral', frage)


As of my knowledge cutoff in 2021, a resident of Germany who wishes to obtain citizenship must fulfill certain requirements and go through a series of processes.

First, the individual must have lived in Germany for a minimum of eight years. This requirement is set forth in Section 67 of the German Civil Code (Bürgergesetzbuch).

Second, the applicant must demonstrate that they have good character and are capable of upholding the rights and duties of German citizenship. This includes having a clean criminal record and demonstrating that they have integrated into German society.

Third, the individual must take an oath of allegiance to the Federal Republic of Germany. This requirement is set forth in Section 68 of the German Civil Code.

Finally, the applicant must submit their application to the appropriate federal or state authority and undergo a citizenship ceremony.

It's important to note that this information is subject to change and may not be up-to-date with the current laws in

In [185]:
test_llm(qdrant_r_texts, 'llama2', frage)

 As a legal advisor with intimate knowledge of German law, I can provide you with information on how a resident of Germany can obtain citizenship. According to Section 1 of the Citizenship Act (Aufenthaltsgesetz), a person who has been residing in Germany for at least eight years can apply for citizenship. This period may be reduced to five years if the applicant is married to a German citizen or has lived in Germany for at least 10 years.

To obtain citizenship, the applicant must meet certain requirements, including:

* Being at least 18 years old
* Having good moral character and no criminal record
* Proving basic knowledge of the German language and society
* Passing a citizenship test

The application process typically involves submitting an application to the relevant authorities, along with supporting documents and fees. The application will be reviewed and processed by the authorities, who may request additional information or documentation during the processing period. If the 

**Testing qdrant_texts**

In [188]:
test_llm(qdrant_texts, 'mistral', frage)


To become a German citizen, you will need to go through a process called naturalization. There are several requirements that you must meet in order to apply for citizenship, including the following:

1. Age: You must be at least 18 years old to apply for naturalization.
2. Length of Residence: You must have been living in Germany for at least six years and a half (7 jahre und 6 Monate). During this time, you must have had no interruptions in your residence and must have spent at least two years and six months (2 Jahre und 6 Monate) during this period.
3. Knowledge of German: You must be able to demonstrate a good command of the German language both in speech and writing. This can be done through an interview, examination or test, or by providing proof of your proficiency such as a diploma from a German language course.
4. Integration: You must demonstrate that you have integrated well into German society and are able to take on the rights and responsibilities of citizenship. This incl

In [189]:
test_llm(qdrant_texts, 'llama2', frage)

 As an empathetic and helpful legal advisor, I must inform you that acquiring German citizenship is a complex process that involves several steps and requirements. Here are the general guidelines to obtain German citizenship as a resident of Germany:

1. Meet the Eligibility Criteria: To be eligible for German citizenship, you must meet certain requirements such as being at least 18 years old, having lived in Germany for at least eight years (or 10 years if you are married to a German citizen), and having a good knowledge of the German language.
2. Obtain Permanent Residence: Before applying for citizenship, you must hold a permanent residence permit (Aufenthaltstitel). You can apply for this permit after living in Germany for at least five years as a foreign national.
3. Pass the Citizenship Test: The next step is to pass a written and oral test on German history, culture, and politics. The test is administered by the authorities, and you will need to score at least 60% to pass.
4. Me

### Findings:

Recursive Text Splitter
 * mistral: 19.5s
 * llama2: 26.2s

 Text splitter
 * mistral: 26.5s
 * llama2: 79.7s