<br>

<h1 style="text-align:center;">Legal GPT</h1>

<br>

## Introduction

---

This project is a legal research assistant powered by OpenAI's API and a Pinecone database. It utilizes Retrieval Augmented Generation (RAG) to efficiently search and analyze legal documents, aiding legal professionals and researchers in their work.


<br>

## Initial Setup

---

In this section, we will install the necessary dependencies, import the required libraries, and set up the environment variables for OpenAI and Pinecone.

In [1]:
!pip install -r requirements.txt

In [2]:
# Import Libraries
from langchain.text_splitter import RecursiveCharacterTextSplitter, HTMLHeaderTextSplitter
from langchain.vectorstores import Pinecone
from langchain.prompts import PromptTemplate 
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
import gradio as gr
import openai
import os

In [4]:
# Set environment variables
os.environ["OPENAI_API_KEY"] = "****"
os.environ["PINECONE_API_KEY"] = "****"
os.environ["PINECONE_INDEX"] = "dutch-law"
os.environ["PINECONE_ENVIRONMENT"] = "us-east-1"

<br>

## LLM and Embedding

---

In this section, we will establish the LLM and the embedding models utilizing OpenAI.

In [5]:
# Define the model
llm = ChatOpenAI(model="gpt-3.5-turbo-1106", temperature=0)

# Define the embeddings
embedding = OpenAIEmbeddings()

<br>

## Preprocess Data

---

In this section, we will retrieve and preprocess the text data.

In [6]:
# Dutch Laws URLs
urls = [
    "https://wetten.overheid.nl/BWBR0011823/2024-01-01",
    "https://wetten.overheid.nl/BWBR0011825/2024-04-17",
    "https://wetten.overheid.nl/BWBR0003738/2023-10-01",
    "https://wetten.overheid.nl/BWBR0044770/2023-01-01",
]

In [7]:
# Initialize the splitter (based on HTML headers)
html_splitter = HTMLHeaderTextSplitter(
    headers_to_split_on = [("h1", "Header 1"), ("h2", "Header 2"), ("h3", "Header 3")]
)

# Initialize the splitter (based on characters)
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200
)

# Process each URL and ensure all splits are strings
splits_l = []
for url in urls:
    splits = html_splitter.split_text_from_url(url)
    splits = text_splitter.split_documents(splits)
    splits_l.extend(splits)

splits_l[:3]

[Document(page_content='Direct naar content  \nMenu  \nU bent nu hier: Wettenbank'),
 Document(page_content='Andere sites binnen Overheid.nl  \nAndere sites binnen Overheid.nl Eenvoudig zoeken Uitgebreid zoeken Zoeken in EU-richtlijnen', metadata={'Header 2': 'Primaire navigatie'}),
 Document(page_content='Berichten over uw Buurt')]

<br>

## Vector Store (Pinecone)

---

In this section, we will insert or update (upsert) the preprocessed text data into Pinecone. If already exist, then we will load it.

In [8]:
# Flag for upserting
upsert_flag = False

# Upsert the documents to the vector store
if upsert_flag:
    vectorstore = Pinecone.from_documents(documents=splits_l, embedding=embedding, index_name="dutch-law",)

In [9]:
# Load document from vector store
vectorstore = Pinecone.from_existing_index("dutch-law", OpenAIEmbeddings())

In [10]:
# Connect retriever to the vector store
retriever = vectorstore.as_retriever()

<br>

## Prompt Template

---

In this section, we will build a prompt template.

In [11]:
template = """
      ### INSTRUCTIONS:
      As a polite and professional AI assistant, your task is to address user queries effectively.

      PLEASE ENSURE TO:
      (0) Carefully read and understand the question and its context.
      (1) Begin your response by affirming the question to clarify understanding.
      (2) Provide a comprehensive, clear, and well-sourced answer. If information is unavailable, explain the limitation with: "I couldn't find the required information based on the resources available."
      (3) Cite all references inline to support your claims.
      (4) Review your response for accuracy, helpfulness, and clarity before finalizing.

      Proceed in a systematic manner.

      ### QUERY:
      #### Question: {question}

      #### Context: {context}

      ### RESPONSE:
      #### Detailed Answer with References:
"""

In [12]:
# create prompt template
prompt = PromptTemplate.from_template(template)

prompt

PromptTemplate(input_variables=['context', 'question'], template='\n      ### INSTRUCTIONS:\n      As a polite and professional AI assistant, your task is to address user queries effectively.\n\n      PLEASE ENSURE TO:\n      (0) Carefully read and understand the question and its context.\n      (1) Begin your response by affirming the question to clarify understanding.\n      (2) Provide a comprehensive, clear, and well-sourced answer. If information is unavailable, explain the limitation with: "I couldn\'t find the required information based on the resources available."\n      (3) Cite all references inline to support your claims.\n      (4) Review your response for accuracy, helpfulness, and clarity before finalizing.\n\n      Proceed in a systematic manner.\n\n      ### QUERY:\n      #### Question: {question}\n\n      #### Context: {context}\n\n      ### RESPONSE:\n      #### Detailed Answer with References:\n')

<br>

## Create Chain

---

In this section, we will create a chain so we can invoke later on.

In [13]:
# Chain the retriever, prompt, language model, and output parser
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

chain

{
  context: VectorStoreRetriever(tags=['Pinecone', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.pinecone.Pinecone object at 0x000001D207DB7E50>),
  question: RunnablePassthrough()
}
| PromptTemplate(input_variables=['context', 'question'], template='\n      ### INSTRUCTIONS:\n      As a polite and professional AI assistant, your task is to address user queries effectively.\n\n      PLEASE ENSURE TO:\n      (0) Carefully read and understand the question and its context.\n      (1) Begin your response by affirming the question to clarify understanding.\n      (2) Provide a comprehensive, clear, and well-sourced answer. If information is unavailable, explain the limitation with: "I couldn\'t find the required information based on the resources available."\n      (3) Cite all references inline to support your claims.\n      (4) Review your response for accuracy, helpfulness, and clarity before finalizing.\n\n      Proceed in a systematic manner.\n\n      ### QUERY:\n

In [14]:
# Invoke the chain
answer = chain.invoke("In Netherlands, what is the integration exam consists of?")
print(answer)

The integration exam in the Netherlands consists of two main components: a test of Dutch language proficiency and an examination of knowledge of Dutch society.

The Dutch language proficiency test assesses the reading, listening, and speaking skills of the individual. The exam program ensures that the individual who successfully completes the integration exam has the required skills in the Dutch language at the A1 level of the European Framework for Modern Foreign Languages (Vreemdelingenbesluit 2000).

The examination of knowledge of Dutch society covers various aspects, including:
1. Knowledge of the Netherlands, including topography, history, and state structure.
2. Understanding of housing, education, employment, healthcare, and integration in the Netherlands.
3. Awareness of rights and obligations upon arrival in the Netherlands.
4. Understanding of the rights, obligations, and common social norms of others in the Netherlands (Vreemdelingenbesluit 2000).

The specific content and 

In [15]:
# Invoke the chain
answer = chain.invoke("In Netherlands, is it possible to appeal against a decision of the District Court?")
print(answer)

Based on the information provided, it is possible to appeal against a decision of the District Court in the Netherlands. According to the Rijkswet op het Nederlanderschap, Article 22a states that a decision to revoke Dutch citizenship can be appealed directly to the District Court in The Hague or to the Court of First Instance of the Joint Court of Justice of Aruba, Curaçao, Sint Maarten, and of Bonaire, Sint Eustatius, and Saba. The appeal must be filed within four weeks of the decision.

Additionally, Article 94 of the Vreemdelingenwet 2000 also mentions the right to appeal against a decision to impose a deprivation of liberty measure. It states that the individual is deemed to have filed an appeal against the decision once the court has been notified, unless the individual has already filed an appeal.

Therefore, based on the provided legal documents, it is clear that there is a provision for appealing against decisions of the District Court in the Netherlands.

References:
- Rijksw

<br>

## UI with Gradio

---

In this section, we will setup the UI with Gradio.

In [16]:
# Create a function to use in Gradio (it takes a question and returns an answer)
def get_answer(question):
    answer = chain.invoke(question)
    return answer

In [17]:
# Create and run the Gradio interface
iface = gr.Interface(
    fn=get_answer, 
    inputs=gr.Textbox(value="Enter your question"),
    live=True, 
    outputs="markdown",  
    title="AI-Powered Dutch Lawyer",
    description="Pose any questions about Dutch laws and receive prompt answers from an AI assistant specialized in Dutch integration laws.",
    examples=[
            ["In Netherlands, what is the integration exam consists of?"],
            ],
    theme=gr.themes.Soft(),
    allow_flagging="never",)

iface.launch(share=True)

Running on local URL:  http://127.0.0.1:7860

Could not create share link. Please check your internet connection or our status page: https://status.gradio.app.




### THE END