In [1]:
# MIT LICENCE
#
# Copyright 2025 Massimiliano Carli
# Permission is hereby granted, free of charge, to any person obtaining a 
# copy of this software and associated documentation files (the “Software”), 
# to deal in the Software without restriction, including without limitation 
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the 
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in 
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.

# 👨‍⚖️ LAWI, the EU Law AI Assistant

author: Massimiliano Carli

This notebook builds an AI legal assistant, for legal analysis of key EU regulations, specifically:
- **GDPR** (General Data Protection Regulation)
- **AI Act**

The assistant is capable of delivering legal insights directly sourced from official sources, stored as documents in Chroma DB.

The legal assistant works in pair with two other agents, one dedicated for a RAG task that queries directly articles of the two above sources; one leveraging Google Search to provide extra information. All these agents are then harmonized in one final AI System using LangGraph.

### Implemented Features

- **✅ Agents**  
  Two agents are implemented:
  - A **RAG agent** using an internal ChromaDB-based retrieval system.
  - A **Google search agent** using Google Search to retrive useful information.

- **✅ Vector Search / Vector Store / Vector Database and Embeddings**  
  Two separate ChromaDB collections (GDPR and AI Act) are created, populated in batches, and queried using the custom embedding function, covering the vector database and search aspects.

- **✅ Retrieval-Augmented Generation (RAG)**  
  The **RAG Agent** is able to retrieve regulation content and feeding it into the generative process, pondering a final response.

- **✅ Controlled Generation**  
  Generation parameters (e.g., `temperature=0.2`, `top_p=0.5`) are set for both the RAG agent and the Google search agent, ensuring consistent and controlled outputs.

- **✅ Structured Output**
  Both agents follow iterative, ReAct strategies and utilize tool calls for multistep reasoning. Specific prompt templates are used, and final answers are extracted via `<finish>` tags.

- **✅ Function Calling**  
  A decorator (`@tool`) is employed to expose functions like `rag_agent_as_tool` and `google_search_agent_as_tool` for use within the agent framework, supporting function calling within the execution process.

## **IMPORTANT!**

The app built in this notebook takes **user input** using a **text box** ([Python's `input`](https://docs.python.org/3/library/functions.html#input)). I tried to bypass inputs as much as possible to provide an end-to-end execution, but I can not predict what the LLM will decide (it may still ask you some extra follow-up questions). Keep an eye out for the last line where `.invoke(...)` is called in order to (eventually) interact with the app.

## Dependencies and Setup

In [2]:
!pip uninstall -qy kfp jupyterlab libpysal thinc spacy fastai ydata-profiling google-cloud-bigquery google-generativeai
!pip install   -qU "google-genai==1.7.0" "chromadb==0.6.3" "langgraph==0.3.21" "langchain-google-genai==2.1.2" "langgraph-prebuilt==0.1.7"

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.5/43.5 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m144.7/144.7 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m611.1/611.1 kB[0m [31m19.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m138.0/138.0 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.0/42.0 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m52.8 MB/s[0m eta [36

In [3]:
from google import genai
from google.genai import types

import re
from IPython.display import display, Markdown, Image
from typing import Annotated, Literal, Dict, List
from typing_extensions import TypedDict

genai.__version__

'1.7.0'

In [4]:
from kaggle_secrets import UserSecretsClient

GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")
client = genai.Client(api_key=GOOGLE_API_KEY)

In [5]:
from google.api_core import retry

is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})

If you received an error response along the lines of `No user secrets exist for kernel id ...`, then you need to add your API key via `Add-ons`, `Secrets` **and** enable it.

![Screenshot of the checkbox to enable GOOGLE_API_KEY secret](https://storage.googleapis.com/kaggle-media/Images/5gdai_sc_3.png)

---

## 🕸️ Scraping GDPR and AI Act

BeautifulSup web scraping has been used to download and clean legal content from official EU websites. Custom functions are implemented to isolate the relevant HTML containers, parse out titles, subtitles, and content, and store the results in structured dictionaries.



In [6]:
import re
import requests
from bs4 import BeautifulSoup

def download_regulation_html(url: str) -> str:
    """
    Downloads the HTML content from the given URL.

    Args:
        url: URL pointing to the regulation HTML page.

    Returns:
        Raw HTML content as a string.

    Raises:
        requests.exceptions.HTTPError: If the HTTP request fails.
    """
    response = requests.get(url)
    response.raise_for_status()  # Raises HTTPError for 4xx/5xx responses
    return response.text


def extract_articles_gdpr_html(html_text: str) -> dict:
    """
    Extracts GDPR regulation articles from HTML content.

    Processes the container with id `L_2016119EN.01000101.doc` to extract article divs.
    Each article div (`class="eli-subdivision"`) is parsed to collect:
      - Article number from div IDs (e.g., ".art_52" becomes "ART_52")
      - Title from `<p class="oj-ti-art">`
      - Content from all `<p class="oj-normal">` paragraphs.

    Args:
        html_text: HTML content of the GDPR page.

    Returns:
        Dictionary with keys like `ART_<number>`, each containing 'title' and 'content'.

    Raises:
        ValueError: If the regulation container is not found.
    """
    soup = BeautifulSoup(html_text, "html.parser")
    articles = {}

    # Isolate GDPR articles within the regulation container
    regulation_container = soup.find("div", id="L_2016119EN.01000101.doc")
    if not regulation_container:
        raise ValueError("Regulation container not found.")

    # Process each article subdivision
    subdivision_divs = regulation_container.find_all("div", class_="eli-subdivision")
    for div in subdivision_divs:
        div_id = div.get("id", "")
        # Match div IDs like ".art_52" to extract article numbers
        if m := re.search(r"\.art_(\d+)", div_id):
            article_number = m.group(1)
            article_key = f"ART_{article_number}"

            # Extract title from the header tag
            header_tag = div.find("p", class_="oj-ti-art")
            article_title = header_tag.get_text(strip=True) if header_tag else "No Title"

            # Aggregate content paragraphs
            content_paragraphs = div.find_all("p", class_="oj-normal")
            article_content = "\n".join(p.get_text(strip=True) for p in content_paragraphs)

            articles[article_key] = {"title": article_title, "content": article_content}

    return articles


def extract_annexes_gdpr_html(html_text: str) -> dict:
    """
    Extracts GDPR annexes from HTML content.

    Processes the container with id `L_2016119EN.01013201.doc` to extract annex divs.
    Each annex div (`class="eli-container"`) is parsed to collect:
      - Annex number from IDs (e.g., ".anx_I" becomes "ANX_I")
      - Title from `<p class="oj-doc-ti">`
      - Subtitle from `<p class="oj-ti-grseq-1">` (optional)
      - Content from all `<p class="oj-normal">` paragraphs.

    Args:
        html_text: HTML content of the GDPR page.

    Returns:
        Dictionary with keys like `ANX_<roman_numeral>`, each containing 'title',
        'subtitle', and 'content'.

    Raises:
        ValueError: If the annex container is not found.
    """
    soup = BeautifulSoup(html_text, "html.parser")
    annexes = {}

    # Isolate annexes within their container
    annex_container = soup.find("div", id="L_2016119EN.01013201.doc")
    if not annex_container:
        raise ValueError("Annex container not found.")

    # Process each annex div
    annex_divs = annex_container.find_all("div", class_="eli-container")
    for div in annex_divs:
        div_id = div.get("id", "")
        # Match div IDs ending with roman numerals (e.g., ".anx_I")
        if m := re.search(r"\.anx_([IVXLCDM]+)$", div_id):
            annex_number = m.group(1)
            annex_key = f"ANX_{annex_number}"

            # Extract title and optional subtitle
            title_tag = div.find("p", class_="oj-doc-ti")
            annex_title = title_tag.get_text(strip=True) if title_tag else "No Title"
            subtitle_tag = div.find("p", class_="oj-ti-grseq-1")
            annex_subtitle = subtitle_tag.get_text(strip=True) if subtitle_tag else ""

            # Aggregate content paragraphs
            content_paragraphs = div.find_all("p", class_="oj-normal")
            annex_content = "\n".join(p.get_text(strip=True) for p in content_paragraphs)

            annexes[annex_key] = {
                "title": annex_title,
                "subtitle": annex_subtitle,
                "content": annex_content,
            }

    return annexes


def download_and_extract_gdpr_content(url: str) -> dict:
    """
    Downloads GDPR HTML and extracts articles + annexes.

    Args:
        url: URL to the GDPR HTML page.

    Returns:
        Dictionary with two keys:
        - "articles": GDPR articles (keys like "ART_52")
        - "annexes": GDPR annexes (keys like "ANX_I")
    """
    html_text = download_regulation_html(url)
    return {
        "articles": extract_articles_gdpr_html(html_text),
        "annexes": extract_annexes_gdpr_html(html_text),
    }


def extract_articles_ai_act_html(html_text: str) -> dict:
    """
    Extracts AI Act articles from HTML content.

    Processes the container with id `enc_1` to extract article divs.
    Each article div (`class="eli-subdivision"`) is parsed to collect:
      - Article number from div IDs (e.g., "art_113" becomes "ART_113")
      - Title from `<p class="oj-ti-art">`
      - Content from all `<p class="oj-normal">` paragraphs.

    Args:
        html_text: HTML content of the AI Act page.

    Returns:
        Dictionary with keys like `ART_<number>`, each containing 'title' and 'content'.

    Raises:
        ValueError: If the article container is not found.
    """
    soup = BeautifulSoup(html_text, "html.parser")
    articles = {}

    # Isolate AI Act articles within their container
    container = soup.find("div", id="enc_1")
    if not container:
        raise ValueError("Article container 'enc_1' not found.")

    # Process each article subdivision
    article_divs = container.find_all("div", class_="eli-subdivision")
    for div in article_divs:
        div_id = div.get("id", "")
        # Match div IDs like "art_113" to extract article numbers
        if m := re.match(r"art_(\d+)$", div_id):
            article_number = m.group(1)
            article_key = f"ART_{article_number}"

            # Extract title from header tag
            header_tag = div.find("p", class_="oj-ti-art")
            article_title = header_tag.get_text(strip=True) if header_tag else "No Title"

            # Aggregate content paragraphs (including nested in tables)
            content_paragraphs = div.find_all("p", class_="oj-normal")
            article_content = "\n".join(p.get_text(strip=True) for p in content_paragraphs)

            articles[article_key] = {"title": article_title, "content": article_content}

    return articles


def extract_annexes_ai_act_html(html_text: str) -> dict:
    """
    Extracts AI Act annexes from HTML content.

    Searches for all divs with class `eli-container` and IDs matching `anx_<roman_numeral>`.
    Each annex div is parsed to collect:
      - Annex number from IDs (e.g., "anx_I" becomes "ANX_I")
      - Title from the first `<p class="oj-doc-ti">`
      - Subtitle from the next sibling `<p class="oj-doc-ti">` (if present)
      - Content from all `<p class="oj-normal">` paragraphs.

    Args:
        html_text: HTML content of the AI Act page.

    Returns:
        Dictionary with keys like `ANX_<roman_numeral>`, each containing 'title',
        'subtitle', and 'content'.
    """
    soup = BeautifulSoup(html_text, "html.parser")
    annexes = {}

    # Find all potential annex divs in the HTML
    annex_divs = soup.find_all("div", class_="eli-container")
    for div in annex_divs:
        div_id = div.get("id", "")
        # Match div IDs like "anx_I" to extract roman numerals
        if m := re.match(r"anx_([IVXLCDM]+)$", div_id):
            annex_number = m.group(1)
            annex_key = f"ANX_{annex_number}"

            # Extract title and subtitle
            title_tag = div.find("p", class_="oj-doc-ti")
            annex_title = title_tag.get_text(strip=True) if title_tag else "No Title"
            subtitle_tag = title_tag.find_next_sibling("p", class_="oj-doc-ti") if title_tag else None
            annex_subtitle = subtitle_tag.get_text(strip=True) if subtitle_tag else ""

            # Aggregate content paragraphs (including nested in tables)
            content_paragraphs = div.find_all("p", class_="oj-normal")
            annex_content = "\n".join(p.get_text(strip=True) for p in content_paragraphs)

            annexes[annex_key] = {
                "title": annex_title,
                "subtitle": annex_subtitle,
                "content": annex_content,
            }

    return annexes


def download_and_extract_ai_act_content(url: str) -> dict:
    """
    Downloads AI Act HTML and extracts articles + annexes.

    Args:
        url: URL to the AI Act HTML page.

    Returns:
        Dictionary with two keys:
        - "articles": AI Act articles (keys like "ART_113")
        - "annexes": AI Act annexes (keys like "ANX_I")
    """
    html_text = download_regulation_html(url)
    return {
        "articles": extract_articles_ai_act_html(html_text),
        "annexes": extract_annexes_ai_act_html(html_text),
    }

In [7]:
# Direct URLs
GDPR_URL = "https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=OJ:L:2016:119:FULL"
AI_ACT_URL = "https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=OJ:L_202401689"

# Download regulations and extract their content
law = {
    "GDPR": download_and_extract_gdpr_content(GDPR_URL),
    "AI_ACT": download_and_extract_ai_act_content(AI_ACT_URL)
}

In [8]:
# Test if the content was extracted by printing an arbitrary entry
def test_extracters():
    print("GDPR Test on Article 23")
    print(law["GDPR"]["articles"]["ART_23"]['content'][:300])
    
    print("\nAI ACT Test on Annex III")
    print(law["AI_ACT"]["annexes"]["ANX_III"]['content'][:300])

test_extracters()

GDPR Test on Article 23
1.   Union or Member State law to which the data controller or processor is subject may restrict by way of a legislative measure the scope of the obligations and rights provided for in Articles 12 to 22 and Article 34, as well as Article 5 in so far as its provisions correspond to the rights and obl

AI ACT Test on Annex III
High-risk AI systems pursuant to Article 6(2) are the AI systems listed in any of the following areas:
1.
Biometrics, in so far as their use is permitted under relevant Union or national law:
(a)
remote biometric identification systems.
This shall not include AI systems intended to be used for biome


## 🔢 Embedding Legal Content

Once the legal text is extracted, it is transformed into vector embeddings for semantic understanding using ChromaDB.

A batch processing helper ensures efficient handling of large volumes of documents.

Embeddings are later used for semantic search and legal query responses.

In [9]:
# --- Custom Gemini Embedding Function --- #
import chromadb
from chromadb import Documents, EmbeddingFunction, Embeddings

EMBEDDING_MODEL = "models/text-embedding-004"

class GeminiEmbeddingFunction(EmbeddingFunction):
    
    # Specify whether to generate embeddings for documents, or queries
    document_mode = True

    @retry.Retry(predicate=is_retriable)
    def __call__(self, input: Documents) -> Embeddings:
        if self.document_mode:
            embedding_task = "retrieval_document"
        else:
            embedding_task = "retrieval_query"

        response = client.models.embed_content(
            model=EMBEDDING_MODEL,
            contents=input,
            config=types.EmbedContentConfig(
                task_type=embedding_task,
            ),
        )
        return [e.values for e in response.embeddings]


# --- Batch Processing Helper --- #
def batch_generator(data, batch_size=100):
    """Split data into chunks of specified size"""
    for i in range(0, len(data), batch_size):
        yield data[i:i + batch_size]


# --- GDPR DB --- #
def create_gdpr_db(law_data):
    """Create and populate GDPR ChromaDB collection with batch processing"""
    embed_fn = GeminiEmbeddingFunction()
    embed_fn.document_mode = True

    # Initialize ChromaDB client and collection
    chroma_client = chromadb.Client()
    db = chroma_client.get_or_create_collection(
        name="GDPR", 
        embedding_function=embed_fn
    )

    # Prepare documents with batch processing
    documents = []
    for item_type in ["articles", "annexes"]:
        for k, v in law_data["GDPR"][item_type].items():
            doc = f"GDPR {v['title']}\n\n"
            if item_type == "annexes" and v['subtitle']:
                doc += f"{v['subtitle']}\n\n"
            doc += v['content']
            documents.append(doc)

    # Process in batches of 100
    for batch_idx, batch in enumerate(batch_generator(documents)):
        batch_ids = [str(batch_idx * 100 + i) for i in range(len(batch))]
        db.add(documents=batch, ids=batch_ids)
        
    return db, embed_fn


# --- AI ACT DB --- #
def create_ai_act_db(law_data):
    """Create and populate AI Act ChromaDB collection with batch processing"""
    embed_fn = GeminiEmbeddingFunction()
    embed_fn.document_mode = True

    # Initialize ChromaDB client and collection
    chroma_client = chromadb.Client()
    db = chroma_client.get_or_create_collection(
        name="AI_ACT", 
        embedding_function=embed_fn
    )

    # Prepare documents with batch processing
    documents = []
    for item_type in ["articles", "annexes"]:
        for k, v in law_data["AI_ACT"][item_type].items():
            doc = f"AI ACT {v['title']}\n\n"
            if item_type == "annexes" and v['subtitle']:
                doc += f"{v['subtitle']}\n\n"
            doc += v['content']
            documents.append(doc)

    # Process in batches of 100
    for batch_idx, batch in enumerate(batch_generator(documents)):
        batch_ids = [str(batch_idx * 100 + i) for i in range(len(batch))]
        db.add(documents=batch, ids=batch_ids)
        
    return db, embed_fn

# Store documents in the respective ChromaDB
db_gdpr, embed_fn_gdpr = create_gdpr_db(law)
db_ai_act, embed_fn_ai_act = create_ai_act_db(law)

An util function is defined to query the DB more easily (it will become a tool for one of our agents later)

In [10]:
def query_chroma_db(query_text: str, source: Literal["GDPR", "AI_ACT"]) -> str:
    """
    Queries the specified law document database with the given query text.

    Args:
        query_text: Natural language query string.
        source: The database to query, either "GDPR" or "AI_ACT".

    Returns:
        A string containing formatted document chunks with article numbers and source.
    """

    n_results = 1
    
    # Validate source input
    if source not in {"GDPR", "AI_ACT"}:
        raise ValueError(f"Invalid source named {source}. Choose either 'GDPR' or 'AI_ACT'.")

    # Select the appropriate collection
    if source == "GDPR":
        collection = db_gdpr  # GDPR collection
        query_embed_fn = embed_fn_gdpr
    elif source == "AI_ACT":
        collection = db_ai_act  # AI_ACT collection
        query_embed_fn = embed_fn_ai_act
    else:
        raise ValueError(f"Invalid source named {source}. Choose either 'GDPR' or 'AI_ACT'.")
    

    query_embed_fn.document_mode = False  # Critical for query embeddings

    # Execute query with query-specific embedding function
    results = collection.query(
        query_texts=[query_text],
        n_results=n_results,
    )

    # Process results
    content = results['documents'][0]
    content = '\n\n'.join(content)
    print(content)
    
    return content

In [11]:
# Test if the ChromaDB correclty stored documents
def test_gdpr_database(gdpr_question):
    
    gdpr_results = query_chroma_db(gdpr_question, "GDPR")

    print(f"Q: {gdpr_question}")
    print(gdpr_results)

def test_ai_act_database(ai_act_question):

    ai_act_results = query_chroma_db(ai_act_question, "AI_ACT")

    print(f"Q: {ai_act_question}")
    print(ai_act_results)

test_gdpr_database("Can I share data regarding healt of my clients with third parties without their consent?")
test_ai_act_database("By predicting the religion of an user with AI System, in which risk level do I end up?")

GDPR Article 9

1.   Processing of personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person's sex life or sexual orientation shall be prohibited.
2.   Paragraph 1 shall not apply if one of the following applies:
(a)
the data subject has given explicit consent to the processing of those personal data for one or more specified purposes, except where Union or Member State law provide that the prohibition referred to in paragraph 1 may not be lifted by the data subject;
(b)
processing is necessary for the purposes of carrying out the obligations and exercising specific rights of the controller or of the data subject in the field of employment and social security and social protection law in so far as it is authorised by Union or Member State law or a 

---

## 🧠 Agent-Based Architecture

The system is implemented using LangGraph, where two agents are defined: one for Retrieval-Augmented Generation (RAG) and another for Google search. These agents are wrapped around functions to make them accessible to a main front-chatbot as tools. i.e., the user interact with a chatbot that then uses the above-mentioned agents as they were tools. This design pattern was chosen to streamline development within a single notebook. If the system were to be structured as a full project with multiple Python files, each agent would likely be developed as a separate module, potentially with an orchestrator managing their interactions. While such solution would have been the most appropriate one, it would have required multiple files modularizing this project, as it is required to be all self-contained in this notebook the use of agents-as-tools patter was prefered. 

Both agents are insturcted using a prompt based on the ReAct framework, including a one-shot example. They have access to other tools themselves (to query the ChromaDB and Google Search). The agents are capable of reflecting on the information they retrieve before generating the final output.

In [12]:
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage
from langchain_core.tools import tool
from langgraph.prebuilt import ToolNode

from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages

In [13]:
# === RAG Agent Prompt Template ===

RAG_INSTRUCTIONS = """
You are an EU law retrieval expert using precise multi-step queries for an embeddings database. 
Your task is to respond to user request analyzing the GDPR and AI ACT through embedding databases.

Follow this protocol:

1. Initial Triage
- ALWAYS start with GDPR unless question explicitly mentions AI systems
- Query BOTH regulations when question involves automated processing

2. Iterative Retrieval Process: 
Solve your task interleaving Thought, Action, Observation steps. 
- *Thought* can reason about the current situation, and produce a natural language query to use for the database
- *Observation* is understanding relevant information from an Action's output
- *Action* can be one of two types:
 (1) query_chroma_db(query_produced_before), which searches on the GDPR and AI ACT official resources using `query_chroma_db`
 (2) <finish>answer</finish>, which returns the answer and finishes the task.
- Repeat the above steps multiple times to clarify any doubt,
- Follow the given order Thought → Action(<search>) → Observation → Thought → Action(<search>) → Observation → ... → Action(<finish>)
- If there is any issue using the tool explain it wisely and extensively what you encountered

3. How to query
- Formulate atomic legal concept queries
- Execute `query_chroma_db` with:
    - query: your natural language query 
    - source: "GDPR"|"AI_ACT" (exact match)
    - Example: query_chroma_db("Storage Sensible Data", "GDPR")
- Retrive result and analyze Cross-references 
(e.g. "if in result, GDPR Article 9 is cited" → trigger a new search for "GDPR Article 9")

4. Response Requirements*
In your response provide a formal, law backed up analysis.
Remember:
- ALWAYS cite article numbers with regulation prefix (GDPR, AI ACT)
- NEVER combine results from different regulations

IT IS MANDATORY TO WRAP WITH THE <finishi>...</finish> tags your summary response.
"""

example="""
-----------

Example:

Request: "Analyze rules regarding AI-driven credit scoring for EU customers"

Thought: I need to first analyze the user’s question through GDPR and AI Act lenses. Starting with GDPR’s automated decision-making rules.
Action: query_chroma_db("automated decision-making", "GDPR")
Observation: The query returned the following result:

Art. 22 GDPR

The data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her.
Paragraph 1 shall not apply if the decision:
is necessary for entering into, or performance of, a contract between the data subject and a data controller;
is authorised by Union or Member State law to which the controller is subject and which also lays down suitable measures to safeguard the data subject’s rights and freedoms and legitimate interests; or
is based on the data subject’s explicit consent.
In the cases referred to in points (a) and (c) of paragraph 2, the data controller shall implement suitable measures to safeguard the data subject’s rights and freedoms and legitimate interests, at least the right to obtain human intervention on the part of the controller, to express his or her point of view and to contest the decision.
Decisions referred to in paragraph 2 shall not be based on special categories of personal data referred to in Article 9(1), unless point (a) or (g) of Article 9(2) applies and suitable measures to safeguard the data subject’s rights and freedoms and legitimate interests are in place.

Thought: Users can opt-out to get an automated-decision according to Art 22. It also cited Art 9: need to have a look at that one too
Action: query_chroma_db("Article 9", "GDPR")
Observation: The query returned the following:

Art. 9 GDPR

1. Processing of personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person’s sex life or sexual orientation shall be prohibited.
2. Paragraph 1 shall not apply if one of the following applies:
(a) the data subject has given explicit consent to the processing of those personal data for one or more specified purposes, except where Union or Member State law provide that the prohibition referred to in paragraph 1 may not be lifted by the data subject;
(b) processing is necessary for the purposes of carrying out the obligations and exercising specific rights of the controller or of the data subject in the field of employment and social security and social protection law in so far as it is authorised by Union or Member State law or a collective agreement pursuant to Member State law providing for appropriate safeguards for the fundamental rights and the interests of the data subject;
(c) processing is necessary to protect the vital interests of the data subject or of another natural person where the data subject is physically or legally incapable of giving consent;
(d) processing is carried out in the course of its legitimate activities with appropriate safeguards by a foundation, association or any other not-for-profit body with a political, philosophical, religious or trade union aim and on condition that the processing relates solely to the members or to former members of the body or to persons who have regular contact with it in connection with its purposes and that the personal data are not disclosed outside that body without the consent of the data subjects;
(e) processing relates to personal data which are manifestly made public by the data subject;
(f) processing is necessary for the establishment, exercise or defence of legal claims or whenever courts are acting in their judicial capacity;
(g) processing is necessary for reasons of substantial public interest, on the basis of Union or Member State law which shall be proportionate to the aim pursued, respect the essence of the right to data protection and provide for suitable and specific measures to safeguard the fundamental rights and the interests of the data subject;
(h) processing is necessary for the purposes of preventive or occupational medicine, for the assessment of the working capacity of the employee, medical diagnosis, the provision of health or social care or treatment or the management of health or social care systems and services on the basis of Union or Member State law or pursuant to contract with a health professional and subject to the conditions and safeguards referred to in paragraph 3;
(i) processing is necessary for reasons of public interest in the area of public health, such as protecting against serious cross-border threats to health or ensuring high standards of quality and safety of health care and of medicinal products or medical devices, on the basis of Union or Member State law which provides for suitable and specific measures to safeguard the rights and freedoms of the data subject, in particular professional secrecy;
(j) processing is necessary for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes in accordance with Article 89(1) based on Union or Member State law which shall be proportionate to the aim pursued, respect the essence of the right to data protection and provide for suitable and specific measures to safeguard the fundamental rights and the interests of the data subject.
3. Personal data referred to in paragraph 1 may be processed for the purposes referred to in point (h) of paragraph 2 when those data are processed by or under the responsibility of a professional subject to the obligation of professional secrecy under Union or Member State law or rules established by national competent bodies or by another person also subject to an obligation of secrecy under Union or Member State law or rules established by national competent bodies.
4. Member States may maintain or introduce further conditions, including limitations, with regard to the processing of genetic data, biometric data or data concerning health.

Thought: This processing is generally prohibited; but I fall under exception 2.b. Now let's have a look to the AI Act, it may classify credit scoring as high-risk.
Action: query_chroma_db("AI system credit scoring classification", "AI_ACT")
Observation: The query returned the following:

Annex III AI ACT

High-risk AI systems pursuant to Article 6(2) are the AI systems listed in any of the following areas:

1. Biometrics, in so far as their use is permitted under relevant Union or national law:
(a) remote biometric identification systems. This shall not include AI systems intended to be used for biometric verification the sole purpose of which is to confirm that a specific natural person is the person he or she claims to be;
(b) AI systems intended to be used for biometric categorisation, according to sensitive or protected attributes or characteristics based on the inference of those attributes or characteristics;
(c) AI systems intended to be used for emotion recognition.

2. Critical infrastructure: AI systems intended to be used as safety components in the management and operation of critical digital infrastructure, road traffic, or in the supply of water, gas, heating or electricity. Related: Recital 55

3. Education and vocational training:
(a) AI systems intended to be used to determine access or admission or to assign natural persons to educational and vocational training institutions at all levels;
(b) AI systems intended to be used to evaluate learning outcomes, including when those outcomes are used to steer the learning process of natural persons in educational and vocational training institutions at all levels;
(c) AI systems intended to be used for the purpose of assessing the appropriate level of education that an individual will receive or will be able to access, in the context of or within educational and vocational training institutions at all levels;
(d) AI systems intended to be used for monitoring and detecting prohibited behaviour of students during tests in the context of or within educational and vocational training institutions at all levels.

4. Employment, workers management and access to self-employment:
(a) AI systems intended to be used for the recruitment or selection of natural persons, in particular to place targeted job advertisements, to analyse and filter job applications, and to evaluate candidates;
(b) AI systems intended to be used to make decisions affecting terms of work-related relationships, the promotion or termination of work-related contractual relationships, to allocate tasks based on individual behaviour or personal traits or characteristics or to monitor and evaluate the performance and behaviour of persons in such relationships.

5. Access to and enjoyment of essential private services and essential public services and benefits:
(a) AI systems intended to be used by public authorities or on behalf of public authorities to evaluate the eligibility of natural persons for essential public assistance benefits and services, including healthcare services, as well as to grant, reduce, revoke, or reclaim such benefits and services;
(b) AI systems intended to be used to evaluate the creditworthiness of natural persons or establish their credit score, with the exception of AI systems used for the purpose of detecting financial fraud;
(c) AI systems intended to be used for risk assessment and pricing in relation to natural persons in the case of life and health insurance;
(d) AI systems intended to evaluate and classify emergency calls by natural persons or to be used to dispatch, or to establish priority in the dispatching of, emergency first response services, including by police, firefighters and medical aid, as well as of emergency healthcare patient triage systems.

6. Law enforcement, in so far as their use is permitted under relevant Union or national law:
(a) AI systems intended to be used by or on behalf of law enforcement authorities, or by Union institutions, bodies, offices or agencies in support of law enforcement authorities or on their behalf to assess the risk of a natural person becoming the victim of criminal offences;
(b) AI systems intended to be used by or on behalf of law enforcement authorities or by Union institutions, bodies, offices or agencies in support of law enforcement authorities as polygraphs or similar tools;
(c) AI systems intended to be used by or on behalf of law enforcement authorities, or by Union institutions, bodies, offices or agencies, in support of law enforcement authorities to evaluate the reliability of evidence in the course of the investigation or prosecution of criminal offences;
(d) AI systems intended to be used by law enforcement authorities or on their behalf or by Union institutions, bodies, offices or agencies in support of law enforcement authorities for assessing the risk of a natural person offending or re-offending not solely on the basis of the profiling of natural persons as referred to in Article 3(4) of Directive (EU) 2016/680, or to assess personality traits and characteristics or past criminal behaviour of natural persons or groups;
(e) AI systems intended to be used by or on behalf of law enforcement authorities or by Union institutions, bodies, offices or agencies in support of law enforcement authorities for the profiling of natural persons as referred to in Article 3(4) of Directive (EU) 2016/680 in the course of the detection, investigation or prosecution of criminal offences.

7. Migration, asylum and border control management, in so far as their use is permitted under relevant Union or national law:
(a) AI systems intended to be used by or on behalf of competent public authorities or by Union institutions, bodies, offices or agencies as polygraphs or similar tools;
(b) AI systems intended to be used by or on behalf of competent public authorities or by Union institutions, bodies, offices or agencies to assess a risk, including a security risk, a risk of irregular migration, or a health risk, posed by a natural person who intends to enter or who has entered into the territory of a Member State;
(c) AI systems intended to be used by or on behalf of competent public authorities or by Union institutions, bodies, offices or agencies to assist competent public authorities for the examination of applications for asylum, visa or residence permits and for associated complaints with regard to the eligibility of the natural persons applying for a status, including related assessments of the reliability of evidence;
(d) AI systems intended to be used by or on behalf of competent public authorities, or by Union institutions, bodies, offices or agencies, in the context of migration, asylum or border control management, for the purpose of detecting, recognising or identifying natural persons, with the exception of the verification of travel documents.

8. Administration of justice and democratic processes:
(a) AI systems intended to be used by a judicial authority or on their behalf to assist a judicial authority in researching and interpreting facts and the law and in applying the law to a concrete set of facts, or to be used in a similar way in alternative dispute resolution; Related: Recital 61
(b) AI systems intended to be used for influencing the outcome of an election or referendum or the voting behaviour of natural persons in the exercise of their vote in elections or referenda. This does not include AI systems to the output of which natural persons are not directly exposed, such as tools used to organise, optimise or structure political campaigns from an administrative or logistical point of view.

Thought: I have enough information to provide an answer. To summarize I found Article 22 regarding, then Article 9, while not much relevant information was found in AI ACT. I can write the summary and finish here.
Action: <finish>Summary: Under GDPR, AI-driven credit scoring systems must comply with Article 22, which prohibits decisions based solely on automated processing that significantly affect individuals—unless explicit consent is given, the processing is contractually necessary, or authorized by law (Art. 22(2)). If sensitive data (e.g., health, ethnicity) are involved, Article 9 imposes further restrictions unless exceptions like public interest or explicit consent apply (Art. 9(2)). From the AI Act perspective, credit scoring is explicitly classified as high-risk (Annex III, §5(b)), triggering strict requirements for transparency, risk management, and human oversight. Therefore, deploying AI-driven credit scoring in the EU requires compliance with both GDPR's protections for individual rights and the AI Act’s regulatory framework for high-risk systems.</finish>
"""

@tool
def rag_agent_as_tool(request: str) -> str:
    """
    Performs legal analysis on a user request using a multi-step RAG agent over GDPR and AI ACT original documents.
    
    Parameters:
        request (str): The user's legal query.
    
    Returns:
        str: A summarized legal answer based on retrieved EU regulation articles.
    """
    
    # Compose the full prompt by combining the base instructions with the user's request
    request = RAG_INSTRUCTIONS + "\n\n---\n\n" + "Solve this request: " + request

    # Configure generation parameters and tools for the RAG agent
    config = types.GenerateContentConfig(
        tools=[query_chroma_db],  # Only this tool is allowed for information retrieval
        temperature=0.2,          # Low temperature for factual/legal precision
        top_p=0.5                 # Controlled sampling for diversity
    )
    
    # Generate content using the Gemini model with specified config and prompt
    response = client.models.generate_content(
        model="gemini-2.0-flash",
        config=config,
        contents=request
    )

    # Extract text
    content = response.text
    # Attempt to extract the final answer enclosed in <finish> tags
    finish_match = re.search(r"<finish>(.*?)</finish>", content, re.DOTALL)
    if finish_match:
        summary = finish_match.group(1).strip()  # Extract the actual answer
        thought_process = content[:finish_match.start()].strip()  # Extract reasoning steps before final answer
    else:
        summary = content  # Fallback: return whole response
    
    return summary

In [14]:
# === Google Search Agent Prompt Template ===

GOOGLE_SEARCH_INSTRUCTIONS = """
You are a Google search expert in EU law retrieval, tasked with providing accurate and formal legal analyses on the General Data Protection Regulation (GDPR) and the proposed Artificial Intelligence Act (AI Act).

Your approach is structured and iterative, following these steps:

1. Initial Triage:
   - By default, begin with GDPR-focused queries unless the user explicitly refers to AI systems.
   - When the question involves automated processing or intersects with both fields, query each regulation separately.

2. Iterative Retrieval Process:
   - Your work should alternate between Thought, Action, and Observation steps.
   - *Thought*: Reason about the user's request and decide on the next query.
   - *Action*: Perform a structured Google search.
       There are two types of actions:
       (a) **Search Action**: Use a search query command formatted as follows:
           <search>"[Source]" "[query terms]" [additional filters]</search>
           - Replace `[Source]` with either "GDPR" or "AI Act" for exact match filtering.
           - "[query terms]" should capture the legal concepts or articles under investigation.
           - Include additional filters like `site:europa.eu`, `site:ec.europa.eu`, or `intitle:"opinion"` for higher precision.
       (b) **Completion Action**: When your analysis is complete, output:
           <finish>your final answer here</finish>
   - *Observation*: Analyze the results returned by your search. If the output includes citations (e.g., references to specific articles), ensure that your next search or the final summary incorporates these details.

3. Query Strategy:
   - Break the legal inquiry into atomic queries focusing on specific issues (e.g., data protection exceptions, high-risk categorization, necessary safeguards).
   - Use search strings that target official legal texts and opinions.
   - Example query components include:
       - Official EU portals (e.g., "site:europa.eu", "site:eur-lex.europa.eu")
       - File filters if needed (e.g., `filetype:pdf`) for accessing draft proposals or formal documents.

4. Final Response Requirements:
   - Provide a formal, legally sound analysis citing relevant articles (e.g., GDPR Article 9, AI Act Annex III) directly with the regulation prefix.
   - Ensure your answer is self-contained and does not merge information between the two separate regulations.
   - Your final response must strictly adhere to the legal and formal tone required for EU law queries.

IT IS MANDATORY TO WRAP WITH THE <finishi>...</finish> tags your summary response.

-----------

Example:

Request: “Can an AI system use health data under the AI Act?”
Thought: “This requires investigating AI Act guidelines on the processing of health-related data, possibly with cross-reference to GDPR for sensitive data rules.”
Action:
    <search>"AI Act" "health data processing" official EDPS site:.europa.eu intitle:"opinion" after:2023-01-01</search>
Observation:
    The search returns EDPS Opinion 03/2023 on AI Act implementations referencing GDPR Article 9(2)(g) exemptions.
Thought: “Now verify the GDPR Article 9(2) exceptions that might apply to AI Act compliance.”
Action:
    <search>GDPR Article 9(2)(g) "legal basis" official site:eur-lex.europa.eu filetype:pdf</search>
Observation:
    Results clarify that GDPR exceptions only apply where Union or State law provides necessary safeguards.
Thought: “Review AI Act Annex III for classifications regarding high-risk health applications.”
Action:
    <search>"AI Act" Annex III "health" site:ec.europa.eu intitle:"proposal" filetype:pdf</search>
Observation:
    Findings include a 2024 Commission working document listing biometric systems (Annex III(1)) and medical devices (Annex III(5)) as high-risk, requiring GDPR Article 9 adherence.
Thought: “Finalize the analysis summarizing these findings.”
Action:
    <finish>Based on the EDPS opinion and the cited official documents, an AI system processing health data is subject to the AI Act's high-risk requirements, which align with GDPR Article 9 restrictions on processing sensitive data, subject to specific legal exemptions provided by national legislation.</finish>
"""

@tool
def google_search_agent_as_tool(query: str) -> str:
    """
    Tool interface to execute structured, filtered Google searches for EU law queries using generative models.

    The function builds a legal research prompt using a predefined instructional template, sends it to the
    Gemini model (configured with Google search tools), parses the response, and returns a concise legal summary.

    Args:
        query (str): A natural language query related to the GDPR or AI Act.

    Returns:
        str: A summarized legal analysis derived from official EU documents and search findings.
    """

    print("> Entered Google Search Agent")
    
    # Concatenate the structured instructions with the actual query
    request = GOOGLE_SEARCH_INSTRUCTIONS + '\n\n---\n\n' + "Solve this request:" + query

    # Configure the generative model to use Google Search as a tool with tuned generation parameters
    config = types.GenerateContentConfig(
        tools=[types.Tool(google_search=types.GoogleSearch())],
        temperature=0.2,  # Low temperature for more factual outputs
        top_p=0.5         # Moderate sampling diversity
    )

    # Send the query and tool configuration to the Gemini model
    response = client.models.generate_content(
        model="gemini-2.0-flash",
        config=config,
        contents=request
    )

    # Extract full response text
    content = response.text

    # Attempt to extract the <finish> tag content as the final answer
    finish_match = re.search(r"<finish>(.*?)</finish>", content, re.DOTALL)
    if finish_match:
        summary = finish_match.group(1).strip()  # Extract the final legal answer
        thought_process = (
            content[:finish_match.start()]
        ).strip()  # Extract everything before <finish> as the reasoning path

    else:
        summary = content  # Fallback: return whole response

    return summary

Now the final system and interface will be defined, here I instruct the front chatbot to also ask extra information to the user if something is not clear. Then it calls the tools at its disposal (which wrap the two above agents) to back up its final response.

In [19]:
# === System Constants and Instructions ===

CHATBOT_INSTRUCTIONS = """
You are an EU law chatbot interface designed to interact with users and retrieve accurate, detailed, and formal legal responses. Your goal is to analyze each user's query and decide whether to use one of the following specialized tools:

- The RAG Agent (rag_agent_as_tool) for retrieving technical or context-rich information from our internal database.

- The Google Search Agent (google_search_agent_as_tool) for obtaining authoritative legal data, references, and documents from official EU sources.



Your responsibilities include:

1. Analyzing the user’s input and determining the legal or technical context of the query.
2. Deciding which tool (or combination of tools) is best suited to retrieve the necessary information.
- **Mandatorily utilizing the RAG Agent (rag_agent_as_tool) to retrieve relevant information from our internal database for every query.**
3. Formulating precise search queries if needed, using an iterative approach:
   - Start by identifying the key legal issues or technical details.
   - Use available tools to fetch authoritative content.
   - If additional clarification is needed, request it from the user.
4. Synthesizing a final, formal, and well-supported answer that cites the proper legal references (e.g., GDPR or AI Act articles) as applicable.
5. Keeping the dialogue professional and ensuring that each response is concise, reliable, and directly addresses the user’s question.

When you receive a user's request:

- Evaluate its focus (e.g., GDPR, AI Act, or technical details).
- If the query is legal in nature, consider using the Google Search Agent.
- If the inquiry requires retrieval from our technical database or context-specific data, consider using the RAG Agent.
- If the question is ambiguous or broad, ask for clarification before proceeding.

Your integrated response should combine any retrieved information into a coherent answer, ensuring that you:

- Avoid mixing information from different tools unless explicitly needed.
- Clearly document each step of your thought process, including which tool was used for what purpose.
- Ensure that your final response meets the high standards of precision and formality expected in EU law and technical discussions.
"""

WELCOME_MESSAGE = "Hi, I am LAWI, your personal EU Law Assistant. I can help you navigate the complicate GDPR and AI ACT regulations. How can I help you today?"

# === State Definition ===

class State(TypedDict):
    """Defines the state structure used by the chatbot system."""
    
    # List of chat messages with custom behavior via add_messages annotation
    messages: Annotated[list, add_messages]
    
    # Flag to enable/disable detailed logging
    verbose_mode: bool
    
    # Flag to control whether messages should be rendered in markdown format
    markdown_dialogue: bool

    # Flag to control whether Human pre-made messages should be used (default for end-to-end compatibility)
    interactive_dialogue: bool

    # Flag to control wheter the instruction to END in the router for human should be skipped
    not_skip_END: bool
    
    # Flag indicating whether the conversation has concluded
    finished: bool

# === Tool Setup ===

tools = [rag_agent_as_tool, google_search_agent_as_tool]
tools_node = ToolNode(tools)

# === Chatbot Initialization ===

chatbot = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    google_api_key=GOOGLE_API_KEY,
)
chatbot = chatbot.bind_tools(tools)

# === Node Functions ===

def chatbot_node(state: State) -> State:
    """
    Handles interaction with the AI chatbot.
    Invokes the chatbot with user messages or provides initial instructions/welcome message.
    """
    if state.get("verbose_mode", True):
        print("> log:\tEntered chatbot_node")

    messages = state["messages"]

    if messages:
        output = [chatbot.invoke(messages)]
    else:
        output = [
            SystemMessage(content=CHATBOT_INSTRUCTIONS),
            AIMessage(content=WELCOME_MESSAGE)
        ]

    return state | {"messages": messages + output}

def human_node(state: State) -> State:
    """
    Handles user input.
    Allows resetting the chat history or exiting the session.
    """
    if state.get("finished", False):
        return state  # Skip processing if already finished

    if state.get("verbose_mode", True):
        print("> log:\tEntered human_node")

    messages = state["messages"]
    if messages:
        last_message = messages[-1]
        if state.get("markdown_dialogue", False):
            display(Markdown(f"Model: {getattr(last_message, 'content', last_message)}"))
        else:
            print(f"Model: {getattr(last_message, 'content', last_message)}")

    # Generate user input based on interaction mode
    if state.get("interactive_dialogue", False):
        user_input = input("User: ")
    else:
        # Predefined message and auto-close after sending
        user_input = """
        How must I store religious data in the case I want to share it with a third-party AI research lab located outside the EU for the purpose of training a new predictive algorithm?
        Consider this is data regarding freedom of expression of religion beliefs, where I got explicit consent from the users, the lab is in the US, and I already have security measure in place (encryption, end-to-end).
        """
        display(Markdown(f"User: {user_input}"))
        output = [HumanMessage(content=user_input)]
        # Immediately mark as finished after sending auto-message
        return state | {"messages": messages + output, "finished": True, }

    # Handle special commands
    if user_input.lower() in {"r", "reset", "clear", "clear history"}:
        return state | {"messages": []}

    if user_input.lower() in {"q", "quit", "exit", "goodbye", "bye"}:
        return state | {"finished": True}

    output = [HumanMessage(content=user_input)]
    return state | {"messages": messages + output}

    return state | {"messages": messages + output}

def route_human_to_end(state: State) -> Literal["chatbot", "__end__"]:
    """
    Determines the next node after human input.
    Ends the session if 'finished' flag is set; otherwise continues to chatbot.
    """
    if state.get("finished", False) and not state.get("not_skip_END", True):
        if state.get("verbose_mode", True):
            print("> log:\tClosing chat")
        return END
    else:
        if state.get("verbose_mode", True):
            print("> log:\tRouting human to chatbot")
        if not state.get("interactive_dialogue", True):
            state["not_skip_END"] = True
        return "chatbot"

def route_chatbot_to_tools(state: State) -> Literal["human", "tools"]:
    """
    Determines whether the chatbot response requires tool invocation.
    Routes to tools if tool_calls are present; otherwise returns to human.
    """
    last_message = state["messages"][-1]

    if hasattr(last_message, "tool_calls") and len(last_message.tool_calls) > 0:
        if state.get("verbose_mode", True):
            print(f"> log:\tRouting chatbot to tools:{last_message.tool_calls}")
        return "tools"

    if state.get("verbose_mode", True):
        print("> log:\tRouting chatbot to human")

    return "human"

# === Graph Definition ===

# Define and build the state graph
graph = StateGraph(State)

# Add nodes representing the chatbot, human, and tools logic
graph.add_node("chatbot", chatbot_node)
graph.add_node("human", human_node)
graph.add_node("tools", tools_node)

# Define graph flow edges
graph.add_edge(START, "chatbot")
graph.add_edge("tools", "chatbot")

# Add conditional routing
graph.add_conditional_edges("chatbot", route_chatbot_to_tools)
graph.add_conditional_edges("human", route_human_to_end)

# Compile the system
system = graph.compile()

In [20]:
# === System Execution ===

# Initialize the session state
state = {
    "messages": [],
    "verbose_mode": False, # Swith to True to see logs and internal thinking of the agents
    "markdown_dialogue": True,
    "interactive_dialogue": False, # Default for end-to-end functioning, switch to True if you want to use it on yourself. If you encounter any issue switch this to True anyway! (wh knows how an LLM could respond!)
    "finished": False
}

# Set configuration
config = {
    "recursion_limit": 100
}

# Start the interactive loop
display(Markdown("##### Instructions:\n- type `q` to quit chat\n- typer `r` to reset chat\n\n**NOTE**: I tried to design the system to produce pre-made user questions, but no one can predict how an LLM could respond, so it may happen it will still ask for your interaction!\n\n"))
print()
display(Markdown("##### Chat:"))
state = system.invoke(state, config) 

##### Instructions:
- type `q` to quit chat
- typer `r` to reset chat

**NOTE**: I tried to design the system to produce pre-made user questions, but no one can predict how an LLM could respond, so it may happen it will still ask for your interaction!






##### Chat:

Model: Hi, I am LAWI, your personal EU Law Assistant. I can help you navigate the complicate GDPR and AI ACT regulations. How can I help you today?

User: 
        How must I store religious data in the case I want to share it with a third-party AI research lab located outside the EU for the purpose of training a new predictive algorithm?
        Consider this is data regarding freedom of expression of religion beliefs, where I got explicit consent from the users, the lab is in the US, and I already have security measure in place (encryption, end-to-end).
        

> Entered Google Search Agent


ChatGoogleGenerativeAIError: Invalid argument provided to Gemini: 400 Unable to submit request because it has an empty text parameter. Add a value to the parameter and try again. Learn more: https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/gemini

## 🧢 Future Work

The above project could be intended as an MVP, as its capacities are still limited, with other potential features that could be implemented.

Future work on this project could involve
- Expanding the range of legal documents and regulatory texts integrated into the system, adding Recitals or other European Court opinions, maybe even declarations of different states' DPAs
- The LangGraph system could be evolved in an actual structured project, divided in different files, where agents would become actual nodes of the graph, implementing different patters such as a meta-LLM orchestrating the system.
- Letting the user leverage the multi-modal nature of Gemini providing their own documents to analyze (privacy policies for the user, Data Processing Agreements etc.)
- Developing new agents to draft Data Processing Agreements or evaluating risk of AI Systems
- Fine-tuning the LLM to produce better responses