# Agentic Workflows for GDPR Privacy Compliance


## Project Overview
This project automates the creation of GDPR-compliant privacy policies using Large Language Models (LLMs). It includes a feedback loop between a policy generator and a compliance evaluator to iteratively improve the policy until it meets GDPR standards.


### Workflow Diagram

<img src="Simple Flowchart Infographic Graph.png" alt="Flowchart" width="600">


### How It Works

**Policy Generation:**
An LLM generates a privacy policy for your application.

**GDPR Compliance Check:**
A more powerful evaluator LLM checks the generated policy for compliance with the General Data Protection Regulation (GDPR).

**Suggestion and Drafting:**
If the policy is not GDPR-compliant, the evaluator LLM suggests improvements and saves both the current policy and suggestions as a draft in a CSV file using a tool.

**Policy Refinement:**
The suggestions are sent back to the policy generator LLM, which attempts to revise the policy accordingly.

**Iterative Review:**

- If the revised policy still does not pass the compliance check, the evaluator LLM continues to provide feedback and saves additional drafts.

- This process repeats until the policy meets GDPR compliance standards.

**Finalization:**
Once the policy passes the compliance check, it is finalized and saved as a PDF file.

### What You'll Learn
**How to structure LLM responses using specific data types**

**How to make LLMs interact with external tools and functions**

**How to build an iterative feedback loop between multiple LLMs for automated text generation and validation**

### Requirements
save your OPENAI_API_KEY in an .env file

#### You also need to install the required libraries


In [13]:
from pydantic import BaseModel
from openai import OpenAI
from dotenv import load_dotenv
from fpdf import FPDF
import pandas as pd
import json
import os
load_dotenv()
client = OpenAI()

In [4]:
gdpr_checklist = """
1. Data Inventory and Mapping 
 - Identify all personal data your organization collects, stores, processes, or shares.
 - Document data flows: how data enters, moves through, and exits your systems.
 - Classify types of personal data (e.g., name, email, IP address, etc.).
 - Identify the legal basis for each type of data processing.

2. Privacy Policies and Notices
 - Ensure privacy policies are clear, accessible, and written in plain language.
 - Include required information: types of data collected, processing purposes, legal basis, data retention, rights, and contact details.
 - Provide privacy notices at the point of data collection.

3. Legal Bases and Consent
 - Determine the lawful basis for each processing activity (e.g., consent, contract, legal obligation, legitimate interests).
 - Ensure consent is freely given, specific, informed, and unambiguous.
 - Provide an easy method for individuals to withdraw consent.

4. Data Subject Rights
Have procedures to handle data subject rights requests, including:
 - Right of access
 - Right to rectification
 - Right to erasure (right to be forgotten)
 - Right to restriction of processing
 - Right to data portability
 - Right to object
Ensure responses to requests are made within one month.

5. Security Measures
 - Implement technical and organizational measures to protect personal data.
 - Encrypt data in transit and at rest where appropriate.
 - Regularly review and test security controls.
 - Maintain an incident response and breach notification plan.

6. Data Minimization and Retention
 - Collect only the data necessary for specified purposes.
 - Define and enforce data retention schedules.
 - Securely delete or anonymize data that is no longer needed.

7. Contracts and Data Processors
 - Ensure data processing agreements are in place with all third-party processors.
 - Include GDPR-required clauses in processor contracts.
 - Monitor and audit third-party processor compliance.

8. Records of Processing Activities (ROPA)
 - Maintain a record of all processing activities if required (e.g., for organizations with over 250 employees or high-risk processing).
 - Include purpose of processing, data categories, recipients, international transfers, retention periods, and security measures.

9. International Data Transfers
 - Identify any transfers of personal data outside the EU/EEA.
 - Ensure appropriate safeguards are in place (e.g., Standard Contractual Clauses, adequacy decisions, Binding Corporate Rules).

10. Data Breach Notification
 - Have a process to detect, investigate, and report personal data breaches.
 - Notify the supervisory authority within 72 hours of becoming aware of a breach.
 - Notify affected individuals without undue delay if there is a high risk to their rights.

11. Training and Awareness
 - Provide GDPR training for employees handling personal data.
 - Conduct regular awareness updates and refresher training.

12. Data Protection Officer (DPO)
 - Appoint a DPO if required under GDPR (e.g., public authorities or organizations conducting large-scale monitoring or sensitive data processing).
 - Ensure the DPO is involved in relevant matters and their contact information is published.

13. Data Protection Impact Assessments (DPIA)
 - Conduct DPIAs for high-risk processing activities.
 - Include assessment of risks to rights and freedoms, and measures to address those risks.
 - Consult with the supervisory authority where necessary.

"""

gdpr_system_prompt = f"You are an General Data Protection Regulation (GDPR) expert for checking ideas if they are complient to GDPR or not\n \
** If the privacy policy donot comply with GDPR checklist use save_draft tool to save the draft version, make sure you provide all the arguments of the tool calls\n \
** Must provide the function name inside tool calls\n\n\
Now Evaluate the following privacy policy based on the gdpr checklist below:{gdpr_checklist}"

In [14]:
#print(gdpr_system_prompt)

## Specify the data type with pydantic library
- The LLM will response with a specific style (Evaluator class as we define below)
- The **gdpr_compliant** variable's value should be in boolean (true or false)
- **suggesstion** varialble's value will be in string

In [15]:
class Evaluator(BaseModel):
    gdpr_compliant : bool
    suggestion : str

### Policy Generator 

- This fucntion will genreate a privacy policy for a specific application
- I used gpt 4, feel free to use another model


In [7]:
def policy_generator(application_name): 
    privacy_policy_prompt = "Generate a privacy policy for " +application_name
    completion = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user","content": privacy_policy_prompt}])
    return completion.choices[0].message.content


### Compliance Checker
- This function will check the privacy policy against General Data Protection Regulations (GDPR) Checklist
- It will utilize a tool (please check tools section) if needed
- It will generate response in a specific format with (e.g. Evaluator object)
- For simplicity think the response as a json file {gdpr_compliant will be either true or false, and suggesstion: will be in string}

In [8]:
def compliance_checker(gdpr_system_prompt, gdpr_user_prompt):
    messages = [{"role": "system", "content": gdpr_system_prompt}] + [{"role": "user", "content": gdpr_user_prompt}]
    response = client.beta.chat.completions.parse(model="gpt-4o-mini", messages=messages, response_format=Evaluator, tools = tools)
    return response

### Generate Again()
- If Compliance checker is not satisfied with the privacy policy it will response gdpr_compliance = False, or try to use tool save_draft
- THe Compliance checker will also provide the reason for non compliance
- We will run **genrate_again** function while providing LLM reasons that is provided earlier by compliance_checker

In [9]:
def generate_again(suggestion, gdpr_user_prompt):
    prompt = "This privacy policy that donot comply with general data protection Regulation (GDPR)\n" + gdpr_user_prompt+ "\nHere is the reason for for non complieance\n" + suggestion + "Please rewrite the policy while fixing the issues, so that it become GDPR compliant"
    completion = client.chat.completions.create(model="gpt-4o",
        messages=[{"role": "user","content": prompt}])
    return completion.choices[0].message.content

### LLM Function calling and Tools

- funciton should be described in a json file and provided to the LLM
- based on the situation LLM will reply which tool it want to use
- We will capture the tool name and arguments, and run the function in our computer

In [10]:
# Write a json description of the tool that is understandable by openai
# visit this link to see more examples, https://platform.openai.com/docs/guides/function-calling?api-mode=chat

save_draft_json = {
    "name": "save_reasons",
    "description": "Use this tool if the privacy policy doesnot comply with provided GDPR checklist",
    "parameters": {
        "type": "object",
        "properties": {
            "compliance_check": {
                "type": "string",
                "description": "Reason why the policy is not compliant with the GDPR checklist"
            }
        },
        "required": ["compliance_check"],
        "additionalProperties": False, 
    },
    "strict": True 
}

tools = [{"type": "function", "function": save_draft_json}]
#this tools will be fed to llm, please check above complience checker function, where the tools sent to LLM for use






#these are the function that will run on your system, once you get the tool choice response by the llm
def save_pdf(privacy_policy, output_filename="output.pdf"):
    pdf = FPDF()
    pdf.set_title("My PDF Title")
    pdf.set_auto_page_break(auto=True, margin=15)
    pdf.add_page()
    
    
    # Set and print headline
    pdf.set_font("Arial", 'B', 16)
    pdf.cell(0, 10, "Privacy Policy", ln=True, align='C')  # ln=True moves to the next line
    pdf.ln(10) 

    pdf.set_font("Arial", size=12)
    # Split text into lines
    lines = privacy_policy.split('\n')

    for line in lines:
        pdf.multi_cell(0, 10, txt=line)

    pdf.output(output_filename)
    print(f"PDF successfully created: {output_filename}")
    
    
    
    
def save_reasons(compliance_check, policy):
    csv_path = "policy_compliance.csv"

    # Check if the CSV file exists
    if os.path.exists(csv_path):
        # Load existing file
        df = pd.read_csv(csv_path)
        print("CSV file loaded.")
    else:
        # Create new DataFrame with the required columns
        df = pd.DataFrame(columns=["compliance_check", "policy"])
        df.to_csv(csv_path, index=False)
        print("New CSV file created.")
        
    
    new_entry = {
    "compliance_check": compliance_check, 
    "policy" : policy
    }
        
    new_row_df = pd.DataFrame([new_entry])
    df = pd.concat([df, new_row_df], ignore_index=True)
    df.to_csv(csv_path, index=False)

In [11]:
tools

[{'type': 'function',
  'function': {'name': 'save_reasons',
   'description': 'Use this tool if the privacy policy doesnot comply with provided GDPR checklist',
   'parameters': {'type': 'object',
    'properties': {'compliance_check': {'type': 'string',
      'description': 'Reason why the policy is not compliant with the GDPR checklist'}},
    'required': ['compliance_check'],
    'additionalProperties': False},
   'strict': True}}]

In [12]:
import json

## Here policy_generator will generate a privacy policy for a given application name
## It will then appended to a prompt so that we can fed this polcicy for gdpr compliance check
gdpr_user_prompt = "Here is the complete privacy policy\n" + policy_generator("Large Language Model based Chat Bot")

counter = 0
max_iterations = 10 

while True:
    
    #The privacy policy will be fed to compliance checker for gdpr compliance, and the response will be captured
    compliance_str = compliance_checker(gdpr_system_prompt, gdpr_user_prompt)
    choice = compliance_str.choices[0]
 
    finish_reason = getattr(choice, 'finish_reason', None)
    #If there is no finish reason, that means LLM donot need to save the draft, and it signaling the policy is gdpr compliant.
    if finish_reason is None:
        print("No finish reason. Exiting loop.")
        break
        
        
    #If the privacy policy is not gdpr compliant, the llm will call tools to save the draft, and the non compliance reason
    #will be saved, as LLM will call save_draft function.
    # And it will feed the reason along with the policy to genrate again function where The LLM try to fix the issues
    # The loop will go on and on, unless the policy become gdpr compliant
    if finish_reason == "tool_calls":
        tool_calls = choice.message.tool_calls
        for tool_call in tool_calls:
            function_name = tool_call.function.name
            arguments_str = tool_call.function.arguments
            arguments = json.loads(arguments_str)

            if function_name in globals():
                arguments["policy"] = gdpr_user_prompt
                globals()[function_name](arguments["compliance_check"], arguments["policy"])

            # Generate again using the modified input
            gdpr_user_prompt = generate_again(arguments["compliance_check"], gdpr_user_prompt)

    else:
        flag = json.loads(choice.message.content)["gdpr_compliant"]
        if flag == True:
            print(f"Finished with reason: {finish_reason}")
            print("and its compliant")
            save_pdf(gdpr_user_prompt)
            break
        else:
            print("no tool call")
    
    counter += 1
    print(counter)
    if counter >= max_iterations:
        print("Reached maximum number of iterations. Stopping to avoid infinite loop.")
        break

CSV file loaded.
1
no tool call
2
CSV file loaded.
3
Finished with reason: stop
and its compliant
PDF successfully created: output.pdf


In [19]:
print(gdpr_user_prompt)

Here is a revised version of the privacy policy addressing the shortcomings and ensuring GDPR compliance:

---

**Privacy Policy for Large Language Model-Based Chatbot**

**Effective Date: [Date]**

**1. Introduction**

Welcome to our Chatbot, powered by advanced Large Language Model (LLM) technology (referred to as "Chatbot"). Your privacy is important to us. This Privacy Policy outlines our practices concerning the collection, use, and sharing of your information through the Chatbot service. By using the Chatbot, you consent to the terms of this Privacy Policy.

**2. Data Inventory and Classification**

We maintain a comprehensive data inventory to map and classify all personal data processed through our systems. This includes understanding the flow of data and ensuring that each type of data is categorized appropriately.

**3. Information We Collect**

- **User Interaction Data**: Includes chat logs and user inputs.
- **Usage Information**: Interaction frequency, session duration, f