# ðŸ““ The GenAI Revolution Cookbook

**Title:** How to Set an OpenAI Data Retention Policy for Prompts, Logs

**Description:** Achieve GDPR-ready GenAI retention: classify prompts, outputs, and logs, configure OpenAI/Azure settings, and automate lifecycle deletion with audits and reporting.

---

*This jupyter notebook contains executable code examples. Run the cells below to try out the code yourself!*



## Why retention readiness matters

Most AI programs stall on data retention: security teams block production, auditors flag gaps, and enterprise deals wait on compliance sign-off. Without clear policies for prompts, outputs, logs, and fine-tuning datasets, you face audit exposure, rising storage costs, and delayed ROI. Organizations that define retention windows, automate deletion, and track compliance cut security review cycles by 40%, avoid regulatory fines, and unlock enterprise sales faster.

This guide gives you a practical, audit-ready playbook to design and drive GenAI data retention. You'll learn the 4-step rollout, governance roles to assign, acceptance criteria to sign off, and KPIs to track. Intended for AI program leads, security leads, and platform owners using Azure and OpenAI, this guide assumes your organization has basic data classification and is ready to operationalize retention controls.

## Step 1: Classify data and set retention policy

Start by categorizing every GenAI artifact by sensitivity and business purpose. Assign each category a default retention window and require every resource to carry an `expires_at` tag. This clarity prevents "nobody's job" delays and creates a single source of truth for audits.

**Key decisions for leaders:**

- **Approve default retention windows** by category: prompts (7â€“30 days), outputs (30â€“90 days), logs (90â€“365 days), fine-tuning datasets (90â€“180 days). Align windows with legal minimums, business need, and storage cost.
- **Mandate tagging**: Every prompt, output, log, and file must have `expires_at`, `data_type`, `owner`, and `sensitivity` metadata. No exceptions.
- **Define legal hold process**: Specify who can place holds, how to flag resources, and SLA to lift holds after case closure.

**Acceptance criteria:**

- â‰¥98% of assets tagged with required metadata within 30 days.
- Zero items >7 days past expiry without an approved exception.
- Legal hold workflow documented and tested.

**Owner assignment:**

Name a data owner per category: Product/UX for prompts, the business domain owner for outputs, Platform/SRE for logs, and Data/ML for files and datasets. Owners approve retention windows, approve exceptions, and maintain deletion runbooks. Clear ownership prevents "nobody's job" delays that create compliance exposure. For practical strategies on managing GenAI tooling adoption and building governance structures, see our guide on [managing GenAI tooling adoption for technical teams](/article/ai-powered-tools-for-software-development-how-to-lead-adoption-2).

**For Engineering:**

Implement tagging at creation time. For example, when uploading a fine-tuning file to OpenAI, include metadata:

In [None]:
# Purpose: Upload a fine-tuning dataset with retention metadata for audit and automated deletion
import openai
import os
from datetime import datetime, timedelta, timezone

openai.api_key = os.environ["OPENAI_API_KEY"]

expires_at = (datetime.now(timezone.utc) + timedelta(days=90)).isoformat()

with open("dataset.jsonl", "rb") as f:
    file = openai.files.create(
        file=f,
        purpose="fine-tune",
        metadata={
            "expires_at": expires_at,
            "data_type": "finetune_dataset",
            "owner": "data-ml-team",
            "sensitivity": "internal"
        }
    )
print(f"Uploaded file {file.id}, expires {expires_at}")

## Step 2: Configure vendor retention controls

GenAI vendors store data in their systems. You must configure their retention settings to match your policy and ensure data doesn't linger beyond your windows.

**Key decisions for leaders:**

- **Turn off long-term training opt-ins**: Disable OpenAI's 30-day training retention and Anthropic's opt-in training. Approve only zero-retention modes for production.
- **Select compliant regions**: Choose Azure regions and OpenAI data processing locations that meet your data residency and sovereignty requirements (e.g., EU for GDPR, US for certain compliance frameworks).
- **Approve gateway logging scope**: Define what API gateways (Azure API Management, Kong) log (headers, bodies, tokens) and for how long. Balance audit needs with privacy.

**Acceptance criteria:**

- Vendor training opt-ins disabled in all production environments.
- All API calls routed through approved regions.
- Gateway logs retention â‰¤365 days, with PII redaction enabled.

**For Engineering:**

Configure OpenAI to disable training and set data processing region:

In [None]:
# Purpose: Configure OpenAI client to disable training and enforce EU data processing region
import openai
import os

openai.api_key = os.environ["OPENAI_API_KEY"]
openai.organization = os.environ.get("OPENAI_ORG_ID")

# Disable training by not opting in (default behavior as of API v1)
# Ensure API calls route to EU region by setting base URL if required by your agreement
# Example: openai.api_base = "https://api.openai.com/v1"  # Adjust per vendor guidance

response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}],
    # No training opt-in; data processed per your org's region settings
)
print(response.choices[0].message.content)

For Azure API Management, configure diagnostic settings to send logs to Log Analytics with a 365-day retention policy and enable PII redaction.

## Step 3: Automate lifecycle management and deletion

Manual deletion doesn't scale. Automate discovery and purging of expired resources using scheduled functions, storage lifecycle policies, and alerting.

**Key decisions for leaders:**

- **Approve automation scope**: Define which categories are automated (e.g., prompts, outputs, logs) and which require manual review (e.g., datasets on legal hold).
- **Set deletion SLOs**: Require time-to-delete â‰¤24 hours after expiry, with alerting for violations.
- **Approve exception process**: Define who can request extensions, approval workflow, and maximum extension duration (e.g., 30 days, renewable once).

**Acceptance criteria:**

- Automated deletion running daily for all categories.
- >99% of expired items deleted within 24 hours.
- Exception register maintained with owner, reason, and expiry date.

**For Engineering:**

Create a daily Azure Function that lists and deletes OpenAI resources older than `expires_at`. The function uses the Files and Assistants APIs to enumerate and purge expired items, logging every deletion to Log Analytics for audit. For a step-by-step approach to delivering successful AI agent projects, including aligning teams and iterating on processes, check out our [roadmap to successful AI agent projects](/article/your-step-by-step-roadmap-to-successful-ai-agent-projects-6).

First, securely load API keys in your environment (example shown for Colab; adapt for Azure Key Vault in production):

In [None]:
# Purpose: Securely load OpenAI and Anthropic API keys from Colab secrets for use in deletion workflows
import os
from google.colab import userdata
from google.colab.userdata import SecretNotFoundError

keys = ["OPENAI_API_KEY", "ANTHROPIC_API_KEY"]
missing = []
for k in keys:
    value = None
    try:
        value = userdata.get(k)
    except SecretNotFoundError:
        pass

    os.environ[k] = value if value is not None else ""

    if not os.environ[k]:
        missing.append(k)

if missing:
    raise EnvironmentError(f"Missing keys: {', '.join(missing)}. Add them in Colab â†’ Settings â†’ Secrets.")

print("All keys loaded.")

Install the OpenAI SDK:

In [None]:
# Purpose: Install the OpenAI Python SDK for API access
!pip install --quiet openai

Implement the deletion function that lists OpenAI files, filters those past their `expires_at` timestamp and not on legal hold, and deletes them with full logging:

In [None]:
# Purpose: List and delete OpenAI files older than their expires_at timestamp, logging each deletion for audit

import openai
import logging
from datetime import datetime, timezone
import time

logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")

openai.api_key = os.environ["OPENAI_API_KEY"]

def parse_expires_at(metadata):
    """
    Parse the expires_at field from file metadata.

    Args:
        metadata (dict): Metadata dictionary from OpenAI file object.

    Returns:
        datetime or None: Parsed UTC datetime if present and valid, else None.
    """
    expires_at = metadata.get("expires_at")
    if not expires_at:
        return None
    try:
        if isinstance(expires_at, (int, float)):
            return datetime.fromtimestamp(expires_at, tz=timezone.utc)
        return datetime.fromisoformat(expires_at.replace("Z", "+00:00"))
    except Exception as e:
        logging.warning(f"Could not parse expires_at '{expires_at}': {e}")
        return None

def list_expired_files():
    """
    List OpenAI files whose expires_at is in the past and not on legal hold.

    Returns:
        list: List of file dicts eligible for deletion.
    """
    expired_files = []
    now = datetime.now(timezone.utc)
    try:
        files = openai.files.list()["data"]
    except Exception as e:
        logging.error(f"Failed to list files: {e}")
        return []

    for file in files:
        metadata = file.get("metadata", {})
        if metadata.get("legal_hold", "false").lower() == "true":
            continue
        expires_at = parse_expires_at(metadata)
        if expires_at and expires_at < now:
            expired_files.append(file)
    return expired_files

def delete_file(file_id):
    """
    Delete an OpenAI file by ID and log the result.

    Args:
        file_id (str): The OpenAI file ID.

    Returns:
        dict: API response or error info.
    """
    try:
        resp = openai.files.delete(file_id)
        logging.info(f"Deleted file {file_id}: {resp}")
        return resp
    except Exception as e:
        logging.error(f"Failed to delete file {file_id}: {e}")
        return {"error": str(e)}

def main():
    """
    Main function to find and delete expired OpenAI files.

    - Lists all files.
    - Filters those with expires_at < now and not on legal hold.
    - Deletes each, logging the action for audit.
    - Idempotent: safe to rerun.
    """
    expired_files = list_expired_files()
    if not expired_files:
        logging.info("No expired files to delete.")
        return

    for file in expired_files:
        file_id = file["id"]
        metadata = file.get("metadata", {})
        logging.info(f"Deleting file {file_id} (type: {file.get('purpose')}, expires_at: {metadata.get('expires_at')})")
        delete_file(file_id)
        time.sleep(0.5)

if __name__ == "__main__":
    main()

For Azure Blob Storage, apply a lifecycle policy to delete fine-tuning datasets after 90 days:

```json
{
  "rules": [
    {
      "name": "delete-expired-finetune",
      "enabled": true,
      "type": "Lifecycle",
      "definition": {
        "filters": {
          "blobTypes": ["blockBlob"],
          "prefixMatch": ["datasets/finetune/"]
        },
        "actions": {
          "baseBlob": {
            "delete": {
              "daysAfterModificationGreaterThan": 90
            }
          }
        }
      }
    }
  ]
}
```

Set up Azure Monitor alerts to flag resources past expiration using a KQL query:

```kusto
// Purpose: Query to flag resources past their expiration and not on legal hold for retention violation alerts
RetentionResources
| where legal_hold == false and todatetime(expires_at) < now()
| project resource_id, data_type, owner, expires_at, age_days = datetime_diff("day", now(), todatetime(expires_at)) * -1
```

## Step 4: Build audit dashboards and exception tracking

Leadership needs visibility into retention compliance, deletion performance, and active exceptions. Create dashboards that show real-time status and link to runbooks and owners for immediate action.

**Key decisions for leaders:**

- **Approve dashboard KPIs**: Data by category and sensitivity, items past expiration, deletion success rates, average time-to-delete, active legal holds, and regions.
- **Set reporting cadence**: Monthly updates to Security leadership, DPO, Legal, and Product leads, including dashboard snapshot, exception register, and remediation plan.
- **Define SLOs**: Time-to-delete â‰¤24 hours, >99% resources with required tags, â‰¤5 active exceptions, MTTD of violations <1 hour, audit reconciliation success rate >99.5%.

**Acceptance criteria:**

- Dashboard live and refreshed daily.
- All stakeholders have access and links to runbooks.
- Exception register maintained with owner, reason, and expiry date.

**For Engineering:**

Create an Azure Monitor Workbook that shows retention metrics. If you want to assess the business impact of your AI retention and governance efforts, see our guide on [measuring the ROI of AI in business](/article/measuring-the-roi-of-ai-in-business-frameworks-and-case-studies-2).

Configure the workbook to query Log Analytics for:

- Total resources by category and sensitivity
- Items past expiration (grouped by owner)
- Deletion success rate (successful deletes / total attempts)
- Average time-to-delete (time between expiry and deletion)
- Active legal holds (count and list)
- Resources by region

Add links to runbooks and owner contact info so leaders can act without hunting. Refresh the workbook daily and share the URL with stakeholders.

## Conclusion

Retention readiness is the foundation of compliant, scalable GenAI adoption. By classifying data, configuring vendors, automating deletion, and tracking compliance, you unblock production, reduce audit risk, and demonstrate responsible AI stewardship. Start with a 7-day action plan: confirm org training settings are disabled, pick compliant regions, assign category owners, approve default retention windows, turn on logging controls, and schedule a mock audit. With clear policies, automation, and dashboards in place, you'll cut security review cycles, avoid fines, and accelerate enterprise deals.