#**Policy Liability Hotspot Detection**


# Domain: Legal and Compliances
## Project Overview
This project implements an AI-driven framework for identifying legal and regulatory risks in internal policies.
It detects risky clauses, evaluates potential compliance failures, assigns risk levels and numeric scores,
and generates structured outputs for dashboards or reports.


# Problem Statement & Challenges
Organizations maintain numerous internal policies, which may contain ambiguous or risky clauses.
Manual review is time-consuming, inconsistent, and difficult to scale.

**Challenges:**
- Detecting vague, over-promising, or unrestricted policy statements
- Assessing risk consistently across multiple policies
- Generating structured, actionable outputs
- Handling audio-recorded policies and multi-modal inputs


# AI Models & Techniques
- **Models:**
  - `gemini-2.5-flash` → text reasoning for risk assessment
  - `whisper-1` → audio transcription of policies
  - `gpt-4o-mini-tts` → optional text-to-speech for review
- **Techniques:**
  - Zero-Shot Prompting (Template 1)
  - Few-Shot Prompting (Template 2)
  - Multi-modal processing (audio transcription + text analysis) (Template 3)
  - Structured JSON / tabular outputs for automation


# Features & Capabilities Implemented
- Automated risk detection for policy statements
- Assigns **risk levels** (Low / Medium / High) and **numeric scores**
- Supports **text-only**, few-shot, and audio-transcribed policies
- Generates **structured JSON** for dashboards or reporting
- Optional TTS playback for quick review
- Ensures consistent evaluation using calibrated examples


# Data Flow Architecture
1. Input: Policy text or audio recording
2. Processing:
   - Template 1: Zero-shot risk detection
   - Template 2: Few-shot calibrated risk evaluation
   - Template 3: Audio transcription → multi-modal risk analysis
3. Output:
   - Section Summary | Risk Level | Risk Score | Reason
   - JSON structured output for automation
   - Optional TTS playback
4. Integration:
   - Compliance dashboards, reports, or alerts


##Use Case Description

Enables compliance teams to quickly identify risky policy statements without relying on prior examples.
The AI evaluates the policy text for vague language, over-promising, or unrestricted access, and assigns a risk level and numeric score.
Ideal for **rapid, on-the-fly policy screening** or reviewing new policies with unknown structures.


**Prompt Template 1:**
Zero-shot prompting with instructions to identify risky clauses and assign numeric scores (0–100).


In [None]:
# --------------------------------------------
# Task 2 - Template 1
# Policy Liability Hotspot Detection
# Technique: Zero-Shot Prompting + Risk Scoring+Role-Based Prompting
# --------------------------------------------

def policy_liability_zero_shot(policy_text):
    """
    Identifies risky policy sections and assigns a risk level and numeric score.
    """

    prompt = f"""
You are acting as a legal compliance reviewer for an organization.

Analyze the following internal policy and identify any sections that may create
legal, regulatory, or contractual risk.

Policy Text:
{policy_text}

Instructions:
- Focus on vague language, over-promising, unrestricted access, missing safeguards, or regulatory exposure.
- Do not rewrite the policy.
- Only flag sections that introduce risk.
- Assign a numeric risk score from 0 to 100.

Output strictly in this format:
Section Summary | Risk Level (Low/Medium/High) | Risk Score (0-100) | Reason
"""

    # ---- LLM CALL ----
    # Assumes `client` is already initialized in a previous cell
    response = client.chat.completions.create(
        model="gemini-2.5-flash",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.2  # Low temperature for stable risk classification
    )

    return response.choices[0].message.content


##Sample Input

In [None]:
sample_policy_text = """
Employees may access customer data as needed to perform their duties.
The company will ensure all data is handled securely.
Incident reports should be submitted when possible.
The organization strives to comply with all applicable laws.
"""


##Sample Output

In [None]:
output = policy_liability_zero_shot(sample_policy_text)
print(output)


Employees may access customer data as needed to perform their duties. | High | 95 | This section grants broad, undefined access to sensitive customer data without specifying controls, roles, or the principle of least privilege. It creates significant regulatory risk (e.g., GDPR, CCPA, HIPAA, PCI DSS) by failing to implement necessary access restrictions and accountability.
The company will ensure all data is handled securely. | High | 90 | This is an absolute and potentially over-promising statement. "Will ensure" implies a 100% guarantee, which is unrealistic in cybersecurity. It lacks specific details on *how* security will be ensured, creating contractual risk if a breach occurs and regulatory exposure by not demonstrating concrete security measures.
Incident reports should be submitted when possible. | High | 98 | The phrase "when possible" introduces ambiguity and discretion, undermining the critical need for mandatory and timely incident reporting. This directly conflicts with re

**Use Case Description:**
Template 2 allows regulatory analysts to assess policy risks using **multiple annotated examples**.
It ensures **consistent scoring and interpretation** across similar policy statements.
The AI identifies risky clauses, assigns risk levels and numeric scores, and provides concise justifications.
Best suited for **organizations with recurring policy patterns** or when consistent risk calibration is required.



**Prompt Template 2:**
Few-shot prompting with annotated examples showing risk levels and numeric scores.

In [None]:
# --------------------------------------------
# Task 2 - Template 2
# Policy Liability Hotspot Detection
# Technique: Few-Shot Prompting + Risk Scoring
# --------------------------------------------

def policy_liability_few_shot(policy_text):
    """
    Uses calibrated few-shot examples to identify risky policy clauses
    and assign consistent legal risk scores.
    """

    prompt = f"""
You are a senior legal and compliance analyst.

Below are examples of how internal policy statements are evaluated for legal risk.

Example 1:
Policy Statement: "Employees may access all customer data without restriction."
Risk Level: High
Risk Score: 90
Reason: Unrestricted access violates data minimization and access control requirements.

Example 2:
Policy Statement: "The organization aims to comply with applicable laws."
Risk Level: Low
Risk Score: 20
Reason: Aspirational language without enforceable commitment.

Example 3:
Policy Statement: "Security incidents should be reported when feasible."
Risk Level: Medium
Risk Score: 55
Reason: Ambiguous reporting timelines may fail regulatory expectations.

Now analyze the following internal policy.

Policy Text:
{policy_text}

Instructions:
- Identify only policy statements that introduce legal or regulatory risk.
- Assign a Risk Level (Low / Medium / High).
- Assign a Risk Score from 0 to 100.
- Provide a concise reason.

Output strictly in this format:
Section Summary | Risk Level | Risk Score | Reason
"""

    response = client.chat.completions.create(
        model="gemini-2.5-flash",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.15  # Lower temp for consistency
    )

    return response.choices[0].message.content


##Sample Input

In [None]:
sample_policy_text = """
Employees may access customer data as needed to perform their duties.
The company will ensure all data is handled securely.
Incident reports should be submitted when possible.
The organization strives to comply with all applicable laws.
"""


##Sample Output

In [None]:
output = policy_liability_few_shot(sample_policy_text)
print(output)


Employees may access customer data as needed to perform their duties. | High | 75 | "As needed" is subjective and likely fails to meet "least privilege" or "minimum necessary" requirements for data access.
Incident reports should be submitted when possible. | High | 80 | "When possible" is highly ambiguous and fails to meet specific, often short, regulatory timelines for incident reporting.
The organization strives to comply with all applicable laws. | Low | 20 | Aspirational language without a clear, enforceable commitment to compliance.


**Use Case Description:**
Template 3 supports compliance analysis of **spoken or audio-recorded policies**.
The AI first transcribes the policy (using whisper-1) and then evaluates each statement for legal or regulatory risk.
Outputs include structured JSON with section summaries, risk levels, scores, and reasons.
This is useful for **processing meeting recordings, briefings, or verbal policy updates** without manual transcription.


**Prompt Template 3:**
- Step 1: Audio transcription (whisper-1)
- Step 2: Risk evaluation using structured JSON

In [None]:
def transcribe_policy_audio(audio_path):
    """
    Transcribes spoken policy using whisper-1
    """

    with open(audio_path, "rb") as audio_file:
        transcription = client.audio.transcriptions.create(
            file=audio_file,
            model="whisper-1"
        )

    return transcription.text


In [None]:
def policy_liability_multimodal(policy_text):
    """
    Performs legal risk analysis on transcribed policy
    and returns structured JSON.
    """

    prompt = f"""
You are a senior legal compliance expert.

Follow these steps:
1. Identify policy statements with legal or regulatory implications.
2. Evaluate how each statement could cause compliance failure or liability.
3. Assign a risk level (Low / Medium / High).
4. Assign a numeric risk score (0–100).

Policy Text:
{policy_text}

Return ONLY valid JSON in the following schema:

{{
  "policy_risks": [
    {{
      "section_summary": "string",
      "risk_level": "Low | Medium | High",
      "risk_score": number,
      "reason": "string"
    }}
  ]
}}
"""

    response = client.chat.completions.create(
        model="gemini-2.5-flash",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.1
    )

    return response.choices[0].message.content


##Sample Input
Audio file containing:
This regulation applies to organizations that provide digital services to individuals located in the European Union, and that process personal data such as names, email addresses, IP addresses, or online identifiers.

##Sample Output


In [None]:
# Path to audio file in Google Drive
audio_path = "/content/regulation_audio.wav"

# Step 1: Transcription
policy_text = transcribe_policy_audio(audio_path)
print("TRANSCRIBED POLICY:\n", policy_text)

# Step 2: Risk Analysis
json_output = policy_liability_multimodal(policy_text)
print("\nSTRUCTURED RISK OUTPUT:\n", json_output)


TRANSCRIBED POLICY:
 This regulation applies to organizations that provide digital services to individuals located in the European Union, and that process personal data such as names, email addresses, IP addresses, or online identifiers.

STRUCTURED RISK OUTPUT:
 ```json
{
  "policy_risks": [
    {
      "section_summary": "The policy defines the scope of applicability based on providing digital services to individuals located in the European Union and processing personal data, including indirect identifiers like IP addresses and online identifiers.",
      "risk_level": "High",
      "risk_score": 90,
      "reason": "Failure to accurately identify whether an organization falls within the scope of this regulation (due to misinterpreting 'digital services,' 'individuals located in the EU,' or the broad definition of 'personal data' which includes IP addresses and online identifiers) will result in complete non-compliance with all its requirements. This foundational error means the orga

# Summary & Reusability
- All three templates provide modular, reusable pipelines for policy risk detection.
- Zero-shot is ideal for rapid checks; few-shot ensures calibrated scoring; multi-modal handles audio inputs.
- Outputs are structured for dashboards, reports, or automated compliance alerts.
- Can be extended to new policy types, regions, or compliance standards with minimal modifications.
- Supports consistent, scalable, and audit-ready regulatory risk analysis.
