# EU AI Act readiness
Overview of risk categories, obligations, and documentation practices.

In [1]:
# Imports
import json
from datetime import datetime, timedelta, timezone

## Learning goals
- Classify systems: prohibited, high-risk, limited-risk, minimal-risk.
- Know high-risk requirements: risk management, data governance, technical documentation, logging, transparency, human oversight, robustness, and cybersecurity.
- Draft minimal documentation for a high-risk system.
- Map overlaps and tensions with GDPR (legal basis, fairness, transparency).

### Why this matters
The EU AI Act introduces a **risk-based approach**: the higher the risk to fundamental rights or safety, the stricter the obligations. Understanding these categories is crucial for compliance planning. Many systems will be "minimal risk" (e.g., spam filters), while high-risk systems (e.g., hiring, medical, credit) face heavier obligations.

> Educational material only. Always verify requirements against the latest consolidated text, guidance, and delegated acts.

## Risk ladder
- **Prohibited:** manipulative subliminal techniques causing harm, social scoring by public authorities, exploitation of vulnerabilities.
- **High-risk:** safety components (e.g., medical devices), critical infrastructure, employment/education, essential private/public services, law enforcement (under conditions), migration/asylum, justice.
- **Limited-risk:** systems with transparency duties (e.g., chatbots disclosed as AI, emotion recognition with notice).
- **Minimal-risk:** most other use cases; follow voluntary codes and good practice.

## Quick reference
- **Risk levels:**
    - **Unacceptable risk:** banned (e.g., social scoring, subliminal manipulation).
    - **High-risk:** strict obligations (e.g., medical devices, recruitment, critical infrastructure).
    - **Limited risk:** transparency obligations (e.g., chatbots disclosed as AI).
    - **Minimal risk:** no new obligations for most use cases (e.g., spam filters, video games).
- **Key roles:** Provider (developer), Deployer (user), Importer, Distributor.
- **Penalties:** depending on the infringement, fines can reach the higher of a fixed amount (e.g., €35M) or a percentage of global turnover (e.g., 7%).

## High-risk obligations checklist
1.  **Risk Management System:** Continuous iterative process.
2.  **Data Governance:** Bias mitigation, representativeness.
3.  **Technical Documentation:** Full details for conformity assessment.
4.  **Record Keeping (Logging):** Traceability of system functioning.
5.  **Transparency:** Instructions for use, accuracy metrics.
6.  **Human Oversight:** Tools for human intervention.
7.  **Accuracy, Robustness, Cybersecurity:** Resilient to errors and attacks.
The code snippets below show concrete examples of applying these obligations.

## Main AI Act Articles (Reference)
- **Prohibited Practices:** Art 5.
- **High-Risk Classification:** Art 6 & Annex III.
- **Data Governance:** Art 10 (Training, validation, testing data).
- **Technical Documentation:** Art 11 & Annex IV.
- **Transparency:** Art 13 (Instructions for use).
- **Human Oversight:** Art 14.
- **Accuracy, Robustness, Cybersecurity:** Art 15.

### Quick classification exercise
Describe a system below and classify it. Note why it is or is not high-risk, and what obligations apply.

In [2]:
scenario = {
    'name': 'AI system for candidate ranking in hiring',
    'provider': 'HR-Tech-Solutions Inc.',
    'deployer': 'Global Corp',
    'risk_level': 'high',
    'why': 'employment use case (Annex III)',
    'obligations': [
        'bias testing and representativeness checks (Art 10)',
        'logging and traceability (Art 12)',
        'human oversight on final decisions (Art 14)',
        'transparency notice to candidates (Art 13)',
        'cybersecurity robustness (Art 15)'
    ]
}
print("--- System Classification Card ---")
for k, v in scenario.items():
    if isinstance(v, list):
        print(f"{k.upper()}:")
        for item in v:
            print(f"  - {item}")
    else:
        print(f"{k.upper()}: {v}")

--- System Classification Card ---
NAME: AI system for candidate ranking in hiring
PROVIDER: HR-Tech-Solutions Inc.
DEPLOYER: Global Corp
RISK_LEVEL: high
WHY: employment use case (Annex III)
OBLIGATIONS:
  - bias testing and representativeness checks (Art 10)
  - logging and traceability (Art 12)
  - human oversight on final decisions (Art 14)
  - transparency notice to candidates (Art 13)
  - cybersecurity robustness (Art 15)


In [3]:
# Data Governance: Check for bias/imbalance
# Expanded dataset
training_data = [
    {'id': 1, 'gender': 'M', 'hired': True},
    {'id': 2, 'gender': 'M', 'hired': False},
    {'id': 3, 'gender': 'M', 'hired': True},
    {'id': 4, 'gender': 'F', 'hired': False},
    {'id': 5, 'gender': 'M', 'hired': True},
    {'id': 6, 'gender': 'F', 'hired': True},
    {'id': 7, 'gender': 'M', 'hired': False},
    {'id': 8, 'gender': 'NB', 'hired': False},
]

def check_representativeness(data, sensitive_attr):
    counts = {}
    for row in data:
        val = row.get(sensitive_attr, 'Unknown')
        counts[val] = counts.get(val, 0) + 1
    
    total = len(data)
    print(f"--- Data Governance Audit: {sensitive_attr} ---")
    print(f"Total Records: {total}")
    
    for k, v in counts.items():
        pct = v/total
        # Using ASCII block for visualization
        bar = "|" * int(pct * 20)
        print(f"  {k:<10} | {bar:<20} | {pct:.1%}")
        
    # Simple heuristic for warning
    if max(counts.values()) / total > 0.7:
        print("\n[WARNING] Significant imbalance detected. One group dominates >70% of data.")
        print("   Action required: Collect more diverse data or apply re-sampling.")
    else:
        print("\n[OK] Balance looks acceptable for this small sample.")
        
check_representativeness(training_data, 'gender')

--- Data Governance Audit: gender ---
Total Records: 8
  M          | ||||||||||||         | 62.5%
  F          | |||||                | 25.0%
  NB         | ||                   | 12.5%

[OK] Balance looks acceptable for this small sample.


**What this code does:** Runs a simple representativeness/imbalance check by counting how many samples fall into each group (here: `gender`).
It prints a bar chart-like view and warns if one group dominates more than 70% of the dataset (a very rough heuristic for Article 10-style data governance checks).

In [4]:
# Prohibited Practices Check
def check_prohibited(features: list) -> str:
    prohibited_keywords = {
        'subliminal': 'Art 5(1)(a) - Subliminal techniques',
        'social_scoring': 'Art 5(1)(c) - Social scoring by public authorities',
        'biometric_categorization_sensitive': 'Art 5(1)(h) - Biometric categorization (race, politics, etc.)',
        'emotion_recognition_workplace': 'Art 5(1)(f) - Emotion recognition in workplace/education',
        'exploitation_vulnerable': 'Art 5(1)(b) - Exploitation of vulnerable groups'
    }
    
    print(f"Scanning features: {features}")
    violations = []
    for f in features:
        if f in prohibited_keywords:
            violations.append(prohibited_keywords[f])
            
    if violations:
        print("\n[RED FLAG] System contains PROHIBITED practices:")
        for v in violations:
            print(f"  - {v}")
        return "PROHIBITED"
    else:
        print("\n[PASS] No prohibited features detected based on keywords.")
        return "PERMITTED"

check_prohibited(['candidate_ranking', 'emotion_recognition_workplace', 'automated_interview'])

Scanning features: ['candidate_ranking', 'emotion_recognition_workplace', 'automated_interview']

[RED FLAG] System contains PROHIBITED practices:
  - Art 5(1)(f) - Emotion recognition in workplace/education


'PROHIBITED'

**What this code does:** Scans a list of system features for keywords that *might* indicate an Article 5 prohibited practice.
This is a simplified demo: real assessments depend on how the system works in context, not just feature names.

## Logging example
Trace decisions to support post-market monitoring and incident investigation.

In [5]:
# Logging example
def log_decision(user_id, input_summary, model_version, output, store):
    timestamp = datetime.now(timezone.utc).isoformat().replace('+00:00', 'Z')
    entry = {
        'timestamp': timestamp,
        'event_id': f"evt-{len(store)+1:04d}",
        'user_id': user_id,
        'input_summary': input_summary,
        'model_version': model_version,
        'output': output,
        'status': 'success'
    }
    store.append(entry)

audit_log = []
# Simulate a few decisions
log_decision('cand-123', {'cv_length': 2, 'keywords': 15}, 'v1.2.0', {'score': 0.71, 'rank': 'B'}, audit_log)
log_decision('cand-124', {'cv_length': 5, 'keywords': 40}, 'v1.2.0', {'score': 0.92, 'rank': 'A'}, audit_log)
log_decision('cand-125', {'cv_length': 1, 'keywords': 5}, 'v1.2.0', {'score': 0.33, 'rank': 'C'}, audit_log)

print(f"--- Audit Log (Total Entries: {len(audit_log)}) ---")
print(json.dumps(audit_log, indent=2))

--- Audit Log (Total Entries: 3) ---
[
  {
    "timestamp": "2026-01-19T08:56:06.936291Z",
    "event_id": "evt-0001",
    "user_id": "cand-123",
    "input_summary": {
      "cv_length": 2,
      "keywords": 15
    },
    "model_version": "v1.2.0",
    "output": {
      "score": 0.71,
      "rank": "B"
    },
    "status": "success"
  },
  {
    "timestamp": "2026-01-19T08:56:06.936315Z",
    "event_id": "evt-0002",
    "user_id": "cand-124",
    "input_summary": {
      "cv_length": 5,
      "keywords": 40
    },
    "model_version": "v1.2.0",
    "output": {
      "score": 0.92,
      "rank": "A"
    },
    "status": "success"
  },
  {
    "timestamp": "2026-01-19T08:56:06.936332Z",
    "event_id": "evt-0003",
    "user_id": "cand-125",
    "input_summary": {
      "cv_length": 1,
      "keywords": 5
    },
    "model_version": "v1.2.0",
    "output": {
      "score": 0.33,
      "rank": "C"
    },
    "status": "success"
  }
]


In [6]:
# Write your notes here
aiact_analysis = {
    'classification': 'High-Risk / Prohibited / Limited?',
    'reasoning': '...',
    'key_obligations': [
        'Human Oversight (Art 14)',
        '...'
    ],
    'transparency_notice_needed': True
}
aiact_analysis

{'classification': 'High-Risk / Prohibited / Limited?',
 'reasoning': '...',
 'key_obligations': ['Human Oversight (Art 14)', '...'],
 'transparency_notice_needed': True}

**What this code does:** A template to capture your system classification and a draft compliance plan (what applies, why, and what you need to implement).

## Mini-case: Classify and comply
- **System:** A "Smart Proctoring" tool for universities that uses webcam video to detect "suspicious behavior" (gaze tracking, background noise) and flags students for potential exam termination.
- **Task:**
    1. Is this high-risk? (Check Annex III — education).
    2. Is it prohibited? (Check Article 5 — biometric categorization / emotion recognition?).
    3. What obligations apply? (Transparency, accuracy, human oversight).
Document your reasoning below.

In [7]:
# Transparency: Instructions for Use
system_metadata = {
    'name': 'RecruitAI-v1',
    'provider': 'HR-Tech-Solutions Inc.',
    'purpose': 'Assist in ranking CVs based on keyword matching',
    'accuracy': '85% on test set B (demographically balanced)',
    'limitations': 'Not valid for executive roles or creative portfolios',
    'human_oversight': 'Human recruiter must verify all rejections before sending emails'
}

def generate_notice(meta):
    return f"""
    ================================================================
                       SYSTEM TRANSPARENCY NOTICE                    
    ================================================================
     SYSTEM:    {meta['name']:<43} 
     PROVIDER:  {meta['provider']:<43} 
    ----------------------------------------------------------------
     PURPOSE:                                                       
     {meta['purpose']:<62} 
    ----------------------------------------------------------------
     PERFORMANCE:                                                   
     {meta['accuracy']:<62} 
    ----------------------------------------------------------------
     [!] LIMITATIONS:                                                
     {meta['limitations']:<62} 
    ----------------------------------------------------------------
     [O] HUMAN OVERSIGHT:                                            
     {meta['human_oversight']:<62} 
    ================================================================
    """
print(generate_notice(system_metadata))


                       SYSTEM TRANSPARENCY NOTICE                    
     SYSTEM:    RecruitAI-v1                                
     PROVIDER:  HR-Tech-Solutions Inc.                      
    ----------------------------------------------------------------
     PURPOSE:                                                       
     Assist in ranking CVs based on keyword matching                
    ----------------------------------------------------------------
     PERFORMANCE:                                                   
     85% on test set B (demographically balanced)                   
    ----------------------------------------------------------------
     [!] LIMITATIONS:                                                
     Not valid for executive roles or creative portfolios           
    ----------------------------------------------------------------
     [O] HUMAN OVERSIGHT:                                            
     Human recruiter must verify all rejection

**What this code does:** Generates a transparency notice (Article 13) for the user.

## Exercises
1) Replace the scenario with your own and classify it.
2) Extend `log_decision` with a minimal retention policy (e.g., drop entries older than N days).
3) Draft a one-page technical documentation outline: purpose, data, models, metrics, foreseeable risks, contact point.
4) Note how GDPR applies: lawful basis, transparency, data subject rights, and if a DPIA is needed.

## Solutions

The following cells contain example solutions for the exercises above.

### Solution 1: Mini-case classification

**System:** "Smart Proctoring" tool (webcam analysis for suspicious behavior).

1.  **Classification:** **High-risk**.
    *   **Reasoning:** It falls under Annex III, paragraph 3 (education and vocational training), specifically systems used for assessing students.
2.  **Prohibited?**
    *   **Check:** Does it use "emotion recognition"? If it infers emotional states (stress, anxiety) to determine cheating, it might be prohibited in education under Article 5(1)(f). If it only tracks gaze/movement, it is likely high-risk but not prohibited.
3.  **Obligations:**
    *   **Transparency:** Students must know they are being monitored and how the system flags behavior.
    *   **Accuracy:** Validate to reduce false positives (e.g., looking away to think flagged as cheating).
    *   **Human oversight:** The system should *flag* for review, not automatically terminate the exam. A human should make the final decision.

### Solution 2: Logging with a retention policy

We extend the log function to remove old entries after a defined retention period.

In [8]:
def log_decision_with_retention(user_id, input_summary, model_version, output, store, retention_days=30):
    # 1. Add new entry
    timestamp = datetime.now(timezone.utc).isoformat().replace('+00:00', 'Z')
    entry = {
        'timestamp': timestamp,
        'user_id': user_id,
        'input': input_summary,
        'output': output
    }
    store.append(entry)
    
    # 2. Apply retention (cleanup)
    cutoff = datetime.now(timezone.utc) - timedelta(days=retention_days)
    
    # Filter list to keep only recent logs
    # In a real DB, this would be a DELETE query
    original_count = len(store)
    store[:] = [
        log for log in store
        if datetime.fromisoformat(log['timestamp'].replace('Z', '+00:00')) > cutoff
    ]
    removed = original_count - len(store)
    
    if removed > 0:
        print(f"[MAINTENANCE] Removed {removed} old log entries.")

# Test it
audit_store = []

# Add an old entry manually
old_timestamp = (datetime.now(timezone.utc) - timedelta(days=40)).isoformat().replace('+00:00', 'Z')
audit_store.append({'timestamp': old_timestamp, 'data': 'old'})

log_decision_with_retention('user-new', 'test', 'v1', 'result', audit_store, retention_days=30)
print(f"Current log size: {len(audit_store)}")

[MAINTENANCE] Removed 1 old log entries.
Current log size: 1


### Solution 3: Technical documentation outline

For a high-risk system, Article 11 requires detailed documentation. A minimal outline includes:

1.  **System description:**
    *   Intended purpose (what does it do?).
    *   Intended users (who uses it?).
    *   Hardware/software requirements.
2.  **Development process:**
    *   **Data:** Source of training data, bias checks performed, data lineage.
    *   **Model:** Architecture choice, training methodology, validation metrics (accuracy, F1-score).
3.  **Risk management:**
    *   List of foreseeable risks (e.g., bias against non-native speakers).
    *   Mitigation measures (e.g., human-in-the-loop review).
4.  **Monitoring:**
    *   How the system is logged.
    *   Post-market monitoring plan.

---

### Solution 4: GDPR overlaps

The AI Act and GDPR complement each other:

*   **Lawful basis (GDPR Article 6):** You still need a lawful basis (e.g., consent or legitimate interests) to process the *personal data* used to train or run the AI.
*   **Automated decision-making (GDPR Article 22):** If the AI makes significant decisions without meaningful human involvement, GDPR provides a right to human review. The AI Act reinforces this via the "human oversight" requirement.
*   **DPIA (GDPR Article 35):** A high-risk AI system will often require a data protection impact assessment (DPIA) because it can involve systematic and extensive evaluation of people.