# LLM Evaluation on Anonymized Data

## What & Why
This notebook evaluates **LLM privacy risks** by testing whether LLMs can:
1. **Impute missing/generalized values** - Predict original values that were anonymized (e.g., `[30-39]` → `35`, `jung` → specific age)
2. **Predict target variables** - Classify outcomes (income, diabetes) from privacy-protected features

**Why this matters**: High accuracy indicates LLMs can extract sensitive information from anonymized data, posing privacy risks.

## How
1. **Load two datasets**: Anonymized data (with `?` and generalized values) + Original data (ground truth)
2. **Define task-specific prompts**: Separate configurations for imputation and prediction with embedded system prompts
3. **Call LLM API**: Parametric function using Mistral-Small-24B hosted at kiara.sc.uni-leipzig.de
4. **Compare predictions vs ground truth**: Calculate accuracy, analyze by value type and column
5. **Scale evaluation**: Start with 10-record samples, then run full 500-record evaluations

## Datasets:
- **Adult Dataset**: Income prediction (<=50K vs >50K) from census data - 14 features including age, education, occupation
- **Diabetes Dataset**: Diabetes risk (yes/no) from health survey data - 21 features including BMI, age, physical health

## 1. Setup and Configuration

In [1]:
!pip install aiohttp dotenv



Start with small samples to test or scale up to larger evaluations.

⚠️ **Warning**: This may take significant time and make many API calls!

In [2]:
# --n-samples 100 
# --record-ids 32347,456
# --ids-file llm_evaluation/33-33-34_results/adult_train_imputation_missing_ids.json

In [3]:
!python llm_evaluation.py --percentage 33-33-34 --datasets Adult --input-dir ./data --batch-size 10 --record-ids 32347

LLM PRIVACY RISK EVALUATION
Configuration:
  Percentage: 33-33-34
  Datasets: Adult
  Sample size: 100
  Concurrency: 20
  Batch Size: 10
  Partition: None (full dataset)
  Results directory: llm_evaluation/33-33-34_results
  API: https://kiara.sc.uni-leipzig.de
  Model: vllm-mistral-small-24b-instruct-2501

Input files to be used:
- Dataset: Adult
    anon_train: ./data/adult/generalization/33-33-34/adult_train.csv
    anon_test: ./data/adult/generalization/33-33-34/adult_test.csv
    orig_train: ./data/adult/adult_train.csv
    orig_test: ./data/adult/adult_test.csv
Processing specific record ids: found 0 matching rows for provided ids
Created 0 batches from 0 records (batch_size=10)

⏱️  Time elapsed: 0.00 minutes

ADULT IMPUTATION (train) EVALUATION RESULTS
Total predictions: 0
⚠️  No predictions to evaluate (empty results)

✓ Results saved to llm_evaluation/33-33-34_results/adult_train_imputation_results_ids_replay.csv
Processing specific record ids: found 1 matching rows for prov