# LLM Evaluation on Anonymized Data
## Notice: This is only for testing. For runs, use llm_runs.sh or llm.job from the project root.

## What & Why
This notebook evaluates whether LLMs can:
1. **Impute missing/generalized values** - Predict original values that were anonymized (e.g., `[30-39]` → `35`, `young` → specific age)
2. **Predict target variables** - Classify outcomes (label inference) from privacy-protected features

## How it works
1. **Load two datasets**: Anonymized data (with `?` and generalized values) + Original data (ground truth)
2. **Define task-specific prompts**: Separate configurations for imputation and prediction with embedded system prompts
3. **Call LLM API**: Parametric function using LLM API URL and Model from .env_llm (in project root)
4. **Compare predictions vs ground truth**: Calculate accuracy, analyze by value type and column

## 1. Setup and Configuration

In [1]:
!pip install aiohttp dotenv



Start with small samples to test or scale up to larger evaluations.

⚠️ **Warning**: This may take significant time and make many API calls!

In [None]:
# some useful options
# limit samples: --n-samples 100 
# list of specific ids: --record-ids 32347,456
# file with ids: --ids-file llm_evaluation/33-33-34_results/adult_train_imputation_missing_ids.json

In [None]:
!python llm_evaluation.py --percentage 33-33-34 --datasets Adult --input-dir ../data --batch-size 10 --n-samples 2

LLM PRIVACY RISK EVALUATION
Configuration:
  Percentage: 33-33-34
  Datasets: Adult
  Sample size: 2
  Concurrency: 20
  Batch Size: 2
  Partition: None (full dataset)
  Results directory: ../llm_evaluation/33-33-34_results
  API: https://kiara.sc.uni-leipzig.de
  Model: vllm-mistral-small-24b-instruct-2501

Input files to be used:
- Dataset: Adult
    anon_train: ../data/adult/generalization/33-33-34/adult_train.csv
    anon_test: ../data/adult/generalization/33-33-34/adult_test.csv
    orig_train: ../data/adult/adult_train.csv
    orig_test: ../data/adult/adult_test.csv
Created 1 batches from 2 records (batch_size=2)

⏱️  Time elapsed: 0.06 minutes

ADULT IMPUTATION (train) EVALUATION RESULTS
Total predictions: 13
Correct predictions: 4
Accuracy: 30.77%

By value type:
  generalized: 20.00% (1/5)
  missing: 37.50% (3/8)

By column:
  capital-gain: 100.00% (1/1)
  education: 0.00% (0/1)
  fnlwgt: 0.00% (0/1)
  hours-per-week: 100.00% (2/2)
  marital-status: 0.00% (0/1)
  occupation: 0