# Getting Started with PyRevealed

This notebook introduces the core concepts and API of PyRevealed for revealed preference analysis.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Install if needed: pip install pyrevealed
from pyrevealed import (
    ConsumerSession,
    check_garp,
    compute_aei,
    compute_mpi,
    recover_utility,
    get_integrity_score,
)

## 1. Creating a ConsumerSession

The `ConsumerSession` is the fundamental data structure. It holds:
- **Prices**: T x N matrix (T observations, N goods)
- **Quantities**: T x N matrix of chosen bundles

In [None]:
# Example: Consumer faces different prices over 3 shopping trips
# 2 goods: Apples (A) and Bananas (B)

prices = np.array([
    [1.0, 2.0],  # Trip 0: Apples=$1, Bananas=$2
    [2.0, 1.0],  # Trip 1: Apples=$2, Bananas=$1
    [1.5, 1.5],  # Trip 2: Equal prices
])

quantities = np.array([
    [4.0, 1.0],  # Trip 0: Bought 4 Apples, 1 Banana
    [1.0, 4.0],  # Trip 1: Bought 1 Apple, 4 Bananas
    [2.0, 2.0],  # Trip 2: Bought equal amounts
])

session = ConsumerSession(prices=prices, quantities=quantities)

print(f"Number of observations: {session.num_observations}")
print(f"Number of goods: {session.num_goods}")
print(f"\nExpenditure at each trip: {session.own_expenditures}")

## 2. Checking GARP Consistency

GARP (Generalized Axiom of Revealed Preference) tests if choices are consistent with utility maximization.

In [None]:
result = check_garp(session)

print(f"Is consistent (satisfies GARP): {result.is_consistent}")
print(f"Number of violations: {result.num_violations}")
print(f"Computation time: {result.computation_time_ms:.2f} ms")

### Understanding the Logic

This data is **consistent** because:
- When apples were cheap (Trip 0), the consumer bought more apples
- When bananas were cheap (Trip 1), the consumer bought more bananas
- This is rational behavior!

## 3. Detecting Violations

Let's create inconsistent data to see how violations are detected.

In [None]:
# Inconsistent data: At the same prices, consumer chooses different bundles
inconsistent_prices = np.array([
    [1.0, 1.0],  # Equal prices
    [1.0, 1.0],  # Same equal prices
])

inconsistent_quantities = np.array([
    [3.0, 1.0],  # Chose (3, 1)
    [1.0, 3.0],  # Chose (1, 3) - but at same prices, this is inconsistent!
])

bad_session = ConsumerSession(prices=inconsistent_prices, quantities=inconsistent_quantities)
bad_result = check_garp(bad_session)

print(f"Is consistent: {bad_result.is_consistent}")
print(f"Violations found: {bad_result.violations}")

## 4. Computing the Afriat Efficiency Index (AEI)

AEI measures HOW consistent the behavior is:
- **AEI = 1.0**: Perfectly consistent
- **AEI < 1.0**: Some inconsistency (lower = more irrational)

In [None]:
# Consistent session
aei_good = compute_aei(session)
print(f"Consistent data AEI: {aei_good.efficiency_index:.4f}")

# Inconsistent session
aei_bad = compute_aei(bad_session)
print(f"Inconsistent data AEI: {aei_bad.efficiency_index:.4f}")
print(f"Waste fraction: {aei_bad.waste_fraction:.2%}")

## 5. Computing the Money Pump Index (MPI)

MPI measures the exploitable inconsistency - how much money could be extracted from the consumer's irrational choices.

In [None]:
mpi_result = compute_mpi(bad_session)

print(f"Money Pump Index: {mpi_result.mpi_value:.4f}")
print(f"Worst violation cycle: {mpi_result.worst_cycle}")
print(f"Total expenditure: ${mpi_result.total_expenditure:.2f}")

## 6. Quick Integrity Score

For quick checks, use `get_integrity_score()` which returns just the AEI value.

In [None]:
score = get_integrity_score(session)
print(f"Integrity score: {score:.4f}")

# Use for bot detection
if score < 0.85:
    print("WARNING: Potentially bot-like behavior!")
else:
    print("Behavior appears consistent with rational decision-making.")

## 7. Recovering the Utility Function

If data is consistent, we can recover a utility function that explains the choices.

In [None]:
utility_result = recover_utility(session)

if utility_result.success:
    print("Utility recovery successful!")
    print(f"\nUtility values at each observation: {utility_result.utility_values}")
    print(f"Marginal utility of money: {utility_result.lagrange_multipliers}")
else:
    print(f"Recovery failed: {utility_result.lp_status}")

## Summary

PyRevealed provides tools to:

1. **`check_garp()`**: Test if behavior is consistent with rationality
2. **`compute_aei()`**: Measure degree of consistency (0 to 1)
3. **`compute_mpi()`**: Measure exploitable inconsistency
4. **`recover_utility()`**: Reconstruct the underlying utility function
5. **`get_integrity_score()`**: Quick consistency check

## 8. Loading Real-World Datasets (Prest Examples)

The [prest](https://github.com/prestsoftware/prest) project provides example datasets for revealed preference analysis. Let's load them for exploration.

In [None]:
import pandas as pd

BASE_URL = "https://raw.githubusercontent.com/prestsoftware/prest/master/docs/src/_static/examples/"

DATASETS = [
    "budgetary.csv",
    "estimation-models-defaults.csv",
    "estimation-models-no-defaults.csv",
    "general-defaults-128.csv",
    "general-defaults.csv",
    "general-hybrid.csv",
    "general-merging.csv",
    "general-no-defaults-128.csv",
    "general-no-defaults.csv",
    "general-stochastic-consistency.csv",
    "integrity.csv",
]

# Load all datasets into a dictionary
datasets = {}
for name in DATASETS:
    key = name.replace(".csv", "").replace("-", "_")
    datasets[key] = pd.read_csv(BASE_URL + name)
    print(f"Loaded {name}: {datasets[key].shape}")

### Dataset Overview

The datasets fall into two categories:
1. **Budgetary data** (`budgetary.csv`): Classic price/quantity data - works directly with PyRevealed
2. **Menu choice data** (all others): Discrete choice from menus - requires different analysis

In [None]:
# Explore the budgetary dataset (compatible with PyRevealed)
print("BUDGETARY DATASET")
print("=" * 50)
print(f"Shape: {datasets['budgetary'].shape}")
print(f"Columns: {list(datasets['budgetary'].columns)}")
print(f"\nSubjects: {datasets['budgetary']['Subject'].unique()}")
print(f"\nSample rows:")
datasets['budgetary'].head(3)

### Analyzing Budgetary Data with PyRevealed

The budgetary dataset has Price1-6 and Demand1-6 columns. We can run GARP, AEI, and MPI on each subject.

In [None]:
# Analyze each subject's consistency
budgetary = datasets['budgetary']
price_cols = [f"Price{i}" for i in range(1, 7)]
demand_cols = [f"Demand{i}" for i in range(1, 7)]

results = []
for subject in budgetary['Subject'].unique():
    subject_data = budgetary[budgetary['Subject'] == subject]
    prices = subject_data[price_cols].values
    quantities = subject_data[demand_cols].values
    
    # Filter out zero-price columns (some subjects have fewer goods)
    valid_cols = (prices > 0).any(axis=0)
    prices = prices[:, valid_cols]
    quantities = quantities[:, valid_cols]
    
    session = ConsumerSession(prices=prices, quantities=quantities)
    garp_result = check_garp(session)
    aei_result = compute_aei(session)
    
    results.append({
        'Subject': subject,
        'Observations': session.num_observations,
        'Goods': session.num_goods,
        'GARP Consistent': garp_result.is_consistent,
        'Violations': garp_result.num_violations,
        'AEI (Integrity)': aei_result.efficiency_index
    })

results_df = pd.DataFrame(results)
results_df

### Menu Choice Datasets (Exploration Only)

These datasets contain discrete choice data (menu, choice, optional default). They require WARP/SARP analysis rather than GARP.

In [None]:
# Overview of menu choice datasets
menu_datasets = ['general_defaults', 'general_no_defaults', 'integrity']

for name in menu_datasets:
    df = datasets[name]
    print(f"\n{name.upper()}")
    print("=" * 50)
    print(f"Shape: {df.shape}")
    print(f"Columns: {list(df.columns)}")
    print(f"Subjects: {df['subject'].nunique()}")
    print(f"\nSample:")
    print(df.head(3).to_string())