# NEP → GAA Budget Anomaly Detection

This notebook analyzes how proposed government budgets (NEP) change after approval (GAA).
It identifies unusual budget adjustments to support transparency and public understanding.

NEP = Proposed Budget  
GAA = Approved Budget


## Background

Government budgets go through a review process before final approval.
While proposed (NEP) and approved (GAA) budgets are public, understanding how and where changes happen is difficult.

This notebook focuses on:
- Identifying large or unusual changes
- Comparing regions fairly
- Avoiding false alarms
- Explaining results in simple language


## Dataset Overview

The dataset contains budget records with:
- Budget type (NEP or GAA)
- Fiscal year
- Budget amount
- Agency and region
- Budget classification and object codes

Only records from a selected agency are analyzed.


## Key Definitions

- NEP (National Expenditure Program): Proposed government budget
- GAA (General Appropriations Act): Final approved budget
- Budget Drift: Difference between NEP and GAA
- Anomaly: A budget adjustment that is unusually large or inconsistent with peers or history

Anomalies indicate items that may require review, not wrongdoing.


## Data Preparation

We separate NEP and GAA records and aggregate them to a comparable level
to ensure fair comparison.


In [127]:
import pandas as pd

df = pd.DataFrame()

fiscl_years = [2020, 2021, 2022, 2023, 2024, 2025, 2026]

for year in fiscl_years:
    df_year = pd.read_parquet(f'cleaned_budget_{year}.parquet')
    df = pd.concat([df, df_year], ignore_index=True)

nep = df[df['budget_type'] == 'NEP']
gaa = df[df['budget_type'] == 'GAA']

# Aggregate to comparable level
group_cols = [
    'fiscal_year',
    'department_code',
    'department_name',
    'full_agency_code',
    'agency_name',
    'region_code',
    'region_description',
    'uacs_object_code',
    'uacs_sub_object_name'
]

nep_agg = nep.groupby(group_cols)['budget_amount'].sum().reset_index()
gaa_agg = gaa.groupby(group_cols)['budget_amount'].sum().reset_index()

merged = nep_agg.merge(
    gaa_agg,
    on=group_cols,
    suffixes=('_nep', '_gaa'),
    how='outer'
)

In [128]:
merged['unapproved_budget'] = merged['budget_amount_gaa'].isna()
merged['inserted_budget'] = merged['budget_amount_nep'].isna()
merged['budget_amount_gaa'] = merged['budget_amount_gaa'].fillna(0)
merged['budget_amount_nep'] = merged['budget_amount_nep'].fillna(0)

## Measuring Budget Changes

We calculate:
- Absolute change (difference in amount)
- Percentage change (relative adjustment)

Small changes are common and expected.


In [129]:
# Calculate drift
merged['abs_change'] = merged['budget_amount_gaa'] - merged['budget_amount_nep']
merged['pct_change'] = merged['abs_change'] / merged['budget_amount_nep'].replace(0, 1)

## Interpreting Budget Adjustments

- NEP > GAA: Budget reduction
- NEP < GAA: Budget increase
- Minimal difference: Approved as proposed


In [130]:
def classify_adjustment(row):
    if row['unapproved_budget']:
        return "Unapproved Budget"
    elif row['inserted_budget']:
        return "Inserted Budget"
    elif row['pct_change'] > 0.05:
        return "Budget Increase"
    elif row['pct_change'] < -0.05:
        return "Budget Reduction"
    else:
        return "No Significant Change"

merged['adjustment_type'] = merged.apply(classify_adjustment, axis=1)

## Anomaly Detection Strategy (Summary)

This project uses multiple anomaly signals to reduce false positives.
A budget item is flagged only when it meets more than one condition:

- Meaningful budget change
- Unusual compared to peers
- Inconsistent with historical behavior

Final decisions are based on a combined anomaly score.


### Threshold Selection

Thresholds were chosen to balance sensitivity and false positives.
They are conservative by design and intended for review support, not enforcement.


In [131]:
merged['anomaly_threshold'] = (
    (merged['pct_change'].abs() > 0.3) |
    (merged['abs_change'].abs() > 10_000_000)
)

### Z-Score (Statistical Outliers)
Logic:

Compare each item to its peers (same year, same classification).

In [132]:
merged['z_score'] = merged.groupby(
    ['fiscal_year', 'uacs_object_code']
)['pct_change'].transform(
    lambda x: (x - x.mean()) / x.std(ddof=0)
)

merged['anomaly_zscore'] = merged['z_score'].abs() > 2

### Regional Comparison

Regions with no or minimal budget changes are treated as normal outcomes
of the approval process and are not flagged as anomalies.

In [133]:
merged['region_mean'] = merged.groupby(
    ['fiscal_year', 'uacs_object_code']
)['pct_change'].transform('mean')

merged['region_std'] = merged.groupby(
    ['fiscal_year', 'uacs_object_code']
)['pct_change'].transform('std')

MIN_CHANGE = 0.05

merged['region_anomaly'] = (
    (merged['pct_change'].abs() > MIN_CHANGE) &
    ((merged['pct_change'] - merged['region_mean']).abs() >
     2 * merged['region_std'])
)

### Historical Consistency Check

Logic:

Compare current drift against historical average drift.

In [134]:
merged['historical_mean'] = merged.groupby(
    ['full_agency_code', 'region_code', 'uacs_object_code']
)['pct_change'].transform('mean')

merged['historical_std'] = merged.groupby(
    ['full_agency_code', 'region_code', 'uacs_object_code']
)['pct_change'].transform('std')

merged['historical_anomaly'] = (
    (merged['pct_change'] - merged['historical_mean']).abs() >
    2 * merged['historical_std']
)

## Final Anomaly Scoring

In [135]:
merged['anomaly_score'] = (
    merged['anomaly_threshold'].astype(int) +
    merged['anomaly_zscore'].astype(int) +
    merged['historical_anomaly'].astype(int) +
    merged['region_anomaly'].astype(int)
)

merged['is_anomaly'] = merged['anomaly_score'] >= 2


Interpretation

Score 0–1: Normal

Score 2–3: Needs review

Score 4: High-risk anomaly

## Natural Language Explanation

In [136]:
def explain(row):
    reasons = []
    if row['anomaly_threshold']:
        reasons.append("large budget change")
    if row['anomaly_zscore']:
        reasons.append("statistically unusual compared to peers")
    if row['historical_anomaly']:
        reasons.append("inconsistent with historical trends")
    if row['region_anomaly']:
        reasons.append("differs from other regions")
    return '' if not reasons else "Flagged due to: " + ", ".join(reasons)

merged['explanation'] = merged.apply(explain, axis=1)

In [137]:
# Save the results
merged.to_parquet('detected_anomaly.parquet')

## Reviewing Sample Results

The following samples show:
- Random non-anomalous records
- Random detected anomalies

This helps validate that the detection logic behaves as expected.


In [63]:
pd.set_option('display.max_columns', None)
# Sample some non-anomalies
merged[(~merged['is_anomaly']) & (merged['explanation'] != '')].sample(5)

Unnamed: 0,fiscal_year,department_code,department_name,full_agency_code,agency_name,region_code,region_description,uacs_object_code,uacs_sub_object_name,budget_amount_nep,budget_amount_gaa,unapproved_budget,inserted_budget,abs_change,pct_change,adjustment_type,anomaly_threshold,z_score,anomaly_zscore,region_mean,region_std,region_anomaly,historical_mean,historical_std,historical_anomaly,anomaly_score,is_anomaly,explanation
312563,2025,20,Department of Social Welfare and Development (...,20008,National Commission on Indigenous Peoples,3,Region III - Central Luzon,5010214001,Year-End Bonus - Civilian,4278.0,0.0,True,False,-4278.0,-1.0,Unapproved Budget,False,-0.104041,False,72.11162,703.101122,False,-0.166667,0.408248,True,1,False,Flagged due to: inconsistent with historical t...
27131,2020,14,Department of the Interior and Local Governmen...,14006,Philippine National Police,2,Region II - Cagayan Valley,5020301002,Office Supplies Expenses,34463.0,52992.0,False,False,18529.0,0.537649,Budget Increase,True,-0.036925,False,298.350031,8069.716853,False,0.064688,0.242372,False,1,False,Flagged due to: large budget change
342391,2026,15,Department of Justice (DOJ),15010,Public Attorney's Office,13,National Capital Region (NCR),5020501000,Postage and Courier Expenses,269.0,0.0,True,False,-269.0,-1.0,Unapproved Budget,False,-0.202395,False,-0.960648,0.194543,False,-0.142857,0.377964,True,1,False,Flagged due to: inconsistent with historical t...
307283,2025,18,Department of Public Works and Highways (DPWH),18001,Office of the Secretary,16,Region XIII - CARAGA,5060403004,Water Supply Systems,30000.0,56000.0,False,False,26000.0,0.866667,Budget Increase,True,0.40718,False,0.313131,1.368846,False,68417.382523,162735.749373,False,1,False,Flagged due to: large budget change
338172,2026,10,Department of Environment and Natural Resource...,10003,Mines and Geosciences Bureau,8,Region VIII - Eastern Visayas,5020309000,"Fuel, Oil and Lubricants Expenses",949.0,0.0,True,False,-949.0,-1.0,Unapproved Budget,False,-0.106479,False,-0.988789,0.105345,False,-0.142857,0.377964,True,1,False,Flagged due to: inconsistent with historical t...


In [83]:
# Sample some detected anomalies
merged[(merged['is_anomaly']) & (merged['anomaly_score'] >=4) ].sample(10)#.iloc[[0]].to_json(orient='records')

Unnamed: 0,fiscal_year,department_code,department_name,full_agency_code,agency_name,region_code,region_description,uacs_object_code,uacs_sub_object_name,budget_amount_nep,budget_amount_gaa,unapproved_budget,inserted_budget,abs_change,pct_change,adjustment_type,anomaly_threshold,z_score,anomaly_zscore,region_mean,region_std,region_anomaly,historical_mean,historical_std,historical_anomaly,anomaly_score,is_anomaly,explanation
33672,2020,17,Department of National Defense (DND),17009,Philippine Navy ( Naval Forces ),13,National Capital Region (NCR),5021002000,Intelligence Expenses,0.0,39749.0,False,True,39749.0,39749.0,Inserted Budget,True,5.0,True,1529.191,7795.342,True,5678.286,15023.77,True,4,True,"Flagged due to: large budget change, statistic..."
63975,2021,7,Department of Education (DepEd),7001,Office of the Secretary,13,National Capital Region (NCR),5021203000,Security Services,17793.0,47793.0,False,False,30000.0,1.686056,Budget Increase,True,16.974759,True,0.008506266,0.09890393,True,0.09800804,0.7932577,True,4,True,"Flagged due to: large budget change, statistic..."
8678,2020,8,State Universities and Colleges (SUCs),8005,Polytechnic University of the Philippines,13,National Capital Region (NCR),5021399099,"Other Property, Plant and Equipment",350.0,550.0,False,False,200.0,0.5714286,Budget Increase,True,3.321826,True,-0.02680251,0.1806529,True,0.0952381,0.2332847,True,4,True,"Flagged due to: large budget change, statistic..."
33694,2020,17,Department of National Defense (DND),17009,Philippine Navy ( Naval Forces ),13,National Capital Region (NCR),5029905003,Rents - Motor Vehicles,0.0,5189.0,False,True,5189.0,5189.0,Inserted Budget,True,17.776386,True,16.36871,291.4434,True,864.8333,2118.4,True,4,True,"Flagged due to: large budget change, statistic..."
53640,2020,35,Budgetary Support to Government Corporations (...,35056,Philippine National Railways,13,National Capital Region (NCR),5021404001,Subsidy Support to Operations of GOCCs,0.0,318000.0,False,True,318000.0,318000.0,Inserted Budget,True,5.591517,True,11943.21,55259.71,True,53000.44,129822.7,True,4,True,"Flagged due to: large budget change, statistic..."
121784,2022,10,Department of Environment and Natural Resource...,10001,Office of the Secretary,9,Region IX - Zamboanga Peninsula,5021304099,Other Structures,1800.0,2700.0,False,False,900.0,0.5,Budget Increase,True,6.886144,True,0.003582408,0.07230231,True,0.08333333,0.2041241,True,4,True,"Flagged due to: large budget change, statistic..."
190472,2023,18,Department of Public Works and Highways (DPWH),18001,Office of the Secretary,3,Region III - Central Luzon,5060403001,Road Networks,0.0,11268151.0,False,True,11268151.0,11268150.0,Inserted Budget,True,2.253085,True,2311839.0,4013173.0,True,1878026.0,4600203.0,True,4,True,"Flagged due to: large budget change, statistic..."
33615,2020,17,Department of National Defense (DND),17009,Philippine Navy ( Naval Forces ),13,National Capital Region (NCR),5010206004,Laundry Allowance - Magna Carta Benefits for P...,0.0,365.0,False,True,365.0,365.0,Inserted Budget,True,12.328828,True,2.385621,29.5085,True,60.83333,149.0106,True,4,True,"Flagged due to: large budget change, statistic..."
222451,2024,7,Department of Education (DepEd),7001,Office of the Secretary,13,National Capital Region (NCR),5010299004,Special Hardship Allowance - Civilian,12383.0,312383.0,False,False,300000.0,24.22676,Budget Increase,True,3.872983,True,1.514173,6.056691,True,4.037794,9.890534,True,4,True,"Flagged due to: large budget change, statistic..."
33678,2020,17,Department of National Defense (DND),17009,Philippine Navy ( Naval Forces ),13,National Capital Region (NCR),5021305001,Machinery,0.0,143289.0,False,True,143289.0,143289.0,Inserted Budget,True,13.784049,True,750.1864,10368.03,True,23881.5,58497.49,True,4,True,"Flagged due to: large budget change, statistic..."


Using a sample object that came from a sample where an anomaly has been detected based from the data

In [116]:
analysis_deeper= df[ (df['fiscal_year']==2020) &  (df['department_name']=='Department of National Defense (DND)') & (df['region_description']=='National Capital Region (NCR)')  & (df['uacs_object_code'] == '5029905003') ].sort_values(by=['fiscal_year'], ascending=False)#.to_json(orient='records')
analysis_deeper

Unnamed: 0,budget_id,budget_type,fiscal_year,budget_amount,budget_description,funding_code,funding_source,department_code,department_name,abbreviation,agency_code,full_agency_code,agency_name,org_code,org_name,region_code,region_description,prexc_fpap_id,uacs_object_code,uacs_classification,uacs_sub_class,uacs_group,uacs_object_name,uacs_sub_object_name
452786,GAA-2020-0000452787,GAA,2020,1928.0,"Development, implementation and monitoring of ...",1101101,Regular Agency Fund - General Fund - New Gener...,17,Department of National Defense (DND),DND,1,17001,Office of the Secretary - Proper,170010000000,Office of the Secretary - Proper,13,National Capital Region (NCR),310100100001000,5029905003,Expenses,Maintenance and Other Operating Expenses,Other Maintenance and Operating Expenses,Rent/Lease Expenses,Rents - Motor Vehicles
452805,GAA-2020-0000452806,GAA,2020,2000.0,"Development, implementation and monitoring of ...",1101101,Regular Agency Fund - General Fund - New Gener...,17,Department of National Defense (DND),DND,1,17001,Office of the Secretary - Proper,170010000000,Office of the Secretary - Proper,13,National Capital Region (NCR),310100100002000,5029905003,Expenses,Maintenance and Other Operating Expenses,Other Maintenance and Operating Expenses,Rent/Lease Expenses,Rents - Motor Vehicles
452983,GAA-2020-0000452984,GAA,2020,650.0,General management and supervision,1101101,Regular Agency Fund - General Fund - New Gener...,17,Department of National Defense (DND),DND,3,17003,National Defense College of the Philippines,170030000000,National Defense College of the Philippines,13,National Capital Region (NCR),100000100001000,5029905003,Expenses,Maintenance and Other Operating Expenses,Other Maintenance and Operating Expenses,Rent/Lease Expenses,Rents - Motor Vehicles
453067,GAA-2020-0000453068,GAA,2020,500.0,Conduct of graduate level and other courses of...,1101101,Regular Agency Fund - General Fund - New Gener...,17,Department of National Defense (DND),DND,3,17003,National Defense College of the Philippines,170030000000,National Defense College of the Philippines,13,National Capital Region (NCR),310200100001000,5029905003,Expenses,Maintenance and Other Operating Expenses,Other Maintenance and Operating Expenses,Rent/Lease Expenses,Rents - Motor Vehicles
453133,GAA-2020-0000453134,GAA,2020,0.0,General management and supervision,1101101,Regular Agency Fund - General Fund - New Gener...,17,Department of National Defense (DND),DND,4,17004,Office of Civil Defense,170040000000,Office of Civil Defense,13,National Capital Region (NCR),100000100001000,5029905003,Expenses,Maintenance and Other Operating Expenses,Other Maintenance and Operating Expenses,Rent/Lease Expenses,Rents - Motor Vehicles
453169,GAA-2020-0000453170,GAA,2020,2270.0,"Enhancement, Capacity Development and Mobiliza...",1101101,Regular Agency Fund - General Fund - New Gener...,17,Department of National Defense (DND),DND,4,17004,Office of Civil Defense,170040000000,Office of Civil Defense,13,National Capital Region (NCR),310101100001000,5029905003,Expenses,Maintenance and Other Operating Expenses,Other Maintenance and Operating Expenses,Rent/Lease Expenses,Rents - Motor Vehicles
453222,GAA-2020-0000453223,GAA,2020,1500.0,Empowering Sectors on DRRM for Resiliency,1101101,Regular Agency Fund - General Fund - New Gener...,17,Department of National Defense (DND),DND,4,17004,Office of Civil Defense,170040000000,Office of Civil Defense,13,National Capital Region (NCR),310102100001000,5029905003,Expenses,Maintenance and Other Operating Expenses,Other Maintenance and Operating Expenses,Rent/Lease Expenses,Rents - Motor Vehicles
453289,GAA-2020-0000453290,GAA,2020,580.0,General management and supervision,1101101,Regular Agency Fund - General Fund - New Gener...,17,Department of National Defense (DND),DND,5,17005,Philippine Veterans Affairs Office (PVAO) - Pr...,170050000000,Philippine Veterans Affairs Office (PVAO) - Pr...,13,National Capital Region (NCR),100000100001000,5029905003,Expenses,Maintenance and Other Operating Expenses,Other Maintenance and Operating Expenses,Rent/Lease Expenses,Rents - Motor Vehicles
453361,GAA-2020-0000453362,GAA,2020,0.0,Payment of veterans' benefits,1101101,Regular Agency Fund - General Fund - New Gener...,17,Department of National Defense (DND),DND,5,17005,Philippine Veterans Affairs Office (PVAO) - Pr...,170050000000,Philippine Veterans Affairs Office (PVAO) - Pr...,13,National Capital Region (NCR),310100100002000,5029905003,Expenses,Maintenance and Other Operating Expenses,Other Maintenance and Operating Expenses,Rent/Lease Expenses,Rents - Motor Vehicles
453460,GAA-2020-0000453461,GAA,2020,205.0,Provide assistance in empowering of veterans o...,1101101,Regular Agency Fund - General Fund - New Gener...,17,Department of National Defense (DND),DND,5,17005,Philippine Veterans Affairs Office (PVAO) - Pr...,170050000000,Philippine Veterans Affairs Office (PVAO) - Pr...,13,National Capital Region (NCR),310200100001000,5029905003,Expenses,Maintenance and Other Operating Expenses,Other Maintenance and Operating Expenses,Rent/Lease Expenses,Rents - Motor Vehicles


In [122]:
nep = analysis_deeper[analysis_deeper['budget_type'] == 'NEP']
gaa = analysis_deeper[analysis_deeper['budget_type'] == 'GAA']

# Aggregate to comparable level
group_cols = [
    'budget_description',
    'funding_source',
    'full_agency_code',
    'agency_name',
    'org_code',
    'org_name',
]

nep_agg = nep.groupby(group_cols)['budget_amount'].sum().reset_index()
gaa_agg = gaa.groupby(group_cols)['budget_amount'].sum().reset_index()

merged_analysis = nep_agg.merge(
    gaa_agg,
    on=group_cols,
    suffixes=('_nep', '_gaa'),
    how='outer'
)
merged_analysis['unapproved_budget'] = merged_analysis['budget_amount_gaa'].isna()
merged_analysis['inserted_budget'] = merged_analysis['budget_amount_nep'].isna()
merged_analysis['budget_amount_gaa'] = merged_analysis['budget_amount_gaa'].fillna(0)
merged_analysis['budget_amount_nep'] = merged_analysis['budget_amount_nep'].fillna(0)

In [123]:
merged_analysis[merged_analysis['budget_amount_nep'] != merged_analysis['budget_amount_gaa']].sort_values(by=['budget_amount_gaa','budget_amount_nep'], ascending=False)#.nunique()

Unnamed: 0,budget_description,funding_source,full_agency_code,agency_name,org_code,org_name,budget_amount_nep,budget_amount_gaa,unapproved_budget,inserted_budget
10,Force-Level Support Services,Regular Agency Fund - General Fund - New Gener...,17009,Philippine Navy ( Naval Forces ),170091700001,Philippine Navy,0.0,4212.0,False,True
5,"Enhancement, Capacity Development and Mobiliza...",Regular Agency Fund - General Fund - New Gener...,17004,Office of Civil Defense,170040000000,Office of Civil Defense,1500.0,2270.0,False,False
15,General management and supervision,Regular Agency Fund - General Fund - New Gener...,17009,Philippine Navy ( Naval Forces ),170091700001,Philippine Navy,0.0,477.0,False,True
7,Force Development,Regular Agency Fund - General Fund - New Gener...,17009,Philippine Navy ( Naval Forces ),170091700001,Philippine Navy,0.0,384.0,False,True
8,Force Sustainment,Regular Agency Fund - General Fund - New Gener...,17009,Philippine Navy ( Naval Forces ),170091700001,Philippine Navy,0.0,116.0,False,True


In [124]:
merged_analysis[merged_analysis['budget_amount_nep'] != merged_analysis['budget_amount_gaa']].sort_values(by=['budget_amount_gaa','budget_amount_nep'], ascending=False).nunique()

budget_description    5
funding_source        1
full_agency_code      2
agency_name           2
org_code              2
org_name              2
budget_amount_nep     2
budget_amount_gaa     5
unapproved_budget     1
inserted_budget       2
dtype: int64

In [126]:
merged_analysis[merged_analysis['budget_amount_nep'] != merged_analysis['budget_amount_gaa']].sort_values(by=['budget_amount_gaa','budget_amount_nep'], ascending=False)[['budget_amount_gaa','budget_amount_nep','unapproved_budget','inserted_budget']].sum()

budget_amount_gaa    7459.0
budget_amount_nep    1500.0
unapproved_budget       0.0
inserted_budget         4.0
dtype: float64

## Limitations

- This analysis does not determine intent or correctness
- Results should be reviewed alongside official documents
- Some budget changes may be policy-driven or emergency-related


## Summary

This notebook demonstrates a transparent and explainable approach
to identifying unusual budget adjustments between NEP and GAA.

Future improvements include:
- Multi-year trend analysis
- Interactive dashboards
- Integration with Snowflake


In [64]:
sample

Unnamed: 0,department_name,budget_amount_nep,budget_amount_gaa
9,Department of Agriculture (DA),622290202.0,547659593.0
10,Department of Budget and Management (DBM),14611023.0,13952386.0
1,Autonomous Region in Muslim Mindanao (ARMM),0.0,0.0
37,State Universities and Colleges (SUCs),594100620.0,647452064.0
6,Commission on Human Rights (CHR),6124744.0,6191895.0
