Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions .github/workflows/policy_engine_checks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
name: Policy Engine (Go) Checks

on:
workflow_dispatch:
pull_request:
paths:
- "policy-engine/**"
- ".github/workflows/policy_engine_checks.yml"
merge_group:
types: [checks_requested]
push:
branches:
- "main"
- "release-**"
paths:
- "policy-engine/**"
- ".github/workflows/policy_engine_checks.yml"

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

jobs:
go-checks:
runs-on: ubuntu-latest
defaults:
run:
working-directory: policy-engine
steps:
- name: Checkout
uses: actions/checkout@v6

- name: Set up Go
uses: actions/setup-go@v5
with:
go-version: "1.23"
cache-dependency-path: policy-engine/go.mod

- name: Verify module
run: go mod tidy && git diff --exit-code go.mod go.sum

- name: Format
run: test -z "$(gofmt -l .)"

- name: Build
run: go build ./...

- name: Vet
run: go vet ./...

- name: Test
run: go test ./... -v -count=1
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,7 @@ htmlcov/
.cache
nosetests.xml
coverage.xml
test_report.xml
*.cover
.hypothesis/
.pytest_cache/
Expand Down
4 changes: 4 additions & 0 deletions changelog/7926-go-pbac-policy-engine.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
type: Added
description: Go PBAC policy engine library for high-throughput evaluation, plus `fides pbac` CLI commands for purpose and access policy evaluation
pr: 7926
labels: []
162 changes: 162 additions & 0 deletions pbac/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
# PBAC demo fixtures

Sample data for the `fides-pbac` CLI (Go standalone binary, see
`policy-engine/README.md`). Each `.txt` file in `entries/` is one
identity's SQL queries; the CLI is told which identity via
`--identity`, extracts table references from the SQL with
`pkg/sqlextract`, and runs them through the full PBAC pipeline
(purpose evaluation + policy filtering).

All domains use RFC 2606 reserved `.example` suffixes so this is safe
to commit to the public repo.

## Cast

| Identity | Consumer | Purposes |
|---|---|---|
| `alice@demo.example`, `priya@demo.example` | Analytics Team | `analytics` |
| `bob@demo.example`, `maria@demo.example` | Marketing Team | `marketing` |
| `dave@demo.example` | Onboarding | *none declared* |
| `carol@demo.example` | *not registered* | |

| Purpose | `data_use` |
|---|---|
| `analytics` | `analytics.reporting` |
| `marketing` | `marketing.advertising` |
| `billing` | `essential.service.payment_processing` |

| Dataset (`fides_key`) | Dataset `data_purposes` | Collections |
|---|---|---|
| `sales` | `billing` | `orders`, `invoices` (+ `analytics` at collection level) |
| `events` | `analytics` | `page_views` |
| `marketing` | `marketing` | `campaigns` |

Tables are assumed to be **globally unique across datasets**, so the CLI
resolves queries by bare table name. `SELECT ... FROM orders` and
`SELECT ... FROM warehouse.archive.orders` both resolve to whichever
dataset declares an `orders` collection. A query naming a table that
isn't a declared collection (e.g. `cold_storage`) produces an
`UNCONFIGURED_DATASET` gap identified by the query's qualified name.

## Purposes at three levels

`data_purposes` can appear on the dataset, each collection, and each
field. They stack additively:

```
effective_purposes(dataset.collection)
= dataset.data_purposes
∪ collection.data_purposes
∪ union(field.data_purposes for each field in collection)
```

The engine currently evaluates at collection granularity (the CLI
extracts tables, not individual columns from SELECT lists), so
field-level purposes fold into their owning collection's effective
set. A column-aware extractor would let field-level purposes gate
individual SELECTs, but that's out of scope today.

`sales.invoices` demonstrates the collection layer: the dataset is
`billing`, the collection adds `analytics`, so analytics-team queries
against invoices pass the purpose check at the engine without needing
any policy override.

## Access policies

`policies/allow-analytics-on-billing-data.yml` shows a realistic ALLOW
override. It matches any violation where the dataset's purposes resolve
to `essential.service.payment_processing` (the `billing` purpose's
`data_use`) and suppresses the violation.

Policy evaluation only runs on purpose violations. Compliant queries
and coverage gaps pass through unchanged — gaps represent missing
configuration, not access decisions.

## File layout

```
pbac/
consumers/ one YAML per consumer (top-level key: consumer:)
purposes/ one YAML per purpose (top-level key: purpose:)
datasets/ fideslang Dataset YAML (top-level key: dataset:)
policies/ one YAML per policy (top-level key: policy:)
entries/ one .txt per identity, raw SQL separated by semicolons
```

## Invocation

```bash
fides-pbac --config pbac/ --identity alice@demo.example pbac/entries/alice.txt
fides-pbac --config pbac/ --identity bob@demo.example pbac/entries/bob.txt
fides-pbac --config pbac/ --identity carol@demo.example pbac/entries/carol.txt
fides-pbac --config pbac/ --identity dave@demo.example pbac/entries/dave.txt
```

## Expected outcomes

| File | Query | Outcome |
|---|---|---|
| `alice.txt` | `SELECT ... FROM page_views ...` | **compliant** (analytics ∩ analytics) |
| `alice.txt` | `SELECT ... FROM orders ...` | violation **suppressed** by `allow-analytics-on-billing-data` |
| `alice.txt` | `SELECT ... FROM invoices ...` | **compliant** via collection-level `analytics` on `sales.invoices` |
| `alice.txt` | `SELECT ... FROM campaigns ...` | **violation stands** — no matching policy |
| `bob.txt` | `SELECT ... FROM cold_storage ...` | **gap** `UNCONFIGURED_DATASET` |
| `carol.txt` | `SELECT ... FROM page_views` | **gap** `UNRESOLVED_IDENTITY` |
| `dave.txt` | `SELECT ... FROM page_views ...` | **gap** `UNCONFIGURED_CONSUMER` |

## Schema notes

**Consumer YAML**. A consumer represents a group (or individual) that
accesses data, with a list of identities under `members`. Every member
email resolves to the same consumer, so an identity match in the CLI is
"`identity` appears in some consumer's `members` list."

```yaml
consumer:
- name: Analytics Team
members:
- alice@demo.example
- priya@demo.example
purposes: [analytics]
```

If the same identity appears in multiple consumers, the last one loaded
wins.

**Purpose YAML** mirrors `fidesplus/seed/pbac/data.py::PURPOSES`:

```yaml
purpose:
- fides_key: analytics
name: Product Analytics
data_use: analytics.reporting
data_categories: [user.behavior]
```

**Dataset YAML** is standard fideslang. `data_purposes` can be declared
at the dataset, collection, and field levels; they stack additively
(see "Purposes at three levels" above). `sales.invoices` demonstrates
the collection layer — `sales` is `billing` at the dataset level,
`invoices` adds `analytics` at the collection level, so analytics-team
queries against `invoices` pass the purpose check directly.

**Policy YAML** matches `pbac.AccessPolicy`:

```yaml
policy:
- key: allow-analytics-on-billing-data
priority: 100
enabled: true
decision: ALLOW
match:
data_use:
any:
- essential.service.payment_processing
unless: []
action:
message: ...
```

Match blocks key on the `data_use` of the dataset being accessed
(the CLI resolves dataset purposes to their `data_use` via the
purposes/ directory before calling the policy engine).
8 changes: 8 additions & 0 deletions pbac/consumers/analytics-team.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
consumer:
- name: Analytics Team
description: Product analysts running reporting queries.
members:
- alice@demo.example
- priya@demo.example
purposes:
- analytics
8 changes: 8 additions & 0 deletions pbac/consumers/marketing-team.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
consumer:
- name: Marketing Team
description: Marketing team managing advertising campaigns.
members:
- bob@demo.example
- maria@demo.example
purposes:
- marketing
8 changes: 8 additions & 0 deletions pbac/consumers/onboarding-unconfigured.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
consumer:
- name: Onboarding
description: Registered consumer with no declared purposes. Any member
of this consumer produces UNCONFIGURED_CONSUMER gaps until purposes
are declared.
members:
- dave@demo.example
purposes: []
24 changes: 24 additions & 0 deletions pbac/datasets/campaigns.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
dataset:
- fides_key: marketing
organization_fides_key: default_organization
name: Marketing Campaigns
description: Campaign definitions and targeting rules.
data_categories:
- user.contact
data_purposes:
- marketing
collections:
- name: campaigns
description: Campaign definitions.
data_categories:
- user.contact
fields:
- name: campaign_id
data_categories:
- system.operations
- name: name
data_categories:
- system.operations
- name: audience_rule
data_categories:
- user.contact
24 changes: 24 additions & 0 deletions pbac/datasets/events.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
dataset:
- fides_key: events
organization_fides_key: default_organization
name: Product Events
description: Page views and behavioral events.
data_categories:
- user.behavior
data_purposes:
- analytics
collections:
- name: page_views
description: Page view events from the web application.
data_categories:
- user.behavior
fields:
- name: user_id
data_categories:
- user.unique_id
- name: page_path
data_categories:
- user.behavior
- name: event_date
data_categories:
- system.operations
48 changes: 48 additions & 0 deletions pbac/datasets/sales.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
dataset:
- fides_key: sales
organization_fides_key: default_organization
name: Sales
description: Order and invoice records.
data_categories:
- user.financial
data_purposes:
- billing
collections:
- name: orders
description: Customer order records.
data_categories:
- user.financial
fields:
- name: order_id
data_categories:
- system.operations
- name: customer_id
data_categories:
- user.unique_id
- name: total
data_categories:
- user.financial
- name: order_date
data_categories:
- system.operations
- name: invoices
description: Invoice records tied to orders. Also used by the
Analytics Team for revenue reconciliation, so the collection
declares analytics as an additional purpose beyond the dataset's
billing default.
data_categories:
- user.financial
data_purposes:
- analytics
fields:
- name: invoice_id
data_categories:
- system.operations
- name: order_id
data_categories:
- system.operations
- name: amount
data_categories:
- user.financial
data_purposes:
- analytics
19 changes: 19 additions & 0 deletions pbac/entries/alice.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
-- alice@demo.example is registered as the Analytics Team (purpose: analytics).
--
-- Compliant: analytics ∩ events.data_purposes = {analytics}
SELECT user_id, page_path FROM page_views WHERE event_date = '2026-04-14';

-- Purpose mismatch at dataset level (analytics vs billing) — SUPPRESSED
-- by the allow-analytics-on-billing-data policy (matches data_use
-- essential.service.payment_processing).
SELECT customer_id, total FROM orders WHERE order_date = '2026-04-14';

-- Compliant via COLLECTION-level purpose: the sales dataset is billing
-- at the dataset level, but the invoices collection adds analytics. Any
-- analytics-team query against invoices passes the purpose check directly
-- without needing a policy override.
SELECT invoice_id, amount FROM invoices WHERE amount > 100;

-- Purpose mismatch (analytics vs marketing) — NO matching policy, so
-- this violation stands.
SELECT campaign_id, name FROM campaigns LIMIT 10;
6 changes: 6 additions & 0 deletions pbac/entries/bob.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
-- bob@demo.example is registered as the Marketing Team (purpose: marketing).
--
-- Gap (UNCONFIGURED_DATASET): cold_storage is not a declared collection
-- in any dataset under pbac/datasets/, so table resolution falls through
-- to the qualified name and the engine records a dataset gap.
SELECT archive_key FROM cold_storage LIMIT 10;
6 changes: 6 additions & 0 deletions pbac/entries/carol.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
-- carol@demo.example is NOT registered as a consumer.
--
-- Gap (UNRESOLVED_IDENTITY): no consumer in pbac/consumers/ lists
-- carol in its members, so identity resolution returns nothing and
-- every dataset access is recorded as an identity gap.
SELECT COUNT(*) FROM page_views;
6 changes: 6 additions & 0 deletions pbac/entries/dave.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
-- dave@demo.example is registered (Onboarding) but declares no purposes.
--
-- Gap (UNCONFIGURED_CONSUMER): the engine emits UNRESOLVED_IDENTITY and
-- the CLI reclassifies it to UNCONFIGURED_CONSUMER because the consumer
-- was found but its purposes list is empty.
SELECT page_path FROM page_views LIMIT 100;
Loading
Loading