## Gantry

**A Python DICOM Object Model and Redaction Toolkit.**

Gantry provides a high-performance, object-oriented interface for managing, analyzing, and de-identifying DICOM datasets. It is designed for large-scale ingestion, precise pixel redaction, and strict PHI compliance.

1. **Ingest**: Load raw data into the managed session index.
2. **Examine**: Inventory the cohort and equipment.
3. **Configure**: Define privacy tags and redaction rules.
4. **Audit (Target)**: Measure PHI risks against the configuration.
5. **Backup**: (Optional) Securely lock original identities for reversibility.
6. **Anonymize**: Apply remediation to metadata (in-memory).
7. **Redact**: Scrub pixel data for specific machines (in-memory).
8. **Verify**: Re-audit the session to ensure a clean state.
9. **Report**: Generate a signed Compliance Report (Manifest, Exceptions, Audit Trail).
10. **Export**: Write clean DICOM files to disk.

In [1]:
%pip install --force-reinstall "git+https://github.com/kvnlng/Gantry.git"

Collecting git+https://github.com/kvnlng/Gantry.git
  Cloning https://github.com/kvnlng/Gantry.git to /private/var/folders/9_/t_m12zps0xx_3k059_29tbj40000gn/T/pip-req-build-40nvr92e
  Running command git clone --filter=blob:none --quiet https://github.com/kvnlng/Gantry.git /private/var/folders/9_/t_m12zps0xx_3k059_29tbj40000gn/T/pip-req-build-40nvr92e
  Resolved https://github.com/kvnlng/Gantry.git to commit 29abe16c28dc1270cc20d4b75fff3466b573a400
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting pydicom>=2.4.0 (from gantry==0.5.3)
  Using cached pydicom-3.0.1-py3-none-any.whl.metadata (9.4 kB)
Collecting numpy>=1.20.0 (from gantry==0.5.3)
  Using cached numpy-2.4.1-cp314-cp314t-macosx_14_0_arm64.whl.metadata (6.6 kB)
Collecting tqdm>=4.65.0 (from gantry==0.5.3)
  Using cached tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
Collecting cryptography>=41.0.0 (

### Initialize a Session

Gantry uses a **persistent session** to manage your workflow. Unlike scripts that run once and forget, a Session creates a local SQLite database (`gantry.db`) to index your data. This allows you to pause, resume, and audit your work without re-scanning thousands of files.

In [3]:
from gantry import Session
session = Session("cohort.db")

Initializing new session at cohort.db...


### Ingest & Examine

Ingestion builds a lightweight **metadata index** of your DICOM files. Gantry scans your folders recursively, extracting patient/study/series information into the database *without moving or modifying your original files*. It is resilient to nested directories and non-DICOM clutter.

In [4]:
session.ingest("comprehensive_dicoms")
session.save() # Persist the index to disk

Ingesting from 'comprehensive_dicoms'...


Ingesting: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 140/140 [00:00<00:00, 516.75it/s]

Ingestion complete. Saved session state.
Summary:
  - 140 Patients
  - 140 Studies
  - 140 Series
  - 140 Instances





In [5]:
# Print a summary of the cohort and equipment
session.examine()


Inventory Summary:
 Patients:  140
 Studies:   140
 Series:    140
 Instances: 140

Equipment Inventory:
 - Agfa - CR 30-X (Count: 5)
 - Agfa - DX-D (Count: 5)
 - Carestream - Classic (Count: 5)
 - Carestream - DRX (Count: 5)
 - Fuji - FCR (Count: 5)
 - Fuji - FDR (Count: 5)
 - GE - Discovery (Count: 5)
 - GE - Discovery MI (Count: 5)
 - GE - Innova (Count: 5)
 - GE - NM 830 (Count: 5)
 - GE - Precision (Count: 5)
 - GE - Revolution (Count: 5)
 - GE - Senographe (Count: 5)
 - GE - Voluson (Count: 5)
 - Gantry - ReportSystem (Count: 5)
 - Gantry - Test (Count: 5)
 - Generic - ScreenCapture (Count: 5)
 - Hologic - Selenia (Count: 5)
 - Philips - Brilliance (Count: 5)
 - Philips - EPIQ (Count: 5)
 - Philips - Ingenia (Count: 5)
 - Siemens - Acuson (Count: 5)
 - Siemens - Artis (Count: 5)
 - Siemens - Biograph (Count: 5)
 - Siemens - Luminos (Count: 5)
 - Siemens - Magnetom (Count: 5)
 - Siemens - Somatom (Count: 5)
 - Siemens - Symbia (Count: 5)


### Create Configuration

Before changing anything, define your privacy rules. 

Use `create_config` to generate a scaffolding based on your inventory. 

In [6]:
# Create a default configuration file
session.create_config("config.yaml")

# Load the configuration (rules, tags, jitter)
session.load_config("config.yaml")

Scaffolded Unified Config to config.yaml
Loading configuration from config.yaml...
Configuration Loaded:
 - 140 Machine Redaction Rules
 - 29 PHI Tags
 - Date Jitter: -365 to -1 days
 - Remove Private Tags: True
Tip: Run .audit() to check PHI, or .redact_pixels() to apply redaction.


In [7]:
# Run an audit to find PHI
report = session.audit() 
session.save_analysis(report)

print(f"Found {len(report)} potential PHI issues.")

Scanning PHI: 100%|███████████████████████████████████████████████████████████████████████████████████| 140/140 [00:00<00:00, 246931.27it/s]

Found 1960 potential PHI issues.





### Backup Identity (Optional)

To enable reversible anonymization, generate a cryptographic key and "lock" the original patient identities into a secure, encrypted DICOM tag. This must be done *before* anonymization.

In [8]:
# Enable encryption (generates 'gantry.key')
session.enable_reversible_anonymization()

# cryptographically lock identities for all patients found in the audit
# Optional: Specify custom tags to preserve (defaults to Name, ID, DOB, Sex, Accession)
# session.lock_identities(report, tags_to_lock=["0010,0010", "0010,0020", "0010,0030"])
session.lock_identities(report)

session.save()

Locking Identities: 100%|█████████████████████████████████████████████████████████████████████████| 140/140 [00:00<00:00, 11190.14patient/s]


Gantry supports **Exploratory Data Analysis (EDA)**. You can interrogate your cohort using Pandas and perform targeted exports based on metadata criteria.

In [9]:
df = session.export_dataframe(expand_metadata=True)
df.loc[df.Modality == 'CT', ['PatientID', 'PatientName']][0:10]



Exported metadata to export_metadata.csv


Unnamed: 0,PatientID,PatientName
0,PID-42589,Pierce^Glenn
21,PID-89664,Ellis^David
27,PID-00467,Anderson^Jennifer
33,PID-48145,Smith^Aaron
36,PID-37610,Murphy^Cindy
38,PID-84046,Miller^Amanda
46,PID-91157,Welch^Leonard
47,PID-85557,Brandt^Mike
61,PID-58693,Adams^Jennifer
80,PID-20628,Rivera^Clarence


**Anonymize**: Strips or replaces metadata tags (PatientID, Names, Dates) based on your config.

In [10]:
# Apply metadata remediation (anonymization) using the findings
session.anonymize(report)
session.save()

Anonymizing Metadata: 100%|█████████████████████████████████████████████████████████████████████| 1960/1960 [00:00<00:00, 48105.16finding/s]

Anonymized/Remediated None tags according to policy.





In [11]:
df = session.export_dataframe(expand_metadata=True)
df.loc[df.Modality == 'CT', ['PatientID', 'PatientName']][0:10]

Exported metadata to export_metadata.csv


Unnamed: 0,PatientID,PatientName
0,ANON_8c1c7e63cdec,ANONYMIZED
21,ANON_0d373fb19710,ANONYMIZED
27,ANON_0e295da9c9c3,ANONYMIZED
33,ANON_20390a10b4cc,ANONYMIZED
36,ANON_34d8b3092e05,ANONYMIZED
38,ANON_6de36b3bddc9,ANONYMIZED
46,ANON_894a95a83c29,ANONYMIZED
47,ANON_25923c79f1b5,ANONYMIZED
61,ANON_e73e8b4ea3c3,ANONYMIZED
80,ANON_dec057f22ceb,ANONYMIZED


### Recover Identity (Optional)

If you have a valid key (`gantry.key`) and need to retrieve the original identity of an anonymized patient:

In [None]:
def recover(anon):
    session.recover_patient_identity(anon.PatientID, restore=True)

df.apply(recover, axis=1)
session.save()

In [None]:
df = session.export_dataframe(expand_metadata=True)
df.loc[df.Modality == 'CT', ['PatientID', 'PatientName']][0:10]

**Redact**: Loads pixel data and scrubs burned-in PHI from defined regions.

In [12]:
# Apply pixel redaction rules (requires config to be loaded)
session.redact()

No matching images found for any loaded rules.


**Export**: The final "Gatekeeper". Writes clean files to a new directory. Setting `safe=True` ensures the export halts if any verification checks fail (e.g., corrupt images or missing codecs).

In [13]:
df = session.export_dataframe(expand_metadata=True)
cohort_df = df[df.Modality == 'CT']
# Export only safe (clean) data to a new folder
# Compression="j2k" optionally compresses output to JPEG 2000
session.export("export_clean", subset=cohort_df, safe=True, compression="j2k")

Exported metadata to export_metadata.csv


Scanning PHI: 100%|██████████████████████████████████████████████████████████████████████████████████████| 140/140 [00:00<00:00, 670.74it/s]


Preparing export plan...
Saving pending changes to free memory...


Releasing Memory: 100%|█████████████████████████████████████████████████████████████████████████████| 140/140 [00:00<00:00, 2150925.13img/s]

Memory Cleanup: Released 140 images from RAM.
Exporting 15 images from 140 patients...



Exporting: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 17.28it/s]

Done.





### Compliance Report

Generate single-step, audit-ready Markdown reports for HIPAA/GDPR documentation. Reports include:

- **Cohort Manifest**: Summary of all processed patients/studies.
- **Audit Trail**: Aggregated counts of every action (Anonymize, Redact, Export).
- **Exceptions**: Explicit listing of any warnings or errors encountered.
- **Validation Status**: Automatic `PASS`/`REVIEW_REQUIRED` grading.

In [None]:
# Generate a Markdown report
session.generate_report("compliance_report.md")

In [None]:
from IPython.display import Markdown, display

with open('compliance_report.md', 'r', encoding='utf-8') as f:
    content = f.read()
display(Markdown(content))

In [None]:
session.generate_manifest("manifest.html", format="html")

In [None]:
session.generate_manifest("manifest.json", format="json")