# Petrinex Data Loader - Databricks Example

Load Alberta Petrinex data (Volumetrics, NGL) using the unified PetrinexClient.

**Features:**
- Memory efficient incremental loading
- Unity Catalog compatible (no `ANY FILE` privilege needed)
- Handles ZIP extraction, encoding, malformed rows automatically
- Progress tracking with row counts

| Data Type | Description |
|-----------|-------------|
| `Vol` | Conventional Volumetrics (oil & gas production) |
| `NGL` | NGL and Marketable Gas Volumes |

## Setup

In [None]:
# Install directly from GitHub
%pip install git+https://github.com/guanjieshen/petrinex-python-api.git

# Or install from a specific branch
# %pip install git+https://github.com/guanjieshen/petrinex-python-api.git@feature/ngl-gas-support

from petrinex import PetrinexClient

## Initialize Client

In [None]:
# Create client for Volumetrics data
client = PetrinexClient(spark=spark, jurisdiction="AB", data_type="Vol")

print("✓ Client initialized")
print(f"  Data type: {client.data_type}")
print(f"  Jurisdiction: {client.jurisdiction}")

## Load Data

Two date options:
- `updated_after="2025-12-01"` → files updated AFTER this date (incremental)
- `from_date="2021-01-01"` → ALL data from this production month onwards

In [None]:
# Load data updated in the last 30 days
df = client.read_spark_df(updated_after="2025-12-01")

print(f"
✅ Loaded {df.count():,} rows")

## Explore Data

In [None]:
# Show schema
df.printSchema()

# Show sample data
display(df.limit(10))

## (Optional) Load NGL Data

In [None]:
# Uncomment to load NGL and Marketable Gas data:

# ngl_client = PetrinexClient(spark=spark, data_type="NGL")
# ngl_df = ngl_client.read_spark_df(updated_after="2025-12-01")
# print(f"
✅ NGL data: {ngl_df.count():,} rows")

## (Optional) Save to Delta Table

In [None]:
# Uncomment to save to Delta:

# df.write.format("delta") \
#   .mode("overwrite") \
#   .saveAsTable("main.petrinex.volumetrics")
# 
# print("✓ Saved to Delta table")