# Petrinex Data Loader - Databricks Example

Load Alberta Petrinex data (Volumetrics, NGL) using the unified PetrinexClient.

**Features:**
- Memory efficient incremental loading
- Unity Catalog compatible (no `ANY FILE` privilege needed)
- Handles ZIP extraction, encoding, malformed rows automatically
- Progress tracking with row counts

| Data Type | Description |
|-----------|-------------|
| `Vol` | Conventional Volumetrics (oil & gas production) |
| `NGL` | NGL and Marketable Gas Volumes |

## Setup

In [None]:
# Install directly from GitHub
%pip install git+https://github.com/guanjieshen/petrinex-python-api.git

# Or install from a specific branch
# %pip install git+https://github.com/guanjieshen/petrinex-python-api.git@feature/ngl-gas-support

from petrinex import PetrinexClient

## Initialize Client

In [None]:
# Create client for Volumetrics data
client = PetrinexClient(spark=spark, jurisdiction="AB", data_type="Vol")

print("✓ Client initialized")
print(f"  Data type: {client.data_type}")
print(f"  Jurisdiction: {client.jurisdiction}")

## Load Data

Date options:
- `updated_after="2025-12-01"` → files updated AFTER this date (incremental)
- `from_date="2021-01-01"` → ALL data from this production month onwards
- `end_date="2023-12-31"` → optional end date (use with `from_date` for date ranges)

For large loads (20+ files), use `uc_table` to write directly to Delta table:
- Avoids memory issues and Spark Connect timeouts
- Each file written immediately (no accumulation)
- Safety: Only appends to tables created by this library

In [None]:
# Standard load (good for small data - under 20 files)
df = client.read_spark_df(updated_after="2025-12-01")

# For large loads, write directly to Unity Catalog table (recommended for 20+ files)
# Creates table if doesn't exist, validates & appends if it does
# df = client.read_spark_df(
#     from_date="2020-01-01",
#     uc_table="main.petrinex.volumetrics"
# )

# To replace existing data, truncate first:
# spark.sql("TRUNCATE TABLE main.petrinex.volumetrics")
# df = client.read_spark_df(from_date="2020-01-01", uc_table="main.petrinex.volumetrics")

print(f"
✅ Loaded {df.count():,} rows")

## Explore Data

In [None]:
# Show schema
df.printSchema()

# Show sample data
display(df.limit(10))

## (Optional) Download Files to Local Directory


In [None]:
# Download Petrinex files to local directory (e.g., for archival or offline analysis)
# Files are extracted from ZIP and saved as CSVs in subdirectories by production month
# Example: /dbfs/petrinex_data/2025-12/Vol_2025-12.csv
# Uncomment to download:

# paths = client.download_files(
#     output_dir="/dbfs/petrinex_data",  # Use /dbfs/ prefix for Databricks DBFS
#     updated_after="2025-12-01"
# )
# print(f"✓ Downloaded {len(paths)} file(s)")
#
# # Example: download historical range
# # paths = client.download_files(
# #     output_dir="/dbfs/petrinex_data",
# #     from_date="2021-01-01",
# #     end_date="2023-12-31"
# # )


## (Optional) Load NGL Data

In [None]:
# Uncomment to load NGL and Marketable Gas data:

# ngl_client = PetrinexClient(spark=spark, data_type="NGL")
# ngl_df = ngl_client.read_spark_df(updated_after="2025-12-01")
# print(f"
✅ NGL data: {ngl_df.count():,} rows")

## (Optional) Save to Delta Table

In [None]:
# Uncomment to save to Delta:

# df.write.format("delta") \
#   .mode("overwrite") \
#   .saveAsTable("main.petrinex.volumetrics")
# 
# print("✓ Saved to Delta table")