# CSV Batch Reader - Read from Volume and Write to Delta Table

This notebook demonstrates **batch ingestion** to read CSV files from a Unity Catalog Volume and write to a Delta table.

## Key Features:
- **Batch read** using standard Spark CSV reader for one-time or scheduled processing
- **Schema inference** with automatic schema detection
- **Efficient processing** of all files in a single batch operation

## Difference from Streaming:
- **Batch**: Reads all data at once, completes, and stops (this notebook)
- **Streaming**: Continuously monitors for new data and processes incrementally (use Auto Loader with `readStream`)


## Configuration Variables

Define all paths and table names as variables for easy customization:


In [None]:
# Catalog, schema, and table configuration
CATALOG = "jpg"
SCHEMA = "default"
TABLE_NAME = "csv_batch"
FULL_TABLE_NAME = f"{CATALOG}.{SCHEMA}.{TABLE_NAME}"

# Volume paths configuration
VOLUME_BASE = f"/Volumes/{CATALOG}/{SCHEMA}"
SOURCE_PATH = f"{VOLUME_BASE}/csvs"


print(f"Source Path: {SOURCE_PATH}")
print(f"Target Table: {FULL_TABLE_NAME}")


## Read CSV Files Using Spark Batch Reader

For batch processing, use the standard Spark CSV reader. This reads all files at once and is ideal for one-time loads or scheduled batch jobs.

**Note**: Auto Loader (`cloudFiles`) is designed for streaming with `readStream`, not batch `read` operations.


In [None]:
# Read CSV files from volume using standard Spark batch reader
df = spark.read \
  .format("csv") \
  .option("header", "true") \
  .option("sep", ",") \
  .option("inferSchema", "true") \
  .load(SOURCE_PATH)

# Display the data
display(df)

# Show basic statistics
print(f"Total rows: {df.count()}")
df.printSchema()


## Write to Delta Table (Batch)

Write all the data to a Delta table in a single batch operation.


In [None]:
# Write to Delta table in batch mode
df.write \
  .format("delta") \
  .mode("overwrite") \
  .saveAsTable(FULL_TABLE_NAME)

print(f"Successfully wrote all data to {FULL_TABLE_NAME}")


## Alternative Write Modes

The code above uses `mode("overwrite")` which replaces the entire table. Other options include:

- **`append`**: Add new rows to existing table
- **`overwrite`**: Replace entire table (used above)
- **`ignore`**: Write only if table doesn't exist
- **`error` or `errorifexists`**: Throw error if table exists (default)

### Example: Append Mode


In [None]:
# Alternative: Append mode (uncomment to use)
# This will add new rows to the existing table without deleting old data

# df.write \
#   .format("delta") \
#   .mode("append") \
#   .saveAsTable(FULL_TABLE_NAME)
# 
# print(f"Successfully appended data to {FULL_TABLE_NAME}")
