# Databricks Utilities (dbutils) Deep Dive

**Objective:**
In this notebook, we will master **Databricks Utilities (dbutils)**. These are built-in libraries provided by Databricks to perform common tasks efficiently within notebooks.

**Agenda:**
1.  Introduction to `dbutils`.
2.  **File System Utilities (`fs`):** Listing, copying, creating directories, and reading files.
3.  **Widget Utilities (`widgets`):** Creating input parameters and making notebooks dynamic.
4.  Overview of other utilities (`secrets`, `notebook`).

**Note:** `dbutils` is available in Python, R, and Scala notebooks.

## 1. Exploring Available Utilities
To see all available utilities, we can use the `.help()` command on the main object.

In [None]:
# List all available utilities
dbutils.help()

## 2. File System Utilities (dbutils.fs)
This module provides a programmatic interface to interact with the Databricks File System (DBFS), External Locations, and Volumes. It mimics standard shell commands like `ls`, `cp`, `mv`, etc.

In [None]:
# Check help specifically for file system utilities
dbutils.fs.help()

### Listing Files (`ls`)
By default, `ls` looks into the root DBFS (`dbfs:/`).
*Tip: Wrap it in `display()` for a formatted table view.*

In [None]:
# List root directory
display(dbutils.fs.ls("/"))

# You can also look into local driver file system using 'file:/'
# display(dbutils.fs.ls("file:/tmp"))

### Practical Scenario: Moving Data
Let's simulate a real-world scenario:
1.  Download a CSV file from the internet to the driver's local storage using `%sh`.
2.  Create a directory in a Unity Catalog Volume (or DBFS) using `dbutils`.
3.  Copy the file from local driver to that storage using `dbutils`.

In [None]:
# Step 1: Download a file to local driver (/tmp)
%sh
wget https://media.githubusercontent.com/media/subhamkharwal/pyspark-zero-to-hero/refs/heads/master/datasets/emp.csv -O /tmp/emp.csv
ls -ltr /tmp/emp.csv

In [None]:
# Step 2: Create a directory in your Volume (or DBFS)
# Replace the path below with your actual Volume path created in the previous lesson
# Example: /Volumes/<catalog>/<schema>/<volume>/input/csv
dest_path = "/Volumes/dev/bronze/managed_vol/input/csv" 

# Creating directory
dbutils.fs.mkdirs(dest_path)
print(f"Directory created: {dest_path}")

In [None]:
# Step 3: Copy file from Local Driver (file:/) to Volume
source_file = "file:/tmp/emp.csv"
target_file = f"{dest_path}/emp.csv"

# Usage: dbutils.fs.cp(source, destination, recurse=False)
dbutils.fs.cp(source_file, target_file)

print("File copied successfully.")

In [None]:
# Verify the file exists
display(dbutils.fs.ls(dest_path))

### Reading File Head (`head`)
You can read the first few bytes of a file to preview its content.

In [None]:
# Read the first 100 characters of the file
dbutils.fs.head(target_file, 100)

---
## 3. Widget Utilities (dbutils.widgets)
Widgets allow you to parameterize your notebooks. You can pass inputs to your code dynamically.

Common types: `text`, `dropdown`, `combobox`, `multiselect`.

In [None]:
# Explore widget options
dbutils.widgets.help()

In [None]:
# Create a Text Input Widget
# Syntax: dbutils.widgets.text(name, defaultValue, label)
dbutils.widgets.text("input_cust_id", "10000", "Customer ID")

print("Widget created. Look at the top of the notebook to see the input box.")

In [None]:
# Retrieve the value from the widget
cust_id = dbutils.widgets.get("input_cust_id")

print(f"The selected Customer ID is: {cust_id}")
print(f"Type of input: {type(cust_id)}") # Note: Widgets always return strings

### Using Widgets in SQL
You can reference widget values directly in SQL cells.

*   **DBR < 15.1:** Use `${widget_name}` or `$widget_name`
*   **DBR >= 15.1:** Use `:widget_name` (Parameter markers)

In [None]:
-- Example using dollar syntax (Standard for templating)
SELECT "${input_cust_id}" as selected_id_dollar_syntax;

In [None]:
# Cleanup: Remove all widgets to keep the UI clean
dbutils.widgets.removeAll()

## 4. Other Utilities Overview

1.  **`dbutils.secrets`**: Used to safely store and retrieve credentials (passwords, keys) backed by Azure Key Vault or Databricks Secret Scopes. *We will cover this in the DevOps/Security section.*
2.  **`dbutils.notebook`**: Used to chain notebooks together (e.g., `dbutils.notebook.run("child_notebook")`) to build workflows. *We will cover this in the Jobs & Workflows section.*

## Summary
*   `dbutils` is your swiss-army knife for Databricks notebooks.
*   Use `dbutils.fs` for moving files between local driver, DBFS, and Object Storage/Volumes.
*   Use `dbutils.widgets` to make your notebooks interactive and reusable.