# Databricks Workflows & Orchestration
## Zero to Hero: Jobs, Task Values, If/Else, and For Each Loops

**Objective:** Learn how to orchestrate complex data pipelines using Databricks Workflows. We will move beyond simple `dbutils.notebook.run` and use the native Jobs UI to handle dependencies, conditions, and loops.

### Agenda
1.  **Task Values:** Passing variables between different tasks in a Job.
2.  **If/Else Condition:** branching logic based on task outputs.
3.  **For Each Loop:** Running a task dynamically for a list of inputs.
4.  **Repair Run:** How to re-run only failed tasks.

---
### Setup: Create 3 Helper Notebooks
To build this workflow, we first need to create 3 separate notebooks in your workspace. Copy the code blocks below into specific files.

### Notebook A: `01_get_run_day`
**Purpose:** Accepts a date parameter, determines the day of the week (e.g., "Sun", "Mon"), and passes this value to the next task.

**Copy this code into a new notebook named `01_get_run_day`:**

In [None]:
# --- CODE FOR NOTEBOOK: 01_get_run_day ---

from pyspark.sql.functions import to_timestamp, date_format

# 1. Create a widget to accept input date (Format: yyyy-MM-dd-HH-mm-ss)
dbutils.widgets.text("input_date", "")
input_date_str = dbutils.widgets.get("input_date")

print(f"Input Date: {input_date_str}")

# 2. Logic to determine the Day of the Week (Sun, Mon, Tue...)
# We create a temporary dataframe to use Spark SQL functions
df = spark.createDataFrame([{"date_str": input_date_str}])

# Convert string to timestamp and extract Day (E pattern gives Sun, Mon etc)
# Note: In Spark 3+, usually 'E' is used for day of week name abbreviated
result_df = df.select(date_format(to_timestamp("date_str", "yyyy-MM-dd-HH-mm-ss"), "E").alias("day"))

# Collect the result to a Python variable
day_of_week = result_df.collect()[0]["day"]

print(f"Calculated Day: {day_of_week}")

# 3. SET TASK VALUE
# This allows other tasks in the Workflow to read this value
dbutils.jobs.taskValues.set(key="input_day", value=day_of_week)

### Notebook B: `02_process_data`
**Purpose:** This will simulate the actual data processing. We will run this inside a **For Each** loop to process different departments.

**Copy this code into a new notebook named `02_process_data`:**

In [None]:
# --- CODE FOR NOTEBOOK: 02_process_data ---

# 1. Receive parameters
dbutils.widgets.text("dept", "general")
department = dbutils.widgets.get("dept")

print(f"Processing data for Department: {department}")

# Simulate processing logic
import time
time.sleep(5) # Sleeping to simulate work

print(f"Successfully processed {department} data.")

### Notebook C: `03_else_condition`
**Purpose:** This task runs if the condition is FALSE (i.e., if it is not Sunday). It demonstrates how to **GET** values passed from previous tasks.

**Copy this code into a new notebook named `03_else_condition`:**

In [None]:
# --- CODE FOR NOTEBOOK: 03_else_condition ---

# 1. GET TASK VALUE from the first task
# "01_set_day" is the Task Name we will define in the Workflow UI
# "input_day" is the key we set in Notebook A
prev_task_day = dbutils.jobs.taskValues.get(taskKey="01_set_day", key="input_day", default="Unknown")

print("Condition Not Met.")
print(f"The calculated day was: {prev_task_day}. No processing required.")

## Creating the Workflow (UI Steps)

Now that the code is ready, follow these steps to build the Job:

### Step 1: Create the Job
1. Go to **Workflows** -> **Create Job**.
2. Name the Job: `Process_Emp_Data_By_Dept`.

### Step 2: Add First Task (Get Day)
1. **Task Name:** `01_set_day` (Important: Must match the `taskKey` used in Notebook C).
2. **Type:** Notebook.
3. **Path:** Select `01_get_run_day`.
4. **Parameters:**
   - Key: `input_date`
   - Value: `{{job.start_time_iso_date}}` (Dynamic Value) or a hardcoded date like `2024-10-27-13-00-00`.
5. Click **Create**.

### Step 3: Add Logic (If/Else Condition)
1. Click the **+** icon next to `01_set_day` to add a downstream task.
2. Select **If/else condition**.
3. **Task Name:** `check_day`.
4. **Condition:**
   - Left operand type: `Dynamic Value`
   - Value: `{{tasks.01_set_day.values.input_day}}` (This reads the value set by `dbutils.jobs.taskValues.set`).
   - Operator: `==` (Equals).
   - Right operand: `Sun` (String).
5. Click **Create**.

### Step 4: Add True Path (For Each Loop)
1. Click the **+** on the **True** branch of the condition.
2. Select **For each**.
3. **Task Name:** `process_data_loop`.
4. **Inputs:** `["sales", "office"]` (JSON Array).
5. **Task to run:** Select Notebook.
   - **Path:** Select `02_process_data`.
   - **Parameters:**
     - Key: `dept`
     - Value: `{{input}}` (This passes the current item from the loop array).
6. Click **Create**.

### Step 5: Add False Path (Else Condition)
1. Click the **+** on the **False** branch.
2. **Task Name:** `03_else_logic`.
3. **Type:** Notebook.
4. **Path:** Select `03_else_condition`.
5. Click **Create**.

---

## Running & Repairing

1. **Run Now:** Click "Run Now".
   - If today is Sunday (or if you passed a Sunday date), the **True** path runs.
   - You will see the loop processing "sales" and "office" concurrently or sequentially.
   - If not Sunday, the **False** path runs.

2. **Repair Run:**
   - If a task fails (e.g., you introduce a typo), fix the notebook code.
   - Go to the Job Run page.
   - Click **Repair Run**.
   - Select the failed tasks. Databricks will *skip* the successful tasks and only run the repaired ones.