This will guide you through setting up and using Databricks Asset Bundles to manage Delta Live Tables pipelines.

## PART 1- Delta Live Tables - Simple Tutorial
## PART 2 - Delta Live Tables with Asset Bundles

# PART 1 - Delta Live Tables with Databricks - Simplified Tutorial ✅

This notebook will guide you through the process of creating a Delta Live Table (DLT) pipeline using Databricks Asset Bundles, complete with explanations for each step.

### Step 1: Set Up Your Databricks Environment

#### 1.1 Create a Cluster
1. **Log in to Databricks:**
   - Open your web browser and navigate to your Databricks workspace.
   - Log in with your credentials.

2. **Create a New Cluster:**
   - On the left sidebar, click on "Clusters."
   - Click on the "Create Cluster" button.
   - Enter a name for your cluster (e.g., `DeltaLiveCluster`).
   - Choose the Databricks runtime version (e.g., 10.4 LTS).
   - Select the appropriate instance type and number of workers based on your subscription.
   - Click on "Create Cluster."

#### 1.2 Create a Notebook
1. **Create a New Notebook:**
   - In the workspace, create a new notebook where you'll run the code.


### Step 2: Define the Input Data

#### 2.1 Prepare Sample JSON Data
1. **Create a Sample JSON File:**
   - Open a text editor on your computer.
   - Copy and paste the following JSON content into the file:
```json
[
  {"id": 1, "name": "Alice", "age": 30},
  {"id": 2, "name": "Bob", "age": 25},
  {"id": 3, "name": "Charlie", "age": 35}
]

### 2.2 Upload JSON File to DBFS

#### Upload the File:
- In Databricks, click on "Data" on the left sidebar.
- Click on "Add Data" > "Upload File."
- Click on "Browse," select the input.json file from your computer, and click "Open."
- Choose a location to upload the file (e.g., /FileStore/tables/input.json).
- Click on "Next" and then "Preview & Confirm."

### Step 3: Create a Delta Live Table Pipeline
#### 3.1 Create and Configure a Delta Live Table Pipeline
- Navigate to Delta Live Tables:
- On the left sidebar, click on "Delta Live Tables."
- Click on "Create Pipeline."
- Enter a name for your pipeline (e.g., SimpleDeltaLiveTablePipeline).
- Choose the cluster you created earlier (DeltaLiveCluster).
- Set the Target schema to default (or any database name you prefer).
- In the "Notebook Libraries" section, add the path to the notebook you created earlier (/Workspace/DeltaLiveTableTutorial).

### Step 4: Define the Pipeline Code
### 4.1 Write the Code in Your Notebook
- Open the Notebook:
- Navigate to the notebook you created (DeltaLiveTableTutorial).
- Copy and paste the following code into the notebook cells:

In [None]:
from pyspark.sql.functions import col
import dlt

# Step 1: Read the input JSON file
input_path = "/dbfs/FileStore/tables/input.json"

# Define the input data as a Delta Live Table
@dlt.table
def input_data():
    return (spark.read
                .json(input_path)
                .select(col("id"), col("name"), col("age")))

# Step 2: Perform a simple transformation
@dlt.table
def transformed_data():
    return (dlt.read("input_data")
                .withColumn("age_in_5_years", col("age") + 5))

# Step 3: Write the transformed data to a Delta table
@dlt.table
def output_data():
    return dlt.read("transformed_data")

##### Explanation:
- We read the JSON file from DBFS and define it as a Delta Live Table named input_data.
- We perform a simple transformation by adding 5 years to the age column and store the result in a Delta Live Table named transformed_data.
- We write the transformed data to a Delta table named output_data.

### Step 5: Deploy and Run the Pipeline
#### 5.1 Deploy the Pipeline
##### Deploy:

- Go back to the Delta Live Tables UI.
- Click on the pipeline you created (SimpleDeltaLiveTablePipeline).
- Click "Start" to deploy and run the pipeline.

#### 5.2 Run the Pipeline
##### Monitor the Pipeline:
- Monitor the progress of your pipeline in the Delta Live Tables UI.
- You can see the status of each table and the overall pipeline.

### Step 6: Verify the Output
#### 6.1 Query the Delta Table
##### Run a Query:
- Open a new notebook or use the existing one.
- Run the following SQL query to verify the output:

In [None]:
SELECT * FROM transformed_data;

### Step 7: Schedule the Pipeline (Optional)
#### 7.1 Set Up Scheduling
##### Configure Scheduling:
- In the Delta Live Tables UI, click on your pipeline (SimpleDeltaLiveTablePipeline).
- Click on "Edit Settings."
- Under the "Schedule" section, configure the frequency and timing for the pipeline to run automatically.
- Click "Save."

You have successfully created, deployed, and run a Delta Live Table pipeline in Databricks. This tutorial covers the basics, and you can now explore more advanced features and configurations based on your requirements.

## PART 2 - Delta Live Tables with Asset Bundles ✅

This tutorial will guide you through the process of creating a Delta Live Table (DLT) pipeline using Databricks Asset Bundles, complete with explanations for each step.


## Requirements
- Databricks CLI version 0.218.0 or above.
- The remote workspace must have workspace files enabled.
- (Optional) Install a Python module to support local pipeline development.


## Step 1: Set up authentication

Use the Databricks CLI to initiate OAuth token management locally by running the following command for each target workspace:

```bash
databricks auth login --host <workspace-url>


Follow the on-screen instructions to log in to your Databricks workspace.




## Step 2: Create the bundle

Switch to a directory on your local development machine and use the Databricks CLI to run the bundle init command:


```
databricks bundle init



## Step 3: Explore the bundle

Switch to the root directory of your newly created bundle and open this directory with your preferred IDE. Files of particular interest include:
- `databricks.yml`
- `resources/<project-name>_job.yml`
- `resources/<project-name>_pipeline.yml`
- `src/dlt_pipeline.ipynb`


## Step 4: Validate the project’s bundle configuration file

From the root directory, use the Databricks CLI to run the bundle validate command:

```bash
databricks bundle validate


## Step 5: Deploy the local project to the remote workspace

Use the Databricks CLI to run the bundle deploy command:

``` databricks bundle deploy -t dev ```


Check if the local notebook and pipeline were deployed in your Databricks workspace.



## Step 6: Run the deployed project

From the root directory, use the Databricks CLI to run the bundle run command:

```
databricks bundle run -t dev <project-name>_pipeline
```

Open your Databricks workspace using the Update URL provided in the terminal.

## Step 7: Clean up

From the root directory, use the Databricks CLI to run the bundle destroy command:

```
databricks bundle destroy -t dev
```

Confirm the deletion requests when prompted.

## Step 8: Create the bundle manually

Create or identify an empty directory on your development machine. Create a file named `dlt-wikipedia-python.py` in this directory with the following code:

In [None]:
import dlt
from pyspark.sql.functions import *

json_path = "/databricks-datasets/wikipedia-datasets/data-001/clickstream/raw-uncompressed-json/2015_2_clickstream.json"

@dlt.table(comment="The raw wikipedia clickstream dataset, ingested from /databricks-datasets.")
def clickstream_raw():
    return spark.read.format("json").load(json_path)

@dlt.table(comment="Wikipedia clickstream data cleaned and prepared for analysis.")
@dlt.expect("valid_current_page_title", "current_page_title IS NOT NULL")
@dlt.expect_or_fail("valid_count", "click_count > 0")
def clickstream_prepared():
    return (
        dlt.read("clickstream_raw")
            .withColumn("click_count", expr("CAST(n AS INT)"))
            .withColumnRenamed("curr_title", "current_page_title")
            .withColumnRenamed("prev_title", "previous_page_title")
            .select("current_page_title", "click_count", "previous_page_title")
    )

@dlt.table(comment="A table containing the top pages linking to the Apache Spark page.")
def top_spark_referrers():
    return (
        dlt.read("clickstream_prepared")
            .filter(expr("current_page_title == 'Apache_Spark'"))
            .withColumnRenamed("previous_page_title", "referrer")
            .sort(desc("click_count"))
            .select("referrer", "click_count")
            .limit(10)
    )


## Step 9: Add a bundle configuration schema file to the project

Generate the Databricks Asset Bundle configuration JSON schema file:

```
databricks bundle schema > bundle_config_schema.json
```

Add the following comment to the beginning of your bundle configuration file:

```yaml-language-server: $schema=bundle_config_schema.json```

## Step 10: Set up authentication

Set up authentication between the Databricks CLI on your development machine and your Databricks workspace using the same command as in Step 1.

## Step 11: Add a bundle configuration file to the project

Create a `databricks.yml` file with the following content, replacing `<workspace-url>` with your workspace URL:

```
# yaml-language-server: $schema=bundle_config_schema.json
bundle:
  name: dlt-wikipedia

resources:
  pipelines:
    dlt-wikipedia-pipeline:
      name: dlt-wikipedia-pipeline
      development: true
      continuous: false
      channel: "CURRENT"
      photon: false
      libraries:
        - notebook:
            path: ./dlt-wikipedia-python.py
      edition: "ADVANCED"
      clusters:
        - label: "default"
          num_workers: 1

targets:
  development:
    workspace:
      host: <workspace-url>
```

## Step 12: Validate the project's bundle configuration file

Use the Databricks CLI to run the bundle validate command:

```databricks bundle validate```

## Step 13: Deploy the local project to the remote workspace

Use the Databricks CLI to run the bundle deploy command:

```databricks bundle deploy -t development```

Check if the local notebook and pipeline were deployed in your Databricks workspace.

## Step 14: Run the deployed project

Use the Databricks CLI to run the bundle run command:

```databricks bundle run -t development dlt-wikipedia-pipeline```

## Step 15: Clean up

Use the Databricks CLI to run the bundle destroy command:

```databricks bundle destroy -t development```

Confirm the deletion requests when prompted.

This completes the tutorial