
**Your Lakeflow Declarative Pipeline has been installed and started for you!** Open the <a dbdemos-pipeline-id="pipeline-bike" href="#joblist/pipelines/8b0431e3-e311-4c5b-b870-7907d348f854" target="_blank">Bike Rental Declarative Pipeline</a> to see it in action.<br/>
*(Note: The pipeline will automatically start once the initialization job is completed, this might take a few minutes... Check installation logs for more details)*

## Streaming Tables

<img src="https://github.com/databricks-demos/dbdemos-resources/blob/main/images/product/declarative-pipelines/declarative-pipelines-1.png?raw=true" width="400px" style="width:400px; float: right;" />

A streaming table is a Delta table with additional support for streaming or incremental data processing. A streaming table can be targeted by one or more flows in an ETL pipeline.

Streaming tables are a good choice for data ingestion for the following reasons:
* Each input row is handled only once, which models the vast majority of ingestion workloads (that is, by appending or upserting rows into a table).
* They can handle large volumes of append-only data.

Streaming tables are also a good choice for low-latency streaming transformations for the following reasons:
* Reason over rows and windows of time
* Handle high volumes of data
* Low latency

<!-- Collect usage data (view). Remove it to disable collection. View README for more details.  -->
<img width="1px" src="https://ppxrzfxige.execute-api.us-west-2.amazonaws.com/v1/analytics?category=data-engineering&org_id=796524194907820&notebook=%2Ftransformations%2F00-pipeline-tutorial&demo_name=pipeline-bike&event=VIEW&path=%2F_dbdemos%2Fdata-engineering%2Fpipeline-bike%2Ftransformations%2F00-pipeline-tutorial&version=1">

## Creating our bronze streaming tables
Take a look at [bronze.sql]($./01-bronze.sql) to see how we create our bronze tables `maintenance_logs_raw`, `rides_raw`, `weather_raw`, and `customers_cdc_raw`.

# Enriching our data with AI functions

<img src="https://github.com/databricks-demos/dbdemos-resources/blob/main/images/product/declarative-pipelines/declarative-pipelines-2.png?raw=true" width="400px" style="width:400px; float: right;" />

Now that we've got our raw data loaded let's enrich it in our silver layer.

Our maintenance logs include an unstructured field `issue_description`, as is this isn't very useful for analytics. Let's use the `ai_classify` function to categorize each of these issues into categories for reporting.

In [0]:
%sql
select
  date(reported_time) as maintenance_date,
  maintenance_id,
  bike_id,
  reported_time,
  resolved_time,
  issue_description,
  -- Classify issues as either related to brakes, chains/pedals, tires or something else
  ai_classify(issue_description, array("brakes", "chains_pedals", "tires", "other")) as issue_type
from
  main.dbdemos_pipeline_bike.maintenance_logs_raw
limit 10

## Creating our silver Streaming Tables enriched with AI
Take a look at [silver.sql]($./02-silver.sql) to see how we create our silver tables `maintenance_logs`, `rides`, `weather`, and `customers` (SCD Type 2 using Auto CDC).

# Incrementally process aggregations with Materialized Views

<img src="https://github.com/databricks-demos/dbdemos-resources/blob/main/images/product/declarative-pipelines/declarative-pipelines-3.png?raw=true" width="400px" style="width:400px; float: right;" />


Like standard views, materialized views are the results of a query and you access them the same way you would a table. Unlike standard views, which recompute results on every query, materialized views cache the results and refreshes them on a specified interval. Because a materialized view is precomputed, queries against it can run much faster than against regular views.

A materialized view is a declarative pipeline object. It includes a query that defines it, a flow to update it, and the cached results for fast access. A materialized view:
* Tracks changes in upstream data.
* On trigger, incrementally processes the changed data and applies the necessary transformations.
* Maintains the output table, in sync with the source data, based on a specified refresh interval.

Materialized views are a good choice for many transformations:
* You apply reasoning over cached results instead of rows. In fact, you simply write a query.
* They are always correct. All required data is processed, even if it arrives late or out of order.
* They are often incremental. Databricks will try to choose the appropriate strategy that minimizes the cost of updating a materialized view. 


When the pipeline defining a materialized view is triggered, the view is automatically kept up to date, often incrementally. Databricks attempts to process only the data that must be processed to keep the materialized view up to date. A materialized view always shows the correct result, even if it requires fully recomputing the query result from scratch, but often Databricks makes only incremental updates to a materialized view, which can be far less costly than a full recomputation.


## Creating our gold Materialized Views
Take a look at [gold.sql]($../transformations/03-gold.sql) to see how we create our gold tables `bikes`, `stations` and `maintenance_events`

## Analyzing your business metrics
You have everything you need! Once ready, open your <a  dbdemos-dashboard-id="bike-rental" href='/sql/dashboardsv3/01f0a995542a13c799f175ff307a2b12' target="_blank">Bike Rental Business Dashboard</a> to track all your insights in realtime ! 


### Next: tracking data quality

Lakeflow Declarative Pipelines makes it easy to track your data quality and set alerts when something is wrong! Open the [02-Pipeline-event-monitoring]($../explorations/02-Pipeline-event-monitoring) notebook for more details.