# Explore data quality metrics from the pipeline event log

Each pipeline can be configured to save out the metrics to a table in Unity Catalog. From this table we can see what is happening and the quality of the data passing through it.
You can leverage the expecations directly as a SQL table with Databricks SQL to track your expectation metrics and send alerts as required. 
This notebook extracts and analyses expectation metrics to build such KPIS.

<!-- Collect usage data (view). Remove it to disable collection. View README for more details.  -->
<img width="1px" src="https://ppxrzfxige.execute-api.us-west-2.amazonaws.com/v1/analytics?category=data-engineering&org_id=796524194907820&notebook=%2Fexplorations%2F02-Pipeline-event-monitoring&demo_name=pipeline-bike&event=VIEW&path=%2F_dbdemos%2Fdata-engineering%2Fpipeline-bike%2Fexplorations%2F02-Pipeline-event-monitoring&version=1">


## Your event log table is now available as a Table within your schema!
This is simply set as an option in your pipeline configuration menu.

In [0]:
%sql
SELECT
  *
FROM
  main.dbdemos_pipeline_bike.pipeline_bike_event_logs
limit 10

The `details` column contains metadata about each Event sent to the Event Log in a JSON blob. Using `parse_json` and the `VARIANT` data type we can explore it as if it was an object. There are different fields depending on what type of Event it is. Some examples include:
* `user_action` Events occur when taking actions like creating the pipeline
* `flow_definition` Events occur when a pipeline is deployed or updated and have lineage, schema, and execution plan information
  * `output_dataset` and `input_datasets` - output table/view and its upstream table(s)/view(s)
  * `flow_type` - whether this is a complete or append flow
  * `explain_text` - the Spark explain plan
* `flow_progress` Events occur when a data flow starts running or finishes processing a batch of data
  * `metrics` - currently contains `num_output_rows`
  * `data_quality` - contains an array of the results of the data quality rules for this particular dataset
    * `dropped_records`
    * `expectations`
      * `name`, `dataset`, `passed_records`, `failed_records`
  

In [0]:
%sql
SELECT
  details:flow_definition.output_dataset,
  details:flow_definition.input_datasets,
  details:flow_definition.flow_type,
  details:flow_definition.schema,
  details:flow_definition
FROM main.dbdemos_pipeline_bike.pipeline_bike_event_logs
WHERE details:flow_definition IS NOT NULL
ORDER BY timestamp


In [0]:
%sql
select
  e.origin.update_id,
  ex.value:name::string,
  ex.value:dataset::string,
  ex.value:passed_records::long as passed_records,
  ex.value:failed_records::long as failed_records
from
  main.dbdemos_pipeline_bike.pipeline_bike_event_logs e,
  lateral variant_explode(parse_json(e.details:flow_progress:data_quality:expectations:[ * ])) as ex
where
  e.event_type = "flow_progress"
  and details:flow_progress:status = "RUNNING"
  and details:flow_progress:data_quality:expectations IS NOT NULL

## Tracking data quality as an AI/BI dashboard

Let's leverage Databricks AI/BI dashboard to monitor our pipeline and data ingestion. 

- Open the <a  dbdemos-dashboard-id="data-quality" href='/sql/dashboardsv3/01f0a99554a21eca84cf62cd2c8823b1' target="_blank">Bike Rental Data Monitoring Dashboard</a> to track all your data quality, and add alerts based on your requirements.
- Open the <a  dbdemos-dashboard-id="operational" href='/sql/dashboardsv3/01f0a99554651cb280bc3da6714779ae' target="_blank">Bike Rental Operational Pipeline Dashboard</a> to track all your pipeline event and cost!