
# AI / BI On Databricks: Your AI-powered Lakehouse is the best Warehouse

Traditional Data Warehouses can’t keep up with the variety of data and use cases. Business agility requires reliable, real-time data, with insight from ML models.

Working with the lakehouse unlock traditional BI analysis but also real time applications having a direct connection to your entire data, while remaining fully secured.

With AI assistant infused in all the BI stack, Databricks makes it easy to analysts and business users to extract insight from their data.  
<br>

<img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/dbsql.png" width="700px" style="float: left" />

<div style="float: left; margin-top: 240px; font-size: 23px">
  Instant, elastic compute<br>
  Lower TCO with Serveless<br>
  Zero management<br><br>

  Governance layer - row level<br><br>

  Your data. Your schema (star, data vault…)
</div>

<!-- Collect usage data (view). Remove it to disable collection or disable tracker during installation. View README for more details.  -->
<img width="1px" src="https://ppxrzfxige.execute-api.us-west-2.amazonaws.com/v1/analytics?category=lakehouse&org_id=1444828305810485&notebook=%2F03-AI-BI-data-warehousing%2F03.1-AI-BI-Datawarehousing&demo_name=lakehouse-retail-c360&event=VIEW&path=%2F_dbdemos%2Flakehouse%2Flakehouse-retail-c360%2F03-AI-BI-data-warehousing%2F03.1-AI-BI-Datawarehousing&version=1">

# BI & Datawarehousing with Databricks SQL

<img style="float: right; margin-top: 10px" width="500px" src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/retail/lakehouse-churn/lakehouse-retail-c360-churn-3.png" />

Our datasets are now properly ingested, secured, with a high quality and easily discoverable within our organization.

Let's explore how Databricks SQL support your Data Analyst team with interactive BI and start analyzing our customer Churn.

To start with Databricks SQL, open the SQL view on the top left menu.

You'll be able to:

- Create a SQL Warehouse to run your queries
- Use DBSQL to build your own dashboards
- Plug any BI tools (Tableau/PowerBI/..) to run your analysis

Add a SQL Warehouse already created step and move the Creating a SQL warehouse below Advanced tip section. So does monitoring 



### Creating a SQL Warehouse 

<img style="float: right; margin-top: 20px" width="600px" src=
"./Images/Data_Warehouse/Create warehouse.png">


A SQL warehouse is a compute resource that lets you run SQL commands on data objects within Databricks SQL.

- You can create a SQL Warehouse from the SQL Warehouse section under SQL from the left pane

- For this workshop let's [Create a SQL Warehouse](/sql/warehouses) to execute workloads, such as data ingestion, queries, visualisations and dashboards
- Click on Create SQL Warehouse
- Choose a Name for your warehouse. Let’s call it data pioneer whs 
- Let's pick X-Small as the Cluster size
- Set the Auto-Stop to 10 minutes. Databricks serverless offers fast start and stop. In this case our warehouse will shut down after 5 minutes of inactivity
- Autoscaling will dynamically scale your warehouse to support hundreds of concurrent users and usage spikes
- We will create a Serverless Warehouse since that has all the latest features and offers fast performance with instant start and stop
- Click Create




## Manage Permissions on SQL Warehouse

<img style="float: right; margin-top: 20px" width="600px" src="./Images/Data_Warehouse/Manage Permissions.png"/>

- In the Manage Permissions section, you can allow other users or groups to access your warehouse. 



## Monitoring of SQL Warehouse

 <img style="float: right; margin-top: 20px" width="600px" src="./Images/Data_Warehouse/Monitoring.png"/>

- Navigate to Monitoring tab from the Overview section
- This Monitoring tab provides you a graphical representation that can be used to monitor your SQL Warehouse usage where cluster will autoscale based on the load.
- This is where you can review the query load and the automatic scaling based on your workload.

## Our SQL Warehouse is now ready for use!



## Explore Data from Catalog Explorer

You can access all your data from Catalog on the left:
- Tables (save under catalog and databases)
- Volumes (containing direct file access)
- AI models
- Functions

- We already have our Catalog, Schema and Table saved for this workshop so let's navigate to Catalog (to be updated)
- Within this Catalog, let's select we <_Catalog Name_> that contains all our Schemas containing customer, order, and churn data

## **_Intelligent AI Powered Search Experience_ 
Databricks provides intelligent unified search capbility simplifying the discovery of all the assets you need for your data and AI projects
- You can use the search box on top to perform a Full Page search
- The search returns all the tables, notebooks, jobs, queries that use the searched keyword
- Improved relevance and popularity- Search uses popularity signals based on how often other users in your workspace are interacting with specific assets to improve how objects are ranked
- Attached [here](https://www.databricks.com/blog/adding-intelligence-to-databricks-search) is the complete list of search capabilities


## Creating your first Query 


<img style="float: right; margin-top: 20px" width="600px" src="./Images/Data_Warehouse/FirstQuery.png"/>


Our users can now start running SQL queries using the SQL editor and add new visualizations.

The SQL Editor consists of 3 main panes:
- Schema Browser
- Query Pane
- Results Pane

By leveraging auto-completion and the schema browser, we can start running adhoc queries on top of our data.

- Navigate to SQL Editor from the left panel
- From the right pane click on Catalog to explore your catalog, schema, and tables
- On the top of the Query Pane, specify your catalog and schema. This allows you to only reference the table names in your query without having to qualify the full catalog.schema.tablename everytime you run the query


## Take Advantage of Databricks AI Assitant


<img style="float: right; margin-top: 20px" width="700px" src="./Images/Data_Warehouse/DBAssitant.png"/>


Databricks AI Assistant provides an always-available expert to help answer your questions and deploy your projects faster.

Using the Assistant, you can get help writing and explaining queries, debug code, create visualizations and much more. 

- From the left top corner, toggle the Databricks Assistant
- In the text box below, type "/"
- Now you can ask the assitant to explain the current code, find tables, or even optimize a query that you have written 
- Let's try to enter /findTables and see what results we get
- We will be prompted to provide our search parameters. Let's find tables with churn features
- The Assitant is going to help us discover the churn feature table in the respective catalog and schema

In [0]:
USE catalog main;
USE schema pallavi_dbdemos_retail_c360;

## Copy and paste the query below in the SQL Editor and analyze the results


In [0]:
-- Retrieve a summary of customer purchase behavior, including total spend, order count, and average order value.
SELECT 
    u.user_id,
    u.firstname,
    u.lastname,
    u.country,
    COUNT(o.order_id) AS total_orders,
    SUM(o.amount) AS total_spent,
   round(AVG(o.amount)) AS avg_order_value,
    MAX(o.creation_date) AS last_order_date
FROM churn_users u
LEFT JOIN churn_orders o ON u.user_id = o.user_id
GROUP BY u.user_id, u.firstname, u.lastname, u.country
ORDER BY COUNT(o.order_id) DESC
LIMIT 1000;

In [0]:
-- Query to analyze all the data in the churn_features table

select * 
from churn_features


## Create your first Visualization 
- From the Query Editor- click on +
- Select Visualization
- Select Visualization type **Bar**
- Select **canal** for X-Axis
- Select **user_id** for Y-axis
- Select **count** for Y-axis 
- Select **country** for Group By column
- Click Save to Save the visualization to the Query Editor

<img src="./Images/Data_Warehouse/Count of users by canal and country.png" alt="description"/>

## Add Parameters to Your Query

Query parameters allow you to make your queries more dynamic and flexible by inserting variable values at runtime. Instead of hard-coding specific values into your queries, you can define parameters to filter data or modify output based on user input. 

This approach improves query reuse, enhances security by preventing SQL injection, and enables more efficient handling of diverse data scenarios.

You can add different kinds of parameters such as:
- Text
- Number
- Dropdown List
- Query based Dropdown List
- Date and Time

For today's workshop we will be adding Text and Dropdown List Paramter to our Query

- Copy and Paste the code in the cell below to your SQL Editor
- From the gear icon next to the country parameter, select Type as Dropdown List
- Provide the value for the dropdown list in the Values box, each value separated by a new line
- Click OK
- Your parametrized query is created. Now you can select different values for gender and country paramters and observe the result set


<img style="float: right; width: 50px; margin-left: 20px;" src="./Images/Data_Warehouse/DropdownParamter.png" alt="description"/>

In [0]:
--Copy and Paste the following Query in your SQL Editor and select different Parameters to run the Query

SELECT canal, 
COUNT(user_id) AS total_customers,
SUM(CASE WHEN churn = 1 THEN 1 ELSE 0 END) AS churned_customers,
ROUND((SUM(CASE WHEN churn = 1 THEN 1 ELSE 0 END) * 100.0 / COUNT(user_id)), 2) AS churn_rate
FROM churn_features
WHERE gender=:gender
AND country = '{{country}}'
AND canal IS NOT NULL
GROUP BY canal
ORDER BY churn_rate DESC;



<img style="float: right; width: 50px; margin-left: 20px;" src="./Images/Data_Warehouse/Parameterized Query.png" alt="description"/>

## SAVE QUERY
- In the SQL Editor- Click Save on the top right corner
- Update the Name of the Query in the dialog box
- Select the folder where you want to Save the query or Click on + to Create a new folder
- For Today's session, let's click on + and create a new folder
- Name the folder as "Data Pioneer Lab"
- Select Create
- Click Save

<img style="float: right; width: 50px; margin-left: 20px;" src="./Images/Data_Warehouse/Save1.png" alt="description"/>

## SCHEDULE QUERY

You can use scheduled query executions to update your dashboards or enable routine alerts.

To set the schedule:

- In the Query Editor, click Schedule>Add schedule to open a menu with schedule settings
- Choose when to run the query
- Click Create. Your query will run automatically according to the schedule
<img style="float: right; width: 50px; margin-left: 20px;" src="./Images/Data_Warehouse/ScheduleQuery.png" alt="description"/>

## SHARE QUERY

You can share a query with different collaborators
To share query with different users and groups:

- Click the Share button at the top right to open the Sharing dialog
- Search for and select the groups and users, and assign the permission level
- Click Add
- In the Sharing settings > Credentials field at the bottom, select either Run as viewer or Run as owner

<img style="float: right; width: 50px; margin-left: 20px;" src="./Images/Data_Warehouse/ShareQuery.png" alt="description"/>

## Going further with DBSQL & Databricks Warehouse

Databricks SQL offers much more and provides a full warehouse capabilities

<img style="float: right" width="400px" src="https://raw.githubusercontent.com/QuentinAmbard/databricks-demo/main/retail/resources/images/lakehouse-retail/lakehouse-retail-dbsql-pk-fk.png" />

### Data modeling

Comprehensive data modeling. Save your data based on your requirements: Data vault, Star schema, Inmon...

Databricks let you create your PK/FK, identity columns (auto-increment): `dbdemos.install('identity-pk-fk')`

### Data ingestion made easy with DBSQL & DBT

Turnkey capabilities allow analysts and analytic engineers to easily ingest data from anything like cloud storage to enterprise applications such as Salesforce, Google Analytics, or Marketo using Fivetran. It’s just one click away. 

Then, simply manage dependencies and transform data in-place with built-in ETL capabilities on the Lakehouse (Delta Live Table), or using your favorite tools like dbt on Databricks SQL for best-in-class performance.

### Query federation

Need to access cross-system data? Databricks SQL query federation let you define datasources outside of databricks (ex: PostgreSQL)

### Materialized view

Avoid expensive queries and materialize your tables. The engine will recompute only what's required when your data get updated. 