-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>

# Databricks Platform

Demonstrate basic functionality and identify terms related to working in the Databricks workspace.


##### Objectives
1. Execute code in multiple languages
1. Create documentation cells
1. Access DBFS (Databricks File System)
1. Create database and table
1. Query table and plot results
1. Add notebook parameters with widgets


##### Databricks Notebook Utilities
- <a href="https://docs.databricks.com/notebooks/notebooks-use.html#language-magic" target="_blank">Magic commands</a>: **`%python`**, **`%scala`**, **`%sql`**, **`%r`**, **`%sh`**, **`%md`**
- <a href="https://docs.databricks.com/dev-tools/databricks-utils.html" target="_blank">DBUtils</a>: **`dbutils.fs`** (**`%fs`**), **`dbutils.notebooks`** (**`%run`**), **`dbutils.widgets`**
- <a href="https://docs.databricks.com/notebooks/visualizations/index.html" target="_blank">Visualization</a>: **`display`**, **`displayHTML`**

### Setup
Run classroom setup to <a href="https://docs.databricks.com/data/databricks-file-system.html#mount-storage" target="_blank">mount</a> Databricks training datasets and create your own database for BedBricks.

Use the **`%run`** magic command to run another notebook within a notebook

In [0]:
%run ../Includes/Classroom-Setup

Python interpreter will be restarted.
Python interpreter will be restarted.



Skipping install of existing datasets to "dbfs:/mnt/dbacademy-datasets/apache-spark-programming-with-databricks/v03"

Validating the locally installed datasets...(4 seconds)

Predefined tables in "da_sergio_salgado_4613_asp":
  -none-

Predefined paths variables:
  DA.paths.user_db:     dbfs:/mnt/dbacademy-users/sergio.salgado@n.world/apache-spark-programming-with-databricks/database.db
  DA.paths.datasets:    dbfs:/mnt/dbacademy-datasets/apache-spark-programming-with-databricks/v03
  DA.paths.working_dir: dbfs:/mnt/dbacademy-users/sergio.salgado@n.world/apache-spark-programming-with-databricks
  DA.paths.checkpoints: dbfs:/mnt/dbacademy-users/sergio.salgado@n.world/apache-spark-programming-with-databricks/_checkpoints
  DA.paths.sales:       dbfs:/mnt/dbacademy-datasets/apache-spark-programming-with-databricks/v03/ecommerce/sales/sales.delta
  DA.paths.users:       dbfs:/mnt/dbacademy-datasets/apache-spark-programming-with-databricks/v03/ecommerce/users/users.delta
  DA.paths.events:

### Execute code in multiple languages
Run default language of notebook

In [0]:
%scala
println("Run default language")

Run language specified by language magic commands: **`%python`**, **`%scala`**, **`%sql`**, **`%r`**

In [0]:
%python
print("Run python")

Run python


In [0]:
%scala
println("Run scala")

In [0]:
%sql
select "Run SQL"

Run SQL
Run SQL


In [0]:
%r
print("Run R", quote=FALSE)

Run shell commands on the driver using the magic command: **`%sh`**

In [0]:
%sh ps | grep 'java'

  273 ?        00:01:14 java
  473 ?        00:06:47 java


Render HTML using the function: **`displayHTML`** (available in Python, Scala, and R)

In [0]:
html = """<h1 style="color:orange;text-align:center;font-family:Courier">Render HTML</h1>"""
displayHTML(html)

## Create documentation cells
Render cell as <a href="https://www.markdownguide.org/cheat-sheet/" target="_blank">Markdown</a> using the magic command: **`%md`**

Below are some examples of how you can use Markdown to format documentation. Click this cell and press **`Enter`** to view the underlying Markdown syntax.


# Heading 1
### Heading 3
> block quote

1. **bold**
2. *italicized*
3. ~~strikethrough~~

---

- <a href="https://www.markdownguide.org/cheat-sheet/" target="_blank">link</a>
- `code`

```
{
  "message": "This is a code block",
  "method": "https://www.markdownguide.org/extended-syntax/#fenced-code-blocks",
  "alternative": "https://www.markdownguide.org/basic-syntax/#code-blocks"
}
```

![Spark Logo](https://files.training.databricks.com/images/Apache-Spark-Logo_TM_200px.png)

| Element         | Markdown Syntax |
|-----------------|-----------------|
| Heading         | `#H1` `##H2` `###H3` `#### H4` `##### H5` `###### H6` |
| Block quote     | `> blockquote` |
| Bold            | `**bold**` |
| Italic          | `*italicized*` |
| Strikethrough   | `~~strikethrough~~` |
| Horizontal Rule | `---` |
| Code            | ``` `code` ``` |
| Link            | `[text](https://www.example.com)` |
| Image           | `[alt text](image.jpg)`|
| Ordered List    | `1. First items` <br> `2. Second Item` <br> `3. Third Item` |
| Unordered List  | `- First items` <br> `- Second Item` <br> `- Third Item` |
| Code Block      | ```` ``` ```` <br> `code block` <br> ```` ``` ````|
| Table           |<code> &#124; col &#124; col &#124; col &#124; </code> <br> <code> &#124;---&#124;---&#124;---&#124; </code> <br> <code> &#124; val &#124; val &#124; val &#124; </code> <br> <code> &#124; val &#124; val &#124; val &#124; </code> <br>|

## Access DBFS (Databricks File System)
The <a href="https://docs.databricks.com/data/databricks-file-system.html" target="_blank">Databricks File System</a> (DBFS) is a virtual file system that allows you to treat cloud object storage as though it were local files and directories on the cluster.

Run file system commands on DBFS using the magic command: **`%fs`**

<br/>
<img src="https://files.training.databricks.com/images/icon_hint_24.png"/>
Replace the instances of <strong>FILL_IN</strong> in the cells below with your email address:

In [0]:
%fs mounts

mountPoint,source,encryptionType
/databricks-datasets,databricks-datasets,sse-s3
/databricks/mlflow-tracking,databricks/mlflow-tracking,sse-s3
/databricks-results,databricks-results,sse-s3
/databricks/mlflow-registry,databricks/mlflow-registry,sse-s3
/,DatabricksRoot,sse-s3


In [0]:
%fs ls

path,name,size,modificationTime
dbfs:/databricks-datasets/,databricks-datasets/,0,0
dbfs:/databricks-results/,databricks-results/,0,0
dbfs:/delta/,delta/,0,0
dbfs:/mnt/,mnt/,0,0
dbfs:/tmp/,tmp/,0,0
dbfs:/user/,user/,0,0


In [0]:
%fs ls dbfs:/tmp

path,name,size,modificationTime
dbfs:/tmp/hive/,hive/,0,0
dbfs:/tmp/sergio.salgado@n.world.txt,sergio.salgado@n.world.txt,69,1664180604000


In [0]:
%fs put dbfs:/tmp/sergio.salgado@n.world.txt "This is a test of the emergency broadcast system, this is only a test" --overwrite=true

In [0]:
%fs head dbfs:/tmp/sergio.salgaldo@n.world.txt

In [0]:
%fs ls dbfs:/tmp

path,name,size,modificationTime
dbfs:/tmp/hive/,hive/,0,0
dbfs:/tmp/sergio.salgado@n.world.txt,sergio.salgado@n.world.txt,69,1664379092000


**`%fs`** is shorthand for the <a href="https://docs.databricks.com/dev-tools/databricks-utils.html" target="_blank">DBUtils</a> module: **`dbutils.fs`**

In [0]:
%fs help

Run file system commands on DBFS using DBUtils directly

In [0]:
dbutils.fs.ls("dbfs:/tmp")

Out[7]: [FileInfo(path='dbfs:/tmp/hive/', name='hive/', size=0, modificationTime=0),
 FileInfo(path='dbfs:/tmp/sergio.salgado@n.world.txt', name='sergio.salgado@n.world.txt', size=69, modificationTime=1664379092000)]

Visualize results in a table using the Databricks <a href="https://docs.databricks.com/notebooks/visualizations/index.html#display-function-1" target="_blank">display</a> function

In [0]:
files = dbutils.fs.ls("dbfs:/tmp")
display(files)

path,name,size,modificationTime
dbfs:/tmp/hive/,hive/,0,0
dbfs:/tmp/sergio.salgado@n.world.txt,sergio.salgado@n.world.txt,69,1664379092000


Let's take one more look at our temp file...

In [0]:
file_name = "dbfs:/tmp/sergio.salgado@n.world.txt"
contents = dbutils.fs.head(file_name)

print("-"*80)
print(contents)
print("-"*80)

--------------------------------------------------------------------------------
This is a test of the emergency broadcast system, this is only a test
--------------------------------------------------------------------------------


## Our First Table

Is located in the path identfied by **`DA.paths.events`** (a variable we created for you).

We can see those files by running the following cell

In [0]:
files = dbutils.fs.ls(DA.paths.events)
display(files)

path,name,size,modificationTime
dbfs:/mnt/dbacademy-datasets/apache-spark-programming-with-databricks/v03/ecommerce/events/events.delta/_delta_log/,_delta_log/,0,0
dbfs:/mnt/dbacademy-datasets/apache-spark-programming-with-databricks/v03/ecommerce/events/events.delta/part-00000-eb68ecaf-f8e1-4820-9513-24e158ed1e22-c000.snappy.parquet,part-00000-eb68ecaf-f8e1-4820-9513-24e158ed1e22-c000.snappy.parquet,75373205,1664180503000
dbfs:/mnt/dbacademy-datasets/apache-spark-programming-with-databricks/v03/ecommerce/events/events.delta/part-00001-e9be20a6-591a-4c06-9284-36d33f8bb378-c000.snappy.parquet,part-00001-e9be20a6-591a-4c06-9284-36d33f8bb378-c000.snappy.parquet,75384788,1664180510000
dbfs:/mnt/dbacademy-datasets/apache-spark-programming-with-databricks/v03/ecommerce/events/events.delta/part-00002-5793eed4-8dea-4287-abe1-a8ed30032f86-c000.snappy.parquet,part-00002-5793eed4-8dea-4287-abe1-a8ed30032f86-c000.snappy.parquet,75393846,1664180515000
dbfs:/mnt/dbacademy-datasets/apache-spark-programming-with-databricks/v03/ecommerce/events/events.delta/part-00003-3c9024f7-5419-45b5-873d-4756e510a797-c000.snappy.parquet,part-00003-3c9024f7-5419-45b5-873d-4756e510a797-c000.snappy.parquet,75295715,1664180520000


## But, Wait!
I cannot use variables in SQL commands.

With the following trick you can!

Declare the python variable as a variable in the spark context which SQL commands can access:

In [0]:
spark.conf.set("whatever.events", DA.paths.events)

<img src="https://files.training.databricks.com/images/icon_note_24.png"> In the above example we use **`whatever.`** to give our variable a "namespace".

This is so that we don't accidently step over other configuration parameters.

You will see throughout this course our usage of the "DA" namesapce as in **`DA.paths.some_file`**

## Create table
Run <a href="https://docs.databricks.com/spark/latest/spark-sql/language-manual/index.html#sql-reference" target="_blank">Databricks SQL Commands</a> to create a table named **`events`** using BedBricks event files on DBFS.

In [0]:
%sql
CREATE TABLE IF NOT EXISTS events
USING DELTA
OPTIONS (path = "${whatever.events}");

This table was saved in the database created for you in classroom setup.

See database name printed below.

In [0]:
print(f"Database Name: {DA.db_name}")

Database Name: da_sergio_salgado_4613_asp


... or even the tables in that database:

In [0]:
%sql
SHOW TABLES IN ${DA.db_name}

database,tableName,isTemporary
da_sergio_salgado_4613_asp,events,False


View your database and table in the Data tab of the UI.

## Query table and plot results
Use SQL to query the **`events`** table

In [0]:
%sql
SELECT * FROM events

device,ecommerce,event_name,event_previous_timestamp,event_timestamp,geo,items,traffic_source,user_first_touch_timestamp,user_id
macOS,"List(null, null, null)",warranty,1593878899217692.0,1593878946592107,"List(Montrose, MI)",List(),google,1593878899217692,UA000000107379500
Windows,"List(null, null, null)",press,1593876662175340.0,1593877011756535,"List(Northampton, MA)",List(),google,1593876662175340,UA000000107359357
macOS,"List(null, null, null)",add_item,1593878792892652.0,1593878815459100,"List(Salinas, CA)","List(List(null, M_STAN_T, Standard Twin Mattress, 595.0, 595.0, 1))",youtube,1593878455472030,UA000000107375547
iOS,"List(null, null, null)",mattresses,1593878178791663.0,1593878809276923,"List(Everett, MA)",List(),facebook,1593877903116176,UA000000107370581
Windows,"List(null, null, null)",mattresses,,1593878628143633,"List(Cottage Grove, MN)",List(),google,1593878628143633,UA000000107377108
Windows,"List(null, null, null)",main,,1593878634344194,"List(Medina, MN)",List(),youtube,1593878634344194,UA000000107377161
iOS,"List(null, null, null)",main,,1593877936171803,"List(Mount Pleasant, UT)",List(),direct,1593877936171803,UA000000107370851
macOS,"List(null, null, null)",main,,1593876843215329,"List(Piedmont, AL)",List(),instagram,1593876843215329,UA000000107360961
Android,"List(null, null, null)",warranty,1593878529774474.0,1593879213196400,"List(Rancho Santa Margarita, CA)",List(),instagram,1593878529774474,UA000000107376205
Windows,"List(null, null, null)",main,,1593876713246514,"List(Elyria, OH)",List(),facebook,1593876713246514,UA000000107359805


Run the query below and then <a href="https://docs.databricks.com/notebooks/visualizations/index.html#plot-types" target="_blank">plot</a> results by selecting the bar chart icon.

In [0]:
%sql
SELECT traffic_source, SUM(ecommerce.purchase_revenue_in_usd) AS total_revenue
FROM events
GROUP BY traffic_source

traffic_source,total_revenue
instagram,16177893.0
direct,12704560.0
youtube,8044326.0
email,78800000.29999994
facebook,24797837.0
google,47218429.0


## Add notebook parameters with widgets
Use <a href="https://docs.databricks.com/notebooks/widgets.html" target="_blank">widgets</a> to add input parameters to your notebook.

Create a text input widget using SQL.

In [0]:
%sql
CREATE WIDGET TEXT state DEFAULT "CA"

Access the current value of the widget using the function **`getArgument`**

In [0]:
%sql
SELECT *
FROM events
WHERE geo.state = getArgument("state")

device,ecommerce,event_name,event_previous_timestamp,event_timestamp,geo,items,traffic_source,user_first_touch_timestamp,user_id
iOS,"List(null, null, null)",main,,1593585213296597,"List(Maywood, CA)",List(),google,1593585213296597,UA000000106459980
Chrome OS,"List(null, null, null)",add_item,1593617875873686.0,1593617890744265,"List(Montebello, CA)","List(List(null, M_STAN_Q, Standard Queen Mattress, 1045.0, 1045.0, 1))",direct,1593617794549174,UA000000106546041
Windows,"List(null, null, null)",faq,1593611874936826.0,1593612750857998,"List(Thousand Oaks, CA)",List(),direct,1593611874936826,UA000000106514480
Android,"List(null, null, null)",email_coupon,1593596669844153.0,1593599082242990,"List(Concord, CA)",List(),facebook,1593596023564390,UA000000106467706
Windows,"List(null, null, null)",mattresses,,1593613959440326,"List(Wasco, CA)",List(),facebook,1593613959440326,UA000000106524831
iOS,"List(null, null, null)",mattresses,,1593613533986229,"List(Gonzales, CA)",List(),google,1593613533986229,UA000000106522612
macOS,"List(null, null, null)",email_coupon,1593607311848884.0,1593607373302649,"List(Clovis, CA)",List(),instagram,1593607311848884,UA000000106495066
macOS,"List(null, null, null)",main,,1593610056825027,"List(Huntington Beach, CA)",List(),instagram,1593610056825027,UA000000106506085
Linux,"List(null, null, null)",cart,1593579614666895.0,1593580024336837,"List(Yuba City, CA)","List(List(null, P_FOAM_K, King Foam Pillow, 79.0, 79.0, 1))",google,1593575941938442,UA000000106457711
Windows,"List(null, null, null)",mattresses,1593609096278044.0,1593762827101654,"List(Los Angeles, CA)",List(),email,1593607374002709,UA000000106495287


Remove the text widget

In [0]:
%sql
REMOVE WIDGET state

To create widgets in Python, Scala, and R, use the DBUtils module: **`dbutils.widgets`**

In [0]:
dbutils.widgets.text("name", "Brickster", "Name")
dbutils.widgets.multiselect("colors", "orange", ["red", "orange", "black", "blue"], "Favorite Color?")

Access the current value of the widget using the **`dbutils.widgets`** function **`get`**

In [0]:
name = dbutils.widgets.get("name")
colors = dbutils.widgets.get("colors").split(",")

html = "<div>Hi {}! Select your color preference.</div>".format(name)
for c in colors:
    html += """<label for="{}" style="color:{}"><input type="radio"> {}</label><br>""".format(c, c, c)

displayHTML(html)

Remove all widgets

In [0]:
dbutils.widgets.removeAll()

### Clean up classroom
Clean up any temp files, tables and databases created by this lesson

In [0]:
DA.cleanup()

Resetting the learning environment...
...dropping the database "da_sergio_salgado_4613_asp"...(2 seconds)
...removing the working directory "dbfs:/mnt/dbacademy-users/sergio.salgado@n.world/apache-spark-programming-with-databricks"...(0 seconds)

Validating the locally installed datasets...(4 seconds)


-sandbox
&copy; 2022 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="https://help.databricks.com/">Support</a>