-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>

# Databricks Platform

Demonstrate basic functionality and identify terms related to working in the Databricks workspace.


##### Objectives
1. Execute code in multiple languages
1. Create documentation cells
1. Access DBFS (Databricks File System)
1. Create database and table
1. Query table and plot results
1. Add notebook parameters with widgets


##### Databricks Notebook Utilities
- <a href="https://docs.databricks.com/notebooks/notebooks-use.html#language-magic" target="_blank">Magic commands</a>: **`%python`**, **`%scala`**, **`%sql`**, **`%r`**, **`%sh`**, **`%md`**
- <a href="https://docs.databricks.com/dev-tools/databricks-utils.html" target="_blank">DBUtils</a>: **`dbutils.fs`** (**`%fs`**), **`dbutils.notebooks`** (**`%run`**), **`dbutils.widgets`**
- <a href="https://docs.databricks.com/notebooks/visualizations/index.html" target="_blank">Visualization</a>: **`display`**, **`displayHTML`**

### Setup
Run classroom setup to <a href="https://docs.databricks.com/data/databricks-file-system.html#mount-storage" target="_blank">mount</a> Databricks training datasets and create your own database for BedBricks.

Use the **`%run`** magic command to run another notebook within a notebook

In [0]:
%run ../Includes/Classroom-Setup

Deleted the working directory dbfs:/user/kucukoglu_ersan@student.ceu.edu/dbacademy/aspwd/asp_1_1_databricks_platform


Your working directory is
dbfs:/user/kucukoglu_ersan@student.ceu.edu/dbacademy/aspwd

The source for this dataset is
wasbs://courseware@dbacademy.blob.core.windows.net/apache-spark-programming-with-databricks/v02/

Skipping install of existing dataset to
dbfs:/user/kucukoglu_ersan@student.ceu.edu/dbacademy/aspwd/datasets


Out[5]: DataFrame[key: string, value: string]

### Execute code in multiple languages
Run default language of notebook

In [0]:
print("Run default language")

Run default language


Run language specified by language magic commands: **`%python`**, **`%scala`**, **`%sql`**, **`%r`**

In [0]:
%python
print("Run python")

Run python


In [0]:
%scala
println("Run scala")

In [0]:
%sql
select "Run SQL"

Run SQL
Run SQL


In [0]:
%r
print("Run R", quote=FALSE)

Run shell commands on the driver using the magic command: **`%sh`**

In [0]:
%sh ps | grep 'java'

  276 ?        00:01:05 java
  480 ?        00:03:24 java


Render HTML using the function: **`displayHTML`** (available in Python, Scala, and R)

In [0]:
html = """<h1 style="color:orange;text-align:center;font-family:Courier">Render HTML</h1>"""
displayHTML(html)

## Create documentation cells
Render cell as <a href="https://www.markdownguide.org/cheat-sheet/" target="_blank">Markdown</a> using the magic command: **`%md`**

Below are some examples of how you can use Markdown to format documentation. Click this cell and press **`Enter`** to view the underlying Markdown syntax.


# Heading 1
### Heading 3
> block quote

1. **bold**
2. *italicized*
3. ~~strikethrough~~

---

- <a href="https://www.markdownguide.org/cheat-sheet/" target="_blank">link</a>
- `code`

```
{
  "message": "This is a code block",
  "method": "https://www.markdownguide.org/extended-syntax/#fenced-code-blocks",
  "alternative": "https://www.markdownguide.org/basic-syntax/#code-blocks"
}
```

![Spark Logo](https://files.training.databricks.com/images/Apache-Spark-Logo_TM_200px.png)

| Element         | Markdown Syntax |
|-----------------|-----------------|
| Heading         | `#H1` `##H2` `###H3` `#### H4` `##### H5` `###### H6` |
| Block quote     | `> blockquote` |
| Bold            | `**bold**` |
| Italic          | `*italicized*` |
| Strikethrough   | `~~strikethrough~~` |
| Horizontal Rule | `---` |
| Code            | ``` `code` ``` |
| Link            | `[text](https://www.example.com)` |
| Image           | `[alt text](image.jpg)`|
| Ordered List    | `1. First items` <br> `2. Second Item` <br> `3. Third Item` |
| Unordered List  | `- First items` <br> `- Second Item` <br> `- Third Item` |
| Code Block      | ```` ``` ```` <br> `code block` <br> ```` ``` ````|
| Table           |<code> &#124; col &#124; col &#124; col &#124; </code> <br> <code> &#124;---&#124;---&#124;---&#124; </code> <br> <code> &#124; val &#124; val &#124; val &#124; </code> <br> <code> &#124; val &#124; val &#124; val &#124; </code> <br>|

## Access DBFS (Databricks File System)
The <a href="https://docs.databricks.com/data/databricks-file-system.html" target="_blank">Databricks File System</a> (DBFS) is a virtual file system that allows you to treat cloud object storage as though it were local files and directories on the cluster.

Run file system commands on DBFS using the magic command: **`%fs`**

In [0]:
%fs ls

path,name,size,modificationTime
dbfs:/databricks-datasets/,databricks-datasets/,0,0
dbfs:/databricks-results/,databricks-results/,0,0
dbfs:/user/,user/,0,0


In [0]:
%fs ls /databricks-datasets

path,name,size,modificationTime
dbfs:/databricks-datasets/,databricks-datasets/,0,0
dbfs:/databricks-datasets/COVID/,COVID/,0,0
dbfs:/databricks-datasets/README.md,README.md,976,1532468253000
dbfs:/databricks-datasets/Rdatasets/,Rdatasets/,0,0
dbfs:/databricks-datasets/SPARK_README.md,SPARK_README.md,3359,1455043490000
dbfs:/databricks-datasets/adult/,adult/,0,0
dbfs:/databricks-datasets/airlines/,airlines/,0,0
dbfs:/databricks-datasets/amazon/,amazon/,0,0
dbfs:/databricks-datasets/asa/,asa/,0,0
dbfs:/databricks-datasets/atlas_higgs/,atlas_higgs/,0,0


In [0]:
%fs head /databricks-datasets/README.md

In [0]:
%fs mounts

mountPoint,source,encryptionType
/databricks-datasets,databricks-datasets,sse-s3
/databricks/mlflow-tracking,databricks/mlflow-tracking,sse-s3
/databricks-results,databricks-results,sse-s3
/databricks/mlflow-registry,databricks/mlflow-registry,sse-s3
/,DatabricksRoot,sse-s3


**`%fs`** is shorthand for the <a href="https://docs.databricks.com/dev-tools/databricks-utils.html" target="_blank">DBUtils</a> module: **`dbutils.fs`**

In [0]:
%fs help

Run file system commands on DBFS using DBUtils directly

In [0]:
dbutils.fs.ls("/databricks-datasets")

Out[12]: [FileInfo(path='dbfs:/databricks-datasets/', name='databricks-datasets/', size=0, modificationTime=0),
 FileInfo(path='dbfs:/databricks-datasets/COVID/', name='COVID/', size=0, modificationTime=0),
 FileInfo(path='dbfs:/databricks-datasets/README.md', name='README.md', size=976, modificationTime=1532468253000),
 FileInfo(path='dbfs:/databricks-datasets/Rdatasets/', name='Rdatasets/', size=0, modificationTime=0),
 FileInfo(path='dbfs:/databricks-datasets/SPARK_README.md', name='SPARK_README.md', size=3359, modificationTime=1455043490000),
 FileInfo(path='dbfs:/databricks-datasets/adult/', name='adult/', size=0, modificationTime=0),
 FileInfo(path='dbfs:/databricks-datasets/airlines/', name='airlines/', size=0, modificationTime=0),
 FileInfo(path='dbfs:/databricks-datasets/amazon/', name='amazon/', size=0, modificationTime=0),
 FileInfo(path='dbfs:/databricks-datasets/asa/', name='asa/', size=0, modificationTime=0),
 FileInfo(path='dbfs:/databricks-datasets/atlas_higgs/', name='

Visualize results in a table using the Databricks <a href="https://docs.databricks.com/notebooks/visualizations/index.html#display-function-1" target="_blank">display</a> function

In [0]:
files = dbutils.fs.ls("/databricks-datasets")
display(files)

## Our First Table

Is located in the path identfied by **`events_path`** (a variable we created for you).

We can see those files by running the following cell

In [0]:
files = dbutils.fs.ls(events_path)
display(files)

path,name,size,modificationTime
dbfs:/user/kucukoglu_ersan@student.ceu.edu/dbacademy/aspwd/datasets/events/events.delta/_delta_log/,_delta_log/,0,0
dbfs:/user/kucukoglu_ersan@student.ceu.edu/dbacademy/aspwd/datasets/events/events.delta/part-00000-eb68ecaf-f8e1-4820-9513-24e158ed1e22-c000.snappy.parquet,part-00000-eb68ecaf-f8e1-4820-9513-24e158ed1e22-c000.snappy.parquet,75373205,1650894657000
dbfs:/user/kucukoglu_ersan@student.ceu.edu/dbacademy/aspwd/datasets/events/events.delta/part-00001-e9be20a6-591a-4c06-9284-36d33f8bb378-c000.snappy.parquet,part-00001-e9be20a6-591a-4c06-9284-36d33f8bb378-c000.snappy.parquet,75384788,1650894664000
dbfs:/user/kucukoglu_ersan@student.ceu.edu/dbacademy/aspwd/datasets/events/events.delta/part-00002-5793eed4-8dea-4287-abe1-a8ed30032f86-c000.snappy.parquet,part-00002-5793eed4-8dea-4287-abe1-a8ed30032f86-c000.snappy.parquet,75393846,1650894669000
dbfs:/user/kucukoglu_ersan@student.ceu.edu/dbacademy/aspwd/datasets/events/events.delta/part-00003-3c9024f7-5419-45b5-873d-4756e510a797-c000.snappy.parquet,part-00003-3c9024f7-5419-45b5-873d-4756e510a797-c000.snappy.parquet,75295715,1650894674000


## But, Wait!
I cannot use variables in SQL commands.

With the following trick you can!

Declare the python variable as a variable in the spark context which SQL commands can access:

In [0]:
spark.sql(f"SET c.events_path = {events_path}")

Out[7]: DataFrame[key: string, value: string]

## Create table
Run <a href="https://docs.databricks.com/spark/latest/spark-sql/language-manual/index.html#sql-reference" target="_blank">Databricks SQL Commands</a> to create a table named **`events`** using BedBricks event files on DBFS.

In [0]:
%sql
CREATE TABLE IF NOT EXISTS events
USING DELTA
OPTIONS (path = "${c.events_path}");

This table was saved in the database created for you in the classroom setup. See the database name printed below.

In [0]:
print(database_name)

dbacademy_kucukoglu_ersan_student_ceu_edu_aspwd_asp_1_1_databricks_platform


View your database and table in the Data tab of the UI.

## Query table and plot results
Use SQL to query the **`events`** table

In [0]:
%sql
SELECT * FROM events

Run the query below and then <a href="https://docs.databricks.com/notebooks/visualizations/index.html#plot-types" target="_blank">plot</a> results by selecting the bar chart icon.

In [0]:
%sql
SELECT traffic_source, SUM(ecommerce.purchase_revenue_in_usd) AS total_revenue
FROM events
GROUP BY traffic_source

## Add notebook parameters with widgets
Use <a href="https://docs.databricks.com/notebooks/widgets.html" target="_blank">widgets</a> to add input parameters to your notebook.

Create a text input widget using SQL.

In [0]:
%sql
CREATE WIDGET TEXT state DEFAULT "CA"

Access the current value of the widget using the function **`getArgument`**

In [0]:
%sql
SELECT *
FROM events
WHERE geo.state = getArgument("state")

Remove the text widget

In [0]:
%sql
REMOVE WIDGET state

To create widgets in Python, Scala, and R, use the DBUtils module: **`dbutils.widgets`**

In [0]:
dbutils.widgets.text("name", "Brickster", "Name")
dbutils.widgets.multiselect("colors", "orange", ["red", "orange", "black", "blue"], "Traffic Sources")

Access the current value of the widget using the **`dbutils.widgets`** function **`get`**

In [0]:
name = dbutils.widgets.get("name")
colors = dbutils.widgets.get("colors").split(",")

html = "<div>Hi {}! Select your color preference.</div>".format(name)
for c in colors:
    html += """<label for="preference" style="color:{}"><input type="radio"> {}</label><br>""".format(c, c, c)

displayHTML(html)

Remove all widgets

In [0]:
dbutils.widgets.removeAll()

### Clean up classroom
Clean up any temp files, tables and databases created by this lesson

In [0]:
classroom_cleanup()

-sandbox
&copy; 2022 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="https://help.databricks.com/">Support</a>