-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>

# Getting Started with the Databricks Platform

This notebook provides a hands-on review of some of the basic functionality of the Databricks Data Science and Engineering Workspace.

## Learning Objectives
By the end of this lab, you should be able to:
- Rename a notebook and change the default language
- Attach a cluster
- Use the **`%run`** magic command
- Run Python and SQL cells
- Create a Markdown cell

# Renaming a Notebook

Changing the name of a notebook is easy. Click on the name at the top of this page, then make changes to the name. To make it easier to navigate back to this notebook later in case you need to, append a short test string to the end of the existing name.

# Attaching a cluster

Executing cells in a notebook requires computing resources, which is provided by clusters. The first time you execute a cell in a notebook, you will be prompted to attach to a cluster if one is not already attached.

Attach a cluster to this notebook now by clicking the dropdown near the top-left corner of this page. Select the cluster you created previously. This will clear the execution state of the notebook and connect the notebook to the selected cluster.

Note that the dropdown menu provides the option of starting or restarting the cluster as needed. You can also detach and re-attach to a cluster in a single movement. This is useful for clearing the execution state when needed.

# Using %run

Complex projects of any type can benefit from the ability to break them down into simpler, reusable components.

In the context of Databricks notebooks, this facility is provided through the **`%run`** magic command.

When used this way, variables, functions and code blocks become part of the current programming context.

Consider this example:

**`Notebook_A`** has four commands:
  1. **`name = "John"`**
  2. **`print(f"Hello {name}")`**
  3. **`%run ./Notebook_B`**
  4. **`print(f"Welcome back {full_name}`**

**`Notebook_B`** has only one commands:
  1. **`full_name = f"{name} Doe"`**

If we run **`Notebook_B`** it will fail to execute becaues the variable **`name`** is not defined in **`Notebook_B`**

Likewise, one might think that **`Notebook_A`** would fail becase it uses the variable **`full_name`** which is likewise not defined in **`Notebook_A`**, but it doesn't!

What actually happens is that the two notebooks are merged together as we see below and **then** executed:
1. **`name = "John"`**
2. **`print(f"Hello {name}")`**
3. **`full_name = f"{name} Doe"`**
4. **`print(f"Welcome back {full_name}`**

And thus providing the expected behavior:
* **`Hello John`**
* **`Welcome back John Doe`**

The folder that contains this notebook contains a subfolder named **`ExampleSetupFolder`**, which in turn contains a notebook called **`example-setup`**. 

This simple notebook declares the variable **`my_name`**, sets it to **`None`** and then creates a DataFrame called **`example_df`**. 

Open the example-setup notebook and modify it so that name is not **`None`** but rather your name (or anyone's name) enclosed in quotes, and so that the following two cells execute without throwing an **`AssertionError`**.

In [0]:
%run ./ExampleSetupFolder/example-setup

In [0]:
assert my_name is not None, "Name is still None"
print(my_name)

## Run a Python cell

Run the following cell to verify that the **`example-setup`** notebook was executed by displaying the **`example_df`** Dataframe. This table consists of 16 rows of increasing values.

In [0]:
display(example_df)

# Detach and Reattach a Cluster

While attaching to clusters is a fairly common task, sometimes it is useful to detach and re-attach in one single operation. The main side-effect this achieves is clearing the execution state. This can be useful when you want to test cells in isolation, or you simply want to reset the execution state.

Revisit the cluster dropdown. In the menu item representing the currently attached cluster, select the **Detach & Re-attach** link.

Notice that the output from the cell above remains since results and execution state are unrelated, but the execution state is cleared. This can be verified by attempting to re-run the cell above. This fails, since the **`example_df`** variable has been cleared, along with the rest of the state.

# Change Language

Notice that the default language for this notebook is set to Python. Change this by clicking the **Python** button to the right of the notebook name. Change the default language to SQL.

Notice that the Python cells are automatically prepended with a <strong><code>&#37;python</code></strong> magic command to maintain validity of those cells. Notice that this operation also clears the execution state.

# Create a Markdown Cell

Add a new cell below this one. Populate with some Markdown that includes at least the following elements:
* A header
* Bullet points
* A link (using your choice of HTML or Markdown conventions)

## Run a SQL cell

Run the following cell to query a Delta table using SQL. This executes a simple query against a table is backed by a Databricks-provided example dataset included in all DBFS installations.

In [0]:
%sql
SELECT * FROM delta.`/databricks-datasets/nyctaxi-with-zipcodes/subsampled`

Execute the following cell to view the underlying files backing this table.

In [0]:
files = dbutils.fs.ls("/databricks-datasets/nyctaxi-with-zipcodes/subsampled")
display(files)

# Review Changes

Assuming you have imported this material into your workspace using a Databricks Repo, open the Repo dialog by clicking the **`published`** branch button at the top-left corner of this page. You should see three changes:
1. **Removed** with the old notebook name
1. **Added** with the new notebook name
1. **Modified** for creating a markdown cell above

Use the dialog to revert the changes and restore this notebook to its original state.

## Wrapping Up

By completing this lab, you should now feel comfortable manipulating notebooks, creating new cells, and running notebooks within notebooks.

-sandbox
&copy; 2022 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="https://help.databricks.com/">Support</a>