# Work with Notebooks

**Technical Accomplishments:**
- Set the stage for learning on the Databricks platform
- Demonstrate how to develop & execute code within a notebook
- Introduce the Databricks File System (DBFS)
- Introduce `dbutils`
- Review the various "Magic Commands"
- Review various built-in commands that facilitate working with the notebooks

### Feeling Lost?
The [Databricks Unified Support Portal](https://help.databricks.com/s/) is a great place to search forums and documentation for Databricks and Spark.

Databricks also offers [multiple tiers for dedicated support](https://databricks.com/support).

-sandbox
##![Spark Logo Tiny](https://files.training.databricks.com/images/wiki-book/general/logo_spark_tiny.png) Scala, Python, R, SQL

* Each notebook is tied to a specific language: **Scala**, **Python**, **SQL** or **R**
* Run the cell below using one of the following options:
  * **CTRL+ENTER** or **CMD+RETURN**
  * **SHIFT+ENTER** or **SHIFT+RETURN** to run the cell and move to the next one
  * Using **Run Cell**, **Run All Above** or **Run All Below** as seen here<br/><img style="box-shadow: 5px 5px 5px 0px rgba(0,0,0,0.25); border: 1px solid rgba(0,0,0,0.25);" src="https://files.training.databricks.com/images/notebook-cell-run-cmd.png"/>

Feel free to tweak the code below if you like:

In [0]:
print("I'm running Python!")

##![Spark Logo Tiny](https://files.training.databricks.com/images/wiki-book/general/logo_spark_tiny.png) Magic Commands
* Magic Commands are specific to the Databricks notebooks
* They are very similar to Magic Commands found in comparable notebook products
* These are built-in commands that do not apply to the notebook's default language
* A single percent (%) symbol at the start of a cell identifies a Magic Commands

### Magic Command: &percnt;sh
For example, **&percnt;sh** allows us to execute shell commands on the driver

In [0]:
%sh ps | grep 'java'

### Magic Command: Other Languages
Additional Magic Commands allow for the execution of code in languages other than the notebook's default:
* **&percnt;python**
* **&percnt;scala**
* **&percnt;sql**
* **&percnt;r**

In [0]:
%scala

println("Hello Scala!")

In [0]:
%python

print("Hello Python!")

In [0]:
%r

print("Hello R!", quote=FALSE)

In [0]:
%sql

select "Hello SQL!"

Hello SQL!
Hello SQL!


### Magic Command: &percnt;md

Our favorite Magic Command **&percnt;md** allows us to render Markdown in a cell:
* Double click this cell to begin editing it
* Then hit `Esc` to stop editing

# Title One
## Title Two
### Title Three

This is a test of the emergency broadcast system. This is only a test.

This is text with a **bold** word in it.

This is text with an *italicized* word in it.

This is an ordered list
0. once
0. two
0. three

This is an unordered list
* apples
* peaches
* bananas

Links/Embedded HTML: <a href="http://bfy.tw/19zq" target="_blank">What is Markdown?</a>

Images:
![Spark Engines](https://files.training.databricks.com/images/Apache-Spark-Logo_TM_200px.png)

And of course, tables:

| Name  | Age | Sex    |
|-------|-----|--------|
| Tom   | 32  | Male   |
| Mary  | 29  | Female |
| Dick  | 73  | Male   |
| Sally | 55  | Female |

### Magic Command: &percnt;run
* You can run a notebook from another notebook by using the Magic Command **%run**
* All variables & functions defined in that other notebook will become available in your current notebook

For example, The following cell should fail to execute because the variable `username` has not yet been declared:

In [0]:
print("username: " + username)

But we can declare it and a handful of other variables and functions buy running this cell:

In [0]:
%run "./Includes/Classroom-Setup"

In this case, the notebook `Classroom Setup` declares the following:
  * The variable `username`
  * The variable `userhome`
  * The function `assertSparkVersion(..)`
  * And others...

In [0]:
print("username: " + username)
print("userhome: " + userhome)

We will use those variables and functions throughout this class.

One of the other things `Classroom Setup` does for us is to mount all the datasets needed for this class into the Databricks File System.

##![Spark Logo Tiny](https://files.training.databricks.com/images/wiki-book/general/logo_spark_tiny.png) Databricks File System - DBFS
* DBFS is a layer over a cloud-based object store
* Files in DBFS are persisted to the object store
* The lifetime of files in the DBFS are **NOT** tied to the lifetime of our cluster

### Mounting Data into DBFS
* Mounting other object stores into DBFS gives Databricks users access via the file system
* This is just one of many techniques for pulling data into Spark
* The datasets needed for this class have already been mounted for us with the call to `%run "../Includes/Classroom Setup"`
* We will confirm that in just a few minutes

See also <a href="https://docs.azuredatabricks.net/user-guide/dbfs-databricks-file-system.html" target="_blank">Databricks File System - DBFS</a>.

### Databricks Utilities - dbutils
* You can access the DBFS through the Databricks Utilities class (and other file IO routines).
* An instance of DBUtils is already declared for us as `dbutils`.
* For in-notebook documentation on DBUtils you can execute the command `dbutils.help()`.

See also <a href="https://docs.azuredatabricks.net/user-guide/dbutils.html" target="_blank">Databricks Utilities - dbutils</a>

In [0]:
dbutils.help()

Additional help is available for each sub-utility:
* `dbutils.fs.help()`
* `dbutils.meta.help()`
* `dbutils.notebook.help()`
* `dbutils.widgets.help()`

Let's take a look at the file system utilities, `dbutils.fs`

In [0]:
dbutils.fs.help()

### dbutils.fs.mounts()
* As previously mentioned, all our datasets should already be mounted
* We can use `dbutils.fs.mounts()` to verify that assertion
* This method returns a collection of `MountInfo` objects, one for each mount

In [0]:
mounts = dbutils.fs.mounts()

for mount in mounts:
  print(mount.mountPoint + " >> " + mount.source)

print("-"*80)

### dbutils.fs.ls(..)
* And now we can use `dbutils.fs.ls(..)` to view the contents of that mount
* This method returns a collection of `FileInfo` objects, one for each item in the specified directory

See also <a href="https://docs.azuredatabricks.net/api/latest/dbfs.html#dbfsfileinfo" target="_blank">FileInfo</a>

In [0]:
files = dbutils.fs.ls("/mnt/training/")

for fileInfo in files:
  print(fileInfo.path)

print("-"*80)

### display(..)

Besides printing each item returned from `dbutils.fs.ls(..)` we can also pass that collection to another Databricks specific command called `display(..)`.

In [0]:
files = dbutils.fs.ls("/mnt/training/")

display(files)

path,name,size
dbfs:/mnt/training/301/,301/,0
dbfs:/mnt/training/Chicago-Crimes-2018.csv,Chicago-Crimes-2018.csv,5201668
dbfs:/mnt/training/City-Data.delta/,City-Data.delta/,0
dbfs:/mnt/training/City-Data.parquet/,City-Data.parquet/,0
dbfs:/mnt/training/EDGAR-Log-20170329/,EDGAR-Log-20170329/,0
dbfs:/mnt/training/StatLib/,StatLib/,0
dbfs:/mnt/training/UbiqLog4UCI/,UbiqLog4UCI/,0
dbfs:/mnt/training/_META/,_META/,0
dbfs:/mnt/training/adventure-works/,adventure-works/,0
dbfs:/mnt/training/airbnb/,airbnb/,0


The `display(..)` command is overloaded with a lot of other capabilities:
* Presents up to 1000 records.
* Exporting data as CSV.
* Rendering a multitude of different graphs.
* Rendering geo-located data on a world map.

And as we will see later, it is also an excellent tool for previewing our data in a notebook.

### Magic Command: &percnt;fs

There is at least one more trick for looking at the DBFS.

It is a wrapper around `dbutils.fs` and it is the Magic Command known as **&percnt;fs**.

The following call is equivalent to the previous call, `display( dbutils.fs.ls("/mnt/training") )` - there is no real difference between the two.

In [0]:
%fs ls /mnt/training

path,name,size
dbfs:/mnt/training/301/,301/,0
dbfs:/mnt/training/Chicago-Crimes-2018.csv,Chicago-Crimes-2018.csv,5201668
dbfs:/mnt/training/City-Data.delta/,City-Data.delta/,0
dbfs:/mnt/training/City-Data.parquet/,City-Data.parquet/,0
dbfs:/mnt/training/EDGAR-Log-20170329/,EDGAR-Log-20170329/,0
dbfs:/mnt/training/StatLib/,StatLib/,0
dbfs:/mnt/training/UbiqLog4UCI/,UbiqLog4UCI/,0
dbfs:/mnt/training/_META/,_META/,0
dbfs:/mnt/training/adventure-works/,adventure-works/,0
dbfs:/mnt/training/airbnb/,airbnb/,0


##![Spark Logo Tiny](https://files.training.databricks.com/images/wiki-book/general/logo_spark_tiny.png) Learning More

We like to encourage you to explore the documentation to learn more about the various features of the Databricks platform and notebooks.
* <a href="https://docs.azuredatabricks.net/user-guide/index.html" target="_blank">User Guide</a>
* <a href="https://docs.databricks.com/user-guide/getting-started.html" target="_blank">Getting Started with Databricks</a>
* <a href="https://docs.azuredatabricks.net/user-guide/notebooks/index.html" target="_blank">User Guide / Notebooks</a>
* <a href="https://docs.databricks.com/user-guide/notebooks/index.html#importing-notebooks" target="_blank">Importing notebooks - Supported Formats</a>
* <a href="https://docs.azuredatabricks.net/administration-guide/index.html" target="_blank">Administration Guide</a>
* <a href="https://docs.databricks.com/user-guide/clusters/index.html" target="_blank">Cluster Configuration</a>
* <a href="https://docs.azuredatabricks.net/api/index.html" target="_blank">REST API</a>
* <a href="https://docs.azuredatabricks.net/release-notes/index.html" target="_blank">Release Notes</a>
* <a href="https://docs.azuredatabricks.net" target="_blank">And much more!</a>