# Welcome to PixieDust

This notebook features an introduction to PixieDust, the notebook extension that makes data visualization easy. 

*This ToC only appears for the local version, where hrefs will work*
1. PixieDust Part 1 - Easy Visualization (this one)
1. [PixieDust Part 2 - Working with External Data]()
1. ...
1. [PixieDust - Contribute]()

## Get started

Load up the [PixieDust documentation](https://ibm-cds-labs.github.io/pixiedust/), and if you're new to notebooks, you should probably orient yourself to [notebook basics](http://nbviewer.jupyter.org/github/jupyter/notebook/blob/master/docs/source/examples/Notebook/Notebook%20Basics.ipynb). To use this notebook, all you really need to do is run cells; **Shift + Enter** will run the cell your cursor is in.

In [19]:
# Make sure you have the latest version of PixieDust installed on your system
# Only run this cell if you did _not_ install PixieDust from source
# To confirm you have the latest, uncomment the next line and run this cell
#!pip install --user --upgrade pixiedust

Now that you have PixieDust installed and up-to-date on your system, you need to import it into this notebook. This is the last dependency before you can play with PixieDust.

In [26]:
# Run this cell
import pixiedust

Once you see the success message output from running the previous cell, you're all set.

## Behold, display()!

In the next cell, we'll build a very simple dataset and store it in a variable. 

In [21]:
#build a SQL context for a Spark dataframe 
sqlContext=SQLContext(sc) 
#create Spark dataframe, and assign it to the variable `df`
df = sqlContext.createDataFrame(
[("Green", 75),
 ("Blue", 25)],
["Colors","%"])

The data in the variable we just created is ready to be displayed, without any code other than the call to `display()`.

In [22]:
# Run this cell to display the dataframe above as a pie chart
display(df)

After running the cell above, you should have seen a **Spark dataframe displayed as a pie chart**, along with some controls to tweak the display. This dataframe wasn't all that... interesting, so the controls aren't all that interesting. 

In the next cell, we'll pass more interesting data to `display()`, which will also offer more advanced controls.

In [23]:
df2 = sqlContext.createDataFrame(
[(2010, 'Camping Equipment', 3),
 (2010, 'Golf Equipment', 1),
 (2010, 'Mountaineering Equipment', 1),
 (2010, 'Outdoor Protection', 2),
 (2010, 'Personal Accessories', 2),
 (2011, 'Camping Equipment', 4),
 (2011, 'Golf Equipment', 5),
 (2011, 'Mountaineering Equipment',2),
 (2011, 'Outdoor Protection', 4),
 (2011, 'Personal Accessories', 2),
 (2012, 'Camping Equipment', 5),
 (2012, 'Golf Equipment', 5),
 (2012, 'Mountaineering Equipment', 3),
 (2012, 'Outdoor Protection', 5),
 (2012, 'Personal Accessories', 3),
 (2013, 'Camping Equipment', 8),
 (2013, 'Golf Equipment', 5),
 (2013, 'Mountaineering Equipment', 3),
 (2013, 'Outdoor Protection', 8),
 (2013, 'Personal Accessories', 4)],
["year","category","unique_customers"])

# This time, we'll combine the dataframe and display() in the same cell.
# Run it and we'll dive in 
display(df2)

## display() controls

### Renderers
This chart should have rendered differently than the first one. Firstly, it's a bar chart, not a pie chart. But also, it was rendered by a different renderer: [Bokeh](http://bokeh.pydata.org/en/0.10.0/index.html). (The first one was rendered by [matplotlib](http://matplotlib.org/).)

To toggle between these renderers, use the `Renderers` control at top right of the display output. Other things to note:
1. [Bokeh](http://bokeh.pydata.org/en/0.10.0/index.html) is interactive; play with the controls along the top of the chart, e.g., zoom, save
1. [Matplotlib](http://matplotlib.org/) is static; you can save the image as a PNG

### Chart options

1. **Chart types**: At top left, you should see an option to display the dataframe as a table. You should also see a dropdown menu with other chart options. In this menu, you can toggle between different types of charts, including bar charts, pie charts, scatter plots, and so on.
1. **Options**: Click the `Options` button to explore other display configurations; e.g., clustering

## Loading External Data
So far, we've worked with hard-coded data. Good for demo purposes, but not what most people will want to analyze. Now, let's load external data from an addressable `URL`.

In [24]:
# PixieDust's API offers a bunch of commands
# pixiedust.sampledata() makes it possible to load a CSV; e.g., a CSV on Github
df3 = pixiedust.sampleData("https://github.com/ibm-cds-labs/open-data/raw/master/cars/cars.csv")
display(df3)

You should see a scatterplot above, rendered by Bokeh. Look at the `Renderer` menu at top right. If you see an option for **Seaborn**, give it a try. If you don't see it, it's not installed on your system. No problem, just install it by running the next cell.

In [20]:
# To install Seaborn, uncomment the next line, and then run this cell
#!pip install --user seaborn

*If you installed Seaborn, you'll need to also restart your notebook kernel, and run the cell to `import pixiedust` again. Find **Restart** in the **Kernel** menu above.*

## Next steps
By now you've seen a few ways to visualize data with notebooks and PixieDust. There's much more to explore:

1. asdf (this notebook)
1. [asdf]()
1. [asdf]()
1. [asdf]()
1. [asdf]()
1. [asdf]()

---

Advanced concepts, for another notebook?

metadata
PixieDust display is automatically capturing your selection in the metadata hiding behind this cell. You can review this metadata by select the View/Cell Toobar/Edit Metadata. For the cell above, this formats this cell as a pie chart.

```
{
  "pixiedust": {
    "displayParams": {
      "handlerId": "pieChart",
      "keyFields": "Foo",
      "valueFields": "bar",
      "aggregation": "AVG",
      "rowCount": "100"
    }
  }
}
```
the display API and params; ie, piechart hard-coded in