<div style="background:#F5F7FA; height:100px; padding: 2em; font-size:14px;">
<span style="font-size:18px;color:#152935;">Want to do more?</span><span style="border: 1px solid #3d70b2;padding: 15px;float:right;margin-right:40px; color:#3d70b2; "><a href="https://ibm.co/wsnotebooks" target="_blank" style="color: #3d70b2;text-decoration: none;">Sign Up</a></span><br>
<span style="color:#5A6872;"> Try out this notebook with your free trial of IBM Watson Studio.</span>
</div>

# Welcome to PixieDust

This notebook features an introduction to [PixieDust](https://ibm-watson-data-lab.github.io/pixiedust/index.html), the Python library that makes data visualization easy. 

This notebook runs on Python 2.7 and 3.5, with Spark 2.1.

## <a id="toc"></a>Table of Contents

 * [Get started](#part_one)
 * [Load text data from remote sources](#part_two)
 * [Mix Scala and Python on the same notebook](#part_three)
 * [Add Spark packages and run inside your notebook](#part_four)
 * [Stash your data](#part_five)
 * [Contribute](#contribute)


<hr>

# <a id="part_one"></a>Get started

This introduction is pretty straightforward, but it wouldn't hurt to load up the [PixieDust documentation](https://ibm-watson-data-lab.github.io/pixiedust/) so it's handy. 

New to notebooks? Don't worry. Here's all you need to know to run this introduction:

1. Make sure this notebook is in Edit mode
1. To run code cells, put your cursor in the cell and press **Shift + Enter**.
1. The cell number will change to **[\*]** to indicate that it is currently executing. (When starting with notebooks, it's best to run cells in order, one at a time.)

In [None]:
# To confirm you have the latest version of PixieDust on your system, run this cell
!pip install -U --no-deps pixiedust

Now that you have PixieDust installed and up-to-date on your system, you need to import it into this notebook. This is the last dependency before you can play with PixieDust.

In [2]:
import pixiedust

Pixiedust database opened successfully


If you get a message telling you that you're not running the latest version of PixieDust, restart the kernel from the **Kernel** menu and rerun the `import pixiedust` command. (Any time you restart the kernel, rerun the `import pixiedust` command.)

## Behold, display()

In the next cell, build a simple dataset and store it in a variable. 

In [3]:
# Build the SQL context required to create a Spark dataframe 
sqlContext=SQLContext(sc) 
# Create the Spark dataframe, passing in some data, and assign it to a variable
df = spark.createDataFrame(
[("Green", 75),
 ("Blue", 25)],
["Colors","%"])

The data in the variable `df` is ready to be visualized, without any further code other than the call to `display()`.

In [4]:
# display the dataframe above as a pie chart
display(df)

After running the cell above, you should see a Spark DataFrame displayed as a **pie chart**, along with some controls to tweak the display. All that came from passing the DataFrame variable to `display()`.

In the next cell, you'll pass more interesting data to `display()`, which will also offer more advanced controls.

In [5]:
# create another DataFrame, in a new variable
df2 = spark.createDataFrame(
[(2010, 'Camping Equipment', 3),
 (2010, 'Golf Equipment', 1),
 (2010, 'Mountaineering Equipment', 1),
 (2010, 'Outdoor Protection', 2),
 (2010, 'Personal Accessories', 2),
 (2011, 'Camping Equipment', 4),
 (2011, 'Golf Equipment', 5),
 (2011, 'Mountaineering Equipment',2),
 (2011, 'Outdoor Protection', 4),
 (2011, 'Personal Accessories', 2),
 (2012, 'Camping Equipment', 5),
 (2012, 'Golf Equipment', 5),
 (2012, 'Mountaineering Equipment', 3),
 (2012, 'Outdoor Protection', 5),
 (2012, 'Personal Accessories', 3),
 (2013, 'Camping Equipment', 8),
 (2013, 'Golf Equipment', 5),
 (2013, 'Mountaineering Equipment', 3),
 (2013, 'Outdoor Protection', 8),
 (2013, 'Personal Accessories', 4)],
["year","category","unique_customers"])

# This time, we've combined the dataframe and display() call in the same cell
# Run this cell 
display(df2)

## display() controls

### Renderers
The chart above, like the first one, is rendered by matplotlib. With PixieDust, you have other options. To toggle between renderers, use the `Renderers` control at top right of the display output:
1. [Bokeh](http://bokeh.pydata.org/en/0.10.0/index.html) is interactive; play with the controls along the top of the chart, for example, zoom and save
1. [Matplotlib](http://matplotlib.org/) is static; you can save the image as a PNG

### Chart options

1. **Chart types**: At top left, you should see an option to display the dataframe as a table. You should also see a dropdown menu with other chart options, including bar charts, pie charts, scatter plots, and so on.
1. **Options**: Click the `Options` button to explore other display configurations; for example, clustering and aggregation.

Here's more on [customizing `display()` output](https://ibm-watson-data-lab.github.io/pixiedust/displayapi.html).

## Load External Data
So far, you've worked with data hard-coded into our notebook. Now, load external data (CSV) from a URL.

In [None]:
# load a CSV with pixiedust.sampleData()
df3 = pixiedust.sampleData("https://github.com/ibm-watson-data-lab/open-data/raw/master/cars/cars.csv")
display(df3)

You should see a scatterplot above, rendered again by matplotlib. Find the `Renderer` menu at top-right. You should see options for **Bokeh** and **Seaborn**. If you don't see Seaborn, it's not installed on your system. No problem, just install it by running the next cell.

In [7]:
# To install Seaborn, uncomment the next line, and then run this cell
#!pip install --user seaborn

*If you installed Seaborn, you'll need to also restart your notebook kernel, and run the cell to `import pixiedust` again. Find **Restart** in the **Kernel** menu above.*

End of chapter. [Return to table of contents](#toc)
<hr>


# <a id="part_two"></a>Load text data from remote sources


Data files commonly reside in remote sources, such as such as public or private market places or GitHub repositories. You can load comma separated value (csv) data files using Pixiedust's `sampleData` method. 

## Prerequisites

If you haven't already, import PixieDust. Follow the instructions in [Get started](#part_one).

## Load  data

To load a data set, run `pixiedust.sampleData` and specify the data set URL:

In [8]:
homes = pixiedust.sampleData("https://openobjectstore.mybluemix.net/misc/milliondollarhomes.csv")

Downloading 'https://openobjectstore.mybluemix.net/misc/milliondollarhomes.csv' from https://openobjectstore.mybluemix.net/misc/milliondollarhomes.csv
Downloaded 102051 bytes
Creating pySpark DataFrame for 'https://openobjectstore.mybluemix.net/misc/milliondollarhomes.csv'. Please wait...
Loading file using 'SparkSession'
Successfully created pySpark DataFrame for 'https://openobjectstore.mybluemix.net/misc/milliondollarhomes.csv'


The `pixiedust.sampleData` method loads the data into an [Apache Spark DataFrame](https://spark.apache.org/docs/latest/sql-programming-guide.html#datasets-and-dataframes), which you can inspect and visualize using `display()`.

## Inspect and preview the loaded data

To inspect the automatically inferred schema and preview a small subset of the data, you can use the _DataFrame Table_ view, as shown in this preconfigured example: 

In [9]:
display(homes)

PROPERTY TYPE,ADDRESS,CITY,STATE,ZIP,PRICE,BEDS,BATHS,LOCATION,SQFT,LOT SIZE,YEAR BUILT,DAYS ON MARKET,URL,SOURCE,LISTING ID,LATITUDE,LONGITUDE
Condo/Co-op,30 Winchester St #3,Brookline,MA,2446,1400000,3.0,3.0,Coolidge Corner,1504.0,,1915.0,66.0,http://www.redfin.com/MA/Brookline/30-Winchester-St-02446/unit-3/home/105251020,MLS PIN,58480309.0,42.3420632,-71.1257602
Single Family Residential,2 Wellington Way,Bedford,MA,1730,1150000,4.0,3.5,Wellington Way,3531.0,43560.0,2012.0,58.0,http://www.redfin.com/MA/Bedford/2-Wellington-Way-01730/home/41363649,MLS PIN,59806880.0,42.5029123,-71.2849657
Condo/Co-op,1 Franklin St #1008,Boston,MA,2110,2049000,2.0,2.0,Midtown,1476.0,,2016.0,59.0,http://www.redfin.com/MA/Boston/1-Franklin-St-02108/unit-1008/home/109481369,MLS PIN,62725868.0,42.35631,-71.05945
Single Family Residential,1 Wilshire Rd,Newbury,MA,1951,2225000,4.0,5.5,Wilshire Road,4214.0,18138.0,2014.0,58.0,http://www.redfin.com/MA/Newbury/1-Wilshire-Rd-01951/home/105539600,MLS PIN,59011440.0,42.7796754,-70.8476708
Townhouse,170 Harvard St Unit 1,Newton,MA,2460,1100000,4.0,3.5,Newtonville,2388.0,10089.0,1910.0,66.0,http://www.redfin.com/MA/Newton/170-Harvard-St-02460/unit-1/home/109313528,MLS PIN,62550577.0,42.3468986,-71.2005455
Single Family Residential,1 Jerusalem Ln,Cohasset,MA,2025,1437000,4.0,3.5,Jerusalem Road/Atlantic Avenue/Jerusalem Lane Cul De Sac,2724.0,9443.0,2000.0,66.0,http://www.redfin.com/MA/Cohasset/1-Jerusalem-Ln-02025/home/8835487,MLS PIN,60777396.0,42.259862,-70.811424
Single Family Residential,34 Crestwood Rd,Marblehead,MA,1945,2997000,1.0,5.0,,8509.0,34400.0,2012.0,,http://www.redfin.com/MA/Marblehead/34-Crestwood-Rd-01945/home/11768413,,,42.501777,-70.877002
Single Family Residential,217 Forest St,Winchester,MA,1890,1155000,4.0,3.5,Muraco School District,3779.0,9234.0,2013.0,72.0,http://www.redfin.com/MA/Winchester/217-Forest-St-01890/home/110057079,MLS PIN,56198065.0,42.4714541,-71.1156268
Single Family Residential,1 Denny St,Westborough,MA,1581,1100000,6.0,3.5,Westborough,3394.0,239580.0,1917.0,79.0,http://www.redfin.com/MA/Westborough/1-Denny-St-01581/home/16634032,MLS PIN,60054100.0,42.259306,-71.611866
Single Family Residential,23 Laurel Hill Ln,Winchester,MA,1890,1475000,5.0,4.0,Winchester,4037.0,12475.0,2014.0,80.0,http://www.redfin.com/MA/Winchester/23-Laurel-Hill-Ln-01890/home/11439968,MLS PIN,58047231.0,42.4724111,-71.1196572


## Simple visualization using bar charts

With PixieDust `display()`, you can visually explore the loaded data using built-in charts, such as, bar charts, line charts, scatter plots, or maps.

To explore a data set:
* choose the desired chart type from the drop down
* configure chart options
* configure display options

You can analyze the average home price for each city by choosing: 
* chart type: bar chart
* chart options
 * _Options > Keys_: `CITY`
 * _Options > Values_: `PRICE` 
 * _Options > Aggregation_: `AVG`
 
Run the next cell to review the results. 

In [10]:
display(homes)

## Explore the data

You can change the display **Options** so you can continue to explore the loaded data set without having to pre-process the data. 

For example, change: 
* _Options > Key_ to `YEAR_BUILT` and 
* _Options > aggregation_ to `COUNT` 

Now you can find out how old the listed properties are:

In [11]:
display(homes)

## Use sample data sets

PixieDust comes with a set of curated data sets that you can use get familiar with the different chart types and options. 

Type `pixiedust.sampleData()` to display those data sets.

In [12]:
pixiedust.sampleData()

Id,Name,Topic,Publisher
1,Car performance data,Transportation,IBM
2,"Sample retail sales transactions, January 2009",Economy & Business,IBM Cloud Data Services
3,Total population by country,Society,IBM Cloud Data Services
4,GoSales Transactions for Naive Bayes Model,Leisure,IBM
5,Election results by County,Society,IBM
6,"Million dollar home sales in Massachusetts, USA Feb 2017 through Jan 2018",Economy & Business,Redfin.com
7,"Boston Crime data, 2-week sample",Society,City of Boston


The homes sales data set you loaded earlier is one of the samples. Therefore, you could have loaded it by specifying the displayed data set id as parameter: `home = pixiedust.sampleData(6)`

If your data isn't stored in csv files, you can load it into a DataFrame from any supported Spark [data source](https://spark.apache.org/docs/latest/sql-programming-guide.html#data-sources). See [these Python code snippets](https://apsportal.ibm.com/docs/content/analyze-data/python_load.html) for more information.

End of chapter. [Return to table of contents](#toc)
<hr>

# <a id="part_three"></a>Mix Scala and Python on the same notebook

Python has a rich ecosystem of modules including plotting with matplotlib, data structure and analysis with pandas, machine learning, and natural language processing. However, data scientists working with Spark might occasionally need to call out code written in Scala or Java, for example, one of the hundreds of libraries available on `spark-packages.org`. Unfortunately, Jupyter Python notebooks do not currently provide a way to call out Scala or Java code. As a result, a typical workaround is to first use a Scala notebook to run the Scala code, persist the output somewhere like a Hadoop Distributed File System, create another Python notebook, and re-load the data. This is obviously inefficent and awkward.

As you'll see in this notebook, PixieDust provides a solution to this problem by letting users write and run scala code directly in its own cell. It also lets variables be shared between Python and Scala and vice-versa.

## Define a few simple variables in Python

In [13]:
pythonString = "Hello From Python"
pythonInt = 20

## Import the PixieDust module

If you haven't already, import PixieDust. Follow the instructions in [Get started](#part_one).

## Use the Python variables in Scala code
PixieDust makes all variables defined in the Python scope available to Scala using the following rules:

* Primitive types are mapped to the Scala equivalent: for example, Python Strings become Scala Strings, Python Integer become Scala Integer, and so on.
* Some complex types are mapped as follows: PySpark SQLContext, DataFrame, RDD are mapped to their Scala Spark equivalents. Python GraphFrames mapped to their Scala equivalents. PixieDust will add more mapping as needed.
* Python classes are currently not converted and therefore cannot be used in Scala.

The PixieDust Scala Bridge requires the environment variable SCALA_HOME to be defined and pointing at a Scala install:

In [14]:
%%scala
print(pythonString)
print(pythonInt + 10)

Hello From Python
30


## Define a variable in Scala and use it in Python
In this section, you'll create a variable in Scala and use it in Python.

**Note:** only variables that are prefixed with two underscores ( `__` ) are available for use in Python.

In [15]:
%%scala
val __scalaString = "Hello From Scala"
val __scalaInt = 5

In [16]:
# using Scala variable in Python
print __scalaString
print __scalaInt + 10

Hello From Scala
15


In this chapter, you've seen how easy it is to intersperse Scala and Python in the same notebook.
Continue exploring this powerful functionality by using more complex Scala libraries!

End of chapter. [Return to table of contents](#toc)
<hr>

# <a id="part_four"></a> Add Spark packages and run inside your notebook

PixieDust PackageManager helps you install spark packages inside your notebook. This is especially useful when you're working in a hosted cloud environment without access to configuration files. Use PixieDust Package Manager to install:

- a spark package from `spark-packages.org`
- a package from the Maven search repository
- a jar file directly from URL

> **Note:** After you install a package, you must restart the kernel and import Pixiedust again.


## View list of packages
To see the packages installed on your system, run the following command:

In [17]:
import pixiedust
pixiedust.printAllPackages()

graphframes:graphframes:0.5.0-spark2.1-s_2.11 => /gpfs/fs01/user/sf9b-795b2b888c32b6-772f4e1cd93d/data/libs/graphframes-0.5.0-spark2.1-s_2.11.jar
com.typesafe.scala-logging:scala-logging-api_2.11:2.1.2 => /gpfs/fs01/user/sf9b-795b2b888c32b6-772f4e1cd93d/data/libs/scala-logging-api_2.11-2.1.2.jar
com.typesafe.scala-logging:scala-logging-slf4j_2.11:2.1.2 => /gpfs/fs01/user/sf9b-795b2b888c32b6-772f4e1cd93d/data/libs/scala-logging-slf4j_2.11-2.1.2.jar
direct.download:https://github.com/ibm-watson-data-lab/spark.samples/raw/master/dist/streaming-twitter-assembly-1.6.jar:1.0 => /gpfs/fs01/user/sf9b-795b2b888c32b6-772f4e1cd93d/data/libs/streaming-twitter-assembly-1.6.jar


## Add a package from spark-packages.org

The command you use to install GraphFrames depends on your Spark version.

In [18]:
if sc.version.startswith('1.6.'):  # Spark 1.6
    pixiedust.installPackage("graphframes:graphframes:0.5.0-spark1.6-s_2.11")
elif sc.version.startswith('2.'):  # Spark 2.1, 2.0
    pixiedust.installPackage("graphframes:graphframes:0.5.0-spark2.1-s_2.11")


pixiedust.installPackage("com.typesafe.scala-logging:scala-logging-api_2.11:2.1.2")
pixiedust.installPackage("com.typesafe.scala-logging:scala-logging-slf4j_2.11:2.1.2")

Package already installed: graphframes:graphframes:0.5.0-spark2.1-s_2.11
Package already installed: com.typesafe.scala-logging:scala-logging-api_2.11:2.1.2
Package already installed: com.typesafe.scala-logging:scala-logging-slf4j_2.11:2.1.2


<pixiedust.packageManager.package.Package at 0x7f83f5cdee50>

> Note: After you install a package, you must restart the kernel and import Pixiedust again. You'll also need to run pixiedust.installPackage again before that package can be used. You can do this by running the two code cells above again after you have restarted the kernel.

## View the updated list of packages

Run `printAllPackages` again to see that GraphFrames is now in your list:

In [19]:
pixiedust.printAllPackages()

graphframes:graphframes:0.5.0-spark2.1-s_2.11 => /gpfs/fs01/user/sf9b-795b2b888c32b6-772f4e1cd93d/data/libs/graphframes-0.5.0-spark2.1-s_2.11.jar
com.typesafe.scala-logging:scala-logging-api_2.11:2.1.2 => /gpfs/fs01/user/sf9b-795b2b888c32b6-772f4e1cd93d/data/libs/scala-logging-api_2.11-2.1.2.jar
com.typesafe.scala-logging:scala-logging-slf4j_2.11:2.1.2 => /gpfs/fs01/user/sf9b-795b2b888c32b6-772f4e1cd93d/data/libs/scala-logging-slf4j_2.11-2.1.2.jar
direct.download:https://github.com/ibm-watson-data-lab/spark.samples/raw/master/dist/streaming-twitter-assembly-1.6.jar:1.0 => /gpfs/fs01/user/sf9b-795b2b888c32b6-772f4e1cd93d/data/libs/streaming-twitter-assembly-1.6.jar


## Display a GraphFrames data sample

Even if GraphFrames is already installed, running the install command loads the Python that comes along with the package. Run the following cell and PixieDust displays a sample graph data set. On the upper left of the display, click the table dropdown and switch between views of nodes and edges. 

In [21]:
from graphframes import GraphFrame

try:
    sqlcontext = SparkSession.builder.getOrCreate()
except:
    sqlcontext = SQLContext(sc)

# Vertex DataFrame
v = sqlcontext.createDataFrame([
  ("a", "Alice", 34),
  ("b", "Bob", 36),
  ("c", "Charlie", 30),
  ("d", "David", 29),
  ("e", "Esther", 32),
  ("f", "Fanny", 36),
  ("g", "Gabby", 60)
], ["id", "name", "age"])

# Edge DataFrame
e = sqlcontext.createDataFrame([
  ("a", "b", "friend"),
  ("b", "c", "follow"),
  ("c", "b", "follow"),
  ("f", "c", "follow"),
  ("e", "f", "follow"),
  ("e", "d", "friend"),
  ("d", "a", "friend"),
  ("a", "e", "friend")
], ["src", "dst", "relationship"])

# Create a GraphFrame
g = GraphFrame(v, e)

display(g)

## Install from Maven
To install a package from the [Apache Maven search repository](https://maven.apache.org/), visit the project and find the `groupId` and `artifactId` for the package that you want. Enter them in the following installation command.  [See instructions for the installPackage command](https://ibm-cds-labs.github.io/pixiedust/packagemanager.html#install-from-maven-search-repository). For example, the following cell installs Apache Commons: 

In [22]:
pixiedust.installPackage("org.apache.commons:commons-csv:0")

Downloading package org.apache.commons:commons-csv:1.5 to /gpfs/fs01/user/sf9b-795b2b888c32b6-772f4e1cd93d/data/libs/commons-csv-1.5.jar


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Package org.apache.commons:commons-csv:1.5 downloaded successfully
[31mPlease restart Kernel to complete installation of the new package[0m
Successfully added package org.apache.commons:commons-csv:1.5


<pixiedust.packageManager.package.Package at 0x7f85129c9310>

In [None]:
# PT
pixiedust.printAllPackages()

## Install a jar file directly from a URL 
    
To install a jar file that is not packaged in a maven repository, provide its URL. 

In [23]:
pixiedust.installPackage("https://github.com/ibm-watson-data-lab/spark.samples/raw/master/dist/streaming-twitter-assembly-1.6.jar")

Package already installed: https://github.com/ibm-watson-data-lab/spark.samples/raw/master/dist/streaming-twitter-assembly-1.6.jar


<pixiedust.packageManager.package.Package at 0x7f83f5cd0290>

## Follow the tutorial

To understand what you can do with this jar file, read David Taieb's latest [Realtime Sentiment Analysis of Twitter Hashtags with Spark](https://medium.com/ibm-watson-data-lab/real-time-sentiment-analysis-of-twitter-hashtags-with-spark-7ee6ca5c1585#.2iblfu58c) tutorial.

## Uninstall a package

It's just as easy to get rid of a package you installed. Just run the command `pixiedust.uninstallPackage("<<mypackage>>")`. For example, you can uninstall Apache Commons:

In [24]:
pixiedust.uninstallPackage("org.apache.commons:commons-csv:0")

Successfully deleted package org.apache.commons:commons-csv:1.5


# Restart the kernel and import pixiedust

After uninstalling a package the <font color="red">restart kernel</font> and import pixiedust before continuing.

In [1]:
# import pixiedust after restarting kernel
import pixiedust

Pixiedust database opened successfully


End of chapter. [Return to table of contents](#toc)
<hr>

# <a id="part_five"></a> Stash Your Data

With PixieDust, you also have the option to export the data from your notebook to external sources.
The output of the `display` API includes a toolbar that contains a **Download** button.

<img style="margin:10px 0" src="https://ibm-watson-data-lab.github.io/pixiedust/_images/downloadfile.png">




## Stash to Cloudant

You save the data directly into a [Cloudant](https://cloudant.com/) or [CouchDB](https://couchdb.apache.org/) database.

**Prerequisite:** Collect your database connection information: the database host, user name, and password.  
  
If your Cloudant instance was provisioned in [IBM Cloud](https://console.ng.bluemix.net/catalog/services/cloudant-nosql-db/), you can find the connectivity information in the **Service Credentials** tab.

To stash to Cloudant:

1. From the toolbar in the `display` output, click the **Download** button.  
2. Choose **Stash to Cloudant** from the menu. 
3. Click the dropdown to see the list of available connections and select an existing connection or add a new connection:  
    1. Click the **`+`** plus button to add a new connection.
    1. Enter your Cloudant database credentials in JSON format.  
    1. If you are stashing to CouchDB, include the protocol. See the [sample credentials format](#Sample-Credentials-Format) below.
    1. Click **OK**.
    1. Select the new connection.
4. Click **Submit**.


### Sample Credentials Format  

#### CouchDB
```
{
    "name": "local-couchdb-connection",
    "credentials": {
        "username": "couchdbuser",
        "password": "password",
        "protocol": "http",
        "host": "127.0.0.1:5984",
        "port": 5984,
        "url": "http://couchdbuser:password@127.0.0.1:5984"
    }
}
```

#### Cloudant
```
{
    "name": "remote-cloudant-connection",
    "credentials": {
        "username": "username-ibmcloud",
        "password": "password",
        "host": "host-ibmcloud.cloudant.com",
        "port": 443,
        "url": "https://username-ibmcloud:password@host-ibmcloud.cloudant.com"
    }
}
```


## Download as a file

Alternatively, you can choose to save the data set to various file formats (for example, CSV, JSON, XML, and so on).

To save a data set as a file:

1. From the toolbar in the **`display`** output, click the **Download** button.
1. Choose **Download as File**.
1. Choose the desired format.
1. Specify the number of records to download.
    <img style="margin:10px 0" src="https://ibm-watson-data-lab.github.io/pixiedust/_images/save_as.png">
1. Click **OK**.


End of chapter. [Return to table of contents](#toc)
<hr>

# <a id="contribute"></a>Contribute

By now, you've walked through PixieDust's intro notebooks and seen PixieDust in action. If you like what you saw, join [the project](https://github.com/ibm-watson-data-lab/pixiedust)! 

Anyone can get involved. Here are some ways you can [contribute](https://ibm-watson-data-lab.github.io/pixiedust/contribute.html):

 - [Write a visualization](#Write-a-visualization)
 - [Build a renderer](#Build-a-renderer)
 - [Enter an issue](#Enter-an-issue)
 - [Share PixieDust](#Share-PixieDust)
 - [Learn more](#Learn-more)


## Write a visualization

Contribute your own custom visualization. Here's a taste of how it works. 

Run the next 4 cells to do the following:

1. Import PixieDust. 
2. Generate a sample DataFrame. 
3. Create a custom table display option called **NewSample**. 
4. Display the DataFrame and see your new custom option under the **Table** dropdown menu.

This is just one small example you can quickly do within this notebook. [Read how to create a custom visualization](https://ibm-watson-data-lab.github.io/pixiedust/writeviz.html).


In [2]:
import pixiedust

Now, create a simple DataFrame:

In [3]:
sqlContext=SQLContext(sc)
d1 = spark.createDataFrame(
[(2010, 'Camping Equipment', 3),
 (2010, 'Golf Equipment', 1),
 (2010, 'Mountaineering Equipment', 1),
 (2010, 'Outdoor Protection', 2),
 (2010, 'Personal Accessories', 2),
 (2011, 'Camping Equipment', 4),
 (2011, 'Golf Equipment', 5),
 (2011, 'Mountaineering Equipment',2),
 (2011, 'Outdoor Protection', 4),
 (2011, 'Personal Accessories', 2),
 (2012, 'Camping Equipment', 5),
 (2012, 'Golf Equipment', 5),
 (2012, 'Mountaineering Equipment', 3),
 (2012, 'Outdoor Protection', 5),
 (2012, 'Personal Accessories', 3),
 (2013, 'Camping Equipment', 8),
 (2013, 'Golf Equipment', 5),
 (2013, 'Mountaineering Equipment', 3),
 (2013, 'Outdoor Protection', 8),
 (2013, 'Personal Accessories', 4)],
["year","zone","unique_customers"])

The following cell creates a new custom table visualization plugin called **NewSample**:

In [4]:
from pixiedust.display.display import *

class TestDisplay(Display):
    def doRender(self, handlerId):
        self._addHTMLTemplateString(
"""
NewSample Plugin
<table class="table table-striped">
    <thead>                 
        {%for field in entity.schema.fields%}
        <th>{{field.name}}</th>
        {%endfor%}
    </thead>
    <tbody>
        {%for row in entity.take(100)%}
        <tr>
            {%for field in entity.schema.fields%}
            <td>{{row[field.name]}}</td>
            {%endfor%}
        </tr>
        {%endfor%}
    </tbody>
</table>
"""
        )

@PixiedustDisplay()
class TestPluginMeta(DisplayHandlerMeta):
    @addId
    def getMenuInfo(self,entity,dataHandler):
        if entity.__class__.__name__ == "DataFrame":
            return [
                {"categoryId": "Table", "title": "NewSample Table", "icon": "fa-table", "id": "newsampleTest"}
            ]
        else:
            return []
    def newDisplayHandler(self,options,entity):
        return TestDisplay(options,entity)

Next, run `display()` to show the data. Click the **Table** dropdown. You now see **NewSample Table** option, the custom visualization you just created!

In [5]:
display(d1)

**Error?** If you changed the name yourself in cell 3, you might get an error when you try to display. You can fix this by updating metadata in the display() cell. To do so, go to the Jupyter menu above the notebook and choose **View > Cell Toolbar > Edit Metadata**. Then scroll down to the `display(dl)` cell, click its **Edit Metadata** button and change the `handlerID`.

## Build a renderer

PixieDust lets you switch between renderers for charts and maps. We'd love to add more to the list. It's easy to get started. Try the `generate` tool to create a boilerplate renderer using a quick CLI wizard. [Read how to build a renderer](https://ibm-watson-data-lab.github.io/pixiedust/renderer.html).

## Enter an issue

Found a bug? Thought of great enhancement? [Enter an issue](https://github.com/ibm-watson-data-lab/pixiedust/issues) to let us know. Tell us what you think.

## Share PixieDust

If you think someone you know would be interested in PixieDust, spread the word:

 - <a href="https://twitter.com/home?status=Happy%20to%20find%20PixieDust.%20Data%20notebook%20visualizations%20for%20everyone%3A%20https%3A//github.com/ibm-watson-data-lab/pixiedust%0A">Tweet</a>
 - <a href="https://www.linkedin.com/shareArticle?mini=true&url=https%3A//github.com/ibm-watson-data-lab/pixiedust&title=PixieDust%3A%20Data%20notebook%20visualizations%20for%20everyone&summary=Happy%20to%20find%20PixieDust,%20a%20new%20helper%20library%20for%20python%20and%20scala%3A%20https%3A//github.com/ibm-watson-data-lab/pixiedust%0A&source=">Share on LinkedIn</a>
 - <a href="mailto:?&subject=PixieDust: Data notebook visualizations for everyone&body=I%20found%20a%20new%20helper%20library%20for%20notebooks%3A%20https%3A//github.com/ibm-watson-data-lab/pixiedust">Send email</a>

## Learn more

Ready to pitch in? We can't wait to see what you share. [More on how to contribute](https://ibm-watson-data-lab.github.io/pixiedust/contribute.html). 

End of chapter. [Return to table of contents](#toc)

## Authors
* Jose Barbosa
* Mike Broberg
* Inge Halilovic
* Jess Mantaro
* Brad Noble
* David Taieb
* Patrick Titzler

<hr>
Copyright &copy; IBM Corp. 2017, 2018. This notebook and its source code are released under the terms of the MIT License.