Python Helper library for Jupyter Notebooks

README.md

PixieDust

PyPI version Build Status

PixieDust is a productivity tool for Python or Scala notebooks, which lets a developer encapsulate business logic into something easy for your customers to consume.

New Book now available: Thoughtful Data Science

This book published by Packt Publishing is the user and developer reference for using PixieDust

Pixiedust developer community

Wait! There is a developer community? Yes there is! If you already are a member, login. If you would like to contribute please join us.

Why you need it

Notebooks are a powerful tool for fast and flexible data analysis. But the learning curve is steep.

Python data science notebooks were first popularized in academia, and there are some formalities to work through before you can get to your analysis. For example, in a Python interactive notebook, a mundane task like creating a simple chart or saving data into a persistence repository requires mastery of complex code like this matplotlib snippet:

All this for a chart?
All this for a chart?

Once you do create a notebook that provides great data insights, it's hard to share with business users, who don't want to slog through all that dry, hard-to-read code, much less tweak it and collaborate.

PixieDust to the rescue.

What is PixieDust?

PixieDust is an open source helper library that works as an add-on to Jupyter notebooks to improve the user experience of working with data. It also fills a gap for users who have no access to configuration files when a notebook is hosted on the cloud.

Use in Python or Scala

PixieDust greatly simplifies working with Python display libraries like matplotlib, but works just as effectively in Scala notebooks too. You no longer have compromise your love of Scala to generate great charts. PixieDust lets you bring robust Python visualization options to your Scala notebooks. Installer and instructions to use Scala with PixieDust are coming soon...

Features

PixieDust's current capabilities include:

  • packageManager lets you install Spark packages inside a Python notebook. This is something that you can't do today on hosted Jupyter notebooks, which prevents developers from using a large number of spark package add-ons.

  • Visualizations. One single API called display() lets you visualize your Spark object in different ways: table, charts, maps, etc.... This module is designed to be extensible, providing an API that lets anyone easily contribute a new visualization plugin.

    This sample visualization plugin uses d3 to show the different flight routes for each airport:

    graph map

  • Embedded apps. Let nonprogrammers actively use notebooks. Transform a hard-to-read notebook into a polished graphic app for business users. Check out these preliminary sample apps:

    • An app can feature embedded forms and responses, flightpredict, which lets users enter flight details to see the likelihood of landing on-time.
    • Or present a sophisticated workflow, like our twitter demo, which delivers a real-time feed of tweets, trending hashtags, and aggregated sentiment charts with Watson Tone Analyzer.
  • Extensibility. Create your own visualizations or apps using the PixieDust extensibility APIs. If you know html and css, you can write and deliver amazing graphics without forcing notebook users to type one line of code. Use the shape of the data to control when PixieDust shows your visualization in a menu.

  • Export. Notebook users can download data to .csv, HTML, JSON, etc. locally on your laptop or into a variety of back-end data sources, like Cloudant, dashDB, GraphDB, etc...

    save as options

  • Scala Bridge. Use Scala directly in your Python notebook. Variables are automatically transfered from Python to Scala and vice-versa. Learn more.

    Or start in a Scala notebook. As mentioned, all these PixieDust features work not only in Python, but in Scala too. So if you prefer Scala, you'll soon be able to start there and use PixieDust to insert sophisticated Python graphic options within your Scala notebook. Instructions coming soon.

  • Spark progress monitor. Track the status of your Spark job. No more waiting in the dark. Notebook users can now see how a cell's code is running behind the scenes.

Watch this video to see PixieDust in action:

about PixieDust

Usage

You can use PixieDust locally or online within IBM's Watson Studio.

Use online

To use PixieDust online

Use locally

  • Pixiedust supports
  • Spark 1.6 or 2.0
  • Python 2.7 or 3.5

Sample notebooks

Wherever you prefer to work, try out the following sample notebooks:

Tutorials

Contribute

Note: PixieDust currently supports Spark DataFrames, Spark GraphFrames and Pandas DataFrames, with more to come. If you can't wait, write your own today and contribute it back.

Read how to contribute for details on our code of conduct and instructions for submitting pull requests to us.

Developer Guide

Dive into the PixieDust developer docs and learn how to build your own custom visualization or embedded app. You can also pitch in and contribute an enhancement to PixieDust's core features.

We can't wait to see what you build.

License

Apache License, Version 2.0.

For details and all the legalese, read LICENSE.