How to write a new PixieDust visualization

Displaying a DataFrame as a table

Code walk-through

Hooking into the display menu

class TableDisplayMeta

The first thing you'll want to do is get your tool presented in the menu. Since you're building a tool to display a DataFrame as an HTML table, look at display/table/__init__.py:

pixiedust init.py in dir display/table

@PixiedustDisplay(isDefault=True)
class TableDisplayMeta(DisplayHandlerMeta):
    @addId
    def getMenuInfo(self,entity, dataHandler):
        if dataFrameMisc.isPySparkDataFrame(entity) or dataFrameMisc.isPandasDataFrame(entity):
            return [
                {"categoryId": "Table", "title": "DataFrame Table", "icon": "fa-table", "id": "dataframe"}
            ]
        elif dataFrameMisc.fqName(entity) == "graphframes.graphframe.GraphFrame":
            return [
                {"categoryId": "Table", "title": "Graph Vertices", "icon": "fa-location-arrow", "id":"vertices"},
                {"categoryId": "Table", "title": "Graph Edges", "icon": "fa-link", "id":"edges"}
            ]
        else:
            return []
    def newDisplayHandler(self,options,entity):
        return TableDisplay(options,entity)

Here you see that class TableDisplayMeta does 2 things. In the method getMenuInfo, it decides what menus to load, depending on the type of entity passed. In the method newDisplayHandler, it does the actual data processing.

def getMenuInfo

In the getMenuInfo method of class TableDisplayMeta we check for the type of the entity. Pixiedust currently supports GraphFrames and PySparkDataFrames (currently only 2-dimensional matrix Pandas DataFrames are supported, so you can consider them to be the same as PySparkDataFrames for the purposes of Pixiedust development).

If it's a Pandas or PySpark DataFrame, we return an array of menu definition objects. A menu definition object consists of 4 properties:

categoryId: a unique string that identifies the menu "category" or group
title: an arbitrary string that describes the menu
icon: the name of a fontawesome icon, or a URL for an image
id: a unique string that identifies your tool

Pixiedust only has one option for displaying a DataFrame as a table, so we return a single menu object in the array.

def newDisplayHandler

The other method we must implement is newDisplayHandler, which is called when the menu item is selected. This is where the DataFrame is actually processed. In this case, we return a new TableDisplay object, which does all the heavy lifting that we'll talk about next.

Data Processing

The TableDisplay class is defined in display/table/display.py. Here is the code:

pixiedust display.py in dir display/table

class TableDisplay(Display):
    def doRender(self, handlerId):
        entity=self.entity       
        if dataFrameMisc.fqName(entity) == "graphframes.graphframe.GraphFrame":
            if handlerId == "edges":
                entity=entity.edges
            else:
                entity=entity.vertices
        if dataFrameMisc.isPySparkDataFrame(entity) or dataFrameMisc.isPandasDataFrame(entity):
            self._addHTMLTemplate('dataframeTable.html', entity=PandasDataFrameAdapter(entity))
            return
            
        self._addHTML("""
            <b>Unable to display object</b>
        """
        )

This class must implement one method, doRender, which is called with a reference to self and a handlerId. In the case of DataFrame display, the handlerId is unused, so we only need to check for one of our DataFrame entity types. If we have one of those, we'll display it using a Jinja2 HTML template.

HTML rendering with Jinja2

This line of code:

self._addHTMLTemplate('dataframeTable.html', entity=PandasDataFrameAdapter(entity))

is the key to rendering our data. dataframeTable.html (by default found in the templates directory in the same directory as the calling file) is a Jinja2 template consisting of CSS styles, HTML and data processing language. You should study this file carefully here, but the key lines are:

{% set rows = entity.take(100) %}: get the first 100 lines of the DataFrame and assign to variable rows
{% for field in entity.getFields() %}: loop over the fields and display each as a <th>
{% for row in rows %}: loop over the rows and display each as a <tr>

Also note the <script> tag at the end of the file. This is where we do some nifty extras like scrolling while keeping the table header in a fixed position and client-side search.

Build your own table display tool

Now that you've seen how Pixiedust works, let's build a very simple second table display tool. You'll need to do 3 things:

Add a menu item
Hook the menu item to your code
Transform the DataFrame into something a web browser can display (HTML in our case, but it could be SVG, a PDF or something more exotic) using Jinja2 HTML templating

Add a menu item and hook it to your code

Have Pixiedust recognize your new menu item code by adding this line in the imports of init.py (in the directory display/table):

from .SimpleDisplayMeta import SimpleDisplayMeta

Then create the file SimpleDisplayMeta.py and enter this code:

from .SimpleDisplay import SimpleDisplay
from ..display import *
import pixiedust.utils.dataFrameMisc as dataFrameMisc

@PixiedustDisplay()
class SimpleDisplayMeta(DisplayHandlerMeta):
   @addId
   def getMenuInfo(self,entity,dataHandler):
        if dataFrameMisc.isPySparkDataFrame(entity) or dataFrameMisc.isPandasDataFrame(entity):   
           return [
               {"categoryId": "Table", "title": "Simple Table", "icon": "fa-table", "id": "simpleTest"}
           ]
        else:
           return []
   def newDisplayHandler(self,options,entity):
       return SimpleDisplay(options,entity)

As described earlier, the method getMenuInfo provides the hook to add a menu item to the user interface. You specify "Table" as the categoryID to add this tool to the existing Table menu. Give it any title and icon you want. And finally, give it a unique id, such as "simpleTest".

The newDisplayHandler method specifies the code that will be used to do the data processing work. Let's write that code now.

DataFrame => HTML

Create the file SimpleDisplay.py in the directory display/table, and enter this code:

from ..display import *
from pyspark.sql import DataFrame
from pixiedust.utils.dataFrameAdapter import *
import pixiedust.utils.dataFrameMisc as dataFrameMisc
    
class SimpleDisplay(Display):
    def doRender(self, handlerId):
        entity=self.entity
        if dataFrameMisc.isPySparkDataFrame(entity) or dataFrameMisc.isPandasDataFrame(entity):
            self._addHTMLTemplate('simpleTable.html', entity=PandasDataFrameAdapter(entity))
            return
            
        self._addHTML("""
            <b>Unable to display object</b>
        """
        )

All you're really doing here is defining a mechanism to call the right Jinja template -- simpleTable.html found in the templates directory -- for processing the data. Once you're working on the template, the sky's the limit for what you can do. But just to finish out this example, here's some extremely simple code you can add there:

<table class="table table-striped">
   <thead>                 
       {%for field in entity.schema.fields%}
       <th>{{field.name}}</th>
       {%endfor%}
   </thead>
   <tbody>
       {%for row in entity.take(100)%}
       <tr>
           {%for field in entity.schema.fields%}
           <td>{{row[field.name]}}</td>
           {%endfor%}
       </tr>
       {%endfor%}
   </tbody>
</table>

What you get

Now that the code is complete. Let's update Pixiedust in our notebook and see the results. Shut down your Jupyter environment, run the below command from your terminal, and restart Jupyter to get the new code.

pip install --user -e <your directory path to pixiedust code>

You should now see something resembling the screenshot below. The table menu is now a dropdown with two options, "DataFrame Table" and your new "Simple Table". Choosing "Simple Table" displays the data using the template you defined in simpleTable.html!

new table menu added

Provide feedback

Saved searches

Use saved searches to filter your results more quickly