create traces

In [1]:
import plotly.plotly as py
import plotly.graph_objs as go
import numpy as np
import plotly.offline as offline

offline.init_notebook_mode(connected=True)

Data is obtained from https://github.com/sandialabs/slycat-data/blob/master/cars.csv

In [2]:
import pandas as pd
cars_data = pd.read_csv('datasets/cars.csv', sep=';')
cars_data.head()

Unnamed: 0,Car,MPG,Cylinders,Displacement,Horsepower,Weight,Acceleration,Model,Origin
0,Chevrolet Chevelle Malibu,18.0,8,307.0,130.0,3504.0,12.0,70,US
1,Buick Skylark 320,15.0,8,350.0,165.0,3693.0,11.5,70,US
2,Plymouth Satellite,18.0,8,318.0,150.0,3436.0,11.0,70,US
3,AMC Rebel SST,16.0,8,304.0,150.0,3433.0,12.0,70,US
4,Ford Torino,17.0,8,302.0,140.0,3449.0,10.5,70,US


In [3]:
cars_data_cyl = cars_data.loc[cars_data['Cylinders'] == 8]
cars_data_cyl.head()

Unnamed: 0,Car,MPG,Cylinders,Displacement,Horsepower,Weight,Acceleration,Model,Origin
0,Chevrolet Chevelle Malibu,18.0,8,307.0,130.0,3504.0,12.0,70,US
1,Buick Skylark 320,15.0,8,350.0,165.0,3693.0,11.5,70,US
2,Plymouth Satellite,18.0,8,318.0,150.0,3436.0,11.0,70,US
3,AMC Rebel SST,16.0,8,304.0,150.0,3433.0,12.0,70,US
4,Ford Torino,17.0,8,302.0,140.0,3449.0,10.5,70,US


**Examine the two fields we will use in our plots**

View the distribution of values for these columns. The box plot will help us visualize this data

In [4]:
cars_data_cyl[['Displacement', 'Horsepower']].describe()

Unnamed: 0,Displacement,Horsepower
count,108.0,108.0
mean,345.203704,158.453704
std,46.03468,27.942325
min,260.0,90.0
25%,305.0,140.0
50%,350.0,150.0
75%,360.0,175.0
max,455.0,230.0


**The first boxplot will model the Displacement**

In [5]:
trace0 = go.Box(y = cars_data_cyl['Displacement'], 
                name = 'Displacement')

**The second boxplot will model the Horsepower**

In [6]:
trace1 = go.Box(y = cars_data_cyl['Horsepower'], 
                name = 'Horsepower')

**plot the data**

In [7]:
data = [trace0, trace1]

offline.iplot(data)

# Styling Outliers

**Define the first boxplot**

The value of **boxpoints** decides which points are plotted separtely with markers rather than being represented by a box. The default value is 'outliers'. A value of 'all' will plot all the points outside the box (in addition to plotting the box).

Other possible values for boxpoints are 'suspectedoutliers' (values outside a certain range) and False (where only a box is drawn)

**boxmean** defines whether a dotted line will be drawn to represent the mean of the distribution

In [8]:
trace0 = go.Box(y = cars_data_cyl['Displacement'], 
                name = 'Displacement', 
                
                boxpoints = 'all',
                
                boxmean = True                
               )

**Define the second boxplot**

A value of 'sd' for **boxmean** will also show the standard deviation for the distribution along with the mean

In [9]:
trace1 = go.Box(y = cars_data_cyl['Horsepower'], 
                name = 'Horsepower', 
                
                boxmean = 'sd'
               )

In [10]:
data = [trace0, trace1]

offline.iplot(data)

**More styling**

**Style the first boxplot**
* **marker** defines the markers (by default only the outliers)
* **line** configures the enclosing box lines and the whiskers. The default width is 1

In [11]:
trace0 = go.Box(y = cars_data_cyl['Displacement'], 
                name = 'Displacement',
                
                marker = dict(color = 'blue'),
                
                line = dict(color = 'green', 
                            width = 1)  
               )

**Style the second boxplot**
* **fillcolor** sets the color with which to fill the box. The default is the line color with 50% transparency
* **symbol** for the marker has a number of options (https://plot.ly/python/reference/#box-marker-symbol)

In [12]:
trace1 = go.Box(y = cars_data_cyl['Horsepower'], 
                name = 'Horsepower',
                
                fillcolor = 'bisque',
                
                marker = dict(color = 'navy', 
                              symbol = 'square'),
                
                line = dict(color = 'red')
               )

In [13]:
data = [trace0, trace1]

offline.iplot(data)