# Plotly Package Demonstration 

[plotly.com/python](https://plotly.com/python/)

### Content
1. Importing Plotly
2. Loading the Test Data
3. Overview of Plotly
4. Basic Charts
5. Statistical Charts
6. Scientific Charts
7. More Charting Capabilities

# 1. Importing Plotly and Supporting Packages

In [24]:
import pandas as pd # data analysis package
import numpy as np # additional functionality for pandas objects
import plotly as plt
import plotly.graph_objs as go
import plotly.express as px

### Plotly Express
[Plotly Express](https://plotly.com/python/plotly-express/) is a module of the Plotly library. It is a high-level API for creating figures with lower levels of customization, but they are much quicker and easier to create.
- Some figures in the traditional Plotly library have been deprecated by Plotly Express figures.
- Some of the plots in this demo will be Ploty Express figures (commonly denoted `px`).

# 2. Load and Explore the Test Data

In [25]:
data = pd.read_csv('energy_bill.csv')
data.head() # view the first 5 rows of data

Unnamed: 0,num_rooms,num_people,housearea,is_ac,is_tv,is_flat,ave_monthly_income,num_children,is_urban,amount_paid
0,3,3,742.57,1,1,1,9675.93,2,0,560.481447
1,1,5,952.99,0,1,0,35064.79,1,1,633.283679
2,3,1,761.44,1,1,1,22292.44,0,0,511.879157
3,0,5,861.32,1,1,0,12139.08,0,0,332.992035
4,1,8,731.61,0,1,0,17230.1,2,1,658.285625


In [26]:
# what are the data types
data.dtypes

num_rooms               int64
num_people              int64
housearea             float64
is_ac                   int64
is_tv                   int64
is_flat                 int64
ave_monthly_income    float64
num_children            int64
is_urban                int64
amount_paid           float64
dtype: object

In [27]:
# size of data
data.shape

(1000, 10)

### Data Features

All columns have values with numerical (int64 or float64) data types.
- `is_ac`, `is_tv`, `is_flat`, and `is_urban` have been encoded to make quantitative analysis easier.
- `0` represents 'No'
- `1` represents 'Yes'

This Pandas `DataFrame` objects contains 1000 rows and 10 columns.

In [28]:
# are there null values?
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 10 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   num_rooms           1000 non-null   int64  
 1   num_people          1000 non-null   int64  
 2   housearea           1000 non-null   float64
 3   is_ac               1000 non-null   int64  
 4   is_tv               1000 non-null   int64  
 5   is_flat             1000 non-null   int64  
 6   ave_monthly_income  1000 non-null   float64
 7   num_children        1000 non-null   int64  
 8   is_urban            1000 non-null   int64  
 9   amount_paid         1000 non-null   float64
dtypes: float64(3), int64(7)
memory usage: 78.2 KB


- There are no null values in this data.
- All columns have the same amount of rows.

# 3. Overview of Plotly

All charts, graphs, diagrams, and maps created with Plotly are referred to as figures. This is similar to how all things in Pandas are referred to as objects.

**Types of Plotly Figures**
- Scatter plots
- Bar, pie, and bubble charts
- Histograms
- Box plots
- Scatter matrix plots
- 3D Scatter plots
- Subplots
- Maps (for Geo Data)
- More!

**Graph Objects**
- Plotly's `graph_objs` package lets you save graphs as objects. 
- Multiple Graph Objects, referred to as `traces`, can be passed as data to visualize multiple Graphs as once.

**Traces**
- Graph objects, each representing a different x-y relationship between two fields.
- Example: `num_people` (x-axis) vs. `amount_paid` (y-axis)
- Mutiple traces can be passed, in the form of a list of trace names, as the argument of the `data` parameter for the Figure object to visualize a combination of one or more traces at a time.

### Graph Object Parameters
Graph Objects can be customized by specifying a wide range of parameters. Here is a brief look at some of those parameters:
- `x` - x-axis
- `y` - y-axis
- `mode` - type of graph (ex: 'line')
- `name` - name of each plot
- `marker` - dictionary of marker parameter arguments (`marker = dict(size=10, color='black', ...`))
- `text` - hover text
- `data` - list of traces (graph objects)
- `layout` - dictionary of layout options
- `fig` - combination of data and layout

# 4. Basic Charts

### Scatter Plot

In [29]:
# creat a new DataFrame object for scatter plot objects
scatter_df = data
scatter_df.head() # view the first five rows

Unnamed: 0,num_rooms,num_people,housearea,is_ac,is_tv,is_flat,ave_monthly_income,num_children,is_urban,amount_paid
0,3,3,742.57,1,1,1,9675.93,2,0,560.481447
1,1,5,952.99,0,1,0,35064.79,1,1,633.283679
2,3,1,761.44,1,1,1,22292.44,0,0,511.879157
3,0,5,861.32,1,1,0,12139.08,0,0,332.992035
4,1,8,731.61,0,1,0,17230.1,2,1,658.285625


In [30]:
# trace1 for our Scatter Plot

trace1 = go.Scatter(x = scatter_df['housearea'],
                    y = scatter_df['amount_paid'],
                    mode = 'markers',
                    name = 'area',
                    marker = dict(color = 'red', size=10,
                                 line=dict(width=2,color='black')),
                    text= scatter_df['housearea']
                   )
# list of traces to be used in the Scatter plot
traces = [trace1]

# create Scatter plot figure with no layout
fig = go.Figure(data=traces)
fig
               

In [31]:
# add a layout
layout = dict(title = 'House Area vs. Monthly Energy Cost',
              xaxis= dict(title= 'House Area',ticklen= 100,zeroline= True),
              yaxis = dict(title='Monthly Energy Cost', ticklen=100, zeroline=True))

# update Scatter plot figure to include layout
fig = go.Figure(data=traces, layout=layout)
fig

In [32]:
# Updating trace markers using `fig.update_traces()`

fig.update_traces(marker=dict(size=10,
                              color='black',
                              line = dict(width=1,
                                          color='red')),
                    selector=dict(mode='markers'))


In [33]:
# Updating figure parameters has side-effects (calling update methods permanently modifies the figure)
fig

### Bar Charts

In [34]:
import plotly.express as xp

# Make new DataFrame
bar_df = pd.DataFrame([data['num_rooms'], data['is_tv'], data['ave_monthly_income'], data['amount_paid']])
bar_df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,990,991,992,993,994,995,996,997,998,999
num_rooms,3.0,1.0,3.0,0.0,1.0,0.0,4.0,3.0,2.0,1.0,...,1.0,2.0,1.0,3.0,2.0,3.0,2.0,2.0,1.0,1.0
is_tv,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,...,0.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0
ave_monthly_income,9675.93,35064.79,22292.44,12139.08,17230.1,24661.81,28184.43,16912.69,26058.28,22545.5,...,39175.03,13412.74,43077.89,13887.45,39795.94,12687.26,39502.92,10145.17,15535.05,22204.0
amount_paid,560.481447,633.283679,511.879157,332.992035,658.285625,793.242346,570.382845,585.4052,653.200868,606.015138,...,615.481956,332.559418,721.615641,495.844286,528.452107,655.870111,354.472693,568.66055,653.423314,537.801005


In [35]:
# Transpose the DataFrame 
bar_df = bar_df.transpose()
bar_df

Unnamed: 0,num_rooms,is_tv,ave_monthly_income,amount_paid
0,3.0,1.0,9675.93,560.481447
1,1.0,1.0,35064.79,633.283679
2,3.0,1.0,22292.44,511.879157
3,0.0,1.0,12139.08,332.992035
4,1.0,1.0,17230.10,658.285625
...,...,...,...,...
995,3.0,1.0,12687.26,655.870111
996,2.0,1.0,39502.92,354.472693
997,2.0,0.0,10145.17,568.660550
998,1.0,1.0,15535.05,653.423314


In [36]:
# sort the columns
bar_df = bar_df.sort_values(['num_rooms', 'amount_paid'])

# Find the sum and average of all amounts paid for each number of rooms
amnt_by_rooms = bar_df.groupby('num_rooms').agg('amount_paid').sum()
avg_amnt_by_rooms = bar_df.groupby('num_rooms').agg('amount_paid').mean()

amnt_by_rooms, avg_amnt_by_rooms

(num_rooms
 -1.0      3225.945788
  0.0     39334.894601
  1.0    155532.336100
  2.0    223637.455470
  3.0    143136.069790
  4.0     33129.017748
  5.0      2400.639830
 Name: amount_paid, dtype: float64,
 num_rooms
 -1.0    645.189158
  0.0    605.152225
  1.0    605.184187
  2.0    599.564224
  3.0    596.400291
  4.0    591.589603
  5.0    600.159957
 Name: amount_paid, dtype: float64)

**Plotly Supports Pandas Series Objects**
- `amnt_by_rooms` and `avg_amnt_by_rooms` are Pandas `Series` objects
- The index is the number of rooms
- The values are the sum/avg of all energy bills for each `num_rooms` value
- '-1' is being used to denote mvalues greater than 5

In [37]:
# Create the first trace
trace1 = px.bar(amnt_by_rooms,
                 x=amnt_by_rooms.index,
                 y=amnt_by_rooms.values,
                 labels={
                     'x':'Number of Rooms',
                     'y':'Sum of Energy Bills'
                     },
                 hover_data=[amnt_by_rooms.index],
                 title='Number of Rooms vs. Total Amount of Energy Bills'
                )
trace1

### Line Charts

In [38]:
# Create the second trace
trace2 = px.line(avg_amnt_by_rooms,
                 x=avg_amnt_by_rooms.index,
                 y=avg_amnt_by_rooms.values,
                 labels={
                     'x':'Number of Rooms',
                     'y':'Average Energy Bill'
                     },
                 hover_data=[avg_amnt_by_rooms.index],
                 title='Number of Rooms vs. Avg. Energy Bill Amount'
                )
trace2

### Bubble Charts

In [39]:
# import plotly express package
import plotly.express as px

#import data
bubble_df = data[['num_people','housearea','ave_monthly_income','amount_paid', 'num_rooms']]

# preview data
bubble_df.head()

Unnamed: 0,num_people,housearea,ave_monthly_income,amount_paid,num_rooms
0,3,742.57,9675.93,560.481447,3
1,5,952.99,35064.79,633.283679,1
2,1,761.44,22292.44,511.879157,3
3,5,861.32,12139.08,332.992035,0
4,8,731.61,17230.1,658.285625,1


In [40]:
# Make a new Scatter figure for the Bubble chart
fig = px.scatter(bubble_df,
                 x=bubble_df['num_people'],
                 y=bubble_df['amount_paid'],
                 size=bubble_df['amount_paid'],
                 color=bubble_df['num_rooms'],
                 title='Bubble Plot'
                 )
fig

# 5. Statistical Charts
Examples of Statistical charts:
- Box plot
- Histogram
- Distplot
- Error Bars

Statistical charts are used to display descriptive information about data.

### Box Plot

In [41]:
import plotly.graph_objects as go
import numpy as np

y0 = data['num_people']
y1 = data['num_rooms']

fig = go.Figure()
fig.add_trace(go.Box(y=y0, name='Number of People',
                marker_color = 'darkred'))
fig.add_trace(go.Box(y=y1, name = 'Number of Rooms',
                marker_color = 'lightgreen'))
fig.update_traces(boxpoints='all')


fig.show()

### Histogram

In [42]:
import plotly.express as px

df = data
fig = px.histogram(df,
                   x='amount_paid',
                   color='is_urban',
                   marginal='violin', # can be `box`, `violin`
                   nbins=50,
                   title='Histogtam with Violin Plots',
                   hover_data=df.columns)
fig.show()

In [43]:
fig = px.histogram(df,
                   x='amount_paid',
                   color='is_urban',
                   marginal='box',
                   nbins=50,
                   title='Histogram with Box Plot',
                   hover_data=df.columns)
fig.show()

In [44]:
fig = px.histogram(df,
                   x='amount_paid',
                   color='is_urban',
                   marginal='rug',
                   nbins=50,
                   title='Histogram with Rug (One-Dimensional Density Plot)',
                   hover_data=df.columns)
fig.show()

# 6. Scientific Charts

### Density Heatmap

- 'Marginal' charts can be added to the X and Y axes to show the distribution of values for only one axis.

In [45]:
heat_df = data.iloc[:,:]

fig = px.density_heatmap(heat_df,
                        x='amount_paid',
                         y='ave_monthly_income',
                         marginal_x = 'histogram',
                         marginal_y = 'box'
                        )

fig.update_layout(title="Density Heatmap",
                  xaxis_title="Amount Paid",
                  yaxis_title="Avg. Monthly Income")
fig.show()

### Bar Chart (Polar)

In [47]:
polar_df = data

fig = px.bar_polar(polar_df,
                   r=(polar_df["amount_paid"]/10), # distance from the axis
                   theta=(polar_df["num_rooms"]*51), # 360 / len(set(data['num_rooms']))
                   color="num_rooms",
                   template="plotly_dark",
                   color_discrete_sequence= px.colors.sequential.Plasma_r,
                   log_r = True,
                   range_theta = (0,360)
                  )

fig.update_layout(title="Bar Polar")
fig.show()

# 7. More Charting Capabilities

### Artificial Intelligence and Machine Learning

Plotly offers a variety of charting capabilities to supplement AI and ML models.
- Regression Analysis
- K-Nearest-Neighbors Classification
- Receiving Operating Characteristic Curves
- Principal Component Analysis Transformation Visualizations
- Additional capabilities with Dash.

### Financial Charts

Financial data can also be used to make charts specifically designed for financial analysis.
- Time Series
- Candlestick Charts
- Waterfall Charts
- Funnel Charts
- Open-High-Low-Close Charts

### Geographical Mapping

Data containing geographical data such as states, countries, zip and area codes, Lat. and Long. coordinates can be used to visualize Geo-Spatial Data

- Chloropleth Maps
- Area Maps
- Bubble Maps
- Density Heat Maps

### 3D Charting

Some of the one and two dimensional charts also have 3D capabilities. 
- 3D Axes
- 3D Scatter Plots
- 3D Surface Plots
- 3D Subplots
- 3D Camera Controls

### More

Plotly has a large collection of visualization capabilities that allow for high amounts of customization. Check them out on [Plotly's website](plotly.com/python).