# Different types of plots in plotly

This notebook provides an overview of some common types of plots in plotly. It is intended to be a starting point that you can use to explore further details and other plot types. 

## Environment preparation

Let's import the necessary packages. 

In [1]:
import plotly.graph_objects as go
import pandas as pd

And load the data. We will use a dataset of hotel bookings from Kaggle. It is commonly used to train ML models to predict if a booking will be canceled or not given some variables like the type of customer, number of days booked in avanced, etc. 

In [2]:
FILE = "data/hotel_bookings.csv"

In [3]:
df = pd.read_csv(FILE)
df.head()

Unnamed: 0,hotel,is_canceled,lead_time,arrival_date_year,arrival_date_month,arrival_date_week_number,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,...,deposit_type,agent,company,days_in_waiting_list,customer_type,adr,required_car_parking_spaces,total_of_special_requests,reservation_status,reservation_status_date
0,Resort Hotel,0,342,2015,July,27,1,0,0,2,...,No Deposit,,,0,Transient,0.0,0,0,Check-Out,2015-07-01
1,Resort Hotel,0,737,2015,July,27,1,0,0,2,...,No Deposit,,,0,Transient,0.0,0,0,Check-Out,2015-07-01
2,Resort Hotel,0,7,2015,July,27,1,0,1,1,...,No Deposit,,,0,Transient,75.0,0,0,Check-Out,2015-07-02
3,Resort Hotel,0,13,2015,July,27,1,0,1,1,...,No Deposit,304.0,,0,Transient,75.0,0,0,Check-Out,2015-07-02
4,Resort Hotel,0,14,2015,July,27,1,0,2,2,...,No Deposit,240.0,,0,Transient,98.0,0,1,Check-Out,2015-07-03


In [4]:
df.columns

Index(['hotel', 'is_canceled', 'lead_time', 'arrival_date_year',
       'arrival_date_month', 'arrival_date_week_number',
       'arrival_date_day_of_month', 'stays_in_weekend_nights',
       'stays_in_week_nights', 'adults', 'children', 'babies', 'meal',
       'country', 'market_segment', 'distribution_channel',
       'is_repeated_guest', 'previous_cancellations',
       'previous_bookings_not_canceled', 'reserved_room_type',
       'assigned_room_type', 'booking_changes', 'deposit_type', 'agent',
       'company', 'days_in_waiting_list', 'customer_type', 'adr',
       'required_car_parking_spaces', 'total_of_special_requests',
       'reservation_status', 'reservation_status_date'],
      dtype='object')

## Box plot

A box plot summarizes the distribution of one variable (y) as a function of the possible values of another optional variable (x). 

In [5]:
figure = go.Figure(
    go.Box(
        x=df.is_canceled,
        y=df.lead_time,
        marker=dict(
            line=dict(width=0),
            symbol="x",
            color="black",
            size=6
        )
    )
)

figure.update_layout(
    template="plotly_white",
    xaxis_title="Booking Canceled",
    yaxis_title="Reservation Lead Time",
    width=600,
    height=400

)

figure.show()

buffer = figure.to_image(format="svg")
with open('boxplot.svg', 'wb') as f:
    f.write(buffer)

In the above case we can see that the canceled reservations have a higher lead time than the non-cancelled ones. 

More about boxplots [in the documentation](https://plotly.com/python/box-plots/). 

## Histograms


Another common type of plot is a histogram. A histogram is a plot that shows the distribution of a variable.

In plotly, you can create a histogram using the `go.Histogram` class. The x parameter is the variable you want to plot.

The `histnorm` parameter is used to normalize the histogram. If you set histnorm to 'percent', the histogram will show the percentage of the total number of observations in each bin.

In the case of categorical variables, plotly will create a bin for each category. 

In [6]:

# 
figure2 = go.Figure(
    go.Histogram(
        x=df.customer_type,
        histnorm='percent',
    )
)

figure2.show()

For histograms of numeric variables, among other things you can specify the start, end and step to create bins, and whether the traces are overlaid or not. 

In [11]:
figure3 = go.Figure(
    go.Histogram(
        x=df.lead_time[df.is_canceled == 0],
        opacity=0.5,
        name="Not Canceled"
    )
)

figure3.add_trace(
    go.Histogram(
        x=df.lead_time[df.is_canceled == 1],
        marker_color='red',
        opacity=0.5
    )
)

figure3.update_traces(opacity=0.5)
figure3.update_traces(
    xbins=dict(
        start=0,
        end=400,
        size=20,
    ),
    name="Canceled"
)
#figure3.update_layout(barmode='overlay')

figure3.update_layout(
    xaxis_title="Lead Time",
    yaxis_title="Count",
    template="plotly_white",
    width=600,
    height=400,
)
figure3.show()


## Violin Plots

Violin plots are a different way to display a distribution. See the plotly documentation for more details. 

In [13]:
figure4 = go.Figure(
    go.Violin(
        y=df.lead_time,
        x=df.is_canceled,
    )
)
figure4.update_layout(
    xaxis_title="Booking Canceled",
    yaxis_title="Reservation Lead Time",
    template="plotly_white",
    width=600,
    height=400
)

# Save to image
#buffer = figure4.to_image(format="svg")
#with open('violin.svg', 'wb') as f:
#    f.write(buffer)


#import plotly.io as pio
#pio.write_image(figure4, 'violin.pdf', width=10000, height=10000)

More about histograms in [the documentation](https://plotly.com/python/histograms/). 

## More plot types

These are just two basic examples to understand the main pattern behind Plotly. 
Plotly has support for many other types of plot. Check the [Figure Reference](https://plotly.com/python/reference/index/). 