Plotly Intro

What is Plotly?
-Plotly is a free, open source library. It supports several languages including Pythons, JavaScript, R, and MATLAB. 
     
What is Plotly used for?
-It is used for data visualization, web-based graphics, and interactive dashboards.

Basic plotly charts and interactions (non-technical)
-Plotly supports several types of charts - basic, financial, statistical and more
-Charts available: 2D/3D scatter, line, bar, pie, heatmap, candlestick, funnel  and geospatial maps
-Interactions: hover tooltips, zooming, panning, click-events, and legend toggling for data exploration

Difference between Plotly graph objects (non-technical/just the basics) and Plotly Express
-Plotly Express is user-friendly and good for quick and simple charts. It is an in-built package that is less customizable.
-Plotly Graph Objects is for visualizations that are customized or more complex. It is also in-built but is more customizable.

In [1]:
import plotly.express as px
import plotly as pl
import plotly.io as pio
import pandas as pd
import numpy as np

JULIA

In [2]:
food = pd.read_csv('daily_food_delivery_orders.csv')
food

Unnamed: 0,order_id,order_date,customer_age,restaurant_type,order_value,delivery_distance_km,delivery_time_minutes,payment_method,delivery_partner_rating,order_status
0,1,2024-11-05,62,Indian,497.51,11.07,79,UPI,3.9,Cancelled
1,2,2024-08-20,35,Bakery,232.32,5.83,69,Wallet,2.7,Cancelled
2,3,2024-02-28,34,Italian,540.82,3.61,70,Wallet,3.4,Cancelled
3,4,2024-05-26,65,Cafe,1197.99,3.66,18,Card,4.6,Cancelled
4,5,2024-09-21,40,Indian,947.03,12.08,57,UPI,4.9,Delayed
...,...,...,...,...,...,...,...,...,...,...
2595,2596,2024-05-20,46,Cafe,738.51,9.12,31,Wallet,4.5,Cancelled
2596,2597,2024-05-15,56,Indian,421.78,8.29,66,Card,2.8,Delayed
2597,2598,2024-10-18,32,Cafe,1009.93,12.80,73,UPI,4.4,Delivered
2598,2599,2024-04-24,55,Bakery,240.97,13.56,56,Cash,4.3,Delivered


In [3]:
status_counts = food['restaurant_type'].value_counts().reset_index()
status_counts.columns = ['restaurant_type', 'frequency']

# Create bar chart
fig1 = px.bar(
    status_counts,
    x='restaurant_type',
    y='frequency',
    title='Frequency of Restaurant Type',
    labels={'restaurant_type': 'Restaurant Type', 'frequency': 'Count'}
   
)

fig1.show()

In [4]:
data = food.groupby(["restaurant_type"]).size().rename("Count").reset_index()
data

Unnamed: 0,restaurant_type,Count
0,Bakery,454
1,Cafe,444
2,Chinese,432
3,Fast Food,425
4,Indian,421
5,Italian,424


In [5]:
px.bar(data, x="restaurant_type", y= "Count", text= "Count", 
color= "restaurant_type",color_discrete_sequence=["brown", "blue", "red", "yellow", "orange", "green"],
 title='Frequency of Restaurant Type',
    labels={'restaurant_type': 'Restaurant Type', 'frequency': 'Count'}) 
  

DARLENE

In [6]:
status_counts = food['order_status'].value_counts().reset_index()
status_counts.columns = ['order_status', 'count']

# Create pie chart
fig2= px.pie(
    status_counts,
    names='order_status',
    values='count',
)

# Update traces to show count (frequency) in black text
fig2.update_traces(
    textinfo='value',          # total frequency
    textfont_color='black',     # text black
    title='<b>FREQUENCY OF ORDER STATUS</b>', #make title bold


#to italicize title Use
   # title='<i>FREQUENCY OF ORDER STATUS</i>', 
)

fig2.update_traces(
    textinfo='label+value',
    textfont_color='black'
)
fig2.show()

KAITLIN

# Why are there two of the same blocks below?
### - David

In [7]:
# Instead of typing food.head(20) over and over, we will assign it a variable of food20
food20 = food.head(20)
food20

Unnamed: 0,order_id,order_date,customer_age,restaurant_type,order_value,delivery_distance_km,delivery_time_minutes,payment_method,delivery_partner_rating,order_status
0,1,2024-11-05,62,Indian,497.51,11.07,79,UPI,3.9,Cancelled
1,2,2024-08-20,35,Bakery,232.32,5.83,69,Wallet,2.7,Cancelled
2,3,2024-02-28,34,Italian,540.82,3.61,70,Wallet,3.4,Cancelled
3,4,2024-05-26,65,Cafe,1197.99,3.66,18,Card,4.6,Cancelled
4,5,2024-09-21,40,Indian,947.03,12.08,57,UPI,4.9,Delayed
5,6,2024-03-16,51,Italian,835.75,3.56,85,UPI,3.8,Delayed
6,7,2024-11-20,52,Cafe,771.83,14.37,80,Card,3.7,Delivered
7,8,2024-11-24,52,Chinese,926.2,12.81,19,UPI,4.4,Delayed
8,9,2024-07-23,38,Bakery,548.11,13.54,42,Cash,4.1,Delayed
9,10,2024-07-01,24,Fast Food,177.18,8.15,29,Cash,2.9,Delivered


In [8]:
# Instead of typing food.head(20) over and over, we will assign it a variable of food20
food20 = food.head(20)
food20

Unnamed: 0,order_id,order_date,customer_age,restaurant_type,order_value,delivery_distance_km,delivery_time_minutes,payment_method,delivery_partner_rating,order_status
0,1,2024-11-05,62,Indian,497.51,11.07,79,UPI,3.9,Cancelled
1,2,2024-08-20,35,Bakery,232.32,5.83,69,Wallet,2.7,Cancelled
2,3,2024-02-28,34,Italian,540.82,3.61,70,Wallet,3.4,Cancelled
3,4,2024-05-26,65,Cafe,1197.99,3.66,18,Card,4.6,Cancelled
4,5,2024-09-21,40,Indian,947.03,12.08,57,UPI,4.9,Delayed
5,6,2024-03-16,51,Italian,835.75,3.56,85,UPI,3.8,Delayed
6,7,2024-11-20,52,Cafe,771.83,14.37,80,Card,3.7,Delivered
7,8,2024-11-24,52,Chinese,926.2,12.81,19,UPI,4.4,Delayed
8,9,2024-07-23,38,Bakery,548.11,13.54,42,Cash,4.1,Delayed
9,10,2024-07-01,24,Fast Food,177.18,8.15,29,Cash,2.9,Delivered


In [9]:
# We want to know what the keys are in this dataset.
food20.keys() 

Index(['order_id', 'order_date', 'customer_age', 'restaurant_type',
       'order_value', 'delivery_distance_km', 'delivery_time_minutes',
       'payment_method', 'delivery_partner_rating', 'order_status'],
      dtype='object')

In [10]:
# Lets look at the order date compared to the order value. The order date
# will be on the x axis and the order value will be on the y value.
# With plotly, one thing you can do is include markers, or points, in your code on the graph.
fig = px.line(food20, x="order_date", y="order_value", markers=True)
fig.show()

In [11]:
# Lets look at the order date compared to the order value. The order date
# will be on the x axis and the order value will be on the y value.
# With plotly, one thing you can do is include markers, or points, in your code on the graph.
fig = px.line(food20, x="order_date", y="order_value", markers=True)
fig.show()

# The issue was here.
## The sort was being applied to a variable named `dfdo20` which was not created in this notebook. It should have been `food20`, so with the change to the correct variable name, the sort was applied to the data you were using to draw the next plot.
### - David 

In [12]:
# The reason this line graph looks insane is because the code is taking
# the dates in order from 0 - 20 and not from the earliest to the latest date.
# To change this, we must use pandas to order the dates from earliest to latest.
# The sort_values function from pandas allows us to sort based on values. We will sort by
# order date, and inplace=True returns a copy of the object when performed.
 
food20.sort_values(by='order_date', inplace=True)
food20





A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,order_id,order_date,customer_age,restaurant_type,order_value,delivery_distance_km,delivery_time_minutes,payment_method,delivery_partner_rating,order_status
12,13,2024-01-09,31,Chinese,758.14,1.28,57,Cash,2.9,Delayed
2,3,2024-02-28,34,Italian,540.82,3.61,70,Wallet,3.4,Cancelled
18,19,2024-03-01,54,Indian,193.87,4.4,63,UPI,2.8,Delivered
5,6,2024-03-16,51,Italian,835.75,3.56,85,UPI,3.8,Delayed
16,17,2024-04-09,54,Cafe,816.14,10.95,86,UPI,4.8,Delayed
17,18,2024-04-20,43,Cafe,640.6,4.49,50,Cash,4.1,Delayed
3,4,2024-05-26,65,Cafe,1197.99,3.66,18,Card,4.6,Cancelled
9,10,2024-07-01,24,Fast Food,177.18,8.15,29,Cash,2.9,Delivered
8,9,2024-07-23,38,Bakery,548.11,13.54,42,Cash,4.1,Delayed
15,16,2024-07-24,38,Bakery,773.12,9.07,50,Wallet,2.8,Cancelled


In [13]:
# Now, since the code is reading from top down, we should see a nice line graph.
fig = px.line(food20, x="order_date", y="order_value", markers=True)
fig.show()

This code works perfecty on Kaitlyn's computer (MAC), but it is not working when we transfer it on to Darlene's computer (PC). Specifically, the dates are not being ordered by date, but rather ordered by the month such that January, month 1, is proceeded by November, month 11. This is show in the chart above.

Can either Dr. Silva or Kaitlyn, who first wrote this code, take a look at this code and please tell us what is wrong?

### For a more in-depth answer, the values of `order_date` are just strings so there is no inherent order when passed to the x axis. However, plotly has a built-in string to datetime parser, so plotly tries to show logical labels by converting to "Jan 2024" for example. Because there is no continuous order for strings, plotly draws based on the index, which appeared to be random before you applied the sort.
### So, ordering the rows by alphabetical `order_date` happens to work, but just because alphabetical happens to align with consecutive in this case. Better would be to use `pd.to_datetime()` to set the values of this variable to the correct data type. Then plotly makes use of the inherent order when drawing across the x axis. The lines however, are always by index. So while the data points are correct, an additional step of ordering may be necessary to draw line traces in logical ways.
### See below for an example where I create a new column for the datetime values and then resort based on customer age ("messing up" the order for the dates). However, becasue I have set the data type for datetime, plotly draws a sensible figure unlike before.
### The second figure further plays with the sort order to show how different lines can be traced across different groupings.
#### - David

In [14]:
food20['datetime'] = pd.to_datetime(food['order_date'])
food20.sort_values('datetime', inplace = True)
fig_fix1 = px.line(food20, x="datetime", y="order_value", markers=True)
fig_fix1.show()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [15]:
food20.sort_values(['order_date', 'order_status'], inplace = True)
fig_fix2 = px.line(food20, x="datetime", y="order_value", color = 'order_status', markers=True)
fig_fix2.show()



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [16]:
# Now that we have a x-axis label and y-axis label, we can get to tampering with the line itself.
# We will use the fig.update_traces() function to update what you see visually in graphs.
# First, we will start with the markers. Use marker=dict, this will tell python to pull out the dictionary of things that can be used on the markers.
# For the color, you can type in a general color like red or blue, or you can use a hex code- you can also set the opacity. Size is size.
# On plotly, there is documentation of all types of markers you can use in a dataset.
# Next, for the line, same thing. You can use line=dict to tell python to pull the dictionary of things we can do to this line.
# For now, we can set the width and color of this line
 
 
fig = px.line(food20, x="order_date", y="order_value", markers=True,
    labels=dict(order_value="Order Value", order_date = "Order Date"))
 
fig.update_traces(marker=dict(color="Gold", opacity=0.8, size=15, symbol="star-diamond" ),
                  line=dict(width=2, color="Teal"))
 
fig.show() 

BEN

In [17]:
fig1 = px.scatter(food, x = "delivery_distance_km", y = "delivery_time_minutes")
fig1.show()

#Plot the delivery distance vs the delivery time

In [18]:
fig2 = px.scatter(food, x = "delivery_distance_km", y = "delivery_time_minutes", color = "delivery_partner_rating")
fig2.show()

#Add color

In [19]:
fig3 = px.scatter(food, x = "delivery_distance_km", y = "delivery_time_minutes", color = "delivery_partner_rating", symbol = "order_status")
fig3.show()

#Add the order status

# Below is the solution. This is part of why we will use Graph Objects in addition to Plotly Express.
## Plotly Express sometimes makes dumb decisions and cuts corners by relying on templates and default values. In this case, the color scale for `delivery_partner_rating` and the associated legend are stored in a way that makes them harder to access and update.
## Thankfully, the `order_status` data and legend are more accessible, so it is a matter of setting the desired legend dictionary keys and values.
## First, I show the default `legend` key. Then I add a yanchor, y, xanchor, and x key with their values and show how this data is stored for drawing.
## I have applied the same solution to all future figures that should have two legends
### - David

In [20]:
fig3_dict = fig3.to_dict()
fig3_dict['layout']['legend'] # the default value from Plotly Express

{'title': {'text': 'order_status'}, 'tracegroupgap': 0}

In [21]:
fig3.update_layout(legend = {'yanchor': "top", 'y': .98, 'xanchor': 'left', 'x': 1.13})
fig3.show()

In [22]:
fig3_dict = fig3.to_dict()
fig3_dict['layout']['legend']

{'title': {'text': 'order_status'},
 'tracegroupgap': 0,
 'yanchor': 'top',
 'y': 0.98,
 'xanchor': 'left',
 'x': 1.13}

In [23]:
foodSample = food.sample(n = 50)
foodSample.head()

#Take a random sample of the data

Unnamed: 0,order_id,order_date,customer_age,restaurant_type,order_value,delivery_distance_km,delivery_time_minutes,payment_method,delivery_partner_rating,order_status
1707,1708,2024-11-13,65,Italian,628.58,3.79,50,Wallet,4.2,Cancelled
827,828,2024-12-13,23,Chinese,252.6,11.3,20,Cash,3.7,Delivered
1619,1620,2024-11-07,26,Fast Food,1199.78,10.95,67,Cash,3.7,Cancelled
1857,1858,2024-10-28,59,Indian,1173.22,11.81,27,UPI,5.0,Delayed
1298,1299,2024-08-10,21,Italian,910.25,10.69,47,UPI,5.0,Delivered


In [24]:
fig4 = px.scatter(foodSample, x = "delivery_distance_km", y = "delivery_time_minutes", color = "delivery_partner_rating", symbol = "order_status")
fig4.update_layout(legend = {'yanchor': "top", 'y': .98, 'xanchor': 'left', 'x': 1.13})
fig4.show()

In [25]:
fig5 = px.scatter(foodSample, x = "delivery_distance_km", y = "delivery_time_minutes", color = "delivery_partner_rating", symbol = "order_status")

fig5.update_traces(marker = dict (size = 20, opacity = 0.5, line = dict (width = 2, color = "Black")))
fig5.update_layout(legend = {'yanchor': "top", 'y': .98, 'xanchor': 'left', 'x': 1.13})

fig5.show()

#Increase the size and decrease the opacity of the points

In [26]:
fig6 = px.scatter(foodSample, x = "delivery_distance_km", y = "delivery_time_minutes", color = "delivery_partner_rating", symbol = "order_status", title = "Delivery Distance vs Delivery Time", labels = {"delivery_distance_km" : "Delivery Distance (KM)", "delivery_time_minutes" : "Delivery Time (M)"})

fig6.update_traces(marker = dict (size = 20, opacity = 0.5))
fig6.update_layout(legend = {'yanchor': "top", 'y': .98, 'xanchor': 'left', 'x': 1.13})
fig6.show()

#Customize the title and axes.

Dr. Silva, what can I do to fix the legend above, if anything needs to be done at all? I can't figure out how to get the legend to look neater when it is dipslaying a color gradient and series of symbols.

### See my answer in a Markdown above.
#### - David