# Introduction to Plotly
Before we start working with the dataset, I want to briefly introduce Plotly and why it’s useful for data visualization.

## What is Plotly?
Plotly is a data visualization library that allows users to create interactive charts in Python, R, and JavaScript. 

Unlike traditional plotting libraries, Plotly graphs are dynamic. Users can hover over data points, zoom into specific regions, filter categories, and interact with the visualization in real time. This makes Plotly particularly useful for exploring data and presenting insights clearly.

## Basic Plotly Charts and Interactions

Plotly supports many common chart types such as bar charts, line charts, scatter plots, histograms, and pie charts.

What makes Plotly powerful is its interactivity. Users can hover to see detailed values, zoom into sections of the chart, pan across the data, and toggle categories on and off. These features help users better understand patterns and relationships within the data.

## Plotly Graph Objects vs Plotly Express

Plotly provides two main interfaces for creating visualizations.

**Plotly Graph Objects** is the lower-level interface that gives users full control over every element of a chart. It is useful when building highly customized or complex visualizations but usually requires more code.

**Plotly Express** is the high-level interface designed for speed and simplicity. It allows users to create charts quickly with minimal code while still producing interactive visuals.

For this demonstration, we will focus on Plotly Express because it makes exploring and visualizing data much faster and easier.

## Moving to the Dataset

Now that we understand the basics of Plotly, we will work with a daily food delivery orders dataset and demonstrate how the same variables can be used to create multiple visualizations using Plotly Express.

## Installing Plotly

To install Plotly in your terminal, open the terminal and get into your environment. You will then type: 

conda install "notebook>=7.0" "anywidget>=0.9.13"

Press "Y" to proceed with the installation, and launch a new jupyter notebook.

In [1]:
# To only import plotly express, we will type import plotly.express as px.
# We will also need pandas for our code, so we will type import pandas as pd.
import plotly.express as px 
import pandas as pd

In [2]:
# Download the dataset from the assignment for this week and use pandas to read the dataset.
# You can give this dataset any variable, we chose DFDO since it is the first letter of each word.
dfdo = pd.read_csv("daily_food_delivery_orders.csv")
dfdo

Unnamed: 0,order_id,order_date,customer_age,restaurant_type,order_value,delivery_distance_km,delivery_time_minutes,payment_method,delivery_partner_rating,order_status
0,1,2024-11-05,62,Indian,497.51,11.07,79,UPI,3.9,Cancelled
1,2,2024-08-20,35,Bakery,232.32,5.83,69,Wallet,2.7,Cancelled
2,3,2024-02-28,34,Italian,540.82,3.61,70,Wallet,3.4,Cancelled
3,4,2024-05-26,65,Cafe,1197.99,3.66,18,Card,4.6,Cancelled
4,5,2024-09-21,40,Indian,947.03,12.08,57,UPI,4.9,Delayed
...,...,...,...,...,...,...,...,...,...,...
2595,2596,2024-05-20,46,Cafe,738.51,9.12,31,Wallet,4.5,Cancelled
2596,2597,2024-05-15,56,Indian,421.78,8.29,66,Card,2.8,Delayed
2597,2598,2024-10-18,32,Cafe,1009.93,12.80,73,UPI,4.4,Delivered
2598,2599,2024-04-24,55,Bakery,240.97,13.56,56,Cash,4.3,Delivered


In [3]:
# You can see that there are 2,600 rows and 10 columns in this dataset.
# We will not be working with 2,600 rows, so we will only show the first twenty
# to have a decent amount of data for our graphs
dfdo.head(20)

Unnamed: 0,order_id,order_date,customer_age,restaurant_type,order_value,delivery_distance_km,delivery_time_minutes,payment_method,delivery_partner_rating,order_status
0,1,2024-11-05,62,Indian,497.51,11.07,79,UPI,3.9,Cancelled
1,2,2024-08-20,35,Bakery,232.32,5.83,69,Wallet,2.7,Cancelled
2,3,2024-02-28,34,Italian,540.82,3.61,70,Wallet,3.4,Cancelled
3,4,2024-05-26,65,Cafe,1197.99,3.66,18,Card,4.6,Cancelled
4,5,2024-09-21,40,Indian,947.03,12.08,57,UPI,4.9,Delayed
5,6,2024-03-16,51,Italian,835.75,3.56,85,UPI,3.8,Delayed
6,7,2024-11-20,52,Cafe,771.83,14.37,80,Card,3.7,Delivered
7,8,2024-11-24,52,Chinese,926.2,12.81,19,UPI,4.4,Delayed
8,9,2024-07-23,38,Bakery,548.11,13.54,42,Cash,4.1,Delayed
9,10,2024-07-01,24,Fast Food,177.18,8.15,29,Cash,2.9,Delivered


In [4]:
# Instead of typing dfdo.head(20) over and over, we will assign it a variable of dfdo20.
dfdo20 = dfdo.head(20)
dfdo20

Unnamed: 0,order_id,order_date,customer_age,restaurant_type,order_value,delivery_distance_km,delivery_time_minutes,payment_method,delivery_partner_rating,order_status
0,1,2024-11-05,62,Indian,497.51,11.07,79,UPI,3.9,Cancelled
1,2,2024-08-20,35,Bakery,232.32,5.83,69,Wallet,2.7,Cancelled
2,3,2024-02-28,34,Italian,540.82,3.61,70,Wallet,3.4,Cancelled
3,4,2024-05-26,65,Cafe,1197.99,3.66,18,Card,4.6,Cancelled
4,5,2024-09-21,40,Indian,947.03,12.08,57,UPI,4.9,Delayed
5,6,2024-03-16,51,Italian,835.75,3.56,85,UPI,3.8,Delayed
6,7,2024-11-20,52,Cafe,771.83,14.37,80,Card,3.7,Delivered
7,8,2024-11-24,52,Chinese,926.2,12.81,19,UPI,4.4,Delayed
8,9,2024-07-23,38,Bakery,548.11,13.54,42,Cash,4.1,Delayed
9,10,2024-07-01,24,Fast Food,177.18,8.15,29,Cash,2.9,Delivered


In [5]:
# We want to know what the keys are in this dataset. 
dfdo.keys()

Index(['order_id', 'order_date', 'customer_age', 'restaurant_type',
       'order_value', 'delivery_distance_km', 'delivery_time_minutes',
       'payment_method', 'delivery_partner_rating', 'order_status'],
      dtype='object')

# Line Graph

In [6]:
# Lets look at the order date compared to the order value. The order date
# will be on the x axis and the order value will be on the y value.
# With plotly, one thing you can do is include markers, or points, in your code on the graph.
fig = px.line(dfdo20, x="order_date", y="order_value", markers=True)
fig.show()

In [7]:
# The reason this line graph looks insane is because the code is taking
# the dates in order from 0 - 20 and not from the earliest to the latest date.
# To change this, we must use pandas to order the dates from earliest to latest.
# The sort_values function from pandas allows us to sort based on values. We will sort by
# order date, and inplace=True returns a copy of the object when performed.

dfdo20.sort_values(by='order_date', inplace=True)
dfdo20



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,order_id,order_date,customer_age,restaurant_type,order_value,delivery_distance_km,delivery_time_minutes,payment_method,delivery_partner_rating,order_status
12,13,2024-01-09,31,Chinese,758.14,1.28,57,Cash,2.9,Delayed
2,3,2024-02-28,34,Italian,540.82,3.61,70,Wallet,3.4,Cancelled
18,19,2024-03-01,54,Indian,193.87,4.4,63,UPI,2.8,Delivered
5,6,2024-03-16,51,Italian,835.75,3.56,85,UPI,3.8,Delayed
16,17,2024-04-09,54,Cafe,816.14,10.95,86,UPI,4.8,Delayed
17,18,2024-04-20,43,Cafe,640.6,4.49,50,Cash,4.1,Delayed
3,4,2024-05-26,65,Cafe,1197.99,3.66,18,Card,4.6,Cancelled
9,10,2024-07-01,24,Fast Food,177.18,8.15,29,Cash,2.9,Delivered
8,9,2024-07-23,38,Bakery,548.11,13.54,42,Cash,4.1,Delayed
15,16,2024-07-24,38,Bakery,773.12,9.07,50,Wallet,2.8,Cancelled


In [8]:
# Now, since the code is reading from top down, we should see a nice line graph.
fig = px.line(dfdo20, x="order_date", y="order_value", markers=True)
fig.show()

Even though this is a simple line graph, plotly is able to zoom and unzoom on the graph, move around, reset or select the graph, as well as hover over points to give the exact information found in the dataset.

Lets take a look on how we can customize this line graph.

In [9]:
# First, we should set a title, x-axis title, and y-axis title to show what exactly we are looking at.
# To do so,in the same parentheses, write labels=dict(title = "Order Dates and Values", order_value="Order Value", order_date = "Order Date"
# labels=dict is pulling all of the things we can do from the labels dictionary
fig = px.line(dfdo20, x="order_date", y="order_value", markers=True,
    labels=dict(order_value="Order Value", order_date = "Order Date"))
fig.show()

In [10]:
# Now that we have a x-axis label and y-axis label, we can get to tampering with the line itself.
# We will use the fig.update_traces() function to update what you see visually in graphs.
# First, we will start with the markers. Use marker=dict, this will tell python to pull out the dictionary of things that can be used on the markers.
# For the color, you can type in a general color like red or blue, or you can use a hex code- you can also set the opacity. Size is size.
# On plotly, there is documentation of all types of markers you can use in a dataset.
# Next, for the line, same thing. You can use line=dict to tell python to pull the dictionary of things we can do to this line. 
# For now, we can set the width and color of this line


fig = px.line(dfdo20, x="order_date", y="order_value", markers=True,
    labels=dict(order_value="Order Value", order_date = "Order Date"))

fig.update_traces(marker=dict(color="Gold", opacity=0.8, size=15, symbol="star-diamond" ), 
                  line=dict(width=2, color="Teal"))

fig.show()

In [13]:
# Next, we are going to make a simple scatter plot comparing the delivery time in minutes on the xaxis and
# the delivery distance on the y axis, using the first 20 colums of the dataset.
fig2 = px.scatter(dfdo20, x="delivery_time_minutes", y="delivery_distance_km")
fig2.show()

In [16]:
# plotly allows for users to add a trendline in a scatter plot. There are different types of trendlines as outlined
# by the documentation, so you should choose a trendline that matches the data you are showing.
fig2 = px.scatter(dfdo20, x="delivery_time_minutes", y="delivery_distance_km", trendline="ols", )
fig2.show()

In [21]:
# We can also customize the points on the scatterplot by using symbol and color values in relation to the dataset. For example,
# for the symbol we can set order_status. Each symbol will be different depending on what the order status is. We can
# also set the color to a category. We can set it to restaurant_type, meaning each color will be different depending on the restaurant type.
# Together, each point will have a distinct color and shape to show what restaurant it is and a distinct shape to show the order status.
# We can display a single, overall trendline with the function trendline_scope.
fig2 = px.scatter(dfdo20, x="delivery_time_minutes", y="delivery_distance_km", symbol="order_status", 
                  color="restaurant_type", trendline="lowess", trendline_scope="overall")
fig2.show()

In [22]:
# Using everything we learned above, we can customize the colors, size, shape, outline, and more in a dataset.
# Now, in the real world, you would customize these colors to match your dataset for biological relevance.

fig2 = px.scatter(dfdo20, x="delivery_time_minutes", y="delivery_distance_km", symbol="order_status", color="restaurant_type", 
                  trendline="lowess", trendline_scope="overall",
                  labels=dict(delivery_time_minutes="Delivery Time in Minutes", delivery_distance_km="Delivery Distance in Kilometers", 
            restaurant_type="Restaurant Type", order_status="Order Status"))

fig2.update_traces(marker=dict(size=15, opacity=0.8, line=dict(width=3,
                                        color='Black')), line=dict(width=5))

fig2.show()

In [None]:
# We can make multiple charts in one. We start by doing what we have learned before, but we then add facet_col
# to show that each column is a piece of content in order_status. In this case, it will be delayed, cancelled, or delivered.
# The showgrid=True function turns on the grid in the background.

fig5 = px.scatter(dfdo20, x="customer_age", y="order_value", color="payment_method",
                 facet_col="order_status", title="Comparing Customer Age and Order Value and the Order Status Based on Payment Method",
                 labels=dict(order_value = "Order Value", customer_age = "Customer Age", payment_method = "Payment Method"))

fig5.update_xaxes(showgrid=True)

fig5.show()