# Introduction to Plotly

Plotly is a popular data visualization library in Python for creating interactive and complex visualizations. It is particularly well-suited for web-based visualizations, with support for a multitude of chart types.

## Overview of Plotly Interfaces

Plotly offers two main interfaces for creating visualizations: `plotly.graph_objects` and `plotly.express`, each catering to different user needs from detailed customization to quick and simple plotting.

### plotly.graph_objects
This is Plotly's low-level interface, providing detailed control over the components of figures, such as traces and layouts. It is designed for creating complex, highly-customized visualizations.

**Key Features:**
- Detailed control over figure components.
- Suitable for layering multiple plots and custom interactions.
- Use for complex visualizations requiring precise configurations.

### plotly.express
Plotly Express is the high-level interface, ideal for quickly creating common chart types with minimal code. It automates many aspects of plotting, making it user-friendly and efficient.

**Key Features:**
- Simple, concise syntax for creating charts.
- Automates layout and common settings.
- Best for quick data exploration and standard visualizations.

<br>
<br>

We'll use the same datasets we worked with in the previous lab:
> #### Datasets:
>
>**[Stock Market Dataset](https://www.kaggle.com/datasets/borismarjanovic/price-volume-data-for-all-us-stocks-etfs)** 
>This dataset includes historical daily prices and volumes of all U.S. stocks and ETFs, containing CSV files for every stock, with values for Date, Open, High, Low, Close, Volume, etc. For this lab, we will use the historical data for the Amazon stock.
>
> **[International tourism, number of arrivals](https://data.worldbank.org/indicator/ST.INT.ARVL)** 
>
>This dataset contains the yearly number of inbound tourists for every country. The data on inbound tourists refer to the number of arrivals, not to the number of people traveling. Thus a person who makes several trips to a country during a given period is counted each time as a new arrival.



<br>

## Generate a Scatter Plot with `plotly.graph_objects`

A scatter plot visualizes the relationship between two variables on a 2D plane, with data points scattered across the plane to reveal patterns or correlations.

While `plotly.express` (PX), the high-level interface, is recommended for most users due to its simplicity and efficiency in creating visualizations quickly, this example focuses on the more detailed and flexible `plotly.graph_objects`. This low-level interface allows for precise customization of every aspect of a figure, providing the flexibility needed for complex customizations not directly achievable with PX.


### Lets use the stock dataset to illustrate the relationship of the open prices vs the volume of the stock on a scatter plot

First we need to import the required libraries:

In [1]:
# Install required libraries using pip
!pip install nbformat plotly

# Import required libraries
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go



In [2]:
# Load the data
df_stock = pd.read_csv('datasets/amzn.csv')
df_stock.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5153 entries, 0 to 5152
Data columns (total 7 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   Date     5153 non-null   object 
 1   Open     5153 non-null   float64
 2   High     5153 non-null   float64
 3   Low      5153 non-null   float64
 4   Close    5153 non-null   float64
 5   Volume   5153 non-null   int64  
 6   OpenInt  5153 non-null   int64  
dtypes: float64(4), int64(2), object(1)
memory usage: 281.9+ KB


In [3]:
# First we will create an empty figure using go.Figure()
fig=go.Figure()
fig

In [4]:
#Next we will create a scatter plot by using the add_trace function and use the go.scatter() function within it
# In go.Scatter we define the x-axis data,y-axis data and define the mode as markers with color of the marker as blue
fig.add_trace(go.Scatter(x=df_stock['Open'], y=df_stock['Volume'], mode='markers', marker={'color': 'blue'}))

# Display the figure
fig.show()

**However in the previous output title, x-axis and y-axis labels are missing. Let us use the `update_layout` function to update the title and labels.**


In [5]:
# Here we update these values under function attributes such as title,xaxis_title and yaxis_title
fig.update_layout(title='Open Price vs Volume', xaxis_title='Open Price', yaxis_title='Volume')
# Display the figure
fig.show()

### Customizing Hover Labels

To enhance the interactivity of our scatter plot, we will customize the hover labels to display more descriptive information about each data point. This can help users better understand the data as they interact with the visualization.


In [6]:
# Customize hover labels using the hover template
fig.update_traces(
    hovertemplate=
        "Open Price: %{x}<br>" +
        "Volume: %{y}<br>" +
        "<extra></extra>"  # Removes trace name from the hover labels
)

# Display the updated figure with customized hover labels
fig.show()


## Comparison with `plotly.express`

To illustrate the simplicity of `plotly.express`, the same visualization can be achieved with fewer lines of code.

In plotly express we set the axis values and the title within the same function call `px.<plot_type>(x=<x-axis value source>,y=<y-axis value source>,title=<title as a string>)`.


In [7]:
# Create the same plot using plotly.express

fig_px = px.scatter(
    df_stock, 
    x='Open', 
    y='Volume', 
    title='Open Price vs Volume of Stock',
    labels={'Open': 'Open Price', 'Volume': 'Volume'}
)

# Display the figure created with Plotly Express
fig_px.show()


## Exercise 1: Line Chart of Daily Stock Prices

A line plot is effective for visualizing data that changes over time by connecting data points with straight lines.

**Objective:**
Use the stock dataset to create a line plot to visualize how the daily close prices of the stock have changed since 2015.

**Requirements:**
- Include only data from the year 2015 onwards.
- **Title**: "Daily Close Prices since 2015"
- **x-axis label**: "Date"
- **y-axis label**: "Close Price"

**Hint**: Utilize the `plotly.express.line` method for this plot. Filter the DataFrame to only include data from 2015 onwards with a condition on the 'Date' column. By default, Plotly uses DataFrame column names as labels for the axes. To use different labels, specify them using the `labels` parameter in `plotly.express.line`.

**Resources**:
- Check the [Plotly Express Line API reference](https://plotly.com/python-api-reference/generated/plotly.express.line.html) for syntax and options.
- For additional examples and customizations, visit [Plotly Line Charts Examples](https://plotly.com/python/line-charts/).


In [8]:
# Type your answer here



### Exercise 1 Solution

import plotly.express as px
import pandas as pd

# Although not necessary, we can convert 'Date' to datetime
df_stock['Date'] = pd.to_datetime(df_stock['Date'])

df_filtered = df_stock[df_stock['Date'].dt.year >= 2015]

# Create the line plot
fig = px.line(
    df_filtered,
    x='Date',
    y='Close',
    title='Daily Close Prices since 2015',
    labels={'Close': 'Close Price'}  # Custom labels for the y-axis
)

# Display the figure
fig.show()




## Exercise 2: Bar Chart of Tourist Arrivals Across Countries

Bar charts are commonly used for visualizing comparisons across categories. They use rectangular bars, where the length of each bar is proportional to the represented value. 

**Objective:**
Use the international tourism dataset to examine the number of tourists visiting Greece, Spain, France, and Italy in 2020 on a bar chart.

**Requirements:**
- Focus on data from the year 2020 for countries: Greece, Spain, France, and Italy.
- Display the bars sorted in descending order of tourist numbers.
- **Title**: "Tourist Arrivals in 2020"
- **x-axis label**: "Country"
- **y-axis label**: "Number of Tourists"

**Hint**: First, access the data for 2020 by directly referencing the '2020' column in your DataFrame. Filter the DataFrame to include only Greece, Spain, France, and Italy. To display the bars in descending order based on the number of tourists, sort the DataFrame by the '2020' column in descending order before plotting.

**Resources**:
- [Plotly Express Bar Chart API Reference](https://plotly.com/python-api-reference/generated/plotly.express.bar.html)
- [Examples of Plotly Bar Charts](https://plotly.com/python/bar-charts/)


In [9]:
# First, let's import necessary libraries
import pandas as pd
import plotly.express as px

# Read the .csv file and store it as a pandas DataFrame
df_tourism = pd.read_csv("datasets/international_tourism.csv")

# Display the data to understand its structure
df_tourism.head()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Aruba,ABW,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,1481000.0,1667000.0,1739000.0,1832000.0,1758000.0,1863000.0,1897000.0,1951000.0,,
1,Afghanistan,AFG,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,,,,,,,,,,
2,Angola,AGO,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,528000.0,650000.0,595000.0,592000.0,397000.0,261000.0,218000.0,218000.0,,
3,Albania,ALB,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,3514000.0,3256000.0,3673000.0,4131000.0,4736000.0,5118000.0,5927000.0,6406000.0,2658000.0,
4,Andorra,AND,"International tourism, number of arrivals",ST.INT.ARVL,,,,,,,...,7900000.0,7676000.0,7797000.0,7850000.0,8025000.0,8152000.0,8328000.0,8235000.0,5207000.0,


In [10]:
# Type your answer here



# Filter the data to include only the specified countries
df_tourism_filtered = df_tourism[df_tourism['Country Name'].isin(['Greece', 'Spain', 'France', 'Italy'])]

# Sorting the data in descending order by the number of tourists in 2020
df_tourism_filtered = df_tourism_filtered.sort_values('2020', ascending=False)

# Create the bar chart using Plotly Express
fig = px.bar(
    df_tourism_filtered,
    x='Country Name',
    y='2020',  # Directly use the '2020' column for tourist numbers
    title='Tourist Arrivals in 2020',
    labels={'Country Name': 'Country', '2020': 'Number of Tourists'},
    text='2020'  # Display the tourist number on each bar
)

# Adjust bar width appearance
fig.update_traces(width=0.3)

# Display the figure
fig.show()





