<h1 align="center"> An Introduction to Data Visualization with Python </h1>
<h3 align="center"> Kristy Streu - HPC Partnerships and Outreach Specialist - Argonne Leadership Computing Facility </h3>

<figure style="text-align: center">
  <img src="img/alcf_vis.jpeg" width="80%">
  <figcaption><em>Figure 1. Vis Lab at Argonne National Laboratory</em></figcaption>
</figure>

<figure style="text-align: center">
  <img src="img/covid.jpeg" width="80%">
  <figcaption><em>Figure 2. Viral genome sequencing helps us predict mutations and develop life saving drugs and vaccines (Emory University)</em></figcaption>
</figure>

<figure style="text-align: center">
  <img src="img/ClimateModel.jpeg" width="80%">
  <figcaption><em>Figure 3. Model and predict climate change to help drive green energy policies (Los Alamos National Laboratory)</em></figcaption>
</figure>

<figure style="text-align: center">
  <img src="img/astro_viz.jpeg" width="80%">
  <figcaption><em>Figure 4. Help us understand the forces beyond our world and answer some of the universe's biggest questions (Simulation and Visualization: Ji-hoon Kim and Tom Abel)</em></figcaption>
</figure>

***

<h3 align="center">Import Core Python Libraries for Data Visualization</h3>

The main python libraries we will be using for visualization are:
* `matplotlib`: https://matplotlib.org/stable/
* `pandas`: https://pandas.pydata.org
* `numpy`: https://numpy.org

Matplotlib is a powerful data visualization library in Python widely used for creating high-quality static, interactive, and animated visualizations. This tutorial will walk you through the essential concepts and functionalities of Matplotlib.

In [None]:
# Basic imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

<h3 align="center">Plotting with Matplotlib</h3>

Lets start with a basic line plot.

`matplotlib.pyplot.plt()` documentation: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html

In [None]:
# Create data
x = np.linspace(0, 10, 100)
y = np.sin(x)

# Create a line plot
plt.plot(x, y, label='sin(x)')

# Add labels and title
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.title('Line Plot Example')
plt.legend()

# Show the plot
plt.show()


Exercise 1

Let's create a Line Plot together by completing the missing code in the cell below.

In [None]:
# Monthly temperature data
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
temps = [32, 35, 45, 55, 65, 75]

plt.____(months, temps, color='____')
plt.title('____')
plt.____('Month')
plt.____('Temperature (°F)')
plt.show()

Now work with your neighbor to add some features to your Line Plot with temperature data.
* Choose marker style: `'o'`, `'s'`, `'^'`, or `'D'`
* Add one of these line styles: `'--'`, `'-.'`, `':'`
* Pick a color: `'red'`, `'green'`, `'purple'`, `'#FF6B6B'`

*Hint: Update `plt.plot()` with optional keyword arguments.*

In [None]:
# Hint:
# plt.plot(____, ____, color=____, linestyle=____, marker=____)

# Your code here - modify the basic plot in the cell above

Scatter Plot example

`matplotlib.pyplot.scatter` documentation: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html

In [None]:
# Create data
x = np.random.rand(50)
y = np.random.rand(50)

# Create a scatter plot
plt.scatter(x, y, label='Random Data', color='blue', marker='o', s=50)

# Add labels and title
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.title('Scatter Plot Example')
plt.legend()

# Show the plot
plt.show()


Bar Plot example

`matplotlib.pyplot.bar()` documentation: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.bar.html

In [None]:
# Create data
categories = ['Category A', 'Category B', 'Category C']
values = [25, 50, 75]

# Create a bar plot
plt.bar(categories, values)

# Add labels and title
plt.xlabel('Categories')
plt.ylabel('Values')
plt.title('Bar Plot Example')

# Show the plot
plt.show()


`seaborn` is a helpful library built on top of `matplotlib` that provides useful automation functionality and better default settings (color palettes, grid styles, etc.)

* `seaborn`: https://seaborn.pydata.org/

In [None]:
import seaborn as sns

# Create data
categories = ['Category A', 'Category B', 'Category C']
values = [25, 50, 75]

# Plot with Seaborn
sns.barplot(x=categories, y=values, palette="pastel")

# Add labels and title
plt.xlabel('Categories')
plt.ylabel('Values')
plt.title("Bar Plot Using Seaborn")

# Show the plot
plt.show()

Histogram Example

`matplotlib.pyplot.hist()` documentation: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html

In [None]:
# Create random data, random sample from standard normal distribution
data = np.random.randn(1000)

# Create a histogram
plt.hist(data, bins=30, edgecolor='black')

# Add labels and title
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.title('Histogram Example')

# Show the plot
plt.show()

What is the effect of changing the following in the above histogram?
* data size
* bin size

In [None]:
# Your code here - modify the histogram code in the cell 



Pie Chart example

`matplotlib.pyplot.pie()` documentation: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.pie.html

In [None]:
# Create data
labels = ['Category A', 'Category B', 'Category C']
sizes = [15, 30, 55]
colors = ['red', 'green', 'blue']

# Create a pie chart
plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%', shadow=True, startangle=90)

# Add title
plt.title('Pie Chart Example')

# Show the plot
plt.show()


<h3 align="center">Plotting Cutomization</h3>

Titles, Labels, and Legends

In [None]:
# Example data
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)

# Create a line plot
plt.plot(x, y1, label='sin(x)', color='blue')
plt.plot(x, y2, label='cos(x)', color='green')

# Add labels and title
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.title('Customized Line Plot')
plt.legend()

# Show the plot
plt.show()


In [None]:
# Example data
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)

# Create a line plot
plt.plot(x, y1, label='sin(x)', color='blue')
plt.plot(x, y2, label='cos(x)', color='green')

# Add labels and title
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.title('Line Plot with Legend')
plt.legend(loc='upper right')

# Show the plot
plt.show()


Colors, Markers, and Linestyles

In [None]:
# Example data
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)

# Create a line plot with customized colors, markers, and linestyles
plt.plot(x, y1, label='sin(x)', color='blue', linestyle='-', marker='o', markersize=5)
plt.plot(x, y2, label='cos(x)', color='green', linestyle='--', marker='s', markersize=5)

# Add labels and title
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.title('Customized Line Plot')
plt.legend(loc='upper right')

# Show the plot
plt.show()


Grids

`matplotlib.pyplot.grid()` documentation: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.grid.html

In [None]:
# Example data
x = np.linspace(0, 10, 100)
y = np.sin(x)

# Create a line plot with gridlines
plt.plot(x, y, label='sin(x)', color='blue')
plt.grid(True)

# Add labels and title
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.title('Line Plot with Gridlines')
plt.legend()

# Show the plot
plt.show()


Exercise 2: Let's create and modify a scatter plot of 100 random points.

In [None]:
x = ____
y = ____

plt.____(x, y, color='green')
plt.title("Random Scatter Plot")
plt.xlabel("Random X")
plt.ylabel("Random Y")
plt.show()

Now work with you neighbor to modify the plot
* Change the color and line style
* Add a legend
* Add a grid
* Add axis limits

In [None]:
# Your code here - modify the basic plot in the cell above

<h3 align="center">Subplots</h3>

Multiple graphs within the same figure

`matplotlib.pyplot.subplots()` documentation: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html

In [None]:
# Example data
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)

# Create a 2x1 subplot grid, (nrows=2, ncols=1, index=1)
plt.subplot(2, 1, 1)
plt.plot(x, y1, label='sin(x)', color='blue')
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.title('Sin(x)')
plt.legend()

# Create a 2x1 subplot grid, (nrows=2, ncols=1, index=2)
plt.subplot(2, 1, 2)
plt.plot(x, y2, label='cos(x)', color='green')
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.title('Cos(x)')
plt.legend()

# Adjust layout
plt.tight_layout()

# Show the plot
plt.show()


Error Bars

`matplotlib.pyplot.errorbar()` documentation: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.errorbar.html

In [None]:
# Example data
x = np.linspace(0, 10, 10)
y = np.sin(x)
error = 0.1 + 0.2 * x

# Create a line plot with error bars
plt.errorbar(x, y, yerr=error, fmt='o-', label='sin(x) with Error Bars', color='blue')

# Add labels and title
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.title('Line Plot with Error Bars')
plt.legend()

# Show the plot
plt.show()


Annotations and Text

`matplotlib.pyplot.annotate()` documentation: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.annotate.html

`matplotlib.pyplot.text()` documentation: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.text.html

In [None]:
# Example data
x = np.linspace(0, 10, 100)
y = np.sin(x)

# Create a line plot
plt.plot(x, y, label='sin(x)', color='blue')

# Add labels and title
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.title('Line Plot with Annotation')
plt.legend()

# add arrow annotation for max y value
plt.annotate('max value', xy=(np.pi/2, 1), xytext=(np.pi/2, 0.5),
                arrowprops=dict(facecolor='black', shrink=0.05))

# Add text
plt.text(4, 0.5, 'Some Text', fontsize=12, color='red')

# Show the plot
plt.show()


<h3 align="center">3D Plotting</h3>

`mplot3d` toolkit provides the three dimensional axes objects necessary to make 3D plots and graphs with `matplotlib`.

`mplot3d` toolkit documentation: https://matplotlib.org/stable/tutorials/toolkits/mplot3d.html

`mplot3d.plot_surface()` documentation: https://matplotlib.org/stable/api/_as_gen/mpl_toolkits.mplot3d.axes3d.Axes3D.plot_surface.html

In [None]:
# Import necessary library for 3D plotting
from mpl_toolkits.mplot3d import Axes3D

# Example data
x = np.linspace(-5, 5, 100)
y = np.linspace(-5, 5, 100)
X, Y = np.meshgrid(x, y)
Z = np.sin(np.sqrt(X**2 + Y**2))

# Create a 3D surface plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z, cmap='viridis')

# Add labels and title
ax.set_xlabel('X-axis')
ax.set_ylabel('Y-axis')
ax.set_zlabel('Z-axis')
ax.set_title('3D Surface Plot')

# Show the plot
plt.show()

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Create random data points
np.random.seed(42)
num_points = 100
x = np.random.rand(num_points)
y = np.random.rand(num_points)
z = np.random.rand(num_points)

# Create a figure and a 3D axis
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

# Create a 3D scatter plot
scatter = ax.scatter(x, y, z, c=z, cmap='viridis', marker='o')

# Add colorbar
fig.colorbar(scatter)

# Set labels and title
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_zlabel('Z Label')
ax.set_title('3D Scatter Plot')

# Show the plot
plt.show()


`mplot3d.bar3d()` documentation: https://matplotlib.org/stable/api/_as_gen/mpl_toolkits.mplot3d.axes3d.Axes3D.bar3d.html

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Create sample data
x = np.arange(5)  # x coordinates of bars
y = np.arange(5)  # y coordinates of bars
X, Y = np.meshgrid(x, y)  # Create a grid of coordinates
Z = np.random.rand(5, 5)  # Heights of bars

# Create a figure and a 3D axis
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

# Create 3D bar plot - (coordinates of bar anchors (x,y), dimensions of bars(0,1,1), heights of bars)
ax.bar3d(X.flatten(), Y.flatten(), 0, 1, 1, Z.flatten())

# Set labels and title
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_zlabel('Z Label')
ax.set_title('3D Bar Plot')

# Show the plot
plt.show()


Animated Plots

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation

# Create a figure and axis
fig, ax = plt.subplots()
xdata, ydata = [], []
ln, = plt.plot([], [], 'r', animated=True)

# Define the initialization function
def init():
    ax.set_xlim(0, 2*np.pi)
    ax.set_ylim(-1, 1)
    return ln,

# Define the update function for the animation
def update(frame):
    xdata.append(frame)
    ydata.append(np.sin(frame))
    ln.set_data(xdata, ydata)
    return ln,

# Create the animation
ani = FuncAnimation(fig, update, frames=np.linspace(0, 2*np.pi, 128),
                    init_func=init, blit=True)

# Display the animation
from IPython.display import HTML
HTML(ani.to_jshtml())


Exercise 3: Let's create a 2x2 subplot with weather data together by completing the missing code in the cell below.

In [None]:
# Create 2x2 subplot with weather data
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# PROVIDED DATA:
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
temperature = [30, 35, 50, 65, 75, 85]
rainfall = [2.5, 2.0, 3.5, 4.0, 3.0, 2.5]

# TASK: Complete each subplot
# Top-left: Temperature line plot
axes[0, 0].plot(months, temperature)
axes[0, 0].set_title('____')

# Top-right: Rainfall bar chart  
axes[0, 1].____('____', '____')
axes[0, 1].set_title('____')

# Bottom-left: Temperature vs Rainfall scatter
axes[1, 0].____('____', '____')
axes[1, 0].set_title('____')

# Bottom-right: Your choice of visualization
# Create any plot you want with the available data

plt.tight_layout()
plt.show()

Work with your neighbor to enhance your 2x2 subplot:
* Apply a consistent color scheme across all plots
* Add a grid to plots if appropriate
* Customize fonts and sizes
* Add annotations or callouts to interesting data points
* Include a global legend if applicable

In [None]:
# Your styled subplot here - modify the subplot code in the cell above



Some more challenging modifications to try if time permits:
* Use more complex layouts (mix of subplot sizes)
* Include calculated metrics (correlations, trends)
* Add statistical overlays (trend lines, confidence intervals)
* Create coordinated visualizations (linked insights)
* Include summary statistics as text annotations
* Use advanced styling (custom colormaps, professional typography)

In [None]:
# Your code here


<h3 align="center">GeoPandas</h3>

`GeoPandas` is a python library that makes it easy to work with geospatial data and create interactive maps.

https://geopandas.org/en/stable/

In [None]:
import geopandas as gpd
import matplotlib.pyplot as plt

# load geospacial data
world = gpd.read_file('data/countries/ne_110m_admin_0_countries.shp')

# explore and plot data
world.plot()
plt.title("World Map")
plt.show()


In [None]:
# customize the map
fig, ax = plt.subplots(figsize=(10, 6))
world.plot(ax=ax, color='lightblue', edgecolor='black')
ax.set_title("Customized World Map")
plt.show()


GeoPandas chloropleths is a type of thematic map where areas are shaded or patterned in proportion to the value of a variable. GeoPandas makes it easy to create choropleth maps using geospatial data.

Let's explore this by looking at the unemployment rate of each county in the US.

In [None]:
import pandas as pd

df1 = pd.read_csv('data/unemployment-2020.csv')
df2 = pd.read_csv('data/lat-lon-fips.csv')

# merge df1 and df2 on the column 'id'
df = pd.merge(df1, df2, on='id')

# show a graph of the data
df.plot(x='lng', y='lat', kind='scatter')
plt.show()

In [None]:
# Import the geopandas and geoplot libraries
import geopandas as gpd
import pandas as pd

# Load the json file with county coordinates
geoData = gpd.read_file('data/US-counties.geojson')

# Make sure the "id" column is an integer
geoData.id = geoData.id.astype(str).astype(int)

# Read file
data = pd.read_csv('data/unemployment-2020.csv')

fullData = geoData.merge(data, left_on=['id'], right_on=['id'])

fullData.explore(scheme='StdMean', cmap='YlOrRd', column='rate')