# Activity 2-2: Creating high quality figures

## Figure generation in Python

`matplotlib` is the most popular library for generating figures in Python. Other visualization libraries such as `seaborn` often take advantage of `matplotlib` functions and data structures. 

We've made some figures in the past, but we haven't worried too much about their appearance. In this activity, we'll learn a bit more about how to control the way that figures are plotted in Python, and we'll try to create good visualizations that are easy to understand.

## 1. Generating example data

First, let's start by creating some data that we want to visualize. The example that we'll use here is based on from Jean-luc Doumont's [Trees, maps, and theorems](https://www.principiae.be/book/).

We imagine that we're taking data from an experiment where the power of some light source is expected to be Gaussian as a function of frequency. The data will roughly follow this trend, but the measurements are noisy.

### 1.a. Defining the Gaussian function and random data

Read through and execute the code block below to define a Gaussian function.

In [None]:
import numpy as np
import numpy.random as rng
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline


# Define the Gaussian function

def gauss(x):
    """ Gaussian function with mean 17 and standard deviation 0.6. """
    mu = 17
    sigma = 0.6
    return np.exp(-((x - mu)**2) / (2 * sigma**2)) / (sigma * np.sqrt(2 * np.pi))


# Plot the Gaussian distribution alone

x_gauss = np.arange(15, 20, 0.01)
y_gauss = gauss(x_gauss)
sns.lineplot(x_gauss, y_gauss)

### 1.b. Generating random data

Simulated "measurements" will be taken from the Gaussian function, together with some additional random noise. Read through and execute the code block below to generate some random measurements, centered around the Gaussian distribution.

In [None]:
# Define the random number generator

r = rng.RandomState(3)


# Generate measurements that are randomly scaled relative to the "true" distribution

x_data = np.arange(15.5, 19, 0.2)
y_data = gauss(x_data)

for i in range(len(y_data)):
    y_data[i] = y_data[i] * (1 + ((r.rand()-0.5)/5))
    
    
# Plot the data

sns.scatterplot(x_data, y_data)

## 2. Visualizing the data

### 2.a. Making a "bad" plot

We'll begin by making a "bad" display -- one that contains many elements that might individually be useful in certain circumstances, but which produce an overwhelming visual effect when combined together.

To do this, we'll use different options in `matplotlib` to manipulate the display of the figure. Read through and execute the code block below to understand the different options that are being applied and how they affect the final figure. To see the appearance of the figure without one of these modifications, you can try commenting out one of the lines and executing the code block again.

In [None]:
# First, we'll plot the data
# We'll color the data points black using the option "c='k'"
# We'll also use + symbols to mark the data points with the option "marker='+'"
# To make a legend to distinguish between the data and the "theory", we'll label these points as data

plt.scatter(x_data, y_data, c='k', marker='+', label='data')


# Now, we'll plot the "theory" Gaussian function
# We'll also set this color in black using the option "c='k'"
# For the legend, we'll add a label "label='theory'"

plt.plot(x_gauss, y_gauss, c='k', label='theory')


# Now let's add the legend 

plt.legend()


# Let's add more tick marks and a background grid

plt.minorticks_on()
plt.grid(b=True, which='both')


# And of course, we should label the axes

plt.xlabel('Frequency [GHz]')
plt.ylabel('Output power [W]');

### 2.b. Making minor improvements

Combined together, the elements of the plot above are overwhelming and they obscure the message of the data. We can do better.

Read through and execute the code block below to generate a new figure that better represents the data.

In [None]:
# Again, we start by plotting the data
# This time, we'll let the color choice be automatic, and we won't change the marker
# To make a legend to distinguish between the data and the "theory", we'll label these points as data

plt.scatter(x_data, y_data, label='data')


# Next we plot the "theory" Gaussian function
# We'll include a label for the legend, but no other styling

plt.plot(x_gauss, y_gauss, label='theory')


# Now we add the legend 

plt.legend()


# And we label the axes

plt.xlabel('Frequency [GHz]')
plt.ylabel('Output power [W]');

### 2.c. Making a better graph

Now, we'll take the base graph above and improve it by thoughtfully choosing the colors, labeling, and display ranges that we use. Our goal will be to make a clear distinction between the theory and data, allowing them to be compared easily. We will also try to reduce visual clutter by removing extraneous pieces of the graph.

To see how we do this, read through and execute the code block below.

In [None]:
# First, we'll choose a color scheme -- one color for the data, and one color for the theory
# The theory color is defined in hexadecimal
# There are many places to go to look for good color schemes, but one good option is
# http://colorbrewer2.org/, which can automatically generate colorblind-friendly palettes

c_theory = '#2b8cbe'
c_data = 'k'


# Now, we'll plot the data and the theory in their respective colors
# To make sure that the data shows over the theory line, we'll include
# another option "zorder" that sets the relative order of plot elements

plt.plot(x_gauss, y_gauss, c=c_theory, zorder=1)
plt.scatter(x_data, y_data, c=c_data, zorder=2)


# Rather than using an automatic legend, we'll label the data and theory parts manually
# We can do this by adding text directly to the figure
# Below, the x and y options tell us where the text will be placed
# We have also adjusted the alignment and color of the text

plt.text(x=17.45, y=0.58, s='data', horizontalalignment='left',
         verticalalignment='center', c=c_data)

plt.text(x=17.55, y=0.48, s='theory', horizontalalignment='left',
         verticalalignment='center', c=c_theory)


# In previous plots, the outside of the graph is heavy and draws attention
# Let's hide the top and right sides completely
# First, we have to obtain the current "axis" for the plot, which we do using
# the function `plt.gca` ("Get Current Axis")
# Then, we hide the top and right sides

ax = plt.gca()
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)


# Now let's also restrict the range of the x axis so that it's centered

plt.xlim(15, 19)


# And let's control the extent of the x and y axes so that they follow the
# bounds of the theory/data

ax.spines['left'].set_bounds(0, 0.65)
ax.spines['bottom'].set_bounds(15.5, 19)


# And reduce the number of tick marks we plot

plt.xticks([15.5, 17, 19])
plt.yticks([0, 0.65])


# And we label the axes

plt.xlabel('Frequency [GHz]')
plt.ylabel('Output power [W]');

## 3. A figure of your own

In the examples above, we considered one way to visualize a simple set of data. Using default plot options is great for quickly exploratory analysis, but often we need to adjust options by hand to make great visualizations that will best communicate the ideas that we have in mind.

For the rest of this activity, work on constructing your own figure. The source data could come from a class you're in or from your research. You could also generate a toy data set like we did above, or use one of the built-in datasets in a library like `pandas` (for example, the "planets" data set that we referred to earlier). Try to create a single, polished figure that clearly communicates an idea without excessive or distracting detail.