### CS102 - Further Computing

Mark Howard<br>
School of Mathematical & Statistical Sciences<br>
NUI Galway<br>
mark.howard@nuigalway.ie

### 3. Aspects of Data Visualization

# Week 8: Plotting Data

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### Brief comment on Time Series Data
* **Times** and **Dates**. although ubiquitous can be fairly complex data to handle.<br>
  Example 1: Daylight Saving Time
  (not all days have 24 hours),<br> 
  Example 2: Leap years (not every year
  has 365 days),<br> 
  and other concerns e.g., the question of how to determine the date of Easter?

* Date and time data comes in a few flavors:

  - **Time stamps** reference particular **moments in time** (e.g., March 17th, 2021 at 9:00am).
  - **Time intervals** and **periods** reference a length of time between a particular beginning and end point; for   example, the year 2020.
  - Periods usually reference a special case of several non-overlapping time intervals of uniform length (e.g., 24 hour-long periods comprising days).
  - **Time deltas** or **durations** reference an **exact length of time** (e.g., a duration of 22.56 seconds).

* `Pandas` contains a fairly extensive set of tools for working with dates, times, and time-indexed data.

##  Example. Covid-19 Cases in Different Counties

* The Government publishes statisitics relating to Covid-19 on its
[website](https://data.gov.ie/) and updates these regulary ...

In [None]:
covid = pd.read_csv("https://opendata-geohive.hub.arcgis.com/datasets/d9be85b30d7748b5b7c09450b8aede63_0.csv", index_col='TimeStamp', parse_dates=True)


In [None]:
covid.tail()

In [None]:
galway = covid[covid['CountyName'] == "Galway"]

In [None]:
galway.head()

In [None]:
galway['ConfirmedCovidCases']

In [None]:
plt.rcParams['figure.figsize'] = [10, 6] #This is just changing default size in Jupyter NB
plt.rcParams['figure.dpi'] = 100 # Ditto
galway['ConfirmedCovidCases'].plot();

In [None]:
donegal = covid[covid.CountyName == "Donegal"]
donegal['ConfirmedCovidCases'].plot();

In [None]:
total = pd.DataFrame({
    'Donegal': donegal['ConfirmedCovidCases'], 
    'Galway': galway['ConfirmedCovidCases']
})

In [None]:
total.plot();

* But what's the daily increase? Calculate the difference between values on consecutive days!

In [None]:
# DataFrame.diff(periods=1, axis=0) 
# Calculates the difference of a Dataframe element compared with another element in the Dataframe
# (default is element in previous row).
daily = total.diff()
daily.tail()

In [None]:
daily.plot();

In [None]:
daily.loc['2020-03']

In [None]:
plt.figure(figsize=(15,5))
plt.plot(daily.loc['2022']);

## Simple Line Plots

* Perhaps the simplest of all plots is the visualization of a single function $y = f(x)$.
* Let's start with a simple sine curve $f(x) = \sin x$, for $0 \leq x \leq 10$, say:

In [None]:
x = np.linspace(0, 10, 1000);plt.plot(x, np.sin(x)); # 1000 points between 0 and 10

## Adjusting the Plot: Line Colors and Styles

* The `plt.plot()` function takes additional arguments that can specify line colors and styles.
* To adjust the **color**, use the `color` keyword, which accepts a color argument in a variety of ways.

In [None]:
plt.plot(x, x + 0, color='blue')        # specify color by name
plt.plot(x, x + 1, color='g')           # short color code (rgbcmyk)
plt.plot(x, x + 2, color='0.75')        # Grayscale between 0 and 1
plt.plot(x, x + 3, color='#FFDD44')     # Hex code (RRGGBB from 00 to FF)
plt.plot(x, x + 4, color=(1.0,0.2,0.3)) # RGB tuple, values 0 to 1
plt.plot(x, x + 5, color='chartreuse'); # all HTML color names supported

* If no color is specified, `Matplotlib` will automatically cycle through a set of default colors for multiple lines.

* The **line style** can be adjusted using the `linestyle` keyword:

In [None]:
plt.plot(x, x + 4, linestyle='-')  # solid
plt.plot(x, x + 5, linestyle='--') # dashed
plt.plot(x, x + 6, linestyle='-.') # dashdot
plt.plot(x, x + 7, linestyle=':');  # dotted

* These ``linestyle`` and ``color`` codes can be combined into a single non-keyword argument:

In [None]:
plt.plot(x, x + 0, '-g')  # solid green
plt.plot(x, x + 1, '--c') # dashed cyan
plt.plot(x, x + 2, '-.k') # dashdot black
plt.plot(x, x + 3, ':r');  # dotted red

* These single-character color codes reflect the standard abbreviations in the RGB (**R**ed/**G**reen/**B**lue) and CMYK (**C**yan/**M**agenta/**Y**ellow/blac**K**) color systems, commonly used for digital color graphics.

In [None]:
?plt.plot

## Adjusting the Plot: Axes Limits

* The most basic way to adjust axis limits is to use the ``plt.xlim()`` and ``plt.ylim()`` methods:

In [None]:
plt.plot(x, np.sin(x))

plt.xlim(-1, 11)
plt.ylim(-1.5, 1.5);

* If you'd like either axis to be displayed in reverse, simply reverse the order of the arguments.
* Here, we plot a parabola "upside-down" with $y$ ranging from $1.2$ down to $-1.2$:

In [None]:
plt.plot(x, 0.1*(x-5)**2-1)

plt.xlim(10, 0)
plt.ylim(1.2, -1.2);

* A useful related method is ``plt.axis()`` (**axis** with an **i**).
* The ``plt.axis()`` method allows you to set the ``x`` and ``y`` limits with a single call, by passing a list which specifies ``[xmin, xmax, ymin, ymax]``:

In [None]:
plt.plot(x, np.sin(x))
plt.axis([-1, 11, -1.5, 1.5]);

* The ``plt.axis()`` method goes beyond this, allowing you to do things like automatically tighten the bounds around the current plot:

In [None]:
plt.plot(x, np.sin(x))
plt.axis('tight');

* It allows even higher-level specifications, such as ensuring an equal aspect ratio so that on your screen, one unit in ``x`` is equal to one unit in ``y``:

In [None]:
plt.plot(x, np.sin(x))
plt.axis('equal');

In [None]:
#plt.axis?

## Titles, Labels, Legends

* An axes can have a **title**, and its $x$- and $y$-axis can have **labels**.
* There are methods that can be used to quickly set the title and axes labels:

In [None]:
plt.plot(x, np.sin(x))
plt.title("y = sin(x)")
plt.xlabel("x")
plt.ylabel("y");

* The position, size, and style of these labels can be adjusted using optional arguments to the functions ...

* When multiple lines are being shown, it can be useful to create a plot **legend** that labels each line type.
* The  `plt.legend()` method creates a legend.
* The label of each line can be specified by using the `label` keyword of the `plot` function:

In [None]:
plt.plot(x, np.sin(x), '-g', label='sin(x)')
plt.plot(x, np.cos(x), ':b', label='cos(x)')
plt.axis('equal')

plt.legend();

* Note how the ``plt.legend()`` function keeps track of the line style and color, and matches these with the correct label.

In [None]:
#plt.legend?

* The third argument in the function call is a character that represents the type of symbol used for the plotting.  
* The full list of available symbols can be seen in the documentation of ``plt.plot``,
* Most of the possibilities are fairly intuitive:

In [None]:
rng = np.random.RandomState()
plt.figure(figsize=(12, 6))
for marker in ['o', '.', ',', 'x', '+', 'v', '^', '<', '>', 's', 'd']:
    plt.plot(rng.rand(5), rng.rand(5), marker,
             label="marker='{0}'".format(marker))
plt.legend()
plt.xlim(0, 1.4);

* These character codes can be used together with line and color codes to plot points + a line connecting them:

* Additional keyword arguments to ``plt.plot`` specify a wide range of properties of the lines and markers:

In [None]:
x=np.linspace(0,10,30);y=np.sin(x);
plt.plot(x, y, '-p', color='gray',#p for pentagon
         markersize=16, linewidth=4,
         markerfacecolor='lightblue',
         markeredgecolor='g',
         markeredgewidth=1)
plt.ylim(-1.2, 1.2);

## Scatter Plots with ``plt.scatter``

In [None]:
plt.scatter(x, y, marker='o');

* The primary difference of ``plt.scatter`` from ``plt.plot`` is that it can be used to create scatter plots where the properties of each individual point (size, face color, edge color, etc.) can be individually controlled or mapped to data.
* In order to better see the overlapping results, we'll also use the ``alpha`` keyword to adjust the transparency level:

In [None]:
rng = np.random.RandomState()
x = rng.randn(100)
y = rng.randn(100)
colors = rng.rand(100)
sizes = 1000 * rng.rand(100)

plt.figure(figsize=(12,6))
plt.scatter(x, y, c=colors, s=sizes, alpha=0.5,
            cmap='viridis')
plt.colorbar();  # show color scale

* Notice that the color argument is automatically mapped to a color scale (shown here by the ``colorbar()`` command).
* Also note that the size argument is given in pixels.
* In this way, the color and size of points can be used to convey information in the visualization, in order to visualize multidimensional data.

* For example, we might use the Iris data from Scikit-Learn, where each sample is one of three types of flowers that has had the size of its petals and sepals carefully measured:

In [None]:
from sklearn.datasets import load_iris
iris = load_iris()
features = iris.data.T
plt.figure(figsize=(12, 6))
plt.scatter(features[0], features[1], alpha=0.5,
            s=100*features[3], c=iris.target, cmap='viridis')#size~petal width,colour~type of Iris
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1]);

* We see that this scatter plot has given us the ability to simultaneously explore four different dimensions of the data:
the $(x, y)$ location of each point corresponds to the sepal length and width, the size of the point is related to the petal width, and the color is related to the particular species of flower.

## Histograms and Binnings

* Bar charts are often used for histograms.
* In a histogram, data are first grouped into bins, then the bins are plotted according to their size.
* A simple histogram can be a great first step in understanding a dataset.

In [None]:
data = np.random.randn(1000)
plt.hist(data);#default is 10 bins

The ``hist()`` function has many options to tune both the calculation and the display; 
here's an example of a more customized histogram:

In [None]:
plt.hist(data, bins=30, 
         density=True, 
         alpha=0.5,
         histtype='stepfilled', 
         color='steelblue',
         edgecolor='blue');

In [None]:
?plt.hist

* This combination of ``histtype='stepfilled'`` with some transparency ``alpha`` can be very useful when comparing histograms of several distributions:

In [None]:
x1 = np.random.normal(0, 0.8, 1000)
x2 = np.random.normal(-2, 1, 1000)
x3 = np.random.normal(3, 2, 1000)

kwargs = dict(histtype='stepfilled', alpha=0.3, density=True, bins=40)
plt.hist(x1, **kwargs)
plt.hist(x2, **kwargs)
plt.hist(x3, **kwargs);

* If you would like to simply compute the histogram (that is, count the number of points in a given bin) and not display it, the ``np.histogram()`` function is available:

In [None]:
counts, bin_edges = np.histogram(data, bins=10)
print(counts)

In [None]:
plt.hist(data);

## References

* `datetime64`, `timedelta64`: [[doc]](https://docs.scipy.org/doc/numpy/reference/arrays.datetime.html)


* the ["Time Series/Date" section](http://pandas.pydata.org/pandas-docs/stable/timeseries.html) of the Pandas online documentation.


### `matplotlib`

* Examples using `matplotlib.pyplot.plot`: [[doc]](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html#examples-using-matplotlib-pyplot-plot)


* `plt.plot`: [[doc]](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html)


* `plt.xlim`, `plt.ylim`: [[doc]](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.xlim.html)


* `plt.legend`: [[doc]](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html)

## Exercises

1. Use suitable functions and ranges to plot a circle of radius $3$ around the centre $(1,1)$.

2. Plot the rational function 
$$
f(x) = \frac{x^2 + x - 2}{x^3 + 6}
$$
and its derivative $f'(x)$ so that all interesting points (zeros, extreme values, inflection points, singularities, ...)
are contained in the plot.

3. Plot $f(x) = x^2 \sin(\pi/x)$ for $x$ in the range $[-0.3, 0.3]$.

In [None]:
from scipy.misc import derivative

# defining the function
def function(x):
    return (x**2+x-2)/(x**3+6)
  
# calculating its derivative
def deriv(x):
    return derivative(function, x)
  
# defininf x-axis intervals
x = np.linspace(-4, 3)
  
# plotting the function
plt.plot(x, function(x), color='purple', label='Function')
  
# plotting its derivative
plt.plot(x, deriv(x), color='green', label='Derivative')
  
# formatting
plt.legend(loc='upper left')
plt.grid(True)