![ContributION - An introduction to Python and Data Science](contribution.png)

# Charting introduction
In this chapter we're going to play around with various types of charts to see how they react when we change their parameters.

Start by doing the usual setup for working with **matplotlib**.

The following example are all taken from https://matplotlib.org/gallery.html.  We'll go over them and learn how they work.  You can then go back and look at other examples yourself in the future when the need arises.

In [None]:
# Set up matplotlib and use a nicer set of plot parameters
%config InlineBackend.rc = {}
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np

## Bar charts

See https://matplotlib.org/examples/lines_bars_and_markers/barh_demo.html

Bar charts are a type of chart that allows us to compare different things next to each other.

In the example below, we compare different people by seeing how fast they want to go.

Let's start by getting some sample data.  This is random, but typically you'll have better data when doing real charting.  For 5 people, we get the performance (how fast they want to go today) and a error (the amount that it might change by).

In [None]:
people = ('Tom', 'Dick', 'Harry', 'Slim', 'Jim')
performance = 3 + 10 * np.random.rand(len(people))
error = np.random.rand(len(people))

We also need to determine where we're going to place the bar.  These will be horizontal bars, so we need to know how high up the y-axis the position of the bar should be.  In our case we just want them on the same scale, so a range of numbers will do.

In [None]:
y_pos = np.arange(len(people))
x_pos = np.arange(len(people))

Next we create a single subplot (a chart area where we'll draw the plot).  The **ax** here is what we will be changing to know what to draw.

Once we have the **ax**is, then we can tell it to draw a horizontal bar chart (using **barh**).

We tell it where to set the tick marks and what labels to put there.

We tell it to invert the y-axis (so it goes top down rather than bottom up.

We set the x-axis label (adding the word *Performance*.

We set the title to the whole chart area.

Lastly we show the whole plot.

#### Note that in Jupyter, if you don't group the first and last lines, then the chart is not drawn correctly, as the first line already tries to draw the chart.

In [None]:
fig, ax = plt.subplots()

ax.barh(y_pos, performance, xerr=error, align='center', color='green', ecolor='black')
ax.set_yticks(y_pos)
ax.set_yticklabels(people)
ax.invert_yaxis()  # labels read top-to-bottom
ax.set_xlabel('Performance')
ax.set_title('How fast do you want to go today?')

plt.show()

#### Can you change the example above, to use the same data, but draw a vertical bar chart (columns)?

## Stacked bar chart

See https://matplotlib.org/examples/pylab_examples/bar_stacked.html

A stacked bar chart allows us to also compare different things next to each other, but we can break it down in some way.

In the example below, we compare the scopes of different groups, but we break it down into the scores for males and females.

Let's start by getting the men and woman mean scores for each of the (5) groups along with the standard deviation (as an error bar).

In [None]:
groups = ('G1', 'G2', 'G3', 'G4', 'G5')
menMeans = [20, 35, 30, 35, 27]
womenMeans = [25, 32, 34, 20, 25]
menStd = (2, 3, 4, 1, 2)
womenStd = (3, 5, 2, 3, 3)

Next, get some fixed data like the number of groups, the width of the bars and the position of the bars (on the x-axis).

In [None]:
N = len(groups)
ind = np.arange(N)    # the x locations for the groups
width = 0.35       # the width of the bars: can also be len(x) sequence

In the example below, we don't go through the effort of getting a subplot, but rather just use a higher level pyplot method instead.

As with subplot(), we again group the creation of the chart with the show() function call so that Jupyter doesn't draw the chart too soon.

We start by adding 2 bar charts, p1 and p2.  For p2, we however give it a **bottom** parameter (an numpy array), telling it how high up it should start drawing the box.  We basically put the Woman boxes on top of menMeans.

Next we add labels, titles, ticks.

We also add a legend, telling it what to name them and which bar char to use to draw the legend.

Lastly we show the chart.

In [None]:
p1 = plt.bar(ind, menMeans, width, color='#d62728', yerr=menStd)
p2 = plt.bar(ind, womenMeans, width, bottom=menMeans, yerr=womenStd)

plt.ylabel('Scores')
plt.title('Scores by group and gender')
plt.xticks(ind, ('G1', 'G2', 'G3', 'G4', 'G5'))
plt.yticks(np.arange(0, 81, 10))
plt.legend((p1[0], p2[0]), ('Men', 'Women'))

plt.show()

#### Can you change the example above, and show the men and figures next to each other, keeping the groups together.

#### Next try to change it into groups showing 5 groups of men on the left and 5 groups of women on the right.

## Pie charts

See https://matplotlib.org/examples/pie_and_polar_charts/pie_demo_features.html

A Pie chart is very similar to a bar chart, but doesn't show values side by side, but rather as slices of a pie.  This makes it easy to see how much of the total a value takes up.

In the example below, we can see Frogs, Hogs, Docs and Logs compare to each other.

As usual, we ensure we have some data to work with.  The **explode** here says by how much that one slice should be pulled out of the pie.

In [None]:
# Pie chart, where the slices will be ordered and plotted counter-clockwise:
labels = 'Frogs', 'Hogs', 'Dogs', 'Logs'
sizes = [15, 30, 45, 110]
explode = (0, 0.1, 0, 0)  # only "explode" the 2nd slice (i.e. 'Hogs')

Next we again use subplots to create the axis to draw on.  Again it is in the same cell as the show() function call.

We use the pie() function to tell it what to draw, include where to start drawing (how far along a clock face, and if we should show shadows).

In [None]:
fig1, ax1 = plt.subplots()

ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%', shadow=True, startangle=70)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

plt.show()

## Scatter chart

See https://matplotlib.org/examples/shapes_and_collections/scatter_demo.html

A Scatter chart is used to show how different elements compare to each other on multiple dimentions.  You might simply have a dot somewhere on the two axises.  You can give different dots different sizes to indicate something else.  You can also use color to distinguish the dots even further.

The example below is random in all aspects.

As usual, let's get some data, let's say 100 dots.

In [None]:
N = 100
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
area = np.pi * (15 * np.random.rand(N))**2  # 0 to 15 point radii

Next we call pyplot's scatter() function passing in the different data elements we know about.

In [None]:
plt.scatter(x, y, s=area, c=colors, alpha=1, edgecolors='green')
plt.show()

## Histograms

See https://matplotlib.org/examples/statistics/histogram_demo_features.html (at least part of it)

A histogram shows what the distribution of values are on a scale (indicating how many of those values fall within a certain interval).

Let's get some random data, given a mean and standard deviation.  The **x** numpy array contains 437 values.

In [None]:
mu = 100  # mean of distribution
sigma = 5  # standard deviation of distribution
x = mu + sigma * np.random.randn(437)

We want to show how those 437 values are distributed.  This means we have to group them together.  Pyplot calls this group a **bin** and allows you to say how many such **bins** you want.  Let's say we want 20 bins.

In [None]:
num_bins = 20

Lastly we have to create the actual chart.  Start by making a subplot.

Use the hist() function to create the actual histogram.  Note that this function also gives back some variables.

Lastly we adjust the layout and show the chart.

In [None]:
fig, ax = plt.subplots()

n, bins, patches = ax.hist(x, num_bins, normed=1)

fig.tight_layout()
plt.show()

The values returned by the *hist* method represent the data of the histogram.

*bins* indicates the x-axis value of where each bin is.  The first bin is starts at bins[0] and ends at bins[1].  The second bin starts at bins[1] and ends at bins[2].  Etc. 

In [None]:
bins

Similarly, *n* indicates the y-axis value (height) of each bin.

In [None]:
n

## Line charts

See https://matplotlib.org/examples/pylab_examples/load_converter.html (it does a bit more, but let's focus on the line drawing).

A line chart shows the evolution of something over some measurement, typically time.  It is quite comparable to a bar chart.  It is however easier to see different lines compared to each other.

The example below shows the closing stock prices of a stock over time.  X-axis contains dates.  Y-axis contains the price of the stock (how much that stock is worth).

In this example we use cbook to read some data from a file (that comes with Jupyter be default).  Let's get the data first.

In [None]:
import matplotlib.cbook as cbook
datafile = cbook.get_sample_data('msft.csv', asfileobj=False)
print('loading', datafile)

Next we use numpy's loadtxt function to convert the data we just read into arrays.

In [None]:
import matplotlib.dates as mdates
from matplotlib.dates import bytespdate2num
dates, closes = np.loadtxt(datafile, delimiter=',',
                           converters={0: bytespdate2num('%d-%b-%y')},
                           skiprows=1, usecols=(0, 4), unpack=True)

Next we create the figure itself (with the data we already obtained).

In this example we're create a figure first and then adding a subplot to it (different to how we did it before).

Next we tell it to plot the dates, giving the dates and the closing figures we read from the file.  The minus sign tells it to draw a line.  Other values are also possible, e.g. a plus sign, or the ^ sign.

Lastly we tell the figure to format the dates and then we draw the chart.

In [None]:
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot_date(dates, closes, '-')
fig.autofmt_xdate()
plt.show()

#### Change the chart above and add the high and low values as well (as a line chart).  

You might have to read up on numpy's loadtxt function.  

Also remember how we added a second bar chart for our stacked chart example.