# Data Visualization Principles

This notebook is all about learning the best principles and practises for effective data visualization.

Topics include:
1. Encoding data using visual cues
2. Know when to include zero
3. Do not distort quantities
4. Order by a meaningful value
5. Show the data
6. Use common axes
7. Consider tranformations
8. Compared visual cues must be adjacent
9. Slope charts
10. Some more guidelines

Note that this notebook is just an RTF; there is no code involved here.

## 1. Encoding Data using Visual Cues

There are several visual cues to encoding data, including:
<li> position </li>
<li> aligned lengths </li>
<li> angles </li>
<li> area </li>
<li> brightness </li>
<li> color hue </li>

Position and lengths are preferred to angles, which in turn are preferred to area.
Brightness and color hue are used only in situations where data has more than two dimensions.

## 2. Know when to Include Zero

### Include zero
Bar plots should begin at 0, instead of 1, because the length of the bars should always be proportional to the quantities being displayed (using 1 would result in small differences that amplify in the plot).

### Do not include zero
When using positions, instead of length, it is not necessary to include zero, to restrict the lengths that only include the plot points.

## 3. Do NOT Distort Quantities

When displaying quantities with circular / spherical shapes, do not make the radius proportional to quantity; instead make the area proportional to quantity.

`ggplot` defualts to using area instead of radius.

Also, avoid using areas when you can use lengths (_i.e.,_ bar plots).

## 4. Order by A Meaningful Value

`ggplot` defaults to ordering values alphabetically; however, this is rarely what we want.

Instead, we would like to order the values by a meaningful value. Like, when we create a bar graph for murder rates across the states, we would like to order the states in a higher-rate-first scheme, instead of ordering them alphabetically.

The `reorder()` function (in the `stats` package) helps achieve this goal.

## 5. Show the data

For visualzations that require showing distributions:
<li> Use scatter plots instead of bar plots to show as many data points as there are </li>
<li> Add jitter to shift the horizontal coordinates by a small value if we don't care about a small shift in the abscissae, and our focus lies on the ordinates </li>
<li> Use alpha blending to avoid having to show points that fall right on top on one another </li>

**Note.** Showing a distribution in these cases in more helpful than showing plots.

## 6. Use Common Axes to Ease Comparisons

Align plots horizontally to see vertical changes, <br>
Align plots vertically to see horizontal changes, <br>
and fix the axes.

When appropriate, show distributions (using histograms, for example) instead of plots.

## 7. Consider Transformations

1. When the data is multiplicative, use log transformations (eg., for population).
2. To better see full changes in odds, use logistic transformations.
3. Square-root transformation is useful to count data

## 8. Compared Visual Cues Must Be Adjacent to Ease Comparisons

When comparing data from two different attributes (say, different years and countries), `ggplot` defaults to putting all of them alphabetically in the plot (so, all the countries in 1970 would appear before all the countries in 2010, for example).

What we really want to do is place the countries of two different years side by side, to ease comparisons.

Another thing we can do is use a color for all countries in 1970, and another color for all countries in 2010, to further ease comparisons when countries are placed adjacent to each other.

## 9. Slope Charts

Instead of using scatter plots, use slope charts when you are comparing data
<li> of same type, </li>
<li> having small differences, </li>
<li> for relatively small number of comparisons. </li>

Here, the plots are based on _angle_ (since slopes are involved), but we also have positions of the variables in our example.

## 10. Some More Guidelines

Whenever needed, remember to:
1. Encode a third variable
2. Avoid pseudo and gratuitous 3D plots
3. Avoid too many significant digits in tables

The notebook ends here