# Introduction

Charts and graphics describe data and allow to reason about the relations within these data.

On many occasions visualisations are the most effective way to explore or summarise data, especially when considering large datasets.

Let's present a few examples that highlight the importance of data visualisation.

## Anscombe's quartet

https://en.wikipedia.org/wiki/Anscombe%27s_quartet

<img src="https://upload.wikimedia.org/wikipedia/commons/e/ec/Anscombe%27s_quartet_3.svg" width="650"/>

## Tide Tables


To predict the rise and fall of the sea waters has been of enormous importance along the history.

https://en.wikipedia.org/wiki/Theory_of_tides

And even in the 19th century tide-predicting mechanical computers were designed and built for this purpose

https://en.wikipedia.org/wiki/Tide-predicting_machine

https://www.youtube.com/watch?v=uFhjGlriDYE

The results of these computations are frequently presented in a table:

<img src="https://upload.wikimedia.org/wikipedia/commons/3/34/Tide_table_01.jpg" width="450"/>

Below you can see a typical table from a web site (https://www.tideschart.com)

![](https://wild.ucm.es/img/tidetable.png)

Anyway, It is not easy to see the tide cycles, the evolution and variations... let's see these same data in a chart:

![](https://wild.ucm.es/img/tidechart.png)

Now with just a look at the chart, you can see the periodic nature of the tide, how the max and min height increases accordingly.

There are also some interesting details in the chart as the day/night periods or the exact moment (red line) in which we are consulting the chart.

## Large dataset

Let's comment on visualisations for large datasets.

Let's download this dataset...

In [None]:
! gdown "1j53EVbJpCCwOSOOq7KP_ZU2bJaDEmA8x"

... and load the file info into a dataframe.

In [None]:
import pandas as pd
pd.set_option('display.max_rows', 8)
pd.set_option('display.max_columns', None)
dfl = pd.read_csv('aps_failure_training_set.csv',
                       na_values = 'na')
dfl

This is a quite large dataset with 171 columns and 60000 rows with a lot of undefined `NaN` values. Let's see how a few lines of code could help to understand the undefined values in the dataset.

With a few lines of code we create an image that shows where are the undefined values in pale yellow, while the black part is for proper values.  

In [None]:
import seaborn as sns
sns.set(rc={"figure.figsize":(20, 10)})
sns.heatmap(dfl[:500].isnull(), cbar=False)


Let's add a few more code, not much really and produce an interactive widget that help to explore the dataset.


In [None]:
from ipywidgets import interact_manual
import matplotlib.pyplot as plt

def plot(min_value, max_value):
    if min_value < max_value:
      print('Preparing the chart, please wait... ', min_value, max_value)
      sns.heatmap(dfl[min_value:max_value].isnull(), cbar=False)
      plt.show()
    else:
      print('min_value should be smaller than max_value')

p = interact_manual(plot, min_value=(0,59500,500), max_value=(500,60000,500))

 I hope that from these simple examples you understand the importance of using visualisations to better understand the data.

<hr>
<hr>
Carlos Gregorio Rodríguez

Universidad Complutense de Madrid

<img src="https://static0.makeuseofimages.com/wordpress/wp-content/uploads/2019/11/CC-BY-NC-License.png" alt="cc by nc" width="200"/>

https://creativecommons.org/licenses/by-nc/4.0/