# Introduction to Data Visualisation with Python

### SWD 7

#### Maeve Murphy Quinlan

jupyter nbconvert notebook_name.ipynb --to slides --post serve --no-input --no-prompt

## Research Computing Team and Service

### Here to support research(ers)
- Provide training
- Support users of Grid and Cloud Computing platforms
- Provide consultancy
    - To develop project proposals
    - To help recruit people with specialist skills
    - Working directly on research projects

For details please see our Website: https://arc.leeds.ac.uk
Contact us via the IT Service Desk: https://bit.ly/arc-help

# Course objectives

#### This course should help you to:

- Become familiar with best practice with regards to scientific data visualisation
- Build a toolkit of resources to help you create objective, informative plots
- Use Python and Python libraries such as Matplotlib and Seaborn to create aesthetically pleasing graphics
- Be aware of some common issues when it comes to data visualisation

# What is data visualisation?


- ### A way of graphically representing data


- ### Can include plots, graphs, charts, schematics, maps, infographics

- ### Commonly found in research papers, publications, policy documents, on conference posters, in talks, and on news reports...

# Data Visualisation

- ### Makes it easier to identify outliers, relationships, patterns and trends in data
- ### Reduces complex and large datasets to a more digestible and understandable format
- ### Saves time and allows for quicker scanning and understanding of data
- ### Can illustrate a story and present results in a logical, clear way

## Good Data Visualisation

- ### Objective and unbiased representation of the data

- ### Should deliver meaningful results and illustrate something useful about the data

- ### All information required to understand it should be available in the legend and/or caption (but may require domain specific knowledge)

- ### Can also be used to explore your data - figures and plots for you and your collaborators, not for publication!

![](figs/Nightingale.jpg)

[The Causes of Mortality in the Crimean War (Florence Nightingale, 1857)](https://longitude.ft.com/10-data-visualisations-that-changed-the-world-of-information-design/)

![](figs/Priestley.jpg)

[A New Chart of History (Joseph Priestley, 1769)](https://longitude.ft.com/10-data-visualisations-that-changed-the-world-of-information-design/)

<img src="figs/stanfords-geological-full.jpg" style="width: 50%;" alt="Stanford's 1915 Geological Map of Ireland in colour"/>

Stanford's 1915 Geological Map of Ireland in colour

<img src="figs/stanfords-geological-zoom.jpg" style="width: 50%;" alt="Stanford's 1915 Geological Map of Ireland in colour"/>

Zoom of West Cork Region

<img src="figs/geo_map_key.jpg" style="width: 35%; float: left;" alt="Stanford's 1915 Geological Map of Ireland in colour"/>


- ### Legends provide context for the data
- ### Legend order can imply significance:
    - ### Geological ages

## But we do now have a better understanding of how it works...



## What's needed for **Good Data Visualisation**?


1. ### Evidence-based best practises for objective data representation

2. ### Technical skills in tools to build graphics

3. ### Domain-specific knowlege of underlying dataset

4. ### Creativity and subjective taste

# Why did we make this course?

1. #### Evidence-based best practises for objective data representation
2. #### Technical skills in tools to build graphics
3. #### Domain-specific knowlege of underlying dataset
4. #### Creativity and subjective taste

### Researchers frequently don't receive guidance on points 1 and 2 above, despite data visualisation being a cornerstone of good research

#### We hope we can provide a jumping off point for you to continue your learning in these areas and apply some best-practises to your data visualisation starting today!

# Why Python?

 - ### A flexible, human-readable programming language with a simple syntax
 - ### Commonly used for:
     - ### Data analytics
     - ### Scientific programming
     - ### Data visualisation
 - ### It has a huge collection of specialised libraries to solve specific research problems
 - ### It is Open Source

# Why Python?

 - ### Allows us to create reproducible graphics and share the source code
  - ### Enables us to build efficient workflows and research pipelines, with data cleaning, statistical analysis and plotting in the same language
   - ### Many different library for various research domain-specific plots, data types, and visualisations
   
## When to *not*  use Python (or not *just* Python)

- Not ideal for building more infographic or illustrative charts
- While geospatial libraries exist, specialist programs such as QGIS are better optimised for large-scale mapmaking (in conjunction with Python tools)

# This Tutorial

## 1. Theory: evidence-based good practise

## 2. Practical: Technical skills in tools to build graphics

For you to pursue outside this course:

### 3. Domain-specific knowledge

### 4. Personal taste

## <p style="text-align:center;">bit.ly/vis-poll</p>

<img src="figs/qr_code.png" style="width: 50%; text-align:center;" alt="QR code directing to the poll url"/>


# 1. Evidence based good practise

![Paper titles](figs/headlines.jpg)

##  Good practise

### Lots of scientific research distilled into a number of "Top Ten"-style lists...

*Kelleher, Christa, and Thorsten Wagener. 2011. “Ten Guidelines for Effective Data Visualization in Scientific Publications.” Environmental Modelling & Software: With Environment Data News 26 (6): 822–27.*

*Midway, Stephen R. 2020. “Principles of Effective Data Visualization.” Patterns (New York, N.Y.) 1 (9): 100141.*

*Rougier, Nicolas P., Michael Droettboom, and Philip E. Bourne. 2014. “Ten Simple Rules for Better Figures.” PLoS Computational Biology 10 (9): e1003833.*

### We are going to pick and choose useful pieces from each of these


# 5 key ideas when building plots

1. ### Who is looking at your plot and why?

2. ### What's your central message?

3. ### Encode your data intentionally

4. ### Compose figures sensibly

5. ### Simplify, remove, clarify

# Who is looking at your plot and why?

### Who?

- ### General public, non-research audiences

- ### Domain specific research audiences

- ### Your collaborators

- ### You!

# Why?

.column {
  float: left;
  width: 50%;
}

/* Clear floats after the columns */
.row:after {
  content: "";
  display: table;
  clear: both;
}

<div class="row">
  <div class="column">Text here</div>
  <div class="column">Text here</div>
</div>

