<div style="background:#E9FFF6; color:#440404; padding:8px; border-radius: 4px; text-align: center; font-weight: 500;">IFN619 - Data Analytics for Strategic Decision Makers (2023_sem1)</div>

# IFN619 :: C2-Enhanced Visualisations

## How do we interpret visualisations?

Data visualisations from a human-data interaction perspective do not contain any meaning attached to them until a user starts interpreting the visualisation. Interpretation is a subjective process that is impacted by the user's previous knowledge, culture, environment and context. Understanding the visualisations components provide a programatic way to develop visualisations but there is still the human side to be considered.

Just like language can be interpreted in different ways depending on the context, visualisations follow the same pattern. For instance, the word "cup" is going to have different meanings depending on the context. In a cooking context a cup is a measurement, in a sports context a cup is trophy and in a meal context a cup is a liquid container. All the meanings are correct but the context of use and the user's previous knowledge is what is going to define which meaning to be used.

<img src="./graphics/information_processing.jpg">

## How visualisations can be enhanced for better interpretation?

The most important way to enhance a visualisation is to understand the effect of different aesthetics in human interpretation. The two most important aesthetics are colour and position and both have different effects on human interpretation. Additionally, interactions can be handled effectively to provide further information. The most common interaction is hovering over data points to obtain information on demand.

---

## The role of colour in interpretation

The aesthetic colour is can evoke different feelings and emotions in the interpretation of a visualisation. For instance, the colour red have different meanings in different contexts. Red can mean love, passion or danger. Thus, selecting a colour can have a great impact on the interpretation of a visualisation.

### Colour theory

Colour theory is the collection of rules and guidelines which designers use to communicate with users through appealing colour schemes in visual interfaces. To pick the best colours every time, designers use a colour wheel and refer to extensive collected knowledge about human optical ability, psychology, culture and more.

The colour wheel is an useful tool to design visualisations.

<img src="./graphics/colour-wheel.jpg" style="width: 400px">

#### Colour components
Different shades or tints, saturations and hues are all possible while still being within the same colour part of the colour wheel.
- *Hue* - This is the position on the colour wheel, and represents the base colour itself
- *Saturation* - This is a representation of how saturated (or rich) a colour is or the amount of white light mixed with the hue. A 0% saturation becomes a shade of gray while 100% is the pure colour with absence of white light
- *Brightness/Lightness* - The amount of shade being mixed with the hue. A 0% of brightness will be black

<img src="./graphics/colour-components.png" style="width: 400px">

#### Colour schemes
The scheme is the way the colours are going to be combined
- *Sequential schemes* - Uses saturation to differentiate different colours

<img src="./graphics/colour-sequential.jpg" style="width: 400px" >

- *Divergent schemes* - Uses two sequential schemes sharing a common colour (usually white) in the middle and diverging using brightness towards each end with different hues

<img src="./graphics/colour-divergent.jpg" style="width: 400px" >

- *Spectral schemes* - Uses a large segment of hues keeping the saturation and brightness equal

<img src="./graphics/colour-spectral.jpg" style="width: 400px" >

#### Colour temperature
Colours can be used to convey emotive content as well as assist with the look and feel of your visualisation. We’re talking about moving people now, evoking passions and feelings in our readers. It’s worth noting at this point that people’s culture, gender, experiences, etc. will also affect the way that colours resonate with them
- *Warm colours* – These are colours located on the half of the colour wheel that includes yellow, orange, and red. These colours are said to reflect feelings such as passion, power, happiness, and energy
- *Cool colours* – These are colours located on the other side of the colour wheel, including green, blue, and purple. Cool colours are said to reflect calmness, meditation, and soothing impressions
- *Neutral Colours* – These are not said to reflect any particular emotions. These colours include gray, brown, white, and black

#### Colour emotions
Colour can play a vital role in how people feel when they see an image since certain colours tend to be associated with certain emotions. There are many complex reasons why a colour creates a psychological reaction in a viewer, and it depends on context, societal influences and interactions with other colours rather than the colour’s inherent properties alone.
- *Red* - excitement, aggression, danger, romance
- *Yellow* - wamth, friendliness, warning
- *Green* - nature, sickliness, envy
- *Blue* - relaxation, coldness, grief
- *White* - cleanliness, innocence, emptiness
- *Black* - oppresive, calm, powerfull

#### Let's try to see the effect of the colour in different visualisations

We are going to use the wallmart dataset from previous weeks

In [None]:
import pandas as pd
import plotly.express as px # Data visualisation library
import json
from urllib.request import urlopen

In [None]:
# Load the data from the CSV file
walmart = pd.read_csv(???)
walmart

In [None]:
# Check the dataframe data types
walmart.dtypes

In [None]:
# Transform the Order Date to a DateTime format
walmart["Order Date"] = pd.to_datetime(walmart["Order Date"], format=???)
walmart["Ship Date"] = pd.to_datetime(walmart["Ship Date"], format=???)
walmart.dtypes

In [None]:
# Plot a scatter plot using the Walmart dataframe
walmart_scatter = px.scatter(walmart, 
    x="Order Date", 
    y="Profit", 
    color="Profit", 
    color_continuous_midpoint=0, # Specify the value that is going to be the middle of the colour scale
    color_continuous_scale=["green", "white", "red"], # Specify the colours in the scale
    title="Wallmart orders profit (2011-2015)")
walmart_scatter.update_layout(
    plot_bgcolor="black" # Specify the backkground colour of the chart
)
walmart_scatter.show()

The previous visualisation has the colours inverted, green for negative numbers while red for positive number. What do you think is going to be the effect of this change?

In [None]:
# Plot a scatter plot using the Walmart dataframe
walmart_scatter = px.scatter(walmart, 
    x="Order Date", 
    y="Profit", 
    color="Profit", 
    color_continuous_midpoint=0, 
    color_continuous_scale=["red", "white", "green"])
walmart_scatter.update_layout(
    plot_bgcolor="black"
)
walmart_scatter.show()

This visualisation is exactly the same but now the colours represent what we are used to see, red for negative colours and green for positive

What if we use a different colour scheme to represent the data, probably a sequential scheme

In [None]:
# Plot a scatter plot using the Walmart dataframe
walmart_scatter = px.scatter(walmart, 
    x="Order Date", 
    y="Profit", 
    color="Profit", 
    color_continuous_midpoint=0, 
    color_continuous_scale=["white", "green"])
walmart_scatter.update_layout(
    plot_bgcolor="black"
)
walmart_scatter.show()

Does this convey a different meaning compared to the divergent colour scheme?

In [None]:
# Plot a scatter plot using the Walmart dataframe
walmart_scatter = px.scatter(walmart, 
    x="Order Date", 
    y="Profit", 
    color="Profit", 
    color_continuous_midpoint=0, 
    color_continuous_scale=["white", "red"])
walmart_scatter.update_layout(
    plot_bgcolor="black"
)
walmart_scatter.show()

Changing the colour makes a difference?

In [None]:
# Plot a scatter plot using the Walmart dataframe
walmart_scatter = px.scatter(walmart, 
    x="Order Date", 
    y="Profit", 
    color="Profit", 
    color_continuous_midpoint=0, 
    color_continuous_scale=["white", "blue"])
walmart_scatter.update_layout(
    plot_bgcolor="black"
)
walmart_scatter.show()

Which visualisation conveys the message of profit/loss better? Why?

---

## The role of position in interpretation

The aesthetic position is one of the most powerful to use when designing visualisations. How position can be convey can have a big impact on the interpretation of a visualisation. What is shown first in the chart and the position of the axes can lead to cognitive biases that need to be considered.

### Cognitive bias

A cognitive bias is a subconscious error in thinking that leads you to misinterpret information from the world around you and affects the rationality and accuracy of decisions and judgments.

Biases are unconscious and automatic processes designed to make decision-making quicker and more efficient. Cognitive bias is often a result of your brain’s attempt to simplify information processing — we receive roughly 11 million bits of information per second. Still, we can only process about 40 bits of information per second. Therefore, we often rely on mental shortcuts (called heuristics) to help make sense of the world with relative speed. As such, these errors tend to arise from problems related to thinking: memory, attention, and other mental mistakes.

There are more than 200 cognitive biases identified (see the full list in the [Cognitive Bias Codex](https://commons.wikimedia.org/wiki/File:Cognitive_bias_codex_en.svg)) but the following are the most important in terms of visualisations:

#### 1. Inatentional blindness

This occurs when a person fails to notice a stimulus that is in plain sight because their attention is directed elsewhere.

The following factors are some reasons of the occurrance of this cognitive bias:
- Certain sensory stimuli (such as bright colours) and cognitive stimuli (such as something familiar) are more likely to be processed, and so stimuli that don’t fit into one of these two categories might be missed.
-  When we focus a lot of our brain’s mental energy on one stimulus, we are using up our cognitive resources and won’t be able to process another stimulus simultaneously.

To see this bias in action we are going to use the life expectancy at birth dataset from [Gapminder](https://www.gapminder.org/data/documentation/gd004/).

In [None]:
gap = px.data.gapminder()
gap

In [None]:
gap_new = gap[gap["year"] == 2007]
gap_new

In [None]:
gap_new_fig = px.bar(gap_new[(gap_new["country"] == "Japan") | (gap_new["country"] == "Korea, Rep.")], 
    x="country", 
    y="lifeExp", 
    title="Japan has a higher life expectancy than Korea",
    labels={
        "lifeExp": "Life expectancy (Years)",
        "country": "Countries"
    })
gap_new_fig.update_layout(
    title_font_size=25,
    title_x=0.5,
)
gap_new_fig.update_xaxes(
    title_font_size=15,
    tickfont_size=12
)
gap_new_fig.update_yaxes(
    title_font_size=15,
    tickfont_size=10,
    dtick=10,
    range = [75, 85]
)
gap_new_fig.show()

The previous visualisation draws that attention of the reader to the bars and gives the idea that Japan has double the life expectancy that Korea. In fact, the difference is only 4 years (< 5%). However, given the size of the bars and the title, the reader's attention goes directly to the bars. Additionally, other configurations make the comparison unfair such as the minimal number of ticks in the Y axis as well as the difference in font size. Consequently, the reader's first impresion is that Japan is far way better than Korea.

A fairer comparison would be the following:

In [None]:
gap_new_fig = px.bar(gap_new[(gap_new["country"] == "Japan") | (gap_new["country"] == "Korea, Rep.")], 
    x="country", 
    y="lifeExp", 
    title="Life expectancy in Japan and Korea",
    labels={
        "lifeExp": "Life expectancy (Years)",
        "country": "Countries"
    },
    text_auto=True) # Display the value of the bars
gap_new_fig.update_layout(
    title_font_size=25,
    title_x=0.5,
)
gap_new_fig.update_traces(
    textfont_size=15
)
gap_new_fig.update_xaxes(
    title_font_size=15,
    tickfont_size=10
)
gap_new_fig.update_yaxes(
    title_font_size=15,
    tickfont_size=10,
    dtick=10,
)
gap_new_fig.show()

Setting the range of the Y axis to start in the position 0 gives an accurate way to visually compare both countries. Additionally, adding the numbers to the bars gives the reader the accurate values to give the specific number to each country. The bars in this example look very similar supporting the argument that the difference is small (< 5%).

#### 2. Anchoring bias

It occurs when we rely too heavily on either pre-existing information or the first piece of information (the anchor) when making a decision.

The following factors are some reasons of the occurrance of this cognitive bias:
- Once an anchor is established, people insufficiently adjust away from it to arrive at their final answer, and so their final guess or decision is closer to the anchor than it otherwise would have been.
- An anchor changes someone’s attitudes to be more favorable to the anchor, which then biases future answers to have similar characteristics as the initial anchor.
- When people experience a greater cognitive load (the amount of information the working memory can hold at any given time), they are more susceptible to the effects of anchoring.

In [None]:
gap_americas = gap_new[gap_new["continent"] == "Americas"]
gap_americas

In [None]:
gap_ab_fig = px.bar(gap_americas[(gap_americas["country"] == "Canada") | (gap_americas["country"] == "Honduras")].sort_values(by="lifeExp", ascending=False), 
    x="country", 
    y="lifeExp",
    title="Life expectancy in Canada and Honduras",
    labels={
        "lifeExp": "Life expectancy (Years)",
        "country": "Countries"
    })
gap_ab_fig.update_layout(
    title_font_size=25,
    title_x=0.5,
)
gap_ab_fig.show()

The previous visualisation provides an anchor with Canada because is the first bar (from left to right) and the biggest. Thus, it is likely to be the first element the reader is going to view. This bar is going to be used for comparison purposes and likely to draw a conclusion that Honduras's life expectancy is low. However, in general terms Honduras's life expectancy is above the average of the dataset (avg 67). Without the context to have a proper comparison (how many years are a low life expectancy) the reader is going to anchor their interpretation to what it is in the visualistion.

A better way to describe this dataset would be the following:

In [None]:
gap_ab_fig = px.bar(gap_americas.sort_values(by="lifeExp", ascending=False), 
    x="country", 
    y="lifeExp",
    title="Life expectancy in the Americas",
    labels={
        "lifeExp": "Life expectancy (Years)",
        "country": "Countries"
    })
gap_ab_fig.update_layout(
    title_font_size=25,
    title_x=0.5,
)
gap_ab_fig["data"][0]["marker"]["color"] = ["red" if (c == "Canada") | (c == "Honduras") else "blue" for c in gap_ab_fig["data"][0]["x"]] # Creates a condition for the bar's colour depending on the country
gap_ab_fig.show()

Or even better a comparison with the world

In [None]:
gap_ab_fig = px.bar(gap_new.sort_values(by="lifeExp", ascending=False), 
    x="country", 
    y="lifeExp",
    title="Life expectancy in the World",
    labels={
        "lifeExp": "Life expectancy (Years)",
        "country": "Countries"
    })
gap_ab_fig.update_layout(
    title_font_size=25,
    title_x=0.5,
)
gap_ab_fig["data"][0]["marker"]["color"] = ["red" if (c == "Canada") | (c == "Honduras") else "blue" for c in gap_ab_fig["data"][0]["x"]]
gap_ab_fig.show()

#### 3. Confirmation bias

Confirmation bias refers to the tendency to interpret new information as confirmation of your preexisting beliefs and opinions.

The following factors are some reasons of the occurrance of this cognitive bias:
- Certain desired conclusions (ones that support our beliefs) are more likely to be processed by the brain and labeled as true.
- Our minds choose to reinforce our preexisting ideas because being right helps preserve our sense of self-esteem, which is important for feeling secure in the world and maintaining positive relationships.

In [None]:
gap_oc = gap[gap["continent"] == "Oceania"]
gap_oc

In [None]:
gap_oc_fig = px.line(gap_oc, 
    x="year", 
    y="lifeExp", 
    color="country",
    title="Australia has the highest Life expectancy in Oceania")
gap_oc_fig.show()

In [None]:
gap_aus = gap_new[(gap_new["continent"] != "Europe") & (gap_new["continent"] != "Asia")]
gap_aus

In [None]:
gap_aus_fig = px.bar(gap_aus.sort_values(by="lifeExp", ascending=False), 
    x="country", 
    y="lifeExp", 
    color="continent",
    title="Australia has the highest Life expectancy in Oceania, the Americas and Africa")
gap_aus_fig.show()

The previous visualisations highlight the good position that Australia has in terms of life expectancy. Most people that belief that Australia is one of the best countries in the world to live would be happy to see this comparison. However, we are selecting the continents where Australia is the highest in terms of life expectancy. Leaving out Europe and Asia, Australia becomes the highest life expectancy country.

A better comparison would be:

In [None]:
gap_ab_fig = px.bar(gap_new.sort_values(by="lifeExp", ascending=False), 
    x="country", 
    y="lifeExp",
    title="Life expectancy in the World",
    color="continent",
    labels={
        "lifeExp": "Life expectancy (Years)",
        "country": "Countries"
    })
gap_ab_fig.update_layout(
    title_font_size=25,
    title_x=0.5,
)
gap_ab_fig.show()

## Enhance visualisations with interactions

Several kind of interactions have been included in visualisations recently to make data exploration easier. The most common interaction is called information on demand or hover. Other type on interactions include zooming, dragging, drilldown, filter and search. 

Maps are without doubt one of the visualisation that have benefited the most from interactions. Zooming, hovering, filtering and searching are mostly expected to be present in maps due to the highly use of tools such as Google Maps.

### Working with maps

We are going to use a dataset about the airport traffic in Australia. We have the Top 9 airports and the number of aircarfts (domestic and international) that have been throught those airports in 2022. Additionally, we have the latitude and longitude of the airports.

In [None]:
# Load the dataset
air = pd.read_csv(???)
air["Total"] = ??? + ???
air

This dataset provides the opportunity to plot a map given the latitude and longitude coordinates. 

In [None]:
air_map = px.scatter_mapbox(air, # Create a bubbles on top of a map
    lat="Latitude", # Specify the latitude for the bubbles
    lon="Longitude") # Specify the longitude for the bubbles
air_map.update_layout(mapbox_style="stamen-terrain", # Specify the kind of background map
    margin={"r":0,"t":0,"l":0,"b":0}) # Sets the margins for the map
air_map.show()

Given the interactivity provided the maps it is extremely important to guide the user to the correct place to start with. The map must be rendered pointing to Australia as the dataset contains information only about Australia and use the correct zoom. Otherwise, the information can be overlooked causing inattentional blindness.

In [None]:
air_map = px.scatter_mapbox(air, 
    lat="Latitude", 
    lon="Longitude"
)
air_map.update_layout(mapbox_style="stamen-terrain", 
    mapbox_center_lat = ???, # Specifies the starting latitude for first render 
    mapbox_center_lon = ???,  # Specifies the starting longitude for first render 
    mapbox_zoom = ???, # Specifies the starting zoom for first render 
    margin={"r":0,"t":0,"l":0,"b":0})
air_map.show()

Once the data has been correctly positioned and showed. It is time to use colour to depict the aircrafts that have been through each airport in the map. It is important to remember what is the purpose of the visualisation to select the appropriate colour.

In [None]:
air_map = px.scatter_mapbox(air, 
    lat="Latitude", 
    lon="Longitude", 
    color="Total", # Specifies the colour to be mapped to the international aircrafts
    title="Total aircrafts through Australian airports in 2022",
    color_continuous_scale = ??? # Sets the colour scale to be used
)
air_map.update_layout(mapbox_style="stamen-terrain", 
    mapbox_center_lat = ???, 
    mapbox_center_lon = ???, 
    mapbox_zoom = ???,
    margin={"r":0,"t":0,"l":0,"b":0})
air_map.show()

The points in the map are quite difficult to see. Thus, it is a good idea to make the dot bigger to provide a clearer visualisation.

In [None]:
air_map = px.scatter_mapbox(air, 
    lat="Latitude", 
    lon="Longitude", 
    color="Total", # Specifies the colour to be mapped to the international aircrafts
    title="Total aircrafts through Australian airports in 2022",
    color_continuous_scale = ??? # Sets the colour scale to be used
)
air_map.update_layout(mapbox_style="stamen-terrain", 
    mapbox_center_lat = ???, 
    mapbox_center_lon = ???, 
    mapbox_zoom = ???,
    margin={"r":0,"t":0,"l":0,"b":0})
air_map.update_traces(marker_size = ???) # Specified the size of the marker in pixels
air_map.show()

The information in demand needs to include the name of the airport as the coordinates are not really meaningful.Additionally, it could include more information such as domestic and international aircarfts as well.

In [None]:
air_map = px.scatter_mapbox(air, 
    lat="Latitude", 
    lon="Longitude", 
    color="Total", # Specifies the colour to be mapped to the international aircrafts
    title="Total aircrafts through Australian airports in 2022",
    color_continuous_scale = ???, # Sets the colour scale to be used
    hover_name = ???, # Includes the text to be used as the tittle when hovered 
    hover_data = ???, # Includes additional data to be displayed when hovered
)
air_map.update_layout(mapbox_style="stamen-terrain", 
    mapbox_center_lat = ???, 
    mapbox_center_lon = ???, 
    mapbox_zoom = ???,
    margin={"r":0,"t":0,"l":0,"b":0})
air_map.update_traces(marker_size = ???) # Specified the size of the marker in pixels
air_map.show()

We can potentially use other aesthetics such as size to depict an additional information such as domestic or international flights.

In [None]:
air_map_label = px.scatter_mapbox(air, 
    lat="Latitude", 
    lon="Longitude", 
    hover_name = "Airport", # Includes the name of the airport when hovered 
    hover_data = ["Domestic", "International"], # Includes additional data to be displayed when hovered
    size = "Domestic", # Specifies the size to be mapped to the domestic aircrafts
    color = "Total", # Specifies the colour to be mapped to the international aircrafts
    title = "Total aircrafts through Australian airports in 2022",
    color_continuous_scale = ["white", "green"], # Sets the colour scale to be used
)
air_map_label.update_layout(mapbox_style="stamen-terrain", 
    mapbox_center_lat = -29, 
    mapbox_center_lon = 140, 
    mapbox_zoom = 3,
    margin={"r":0,"t":0,"l":0,"b":0})
air_map_label.show()

The most important consideration when working with maps is that these are highly interactive. Hover and zoom are interactions that are usually expected when designing maps. Thus, in addition to all what we have learnt before we need to curate interactions that are going to be meaningful to the users.