# Data Visualisation 

This week we are going to cover some tools for data visualisation, as well as some core concepts in the process of visualising data and in turn how that affects/effects what we can visualise and the limits of these tools. 

More specifically we will focus on situating our data visualisation(s). We will look at the importance of Critical Data Visualisation and how to do it.

## The Dataset

In this notebook, we will be using the Plotly library in Python to visualize fires data in the Brazilian amazon.

The version I am using is from this [Kaggle page](https://www.kaggle.com/datasets/mbogernetto/brazilian-amazon-rainforest-degradation) which also has other interesting data to explore (deforestation). This dataset is credited to INPE, [National Institute for Space Research](https://www.gov.br/inpe/pt-br). 

Essentially what we have is data on amount of fire outbreaks in Brazilian Amazon by *state* (region), *month and year*, *latitude* and *longitud* from 1999 to 2019.

The intention is to use this dataset and these features to visualise the data in a variety of ways, to explore the process of data visualisation and be critical of this process too.

First, we are going to visualise the data over time in a simple graph that can provide us with a clear and intuitive understanding of how the data is changing over time. Time series graphs, which are used to show how a variable changes over time, can help us identify patterns, trends, and relationships that may not be apparent when looking at the raw data.

For this, we are going to use plotly time series graph.


In [2]:
pip install plotly

Collecting plotly
  Using cached plotly-5.13.0-py2.py3-none-any.whl (15.2 MB)
Collecting tenacity>=6.2.0
  Using cached tenacity-8.2.1-py3-none-any.whl (24 kB)
Installing collected packages: tenacity, plotly
Successfully installed plotly-5.13.0 tenacity-8.2.1
You should consider upgrading via the '/Users/Yadira/.pyenv/versions/3.7.3/bin/python -m pip install --upgrade pip' command.[0m[33m
[0mNote: you may need to restart the kernel to use updated packages.


In [2]:
pip install --upgrade nbformat


Collecting nbformat
  Downloading nbformat-5.7.3-py3-none-any.whl (78 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.1/78.1 KB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting fastjsonschema
  Downloading fastjsonschema-2.16.2-py3-none-any.whl (22 kB)
Collecting jsonschema>=2.6
  Downloading jsonschema-4.17.3-py3-none-any.whl (90 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m90.4/90.4 KB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
Collecting importlib-resources>=1.4.0
  Downloading importlib_resources-5.12.0-py3-none-any.whl (36 kB)
Collecting pkgutil-resolve-name>=1.3.10
  Using cached pkgutil_resolve_name-1.3.10-py3-none-any.whl (4.7 kB)
Collecting attrs>=17.4.0
  Downloading attrs-22.2.0-py3-none-any.whl (60 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.0/60.0 KB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0
  Downloading pyrs

In [1]:
import pandas as pd
import plotly.graph_objs as go

# Load the data
df = pd.read_csv('data/brazilian_amazon_fires_1999_2019.csv')

# Create the plot
fig = go.Figure()

fig.add_trace(go.Scatter(
                        x=df['year'], 
                        y=df['firespots'], 
                        mode='lines', 
                        name='firespost')
                        )

# Add labels and title
fig.update_layout(
    xaxis_title='Year',
    yaxis_title='Firespots',
    title='Fires over Time')

# Show the plot
fig.show()


![Fires Over Time with Plotly](data/plotly_scatter_plot.gif)


We can also try a more interactive visualisation using Bokeh. 
Bokeh is useful for creating interactive data visualizations in a web browser. 

In [3]:
pip install bokeh

Collecting bokeh
  Downloading bokeh-2.4.3-py3-none-any.whl (18.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.5/18.5 MB[0m [31m13.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting pillow>=7.1.0
  Downloading Pillow-9.4.0-2-cp37-cp37m-macosx_10_10_x86_64.whl (3.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m25.4 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Installing collected packages: pillow, bokeh
Successfully installed bokeh-2.4.3 pillow-9.4.0
You should consider upgrading via the '/Users/Yadira/.pyenv/versions/3.7.3/bin/python -m pip install --upgrade pip' command.[0m[33m
[0mNote: you may need to restart the kernel to use updated packages.


In [1]:
import pandas as pd
from bokeh.io import show
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource

# Load the data
df = pd.read_csv('data/brazilian_amazon_fires_1999_2019.csv')

# Create a ColumnDataSource object
source = ColumnDataSource(data=dict(year=df['year'], firespots=df['firespots']))

# Create the plot
fig = figure(title='Fires over Time', x_axis_label='Year', y_axis_label='Firespots')
fig.line(x='year', y='firespots', source=source)

# Show the plot
show(fig)

**Note:** To be able to see the output and interact with it we have to run the code and this will pop up in a web browser. 

![Fires Over Time with Bokeh](data/bokeh_plot.gif)


By visualizing data in this way, we can quickly identify important insights and trends, and gain a deeper understanding of the data we are working with.

From both visuals we are able to see the amount of fire outbreaks over the years, from 1999 to 2019. 

But what if we want to see where, in what state (region) did they happen?

For that we could try a tool that will let us have a more comprehensive view of more data points and one that we visualise the relationships between all three variables simultaneously (*year, firespots, state*).

A 3D scatter plot in Plotly can offer a more comprehensive view of a larger number of data points, as it allows for the visualization of three variables simultaneously.

In [2]:
import pandas as pd
import plotly.graph_objs as go

# Load the data
df = pd.read_csv('data/brazilian_amazon_fires_1999_2019.csv')

# Create the 3D scatter plot
fig = go.Figure(data=[go.Scatter3d(x=df['year'],
                                   y=df['state'],
                                   z=df['firespots'],
                                   mode='markers',
                                   marker=dict(
                                       size=3, 
                                       color=df['firespots'],                                  
                                       colorscale='Rainbow', 
                                       opacity=0.8))])

# Set the title and axis labels
fig.update_layout(title='Brazilian Amazon Rainforest Fires',
                  scene=dict(xaxis_title='Year',
                             yaxis_title='State',
                             zaxis_title='Firespots'),
                             margin=dict(l=0, r=0, t=40, b=0))

# Show the plot
fig.show()

![Fires Over Time with Plotly 3D scatter plot](data/plotly_3d.gif)


In a 3D scatter plot, the `x`, `y`, and `z` coordinates of each data point are represented by its position in 3D space. This can help us visualize relationships between multiple variables and identify patterns that may not be visible in a simple 2D plot.

The interactivity provided by Plotly can make 3D scatter plots even more useful for data analysis. 

You can interact with the 3D plot and see what it does, exploring it, hovering over every data point and you should see the year, number of firespots and the region in which they happened.

Plotly has many built-in colorscales that you can use to customize the appearance of your data visualizations. Changing the colorscale in a data visualization can be an important step in the process of creating effective and informative visualizations. One of the most powerful aspects of images is colour, which in turn transforms information into meaning. 

Some of these reasons are: 

1. Emphasize different aspects of the data: Changing the colorscale can help emphasize different aspects of the data that might be of interest to viewers. For example, a diverging colorscale might be useful for highlighting the differences between positive and negative values, while a sequential colorscale might be better for emphasizing the magnitude of values.
2. Improve readability: By choosing the right colorscale, you can improve the readability of a visualization. For example, if the data is displayed on a white background, a dark colorscale might be easier to read, while a light colorscale might be more suitable for a dark background.
3. Enhance aesthetics: Finally, changing the colorscale can be a way to enhance the aesthetics of a visualization, making it more visually appealing and engaging for viewers. A well-chosen colorscale can help make the data stand out and make the visualization more memorable.

Here are some of the most commonly used colorscales in Plotly:
    
    `Greys
    YlGnBu
    Greens
    YlOrRd
    Bluered
    RdBu
    Reds
    Blues
    Picnic
    Rainbow
    Portland
    Jet
    Hot
    Blackbody
    Earth
    Electric
    Viridis
    Cividis`

 **Task**

The colour of the visualization could be more engaging, clearer and accesible. 

1. Read the article to find more about the misuse of color in science communication [here](https://www.nature.com/articles/s41467-020-19160-7). 
 
2. Explore the colorscales and apply the one that make the visual less complex, and more accessible for people with colour-vision disabilities.

# Situating the Data: Critical Data Visualisation 

We can go on forever and try and visualize this dataset in many other plots but I actually want to put it into context and apply a feminist and decolonial framework.

How? 

A **decolonial and feminist approach** to data visualization of the Brazil Amazon rainforest fires would also involve engaging in **community-based research** and **participatory design methods**, where **local communities are involved** in the data collection and visualization process. But we are here and there is no way of involving the communities directly and co-creating a visualisation of this data based on their experience. 

Let's try and use a inbuilt Plotly map that will help us visualise firespots and the territories where they happened. 

**Why are territory and geography important in critical data visualisation?**



In [3]:
import pandas as pd
import plotly.express as px

df = pd.read_csv('data/brazilian_amazon_fires_1999_2019.csv')

fig = px.density_mapbox(df, 
                        lat='latitude', 
                        lon='longitude', 
                        z='firespots', 
                        hover_name="state",
                        radius=5,
                        center=dict(lat=0, lon=180), zoom=0,
                        mapbox_style="stamen-terrain")
fig.show()


![Situating the fires in a map with a Plotly density mapbox](data/plotly_interactive_mapbox.gif)

I chose this mapbox feature as a first instance of data visualisation, for few reasons:

1. Maps can be powerful data visuals in situated critical data visualization, as they can help to visualize the geographic context and spatial relationships of data.

2. Maps can provide geographic context to data, allowing viewers to see how data is distributed across different regions and locations. This can help to identify patterns and trends in the data that may not be immediately apparent from other types of visualizations.

**Territory and geography** are important in critical data visualisation because it tells us where this data is coming from. It is no longer from **nowhere** but it comes from **a place - a territory - a geographical location.** It is now also pointing us towards: If it comes from a place, it must also come from a place where there is culture, people, nature, and/or other physical beings.


## Applying materiality to situate the fires 

Now we can see where in Brazil the most fires happened and what places have had the most fires. But we know that current inbuilt data visualisation tools like plotly, matplotlib, seaborn, etc...may not have the capacity to create visuals that go beyond maps, plots, bars, etc... and situating context and purpose from the lense I want to work with: a feminist and decolonial framework.

So, I want to situate my visualisation based on a feminist and decolonial framework but I have no access to working with the communities or applying participatory design?

In this context, data visualization could serve as a tool to raise awareness of the destruction of the Amazon rainforest and the impacts on Indigenous communities and the environment, while also challenging dominant narratives and power structures that have contributed to this destruction.


A situated context of data visualizing Brazil Amazon rainforest fires from a decolonial and feminist perspective would recognize and address the **historical** and **ongoing colonial and patriarchal power structures** that have contributed to the **destruction** of the Amazon rainforest and the displacement and marginalization of Indigenous communities.

We can then centre Indigenous communities and the environment. But how, if we don't have that data at hand?

It's time to get creative. So will add photographs of the physical places when I click on a certain data point. This would situate visualisation combining physical photography and virtual data visualisation to show a particular place that is made out of a surrounding environment, with people, nature and culture.

**What other tools are out there to help me achieve this?** 

Many. 

**Task:** Check [Tactical Tech website] (https://visualisingadvocacy.org/resources/visualisationtools.html) to see the many visualisation tools to help you find the many tools you can use depending on what you want to visualise. 

For now, I want to show things on a map or create a new map. We are doing both and we are going to use [p5.js] (https://p5js.org/).

But firstly, I need to have an idea of what places I want to materialise. This is a work in progress and for now I will start with 5 places. 

* Let's find the 5 locations that have experienced the most fire outbreaks between 1999 and 2019. This involves some data analysis of the dataset.


In [9]:
import pandas as pd

# Load the CSV data into a Pandas dataframe
fires = pd.read_csv('data/brazilian_amazon_fires_1999_2019.csv')

# Group the data by latitude and longitude and sum the number of fire spots for each location
location_counts = fires.groupby(['latitude', 'longitude']).sum()

# Find the 5 locations with the most fire spots
top_locations = location_counts.nlargest(5, 'firespots')

# Print the results
print('Top 5 locations with the most fire spots:')
print(top_locations)


Top 5 locations with the most fire spots:
                       year  month  firespots
latitude   longitude                         
-7.272872  -51.737227  2002      8      37926
-11.093895 -56.204235  2004      9      24886
-11.018350 -56.162919  2007      9      24779
-10.646819 -55.591126  2002      8      23642
-6.767477  -52.125310  2005      8      23635


We now have the exact coordinates which help me find some more about those places, let's do a google search and see what it tells us about every single one.

The first one `-7.272872  -51.737227` - Aldea In digena Aukra - Ourilândia do Norte - State of Pará, Brazil
![aldea](data/aldeaindigenaaukra.jpg)


The second one `-11.093895 -56.204235` Itauba, Matto Grosso.
![Itauba](data/itauba.jpg)

The third one `-11.018350 -56.162919` - Nova Canaã do Norte, State of Mato Grosso, 78515-000, Brazil

![Nova](data/novacanaadenorte.jpg)

`-10.646819 -55.591126` - Colíder - State of Mato Grosso, Brazil


![Colider](data/colider.jpg)

`-6.767477  -52.125310` - São Félix do Xingu, State of Pará, 68380-000, Brazil with 23635 fire outbreaks.

![saofelix](data/saofelix.jpg)

Now that we have the images, we can start pouplating our map.

The project will be called **Amazonia: environmental ecologies of justice** and you can find it [here](https://editor.p5js.org/yadlra/sketches/mhpYfXjyB). 