# Introduction to Data Visualisation
## <font color='grey'>Part 2</font>
# Mapping data types to visual objects
### VMSG 2024
### Prof. Jamie Farquharson, Niigata University Japan. 

Email: jfarquharson@gs.niigata-u.ac.jp
Website:  https://jifarquharson.github.io/

***

# Outline
> ### $\S$1. Data and data types
> ### $\S$2. Plotting data
> ### $\S$3. Beyond the default
> ### $\S$4. Mapping data to chart types
> ### $\S$5. Figure fixer

### This is a Jupyter Notebook
A `Notebook` is a web application for creating and sharing *computational documents*. It is a very useful environment (often embedded within a   `JupyterLab` environment) for data science, scientific computing, and other applications. We will be using it to read and display data in a variety of formats. The `Notebook` is composed of two main *cell* types: `Markdown` and `Code`. `Markdown` cells contain formatted text. `Code` cells contain executable code, in this case written in `Python`.

`Code` cells can be executed individually by clicking &rarr;`Run`, or by pressing the `Shift` + `Return` keys. 

We can use `=` to assign a `value` to a `variable`. Let's try:

In [None]:
name = "" ### Enter your name between the " ", and run the cell.

Nothing obvious happens, but we can now use that `variable` in subsequent `cell` operations. Try running the next `cell`.

In [None]:
print('Hello {}! Welcome to VMSG 2024 '.format(name)+u'\U0001F30B')

***
### A very quick introduction to functions
Often, we want to perform a similar operation many times, in which case it is sensible to write a `function`. A function usually takes some `inputs`, performs an operation, and gives some `outputs`. Here is a simple example:

In [None]:
def say_hello():
    response = input ("What is your name? ")
    print('\nHello {}! Welcome to VMSG 2024 '.format(str(response))+u'\U0001F30B')

This won't do anything until we call the function, like this:

In [None]:
say_hello()

### Here's another function:

In [None]:
def yourNumber():
    seed = input("Please input an integer value from 0 to 4294967295: ")
    try:
        val = int(seed)
        try:
            assert 0 <= val <= 2**32 - 1
            print("\nYour number is {}".format(seed))
            return int(seed)
        except:
            print("Number out of range")
            yourNumber()
    except ValueError:
        print("That's not an integer")
        yourNumber()

### Now, please enter a number when prompted by `yourNumber()` ###

In [None]:
seed = yourNumber()

### Double-check your number, which is saved as a variable named `seed`:

In [None]:
seed

We can (and will) use this number later, at any time, until it is overwritten (i.e. if you were to assign another value to a variable with the name `seed`). Notice that `name` and `seed` are different types of data: a `string` and an `integer`, respectively.

### This is not a `Python` class, so we'll leave it there. When asked, please &rarr;`Run` the relevant `code` cells.
So that you don't have to, a series of `functions` have been written and stored in another `Notebook`. Run the following code so that we can access those functions later.

In [None]:
%run Functions.ipynb     # This is were some relevant functions are stored


***
# $\S$1. Data and data types

We can import data from an external file. For example, we have a `tsv` file containing data, called "DatasaurusDozen-wide.tsv," which we can read in to our `Notebook`. We'll read it in to a format called a `DataFrame`, and check the first few lines of the result.

In [None]:
data = pd.read_csv("The Datasaurus Dozen/DatasaurusDozen-wide.tsv", sep = "\t", header = 0,)
data = data.drop([0])
data = data.apply(pd.to_numeric).reset_index()

In [None]:
data.head()

Now we can plot these data, if we want. Perhaps as **Histograms**:

In [None]:
plt.hist(data['dino'],bins = np.linspace(0,101,20), alpha = 0.5)
plt.hist(data['dino.1'],bins = np.linspace(0,101,20),  alpha = 0.5)
plt.show()

Perhaps as a **line chart**:

In [None]:
plt.plot(data['dino'])
plt.plot(data['dino.1'])
plt.show()

Or perhaps as a **scatterplot**:

In [None]:
for i in range(20):
    fig=plt.figure(1, figsize=(4,4), frameon=True, dpi = 100)
    ax = fig.add_subplot(111)
    ax.set_facecolor(np.random.rand(3,).reshape(1,-1)[0])
    clear_output(wait=True)
    plt.scatter(data['dino'], data['dino.1'], marker=".",\
               c=np.random.rand(3,).reshape(1,-1))
    time.sleep(.5)
    plt.show()
display(Markdown('### Nice.'))

### Common types of data
There are many different kinds of data, and the kind of data visualisation we employ depends, partly, on the data type.

Here are some of the most common data types you may have to work with.

> `Categorical data`:

This type of data represents **categories** (labels) that cannot be measured numerically.

For example, types of phenocryst: quartz, feldspar, biotite.

> `Numerical data`:

This type of data represents _measurable quantities_ in **numerical** form. `Numerical data` can be **discrete** or **continuous**.

For example, 
- the number of volcanic eruptions in a specific region or over a certain time period (`discrete data`), or 
- the concentration of Cl in rhyolite (involves quantifiable measurements of `continuous data`).

> `Nominal data`:

This type of data represents **categories** (or labels) _without an inherent order_.

For example, eruption types: Surtseyan, Vulcanian, Strombolian, Hawaiian

> `Ordinal data`:

This type of data represents categories _with an inherent order_, but the intervals between categories are not uniform. 

For example, Volcanic Explosivity Index: VEI 0 (a non-explosive lava flow) $\rightarrow$ VEI 8 (VEI 8: a mega-colossal eruption).

> `Interval data`:

This type of data represents data that has a consistent **interval** (the difference between values), but _without a zero point_. 

For example, fumarole temperature measurements in °C: there is the same interval between -1 °C and 0 °C, and 0 °C and 1 °C. However, 0 °C does not mean an _absence_ of temperature.

> `Ratio data`:

This type of data is similar to `Interval data`, but _with a zero point_, such that zero means that the quantity being measured is absent.

For example, melt water content: there is the same interval between 2 wt.% and 1 wt.% as between 1 wt.% and 0 wt.%. However, 0 wt.% H$_2$O means there is no water, and we cannot observe negative values.

> `Discrete data`:

This type of data represents distinct (separate) values, where no intermediate values are possible.

For example, the number of historical eruptions from a given volcano. It could be 0, or 18, or 198, but it cannot be 17.9 or 18.1 (you cannot have 0.1 of an eruption).

> `Continuous data`:

This type of data has an infinite number of possible values (within a given range). 

For example, ground displacement due to magma chamber inflation. The value could be 10 mm, or 10.1 mm, or 10.01 mm, or 10.9823745610 mm...

> `Binary data`:

This type of data consists of only two possible values.

For example, 
- Is there an evacuation order in place? [Yes | No]
- This volcano is currently erupting [True | False]

> `Text data`:

This type of data represents **textual information** and is often **unstructured**.

For example, survey response data.

### Other relevant data

While the above are, generally speaking, the most common datatypes, there are others that you are likely to come across in volcano research.

These include:
> `Geospatial data`:

Data representing **geographic location** and/or characteristics of the spatial features on, above, or below the Earth's surface.

For example, an ash isopach map, sampling or drilling locations, SO$_2$ dispersal maps.

> `Temporal data`:

Also known as time-series data, this datatype represents information about time and temporal relationships.

For example, ground deformation in one location _over time_, precipitation in one location _over time_, volcano summit temperature _over time_.

> `Network data`:

This type of data represents **connections**, **communication**, or **relationships** between discrete entitites.

For example, retweets on Twitter*. The users are discrete entities, or _nodes_ in a network. Interactions such as retweets are connections, or _edges_ between those nodes.

> `Sensor data`:

This type of data represents information collected by **sensors** that measure physical or environmental conditions.

For example, output from seismometers, gas sensors, or thermal sensors.

Other datatypes that you might encounter are `Image data` (including satellite or drone imagery) and `Audio data`.

*$\mathbb{X}$, whatever


***
# $\S$2. Plotting data

### Dataset 1: Tilt

A _tiltmeter_ determines tiny changes in the angle of a slope (i.e. the _tilt_ of the ground), in order to monitor deformation caused by magma migrating in Earth's interior. Tiltmeters rely on a liquid level or pendulum.  We are going to use a function to generate some *synthetic* tilt data, using a `function` called `generate_tilt_data()`. The only input is your number from earlier.


In [None]:
t, tilt =  generate_tilt_data(seed)

The output here is two datasets, one called `t` and one called `tilt`. Take a look at some of the data: what **types** of data are they? What does `t` represent?

In [None]:
t[0:21] ## We will just look at the first 21 data points. To see them all, you can use the command `print(t)`

In [None]:
tilt[0:21] ## We will just look at the first 21 data points. To see them all, you can use the command `print(t)`

### Dataset 2: TAS

Volcanic rocks are commonly discriminated according to their TAS: Total-alkali-silica classification. The alkali content is commonly Na$_2$O + K$_2$O, and the silica content is SiO$_2$. We are going to use a function to generate some *synthetic* TAS data, using a `function` called `generate_TAS_data()`. The only input is your number from earlier.

In [None]:
SiO2_values, TA_values = generate_TAS_data(seed)

The output here is two datasets, one called `SiO2_values` and one called `TA_values`. Take a look at some of the data: what **types** of data are they?

In [None]:
SiO2_values

In [None]:
TA_values

### Dataset 3: Precipitation

It is often important to know about the environmental conditions of a volcano you are studying. Precipitation (i.e. rainfall) can be useful to know as it can affect things like satellite data completeness, instrument function, water table position, plume dispersal, site accessibility, and so on. We are going to use a function to generate some *synthetic* precipitation data, using a `function` called `generate_precipitation_data()`. The only input is your number from earlier.

In [None]:
mean_precip, months, years = generate_precipitation_data(seed)

The output here is three datasets, one called `mean_precip`, one called `months`, and one called `years`. Take a look at some of the data: what **types** of data are they? Is there anything you can spot from the data?

In [None]:
mean_precip

In [None]:
months

In [None]:
years

### Bar, Scatter, and Line charts
Bar charts, scatter plots, and line charts are three of the most commonly used types of graph. However, they are not equally appropriate for different datatypes.

In [None]:
plot_three_charts(seed)

Which of the plots best showcases the different datatypes? Why are some appropriate but not others?

### Essential items

#### The default plots give us some information, but there is a lot missing! Some basic things we must include:

> `Data`

Sounds obvious, but you should include the _actual data points_, _lines_, _bars_, etc.

> `Labels`
- X-axis: what is the variable being represented on the **horizontal** axis?
- Y-axis: what is the variable being represented on the **vertical** axis?
- Z-axis?

> `Units`

Must be provided, where appropriate, for each axis.

> `Tick marks`

Tick marks on each axis help readers interpret data values. It also indicates whether axes are logged, linear, or something else.

> `Legend`

If there are multiple data series plotted, a legend should be included to distinguish between them.

> `Data source(s)`

Especially if some of the data were not collected by you (e.g. literature data).

> `Caption`

In a scientific context, you will need to provide a caption that describes the data, offering context or insights. This can be the place for title information, data sources, and information that would otherwise be annotations. 

### Optional extras

> `Gridlines`

Can be useful to guide the reader's eye.

> `Title`

A title describes the content, or purpose, of a graph. In a scientific context, this is often part of the `Caption`.


> `Annotations`

Additional **text** or **shapes** that provide explanations, highlight key points, or add context.


> `Colours` and `Styles`:

Colour and style (e.g. dashed or dotted lines) can be used to differentiate between data series, or to represent data directly (e.g. a colourbar to represent magnitude). Always be mindful of the use of colour and style in the context of accessibility. 


### Specific data visualisation types might require specific elements, such as a North arrow, a scale, error bars, and so on.


***
# $\S$3. Beyond the default

We will use our synthetic TAS data as an illustration.

In [None]:
fig=plt.figure(1, figsize=(6,4), frameon=True, dpi = 100) ### We will plot a figure 6 x 4 inches
ax1 = fig.add_subplot(111) ### Initialise the axes


### Add the data

In [None]:
fig=plt.figure(1, figsize=(6,4), frameon=True, dpi = 100) ### We will plot a figure 6 x 4 inches
ax1 = fig.add_subplot(111) ### Initialise the axes

ax1.scatter(SiO2_values, TA_values)
plt.show()

### Add axes labels and units

In [None]:
fig=plt.figure(1, figsize=(6,4), frameon=True, dpi = 100) ### We will plot a figure 6 x 4 inches
ax1 = fig.add_subplot(111) ### Initialise the axes

ax1.scatter(SiO2_values, TA_values) ### Plot the data

ax1.set_xlabel(r'SiO$_2$ (wt%)')   ### Set the x-axis label
ax1.set_ylabel(r'Na$_2$O + K$_2$O (wt%)')  ### Set the y-axis label
plt.show()

### Adjust the limits of the x- and y-axes, and add some context (annotations). 

The fields are from
> Le Maitre RW (2002) Igneous rocks : IUGS classification and glossary of
        terms : recommendations of the International Union of Geological 
        Sciences Subcommission on the Systematics of igneous rocks, 2nd ed. 
        Cambridge University Press, Cambridge
        
The function to plot them was written by John Stevenson and Joaquin Cortés.

In [None]:
fig=plt.figure(1, figsize=(6,4), frameon=True, dpi = 100) ### We will plot a figure 6 x 4 inches
ax1 = fig.add_subplot(111) ### Initialise the axes

ax1.scatter(SiO2_values, TA_values) ### Plot the data

ax1.set_xlabel(r'SiO$_2$ (wt%)')  ### Set the x-axis label
ax1.set_ylabel(r'Na$_2$O + K$_2$O (wt%)')  ### Set the y-axis label

add_LeMaitre_fields(ax1)  # add TAS fields to the figure
 
ax1.set_ylim(0,14)   ### Set the y-axis limits appropriately
ax1.set_xlim(40, 79)   ### Set the x-axis limits appropriately
plt.show()

### Modify markers, add a legend, and include the caption

In [None]:
fig=plt.figure(1, figsize=(6,4), frameon=True, dpi = 100) ### We will plot a figure 6 x 4 inches
ax1 = fig.add_subplot(111) ### Initialise the axes

ax1.scatter(SiO2_values, TA_values, label = "{} (2024)".format(name), s = 50, marker = "o",
            ec="k",color = "slategrey", alpha = 0.85) ### Plot the data

ax1.set_xlabel(r'SiO$_2$ (wt%)')  ### Set the x-axis label
ax1.set_ylabel(r'Na$_2$O + K$_2$O (wt%)')  ### Set the y-axis label

add_LeMaitre_fields(ax1)  # add TAS fields to the figure
 
ax1.set_ylim(0,14)   ### Set the y-axis limits appropriately
ax1.set_xlim(40, 79)   ### Set the x-axis limits appropriately
ax1.legend(loc='upper right', numpoints=1, edgecolor = "k", fancybox =False)
plt.show()
display(Markdown('''**Figure 1**: Na$_2$O + K$_2$O versus SiO$_2$ (wt%) 
                 for new data collected in this study ({} 2024). 
                 TAS fields from Le Maitre et al. (2002) are presented for context.'''.format(name)))


***
# $\S$4. Mapping data to chart types

If have a dataset and try to plot it indiscriminately, we will run into trouble, as not all data visualisations are appropriate for all kinds of data. The function `chart_picker()` is a tool that can give an idea of which kind of chart to use, depending on what we want to show. Try a few options below (you will need to reset the function each time it gives an output).

In [None]:
chart_picker(seed)

Based on the following research scenarios, think about what *kind* of data visualisation might be the best option.

***
### Scenario 1:

Author 1 collected density measurements of pyroclastic ejecta (**numerical**, **ratio** data). They want to show the *distribution* of these data over a finite range. They have relatively few data (~200 data points). What kind of data visualisation should Author 1 use? Try using the `chart_picker`, then click to reveal the answer.


In [None]:
chart_picker(seed)

In [None]:
display(scenario_1_button)

***
### Scenario 2:

Author 2 has a breakdown of contributions of different groups&mdash;such as volcanologists, economists, and politicians&mdash;to quotes given in newspaper articles about the Eyjafjallajökull ash cloud. How could Author 2 show the composition of quotes given to each news source as a share of the total contribution?  Try using the `chart_picker`, then click to reveal the answer.


In [None]:
chart_picker(seed)

In [None]:
display(scenario_2_button)

***
### Scenario 3:

Author 3 has major element composition of matrix glass from pyroclast samples. For each sample, Author 3 has data pairs corresponding to a weight % of a given major oxide proportion, and the weight % of SiO<sub>2</sub>. How would Author 3 best visualise the distribution of and/or relationship between these two sets of variables? Try using the `chart_picker`, then click to reveal the answer.


In [None]:
chart_picker(seed)

In [None]:
display(scenario_3_button)

***
### Scenario 4

Author 4 has data that breaks down whether volcanological publications (a) are led by local researchers, (b) include (but aren't led by) local researchers, or (c) do not include local researchers. The composition of these data change over time, but the annual data are only available over a 30-year timeframe. How would Author 4 best show the relative *and* absolute differences between these data? Try using the `chart_picker`, then click to reveal the answer.

In [None]:
chart_picker(seed)

In [None]:
display(scenario_4_button)

***
### Scenario 5

Author 5 has a time series of the total mass output of SO<sub>2</sub> during the 2021 eruption of La Soufrière, St Vincent. Author 5 wants to show the contribution of tropospheric and stratospheric SO<sub>2</sub> to the total mass loading (i.e. the composition of the summed data). They have relatively high temporal resolution, so there are many time periods. They are interested in the absolute differences between the different contributors. How could the author visualise these data?  Try using the `chart_picker`, then click to reveal the answer.

In [None]:
chart_picker(seed)

In [None]:
display(scenario_5_button)

***
### Scenario 6

Author 6 has a continuous timeseries of average δ<sup>18</sup>O values from the North Greenland Ice Core Project (NGRIP). The author wants to show if and how this variable changes over a very long time. How best could they visualise these data? Try using the `chart_picker`, then click to reveal the answer.

In [None]:
chart_picker(seed)

In [None]:
display(scenario_6_button)

***
***
# $\S$5. Figure Fixer

In this final section, you will generate and plot some data. You will then have a go at stylising the graph by tweaking lots of parameters.

A few extra `Python` things to know about before moving forward:

 - __`**kwargs`__ are additional `key`word `arg`ument`s` for customising plots. Some examples of things you can change include the __marker sizes__, the __axes thickness__, or the __spacing__ between subplots.

- __`*args`__ are `dict`ionaries containing additional `kwargs`.

- there are many different elements to a `matplotlib` figure. Running the next function will help familiarise you with some of them:



In [None]:
plot_anatomy_figure(seed)

This generates the data:

In [None]:
DATA = generate_data(seed=seed)
DATA.head()

There are several named colours available in `matplotlib` (the package we are using to visualise our data), as well as the capacity to define or import colours and colourmaps. The following code will print out all the available named colours, which might be useful.


In [None]:
named_colours = list(
    mcolors.TABLEAU_COLORS.keys()) + list(
    mcolors.BASE_COLORS.keys()) + list(
    mcolors.CSS4_COLORS.keys())
print(named_colours)

The cell below contains a series of `dict`ionaries, which are just a way in which `Python` can store data, as `{key:value}` pairs. They are called:
- `S1_kwargs`, which stores general parameter values for the first subplot (subplot 1);
- `S2_kwargs`, which stores general parameter values for the second subplot (subplot 2);
- `S1_D1`, which stores parameter values for the 1st dataset on subplot 1;
- `S1_D2`, which stores parameter values for the 2nd dataset on subplot 1;
- `S2_D1`, which stores parameter values for the 1st dataset on subplot 2;
- `S2_D2`, which stores parameter values for the 2nd dataset on subplot 2;
- `S2_D3`, which stores parameter values for the 3rd dataset on subplot 2; and
- `S2_D4`, which stores parameter values for the 4th dataset on subplot 2.

You can modify parameters within each `dict`, or toggle them on and off (by typing a `#` at the start of the relevant line). If the parameter is a `string`, then type between the `"` `"`. Otherwise, if the parameter is a `float` or `int`eger (i.e. a numerical value), just change the value.

You can also adjust `plt.rcParams['font.family']` to equal one of five named options. 
There are also a handful of other options you can change, either as a binary `True|False` option, or by setting a value (such as `horizontal_spacing`). I've been very helpful and set some of the values already. Your task is to produce an effective piece of data visualisation by fine-tuning the plot parameters. 

Alright, now go ahead and run the next cell.

In [None]:

'''
############################################
##  You can adjust the font family here.  ##
############################################
'''

            # This controls the typeface used on the graph.

plt.rcParams['font.family'] = "monospace" ## "fantasy", "monospace", "cursive", "serif", "sans-serif"

'''
##########################################################################
##  Subplot 1 kwargs. These control parameters for the second subplot.  ##
##########################################################################
'''
S1_kwargs = {
            # This controls the label on the x axis.
    "xlabel": ["Y data"],
            # This controls the label on the y axis.
    "ylabel": ["X data"],
            # This controls whether there is a grid.
    "grid": ["on"],
            # This controls whether the x axis is logged ["log"] or ["linear"].
    "xscale": ["linear"],
            # This controls whether the x axis is logged ["log"] or ["linear"].
    "yscale": ["linear"],
            # This controls the upper and lower limits of the x axis.
    "xlim": [(-11.9, 80.9)],
            # This controls the upper and lower limits of the x axis.
    "ylim": [(1, 4000)], 
            # This controls whether there is a title, and what it says
    "title" : ["This is an AWESOME example of data visualisation!!!"],
            # This controls the colour of the subplot background.
    "facecolor" : ["hotpink"],
}

'''
##########################################################################
##  Subplot 2 kwargs. These control parameters for the second subplot.  ##
##########################################################################
'''
S2_kwargs = {
            # This controls the label on the x axis.
    "xlabel": ["X data"],
            # This controls the label on the y axis.
    "ylabel": ["Y data"],
            # This controls whether there is a grid.
    "grid": ["on"],
            # This controls whether the x axis is logged ["log"] or ["linear"].
    "xscale": ["linear"],
            # This controls whether the x axis is logged ["log"] or ["linear"].
    "yscale": ["linear"],
            # This controls the upper and lower limits of the x axis.
    "xlim": [(10,120)],
            # This controls the upper and lower limits of the x axis.
    "ylim": [(-3000, 5001)], 
            # This controls whether there is a title, and what it says.
#     "title" : ["Title"],
            # This controls the colour of the subplot background.
    "facecolor" : ["oldlace"],
}


'''
###############################################################################################
##  Dataset 1 kwargs. These control parameters for the first set of (x,y) data (subplot 1).  ##
###############################################################################################
'''
S1_D1 = {
            # This controls the marker size (scalar value or an array of the same length as the dataset).
    "s": 50,
            # This controls the marker colour (see options above).
    "color": "red",
            # This controls the marker line colour (see options above).
    "edgecolors": "k",
            # This controls the marker shape (options include: 's', 'o', '.', ',', '*', '^', 'v', 'p').
    "marker": "+",
            # This controls the layer order of items on the plot.
    "zorder": 10,
            # This controls the transparency of markers on the plot (values between 0.0 and 1.0).
    "alpha": 0.85,
            # This controls the thickness of lines, including the marker edges.
    "linewidths": 0.25,
            # This controls the style of lines, including the marker edges ("-", ":", "--", ".-").
    "linestyle": "-", 
            # This controls whether there is a legend entry, and what it says.
    "label": "",
        }

'''
###############################################################################################
##  Dataset 2 kwargs. These control parameters for the first set of (x,y) data (subplot 1).  ##
###############################################################################################
'''
S1_D2 = {
            # This controls the marker size (scalar value or an array of the same length as the dataset).
    "s": 80,
            # This controls the marker colour (see options above).
    "color": "lemonchiffon",
            # This controls the marker line colour (see options above).
    "edgecolors": "r",
            # This controls the marker shape (options include: 's', 'o', '.', ',', '*', '^', 'v', 'p').
    "marker": "_",
            # This controls the layer order of items on the plot.
    "zorder": 2,
            # This controls the transparency of markers on the plot (values between 0.0 and 1.0).
    "alpha": 0.95,
            # This controls the thickness of lines, including the marker edges.
    "linewidths": 0.95,
            # This controls the style of lines, including the marker edges ("-", ":", "--", ".-").
    "linestyle": "-", 
            # This controls whether there is a legend entry, and what it says.
    "label": "",
        }

'''
###############################################################################################
##  Dataset 1 kwargs. These control parameters for the first set of (x,y) data (subplot 2).  ##
###############################################################################################
'''
S2_D1 = {
            # This controls the marker size (scalar value or an array of the same length as the dataset).
    "s": 100,
            # This controls the marker colour (see options above).
    "color": "limegreen",
            # This controls the marker line colour (see options above).
    "edgecolors": "blue",
            # This controls the marker shape (options include: 's', 'o', '.', ',', '*', '^', 'v', 'p').
    "marker": "D",
            # This controls the layer order of items on the plot.
    "zorder": 10,
            # This controls the transparency of markers on the plot (values between 0.0 and 1.0).
    "alpha": 0.8,
            # This controls the thickness of lines, including the marker edges.
    "linewidths": 2,
            # This controls the style of lines, including the marker edges ("-", ":", "--", ".-").
    "linestyle": "--", 
            # This controls whether there is a legend entry, and what it says.
    "label": "",
        }

'''
################################################################################################
##  Dataset 2 kwargs. These control parameters for the second set of (x,y) data (subplot 2).  ##
################################################################################################
'''
S2_D2 = {
            # This controls the marker size (scalar value or an array of the same length as the dataset).
    "s": 50,
            # This controls the marker colour (see options above).
    "color": "pink",
            # This controls the marker line colour (see options above).
    "edgecolors": "grey",
            # This controls the marker shape (options include: 's', 'o', '.', ',', '*', '^', 'v', 'p').
    "marker": "^",
            # This controls the layer order of items on the plot.
    "zorder": 10,
            # This controls the transparency of markers on the plot (values between 0.0 and 1.0).
    "alpha": 0.85,
            # This controls the thickness of lines, including the marker edges.
    "linewidths": 0.95,
            # This controls the style of lines, including the marker edges ("-", ":", "--", ".-").
    "linestyle": "-.", 
            # This controls whether there is a legend entry, and what it says.
    "label": "",
        }

'''
###############################################################################################
##  Dataset 3 kwargs. These control parameters for the third set of (x,y) data (subplot 2).  ##
###############################################################################################
'''
S2_D3 = {
            # This controls the marker size (scalar value or an array of the same length as the dataset).
    "s": 100,
            # This controls the marker colour (see options above).
    "color": "firebrick",
            # This controls the marker line colour (see options above).
    "edgecolors": "k",
            # This controls the marker shape (options include: 's', 'o', '.', ',', '*', '^', 'v', 'p').
    "marker": "^",
            # This controls the layer order of items on the plot.
    "zorder": 10,
            # This controls the transparency of markers on the plot (values between 0.0 and 1.0).
    "alpha": 0.5,
            # This controls the thickness of lines, including the marker edges.
    "linewidths": 1.0,
            # This controls the style of lines, including the marker edges ("-", ":", "--", ".-").
    "linestyle": "-", 
            # This controls whether there is a legend entry, and what it says.
    "label": "",
        }

'''
################################################################################################
##  Dataset 4 kwargs. These control parameters for the fourth set of (x,y) data (subplot 2).  ##
################################################################################################
'''
S2_D4 = {
            # This controls the marker size (scalar value or an array of the same length as the dataset).
    "s": 100,
            # This controls the marker colour (see options above).
    "color": "k",
            # This controls the marker line colour (see options above).
    "edgecolors": "k",
            # This controls the marker shape (options include: 's', 'o', '.', ',', '*', '^', 'v', 'p').
    "marker": "*",
            # This controls the layer order of items on the plot.
    "zorder": 10,
            # This controls the transparency of markers on the plot (values between 0.0 and 1.0).
    "alpha": 1.0,
            # This controls the thickness of lines, including the marker edges.
    "linewidths": 0.5,
            # This controls the style of lines, including the marker edges ("-", ":", "--", ".-").
    "linestyle": "-", 
            # This controls whether there is a legend entry, and what it says.
    "label": "???",
        }

'''
#####################################################################################
##  The arg[ument]s are passed through to the plot. Please don't modify this line. ##
#####################################################################################
'''
S2_kwargs["facecolor"] = ["palevioletred"]
args = [S1_kwargs, S2_kwargs, S1_D1,S1_D2,S2_D1, S2_D2,  S2_D3, S2_D4]    #! Do not modify

'''
#######################################################################################
##   This function plots the data using the  args and kwargs you've defined above.   ##
##   You can modify the inputs "models," legend", "vertical," "cat," "box,"          ##
##  "vertical_spacing", and "horizontal_spacing," but please don't touch the others. ##
#######################################################################################
'''

plot_dataframe(DATA,     #! Do not modify
               args,     #! Do not modify
               models = False,
               legend = False,
               vertical = True,
               cat=True,
               box=True,
               vertical_spacing=0.15,
               horizontal_spacing=2.0
               )

display(Markdown('''<div class="alert alert-block alert-success">This might take quite a few iterations to reach "publication quality". Remember that there is no single perfect look and layout for a figure (but there are plenty of wrong ones)! If you are happy with what you've come up with, feel free to email your version to jfarquharson@gs.niigata-u.ac.jp, with the subject line: [VMSG 2024 data visualisation] [Your Name] [Awesome]. You can right-click on the image to copy or save.</div>'''))

### End of Part 2