| [**Overview**](./00_overview.ipynb) | [Getting Started](./01_jupyter_python.ipynb) | **Examples:** | [Access](./02_accessing_indexing.ipynb) | [Transform](./03_transform.ipynb) | [Plotting](./04_simple_vis.ipynb) | [Norm-Spiders](./05_norm_spiders.ipynb) | [Minerals](./06_minerals.ipynb) | [lambdas](./07_lambdas.ipynb) | [CIPW](./08_CIPW_Norm.ipynb) | [Lattice Strain](./09_lattice_strain.ipynb) | **Extensions:** | [ML](./11_geochem_ML.ipynb) | [Spatial Data](./12_spatial_geochem.ipynb) |
| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |

# Simple Visualisation:<br> Bivariate, Ternary and Density Plots

`pyrolite` contains an array of visualisation methods, a few of which we'll quickly run through here. For more, check out the [examples gallery](https://pyrolite.readthedocs.io/en/develop/examples/index.html#plotting-examples)!

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pyrolite.util.synthetic import normal_frame

In [None]:
df = normal_frame(
    columns=[
        "SiO2",
        "CaO",
        "MgO",
        "FeO",
        "TiO2",
    ],  # columns we want for our dataframe - they're all treated the same currently
    cov=np.eye(4)
    * np.array(
        [0.35, 0.85, 0.3, 1.1]
    ),  # here we specify a covariance matrix - this simply tells it 'how spread out' we want the data
    size=1000,  # how many 'samples' we want
    seed=13,  # specify a random seed - so we have random data, but the same random data each time
)

---
## Bivariate Plotting

While there are many ways to get to simple bivariate plots, `pyrolite` provides a few options which can provide a simpler interface and easier access to simple styling configuration.

In [None]:
df[["MgO", "SiO2"]].pyroplot.scatter(color="k", marker="o", alpha=0.5)

Where we get to larger datasets, overplotting becomes an issue, and we may want to consider methods for visualising the distribution of data as a whole rather than individual points. `pyrolite` has as few options for this, including 'density' plots and 'heatscatter' plots (based on kernel density estimates).

In [None]:
df[["MgO", "SiO2"]].pyroplot.density(bins=100)

While this can look quite nice, and it solves the issue of overplotting we were nearing above, sometimes we want to be able to plot over this and clearly see where new data sits. In this case, we can instead use percentile contours of the kernel density estimate instead:

In [None]:
df[["MgO", "SiO2"]].pyroplot.density(
    bins=100, contours=[0.5, 0.95], colors=["k", "0.5"]
)

While it does look nice, the data density diagram and contours both show kernel density which crosses the axes - which shouldn't occur when we consider that abundances are positive-only! One way to get around this is to use a log-scaled kernel density grid. We can see that this improves the situation for both figures:

Note also that we can change colormaps - matplotlib has [a decent range to choose from](https://matplotlib.org/stable/tutorials/colors/colormaps.html), noting that you should lean towards linear unidirectional colormaps in most geochemical data cases!

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(12, 4))
df[["MgO", "SiO2"]].pyroplot.density(
    ax=ax[0], bins=100, logx=True, logy=True, cmap="Purples"
)
df[["MgO", "SiO2"]].pyroplot.density(
    ax=ax[1], bins=100, contours=[0.5, 0.95], colors=["k", "0.5"], logx=True, logy=True
)

Another way to visualise your data is to combine summary information provided by the kernel density estimate with the scatter plot to produce a heatscatter plot which retains the best of both worlds. Here we can see the position of each invdividual sample beyond the core of the distribution, and also itentify where the greatest density of samples are:

In [None]:
df[["MgO", "SiO2"]].pyroplot.heatscatter(alpha=0.5, logx=True, logy=True)

---
## Ternary Plots

Ternary plots are a common in geochemistry, mineralogy and petrology but dont' necessarily pop up elsewhere. `pyrolite` provides an interface to create ternary plots wherever you pass three columns, making it as simple as creating our bivariate plots above!

In [None]:
df[["CaO", "MgO", "FeO"]].pyroplot.scatter(color="k", marker="o", alpha=0.5)

In contrast to most ternary plots, however, we can also create data density visualisations (based on distributions in logratio space):

In [None]:
df[["CaO", "MgO", "FeO"]].pyroplot.density(contours=[0.5, 0.95], colors=["k", "0.5"])

In [None]:
df[["CaO", "MgO", "FeO"]].pyroplot.heatscatter(alpha=0.5, cmap="cividis", s=3)

---
## Plot Templates

pyrolite provides a few built-in plot templates, with the idea to expand the collection in the near future. 

In [None]:
from pyrolite.plot.templates import pearceThNbYb, TAS

In [None]:
ax = pearceThNbYb()

In [None]:
ax = TAS()

This TAS diagram is a litte more than it seems - it's actually built upon a classifer, which will allow you to automatically classify your samples if you have the relevant data:

In [None]:
from pyrolite.util.classification import TAS
from pyrolite.util.synthetic import normal_frame

df = (
    normal_frame(
        columns=["SiO2", "Na2O", "K2O", "Al2O3"],
        mean=[0.5, 0.04, 0.05, 0.4],
        size=100,
        seed=49,
    )
    * 100  # scaled by 100% for the standard TAS diagram
)
df["Na2O + K2O"] = df["Na2O"] + df["K2O"]

cm = TAS()  # TAS classifier model
df["TAS"] = cm.predict(df)  # predict what TAS class the samples occupy

In [None]:
ax = cm.add_to_axes(
    alpha=0.5, linewidth=0.5, zorder=-1, labels="ID"
)  # add the TAS diagram with the labels to an axis
df[["SiO2", "Na2O + K2O"]].pyroplot.scatter(
    ax=ax, c=df["TAS"], alpha=0.5
)  # plot the data colored by their TAS class

## A Quick Aside: Exporting Figures

We quickly mentioned exporting tables from `pandas` earlier (see the [range of import and export options](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html)), and here we can have a look at the options exporting for `matplotlib` figures. This is largely centered around the [`save_figure` function](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html), and `pyrolite` provides a bit of a shortcut to this if you'd like to use it.

A range of export formats are available - in standard raster (JPEG, PNG, TIFF etc) and vector formats (PDF, SVG, EPS etc).

In [None]:
from pyrolite.util.plot import save_figure

save_figure(ax.figure, name="TAS_Diagram")  # defualt PNG output

The `save_figure` function also allows you to export in multiple formats at the same time:

In [None]:
save_figure(ax.figure, name="TAS_Diagram.pdf", save_fmts=["png", "pdf"], dpi=300)

---

Check out some of the other templates in `pyrolite.plot.templates` (e.g. the `USDASoilTexture` Triangle).

----
<div class='alert alert-warning'> <font size="+1" color="black"><b> Checkpoint & Time Check</b><br>How are things going?</font></div>

----

## Pulling a Few Things Together for a Short Workflow

Below we'll look at how we can use some of the compositional data functionality of `pyrolite.comp` in a data analysis and visualisation workflow. We'll be looking at some spinel geochemistry data from Norilsk, and examining the geochemical features of spinels found as inclusions within different phases (we quickly loaded this a few notebooks back). 

The data we use below is available as supplementary material in Schoneveld, L., Barnes, S. J., Williams, M., Le Vaillant, M., and  Paterson, D., 2020, Silicate and Oxide Mineral Chemistry and Textures of the Norilsk-Talnakh Ni-Cu-Platinum Group Element Ore-Bearing Intrusions: Economic Geology doi: http://doi.org/10.5382/econgeo.4747. We can see that each major-element mineral analysis below includes relevant context as to the data source, analysis, thin section location and the enclosing phase within which the spinel sits. The analyses include major oxides in weight percent, and calcuated atoms per formula unit (apfu) for each of these cations:

In [None]:
#import the data
df = pd.read_csv('../data/spinel/Schoneveld2020.csv')
numerical_columns = df.select_dtypes('number').columns
df[numerical_columns] = df[numerical_columns].where(lambda x: x > 0.) # remove values below zero
df.head(2)

What we'd like to do here is look at how the chemistry varies between spinels found in different enclosing phases - and from that point see what we can link back to the magmatic history of some of these rocks. We can see that there are five potential mineral hosts for spinel (olivine, pyroxene, plagioclase, sulfide and 'silicate'), the potential for spinels to be in bubbles and an empty data value - 'nan':

In [None]:
# list the enclosing phases of the spinel
df['Enclosing Phase'].unique()

Before we get into some visualisation, we can standardise some colors so that everything looks the same throughout:

In [None]:
#give each category a colour
colors = {'bubble':'grey', 'olivine':'green', 'plagioclase':'pink', 'pyroxene':'teal', 'silicate':'lightgrey', 'sulphide':'orange'}

First - we can have a look at how each group of spinels is distributed with resepctto the major spinel cations $Cr$, $Al$, and $Fe^{3+}$:

In [None]:
apfu = ['Cr_apfu','Al_apfu','Fe3_apfu'] # the columns we'd like to look at

In [None]:
# for each group plot a spinel ternary diagram of the atoms per formula unit (apfu) of the trivalent cations
fig, ax = plt.subplots(1, figsize=(8, 8))

labels = []
for host, gdf in df.groupby('Enclosing Phase'):
    ax = gdf[apfu].pyroplot.scatter(c=colors[host], ax=ax)
    labels.append(host)

ax.legend(labels, fontsize=12, markerscale=1.8)

Now we'd like to plot the averages of each group on our ternary diagram, and also write these out into a separate file which we can keep handy for refrence:

In [None]:
fig, ax = plt.subplots(1, figsize=(8, 8))

labels = []
for host, gdf in df.groupby('Enclosing Phase'):
    ax = gdf[apfu].pyroplot.scatter(c=colors[host], ax=ax) # scatter points
    labels.append(host)

ax.legend(labels, fontsize=12, markerscale=1.8)

means = {}
for host, gdf in df.groupby('Enclosing Phase'):
    ax = gdf[apfu].pyroplot.scatter(c=colors[host], ax=ax) # scatter points
    mean = gdf[apfu].dropna(how='any').pyrocomp.logratiomean()
    
    mean.pyroplot.scatter(ax=ax, 
                          facecolors=colors[host], 
                          marker='D', 
                          s=100, 
                          edgecolors="k", 
                          linewidths=1, 
                          zorder=3)
    means[host] = mean # store the mean so we can export it shortly

We now have a dictionary of means for each phase, which we can turn into a dataframe:

In [None]:
meandf = pd.DataFrame.from_records(means).T
meandf

We can easily export this to a csv file:

In [None]:
meandf.to_csv('spinel_mean_{}.csv'.format('-'.join(apfu))) # create a csv with the mean compositions which we can use later

Looks like we're most of the way there in terms of exploring the geochemical features of spinels in each of our mineral hosts - but perhaps we don't need to know the composition of every spot (some we cant' see in this diagram anyway!), and we'd just like a summary contour of the main section of the distribution. We can use `df.pyroplot.density()` to create a 68th percentile contour. 

In [None]:
fig, ax = plt.subplots(1, figsize=(8, 8), subplot_kw={"projection": "ternary"})

means = {}
labels = {}
for host, gdf in df.groupby("Enclosing Phase"):
    if gdf.index.size > 1:
        labels[host] = []
        # plot a contour of approximately +/- 1 sigma from the mean composition, assuming a gaussian distsribution
        ax = (
            gdf[apfu]
            .dropna(how="any")
            .pyroplot.density(
                ax=ax,
                bins=50,
                contours=[0.68],
                colors=colors[host],
                # label_contours=False
            )
        )
        labels[host].append(
            ax.collections[-1].legend_elements()[0][0]
        )  # use proxy line for legend
        mean = gdf[apfu].dropna(how="any").pyrocomp.logratiomean()
        means[host] = mean  # store the mean so we can export it shortly
        mean.pyroplot.scatter(
            ax=ax,
            facecolors=colors[host],
            marker="D",
            s=100,
            edgecolors="k",
            linewidths=1,
            zorder=3,
        )
        labels[host].append(ax.collections[-1])
        labels[host] = tuple(labels[host])
        # labels[host] = tuple(ax.collections[-2:])  # last two collections added


ax.legend(
    labels.values(),
    list(labels.keys()),
    fontsize=12,
    markerscale=0.8,
    bbox_to_anchor=(0.8, 1),
)  # our legend, pulling int inside the figure a little for better use of space

We can now save this figure for future reference - or to put into a manuscript. While you can use the default `matplotlib` method `fig.savefig`, `pyrolite` has a helper function which will make sure legends etc aren't cut off, and for saving in multiple formats at once:

In [None]:
from pyrolite.util.plot import save_figure
save_figure(fig, './Norilsk_spinel_chemistry_by_host', save_fmts=['png'])

Now that we have a workflow going, you could wrap this up into a function or two so we can use it for other mineral inclusion datasets!

| [**Overview**](./00_overview.ipynb) | [Getting Started](./01_jupyter_python.ipynb) | **Examples:** | [Access](./02_accessing_indexing.ipynb) | [Transform](./03_transform.ipynb) | [Plotting](./04_simple_vis.ipynb) | [Norm-Spiders](./05_norm_spiders.ipynb) | [Minerals](./06_minerals.ipynb) | [lambdas](./07_lambdas.ipynb) | [CIPW](./08_CIPW_Norm.ipynb) | [Lattice Strain](./09_lattice_strain.ipynb) | **Extensions:** | [ML](./11_geochem_ML.ipynb) | [Spatial Data](./12_spatial_geochem.ipynb) |
| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |