| [**Overview**](./00_overview.ipynb) | [Getting Started](./01_jupyter_python.ipynb) | **Examples:** | [Access](./02_accessing_indexing.ipynb) | [Transform](./03_transform.ipynb) | [Plotting](./04_simple_vis.ipynb) | [Norm-Spiders](./05_norm_spiders.ipynb) | [Minerals](./06_minerals.ipynb) | **Workflows:** | [lambdas](./07_lambdas.ipynb) | [CIPW](./08_CIPW_Norm.ipynb)  | [ML](./11_geochem_ML.ipynb) | [Spatial Data](./12_spatial_geochem.ipynb) |
| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |

## Getting Started with Geopandas


In this notebook we'll look at working with geochemistry in a spatial context, mainly looking at [`geopandas`](https://geopandas.org/en/stable/). We'll also look at how to bring some *simple* interactivity to your `matplotlib` figures, which could also be applied to any non-spatial case.

For this example, we'll use a legacy laterite geochemistry dataset with citation below, which is available via the CSIRO Data Access Portal (DAP):

> Smith, Ray (1987): Laterite geochemistry in the CSIRO-AGE Database - Legacy data. v1. CSIRO. Data Collection. 
https://doi.org/10.25919/9dsm-wr21

See [the README]([../data/laterites/README.md]) for more on this dataset.

In [None]:
import contextily as cx
import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import pyrolite

df = pd.read_excel("../data/laterites/laterites.xlsx").set_index("SampNo")
df.head()

I've cleaned up this dataframe from it's original form slightly - but there are still some things to clean up (a good example of what that involves!) - below we go through the column names and pull out the element and unit component for things in the form `Zr_ppm` - so we can rescale everything to the same units (not always required, but often useful). We also make sure the `DATE` column refers to a numerical value - in this case a geological age:

In [None]:
from pyrolite.util.units import scale

mapping = {
    c: {"element": c.split("_")[0], "unit": c.split("_")[0].lower()}
    for c in df.columns
    if "_" in c
}
scales = {c: scale(d["unit"], "wt%") for c, d in mapping.items()}

# rescale the values
for c, v in scales.items():
    df[c] *= v

# rename the columns
df = df.rename(columns={c: d["element"] for c, d in mapping.items()})

df["DATE"] = df["DATE"].apply(pd.to_numeric, errors="coerce")

Now we've cleaned up the dataframe, we can look at putting into a `geopandas.GeoDataFrame` - which adds some spatial-specific capability, where we can define the geometry of what's in the table:

In [None]:
gdf = gpd.GeoDataFrame(
    df.drop(columns=["LATITUDE", "LONGITUDE"]),
    geometry=gpd.points_from_xy(df["LONGITUDE"], df["LATITUDE"]),
    crs="WGS84",  # lat-long
)

We can observe that the Coordinate Refernence System (CRS) is stored on the dataframe:

In [None]:
gdf.crs

When looking at the columns, we can also see that there's an extra one - `geometry` (it's *special*):

In [None]:
gdf.columns

We can see that this contains our lat-long coordinates:

In [None]:
gdf.geometry

In [None]:
gdf.geometry.iloc[0].x  # the x coordinate from the first point

All of this enables some handy API shortcuts - where e.g. `.plot()` defaults to a spatial form:

In [None]:
gdf.plot()

We can pass optional parameters to get more out of this default plot:

In [None]:
colour_by = "Fe2O3"
# plot the data from our dataset, coloured by the column selected
ax = gdf.plot(c=gdf[colour_by])
plt.colorbar(ax.collections[0], label=colour_by)  # add a colourbar for the variable


## Quick Look at the Chemistry

We can have a look at how this chemistry looks like, here normalizing to an upper-continental crustal reference composition (Rudnick and Gao, 2014) and colouring by the age class:

In [None]:
gdf[~pd.isna(gdf.AGE3)].pyrochem.elements.pyrochem.normalize_to(
    "UCC_RG2014", units="ppm"
).pyroplot.spider(
    figsize=(15, 8),
    c=gdf.AGE3[~pd.isna(gdf.AGE3)],
    index_order="incompatibility",
    alpha=0.5,
    unity_line=True,
)

## Looking at Geochemical PCA in Spatial Context

In [None]:
from sklearn.decomposition import PCA

n_components = 5
pca = PCA(n_components=n_components)

In [None]:
input_df = (
    gdf.pyrochem.elements.apply(
        lambda x: np.where(x > 0, x, np.nanmin(x[x > 0] / 3))
    )  # ~replace by third of detection limit
    .pyrochem.normalize_to("UCC_RG2014", units="ppm")
    .dropna(how="all", axis=1)
    .apply(np.log)
)

In [None]:
pca_scores = gpd.GeoDataFrame(
    pca.fit_transform(input_df),
    columns=["PCA{}".format(ix) for ix in range(n_components)],
    geometry=gdf.geometry.values,
    dtype="float",
)
pca_scores

In [None]:
pd.DataFrame(
    pca.components_,
    columns=input_df.columns,
    index=["PCA{}".format(ix) for ix in range(n_components)],
    dtype="float",
).pyroplot.spider(
    figsize=(12, 4),
    c=["PCA{}".format(ix) for ix in range(n_components)],
    logy=False,
    index_order="incompatibility",
)

In [None]:
from pyrolite.plot.color import process_color  # bug in geopandas colour processing?

cmap = plt.get_cmap("cividis").copy()

fig, ax = plt.subplots(1, n_components, sharex=True, sharey=True, figsize=(15, 3))
ax = list(ax.flat)
for a, c in zip(ax, pca_scores.columns.tolist()):
    a.set_title(c)
    a = pca_scores.plot(
        ax=a,
        c=process_color(pca_scores[c].values, cmap="cividis")["c"],
    )
plt.tight_layout()

## Adding Basemaps with Contextily

In [None]:
ax = gdf.plot(c=gdf[colour_by])
plt.colorbar(ax.collections[0], label=colour_by)
# add a basemap under our dataset
cx.add_basemap(ax, crs=gdf.crs.to_string())

In [None]:
%matplotlib widget

ax = gdf.plot(c=gdf[colour_by], figsize=(6, 12))
plt.colorbar(ax.collections[0], label=colour_by, shrink=0.5)
# add a basemap under our dataset, with the ESRI satellite imagery
cx.add_basemap(ax, crs=gdf.crs.to_string(), source=cx.providers.Esri.WorldImagery)
plt.show()

## Exporting for External Use

You can easily re-export the data to the original format (here, `shapefile`), or instead export to something less-platform dependent/open like `geopackage` (a single file with spatial information, instead of multiple for `.shp`):

In [None]:
gdf.to_file("../data/laterites/processed_laterites.shp")

In [None]:
gdf.to_file("../data/laterites/processed_laterites.gpkg")

You could download these and open them in e.g. QGIS.

| [**Overview**](./00_overview.ipynb) | [Getting Started](./01_jupyter_python.ipynb) | **Examples:** | [Access](./02_accessing_indexing.ipynb) | [Transform](./03_transform.ipynb) | [Plotting](./04_simple_vis.ipynb) | [Norm-Spiders](./05_norm_spiders.ipynb) | [Minerals](./06_minerals.ipynb) | **Workflows:** | [lambdas](./07_lambdas.ipynb) | [CIPW](./08_CIPW_Norm.ipynb)  | [ML](./11_geochem_ML.ipynb) | [Spatial Data](./12_spatial_geochem.ipynb) |
| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |