Skip to content

Latest commit

 

History

History
124 lines (81 loc) · 6.44 KB

3-clustering.rst

File metadata and controls

124 lines (81 loc) · 6.44 KB

Clustering

The clustering filter provides a convenient way to separate compositionally distinct materials within your ablations, using multi-dimensional clustering algorithms.

Two algorithms are currently available in latools: * K-Means will divide the data up into N groups of equal variance, where N is a known number of groups. * Mean Shift will divide the data up into an arbitrary number of clusters, based on the characteristics of the data.

For an in-depth explanation of these algorithms and how they work, take a look at the Scikit-Learn clustering pages.

For most cases, we recommend the K-Means algorithm, as it is relatively intuitive and produces more predictable results.

2D Clustering Example

For illustrative purposes, consider some 2D synthetic data:

Two 'clusters' in composition are evident in the data, which can be separated by clustering algorithms.

The main difference here is that the MeanShift algorithm has identified the transition points (orange) as a separate cluster.

Once the clusters are identified, they can be translated back into the time-domain to separate the signals in the original data:

For simplicity, the example above considers the relationship between two signals (i.e. 2-D). When creating a clustering filter on real data, multiple analytes may be included (i.e. N-D). The only limits on the number of analytes you can include is the number of analytes you've measured, and how much RAM your computer has.

If, for example, your ablation contains three distinct materials with variations in five analytes, you might create a K-Means clustering filter that takes all five analytes, and separates them into three clusters.

When to use a Clustering Filter

Clustering filters should be used to discriminate between clearly different materials in an analysis. Results will be best when they are based on signals with clear sharp changes, and high signal/noise (as in the above example).

Results will be poor when data are noisy, or when the transition between materials is very gradual. In these cases, clustering filters may still be useful after you have used other filters to remove the transition regions - for example gradient-threshold or correlation filters.

Clustering Filter Design

A good place to start when creating a clustering filter is by looking at a cross-plot of your analytes:

A crossplot provides an overview of your data, and allows you to easily identify relationships between analytes. In this example, multiple levels of Sr88 concentration are evident, which we might want to separate. Three Sr88 groups are evident, so we will create a K-Means filter with three clusters:

The clustering filter has used the population-level data to identify three clusters in Sr88 concentration, and created a filter based on these concentration levels.

We can directly see the influence of this filter:

Here, we can see that the filter has picked out three Sr concentrations well, but that these clusters don't seem to have any systematic relationship with other analytes. This suggests that Sr might not be that useful in separating different materials in these data. (In reality, the Sr variance in these data comes from an incorrectly-tuned mass spec, and tells us nothing about the sample!)

  • ~latools.latools.analyse.crossplot creates a cross-plot of specified analytes, showing relationships within the data at the population-level (all samples). This can be useful when choosing a threshold value.
  • ~latools.latools.analyse.crossplot_filters creates a cross-plot of specified analytes with the effect of a particular filter highlighted (see above).
  • ~latools.latools.analyse.trace_plots with option filt=True creates plots of all data, showing which regions are selected/rejected by the active filters.
  • ~latools.latools.analyse.filter_on and ~latools.latools.analyse.filter_off turn filters on or off.
  • ~latools.latools.analyse.filter_clear deletes all filters.