Magnetite Oxides: Partially vs Heavily Serpentinized Rock
=========================================================



## Introduction



Are the magnetite compositions in partially and heavily serpentinized rock statistically similar to each other? The statistical analysis of two multivariate compositional datasets is not a trivial problem. The two datasets have the same number of components/simplices (8), which is also smaller than the number of datapoints (i.e. this is a problem of low-dimensional statistics). However, the data is also highly skewed (i.e. the number of components dominating the composition is much smaller than the total number of components).

One way of approaching this issue is using PERMANOVA/PERMDISP, which takes a distance matrix and tests whether they are similar in center and/or dispersion in groups (PERMANOVA), and if so, which one (PERMDISP) \citep{Anderson2006,Bruckner2017}. PERMDISP would only be necessary if PERMANOVA returns dissimilarity. For compositional data, distances are not simple Euclidean distances of the data itself, but rather the Euclidean distance of the centered log-ratio transformed data, i.e. the Aitchison distance \citep{Quinn2019}. This analysis is performed on the raw elemental compositional data (i.e. not converted into oxide wt%). Since the calibration samples return expected sums, the difference between magnetite sums and 100 wt% is treated as real data (some missing component X) - i.e. closure to 100 wt% will involve adding a component rather than renormalizing existing components. In order to permit PERMANOVA and PERMDISP analysis, negative values are also set to zero (relevant to SiO2, whose values are very low anyway).



## All Magnetites Grouped



In the first instance, magnetite associations will also be grouped together for each sample.



In [1]:
library(robCompositions)
library(vegan)

classify_group <- function(sample) {
  if (grepl("M04|M08",sample)) {
    return ("P")
  } else if (grepl("06C|M01",sample)) {
    return ("H")
  }
}

read_data <- read.csv("magnetites.csv",stringsAsFactors=TRUE)
# Isolate only relevant rows
data <- read_data[read_data$Mineral %in% c("srp","clc","early-clc"),]
# Find serpentinization groupings.
groups <- factor(unlist(lapply(data$Comment,classify_group)))
# Remove irrelevant columns.
data$Mineral <- NULL
data$Comment <- NULL
# Clip lower bound to zero.
data[data<0] <- 0
# Normalize data using an additional component.
data$X <- 100 - rowSums(data)

With the cleaned data, the first step is to visualize variation in the data to permit a qualitative analysis of difference. This requires the dimensionality reduction (ordination) of the multivariate data to bivariate data that can be displayed on a 2D plot. This can be approached using Non-metric MultiDimensional Scaling (NMDS) analysis on a matrix of distances suitable for compositional data (i.e. Aitchison distances) as implemented by the R function `metaMDS`). However, it should also be noted that the results of NMDS are non-unique (and also that the axes orientations are arbitrary).



In [1]:
# Compute Aitchison distance matrix.
dist <- vegdist(data,method="robust.aitchison")
# Perform NMDS analysis.
mds_result <- metaMDS(dist,try=50)
# Declare serp group colors (green for partially serp, purple for heavily serp).
color_vec <- c(rep("forestgreen",summary(groups)[1]),rep("darkorchid",summary(groups)[2]))
# Plot points.
plot(mds_result,xlab="NMDS1",ylab="NMDS2")
points(mds_result,pch=21,bg=color_vec)
# Add legend.
legend("topright",c("Partial","Heavy"),pt.bg=c("forestgreen","darkorchid"),pch=21)

This plot reveals a slight difference in the position of the densest cluster of the two groups. This can be quantitatively checked using PERMANOVA/PERMDISP. Since there's a slight degree of randomness associated with PERMANOVA, a number of repeats will be performed and collection of P values found. If p<0.05 (i.e. alpha level of 0.05), then the two samples are statistically different.

For PERMANOVA:

-   H<sub>0</sub>: samples have the same centroid and dispersion
-   H<sub>1</sub>: H<sub>0</sub> is false



In [1]:
# Declare number of PERMANOVA repeats.
n_runs <- 10
# Preallocate vector to store p values in.
p_vals <- rep(0,n_runs)
# Iterate through requested number of repeats.
for (i in 1:n_runs){
  # Perform PERMANOVA with the correct distance metric for compositional data (Aitchison).
  result <- adonis2(data~group,data=data.frame(group=groups),method="robust.aitchison")
  # Extract p value from result.
  p <- result[["Pr(>F)"]][1]
  # Store p value.
  p_vals[i] <- p
}

print(p_vals)

[1] 0.014 0.008 0.011 0.020 0.007 0.010 0.013 0.015 0.013 0.009

P values are all <0.05, indicating samples are different in centroid and/or dispersion. To determine whether the rejection of H<sub>0</sub> is an artefact of different sample dispersions, a PERMDISP analysis can be performed. Two approachs are applied here (ANOVA of the dispersions and permutation test of the dispersions; permutest can produce very slightly different results each run). Note: these tests are applied since the dispersions (which take Aitchison distance inputs from compositional) no longer represent compositional data.

For PERMDISP:

-   H<sub>0</sub>: samples have same dispersion
-   H<sub>1</sub>: H<sub>0</sub> is false



In [1]:
# Perform PERMDISP.
dispersion_result <- betadisper(dist,groups)
# Plot PERMDISP results, which show the centroids and dispersion of the two samples.
plot(dispersion_result)
# Test results of PERMDISP.
anova(dispersion_result)
permutest(dispersion_result)

#+begin_example
Analysis of Variance Table

Response: Distances
          Df Sum Sq Mean Sq F value  Pr(>F)
Groups     1  32.47  32.470   6.585 0.01202 *
Residuals 86 424.06   4.931
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Permutation test for homogeneity of multivariate dispersions
Permutation: free
Number of permutations: 999

Response: Distances
          Df Sum Sq Mean Sq     F N.Perm Pr(>F)
Groups     1  32.47  32.470 6.585    999   0.01 **
Residuals 86 424.06   4.931
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#+end_example

P values for the PERMDISP tests are also <0.05, which suggests a difference in the dispersion of the two samples. This is consistent with what can be seen in the plot. As such, it's not possible to say that the centroids are different based on the combined PERMANOVA-PERMDISP results. However, this difference in dispersion could well be an artefact of combining multiple populations of magnetite into one large group for each of the two serpentinization levels, as well as there being different amounts of magnetite from each association in the samples.



## Serpentine-Magnetite Association Only



As such, only the serpentine magnetite association (the most common in the dataset) will be tested.



In [1]:
data <- read_data[read_data$Mineral=="srp",]
# Find serpentinization groupings.
groups <- factor(unlist(lapply(data$Comment,classify_group)))
# Remove irrelevant columns.
data$Mineral <- NULL
data$Comment <- NULL
# Clip lower bound to zero.
data[data<0] <- 0
# Normalize data using an additional component.
data$X <- 100 - rowSums(data)
# Compute Aitchison distance matrix.
dist <- vegdist(data,method="robust.aitchison")
# Perform NMDS analysis.
mds_result <- metaMDS(dist,try=50)
# Declare serp group colors (green for partially serp, purple for heavily serp).
color_vec <- c(rep("forestgreen",summary(groups)[1]),rep("darkorchid",summary(groups)[2]))
# Plot points.
plot(mds_result,xlab="NMDS1",ylab="NMDS2")
points(mds_result,pch=21,bg=color_vec)
# Add legend.
legend("topright",c("Partial","Heavy"),pt.bg=c("forestgreen","darkorchid"),pch=21)

The ordination plot reveals a slight difference in the centroids and less so in the dispersion of the two groups, which can be checked using PERMANOVA/PERMDISP.



In [1]:
# Declare number of PERMANOVA repeats.
n_runs <- 10
# Preallocate vector to store p values in.
p_vals <- rep(0,n_runs)
# Iterate through requested number of repeats.
for (i in 1:n_runs){
  # Perform PERMANOVA with the correct distance metric for compositional data (Aitchison).
  result <- adonis2(data~group,data=data.frame(group=groups),method="robust.aitchison")
  # Extract p value from result.
  p <- result[["Pr(>F)"]][1]
  # Store p value.
  p_vals[i] <- p
}

print(p_vals)

[1] 0.018 0.024 0.026 0.013 0.014 0.027 0.028 0.025 0.016 0.022

The p values are all <0.05, suggesting difference in centroid and/or dispersion.



In [1]:
# Perform PERMDISP.
dispersion_result <- betadisper(dist,groups)
# Plot PERMDISP results, which show the centroids and dispersion of the two samples.
plot(dispersion_result)
# Test results of PERMDISP.
anova(dispersion_result)
permutest(dispersion_result)

#+begin_example
Analysis of Variance Table

Response: Distances
          Df  Sum Sq Mean Sq F value  Pr(>F)
Groups     1  12.616 12.6163  2.8835 0.09393 .
Residuals 70 306.273  4.3753
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Permutation test for homogeneity of multivariate dispersions
Permutation: free
Number of permutations: 999

Response: Distances
          Df  Sum Sq Mean Sq      F N.Perm Pr(>F)
Groups     1  12.616 12.6163 2.8835    999  0.083 .
Residuals 70 306.273  4.3753
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#+end_example

P values >0.05 suggest the dispersions are similar at alpha=0.05 - i.e. no significant different in the dispersions between the two samples. Therefore, if just the serpentine samples are analyzed, there's a statistically significant (at alpha=0.05) difference in the centroids of the magnetite compositions in partially vs heavily serpentinized rock.



## Outliers?



Checking if there are any obvious outliers in the input data to the serpentinite-magnetite run and determining whether they should be kept in or not.



In [1]:
display_data <- read_data[read_data$Mineral=="srp",]
# Find serpentinization groupings.
display_data$group <- factor(unlist(lapply(display_data$Comment,classify_group)))
# Remove irrelevant columns.
display_data$Mineral <- NULL
display_data$Comment <- NULL
# Split dataframe by serpentinization group.
split_data <- split(display_data,display_data$group)
print(dim(split_data$P))
split_data$P

#+begin_example
[1] 43  9
        Si     Ti     Mn     Cr     Ni       O     Mg      Fe group
1   0.0200 0.4273 0.1568 8.4314 0.5432 24.5191 0.8825 61.1706     P
3   0.0175 0.4908 0.1615 8.5605 0.5807 24.9371 0.9406 60.0201     P
4   0.0125 0.4276 0.1673 8.0824 0.5583 24.8685 0.9084 60.7955     P
5   0.0143 0.5173 0.1810 8.1994 0.5666 25.1454 1.0875 60.1956     P
6   0.0116 0.4586 0.1743 8.2774 0.5857 24.6812 1.1042 60.8217     P
7   0.0075 0.4761 0.1884 8.3825 0.5430 24.8611 1.1361 60.4540     P
8   0.0090 0.5260 0.1747 8.4998 0.5447 25.1308 1.1084 60.6327     P
9   0.0110 0.4681 0.1807 8.4334 0.5681 25.2634 1.1230 59.6869     P
10  0.1120 0.4606 0.1812 8.3611 0.5848 25.3086 1.2903 59.4608     P
11  0.0124 0.4559 0.1902 8.4589 0.6046 25.0890 1.1482 59.9646     P
12  0.0040 0.4983 0.1855 9.2917 0.6061 24.7787 0.8965 59.1426     P
13  0.0233 0.5014 0.2091 8.1722 0.6135 24.8663 1.0832 60.2864     P
14  0.0018 0.4520 0.1745 8.8266 0.5346 24.9670 0.9702 60.0638     P
15 -0.0147 0.4768 0.22

Rows 86, 87, 88, 89 appear to be unusually enriched in Ti and depleted in Cr in the partially serpentinized samples (n=43). If equivalent examples can be found in the heavily serpentinized dataframe, then they can remain, otherwise this would require more investigation.



In [1]:
print(dim(split_data$H))
split_data$H

#+begin_example
[1] 29  9
        Si      Ti     Mn      Cr     Ni       O     Mg      Fe group
22 -0.0106  0.1377 0.2739  7.1392 0.7218 24.9776 0.7029 62.0503     H
23 -0.0084  0.1451 0.2647  7.5954 0.7338 24.4823 0.8811 62.1617     H
24  0.0081  0.1302 0.2359  7.2319 0.7984 25.4284 0.8306 61.8185     H
25  0.0031  0.1434 0.2715  8.4725 0.8384 25.2233 1.0200 60.8537     H
26  0.0006  0.1452 0.2961  8.5098 0.7619 24.8880 0.8809 60.9997     H
34  0.0162  0.1341 0.4078  9.6471 0.8181 24.9720 0.7133 59.0917     H
35  0.0007  0.1300 0.2611  9.2874 0.8725 24.8885 0.9242 60.2426     H
36 -0.0038  0.1508 0.2589  9.9727 0.8597 25.0330 1.2975 58.7509     H
37  0.0455  0.1562 0.2705 10.3972 0.8341 24.8324 1.3017 57.6059     H
38 -0.0034  0.1491 0.2537  9.5891 0.8302 24.8046 1.7039 58.4670     H
39  0.0143  0.1400 0.2615 10.1305 0.8608 24.9899 1.2686 58.5401     H
40  0.0015  0.1580 0.2487  9.7127 0.8314 24.7430 1.5958 58.6845     H
41  0.0175  0.1340 0.2381  9.2087 0.6374 25.2137 1.0260 59.9283 

No high-Ti/low-Cr equivalent is found in the heavily serpentinized sample (n=29). The first step is to take a look at what was imaged for samples 86-89, which correspond to analysis sites 23C-M08-ox23 to -ox26.



In [1]:
read_data[c(86:89),]

#+begin_example
   Mineral     Si     Ti     Mn     Cr     Ni       O     Mg      Fe
86     srp 0.1403 4.9178 0.0727 2.5007 0.0434 25.1739 0.4325 62.2677
87     srp 0.0260 4.8287 0.0136 2.6530 0.0677 26.8372 0.2676 61.1487
88     srp 0.4895 5.3379 0.1831 2.6688 0.0565 26.4843 0.8803 59.4180
89     srp 0.0263 5.2223 0.0410 2.4292 0.0600 25.2439 0.2677 61.2823
        Comment
86 23C-M08-ox23
87 23C-M08-ox24
88 23C-M08-ox25
89 23C-M08-ox26
#+end_example

There is nothing notable about these serpentine-associated grains, which is unusual given their compositional contrast with the other grains. They have similar morphologies to other grains (e.g. ox21, ox22, ox27), and are within 1 mm of those other grains. As such, this might just be natural variation in grain composition (e.g. due different fluid flow paths, or growth closer to a Ti-rich parent crystal).

![img](./imgs/M08-ox.png "RL scan of M08. Width of photo ~1 mm.")

Regardless of the mechanism giving rise to their different composition, this compositionally-defined magnetite composition appears to have not been sampled elsewhere, so it's worth repeating the previous analysis after ignoring these "outliers".



In [1]:
data <- read_data[read_data$Mineral=="srp",]
# Isolate only rows without unusually high Ti (here anything above 1 wt% being considered unusually high, which catches just rows 86-89).
data <- data[data$Ti<1,]
# Find serpentinization groupings.
groups <- factor(unlist(lapply(data$Comment,classify_group)))
# Remove irrelevant columns.
data$Mineral <- NULL
data$Comment <- NULL
# Clip lower bound to zero.
data[data<0] <- 0
# Normalize data using an additional component.
data$X <- 100 - rowSums(data)
# Compute Aitchison distance matrix.
dist <- vegdist(data,method="robust.aitchison")
# Perform NMDS analysis.
mds_result <- metaMDS(dist,try=50)
# Declare serp group colors (green for partially serp, purple for heavily serp).
color_vec <- c(rep("forestgreen",summary(groups)[1]),rep("darkorchid",summary(groups)[2]))
# Plot points.
plot(mds_result,xlab="NMDS1",ylab="NMDS2")
points(mds_result,pch=21,bg=color_vec)
# Add legend.
legend("topright",c("Partial","Heavy"),pt.bg=c("forestgreen","darkorchid"),pch=21)

A slight difference is visible in the ordination plot.



In [1]:
# Declare number of PERMANOVA repeats.
n_runs <- 10
# Preallocate vector to store p values in.
p_vals <- rep(0,n_runs)
# Iterate through requested number of repeats.
for (i in 1:n_runs){
  # Perform PERMANOVA with the correct distance metric for compositional data (Aitchison).
  result <- adonis2(data~group,data=data.frame(group=groups),method="robust.aitchison")
  # Extract p value from result.
  p <- result[["Pr(>F)"]][1]
  # Store p value.
  p_vals[i] <- p
}

print(p_vals)

[1] 0.056 0.057 0.051 0.048 0.050 0.034 0.046 0.050 0.045 0.037

The p values are not all 0.05, suggesting there may not be a difference in centroid and/or dispersion. Therefore, the visual difference in the plot may not be sufficiently significant to declare a difference in compositions after the four high-Ti/low-Cr outliers are removed.



In [1]:
# Perform PERMDISP.
dispersion_result <- betadisper(dist,groups)
# Plot PERMDISP results, which show the centroids and dispersion of the two samples.
plot(dispersion_result)
# Test results of PERMDISP.
anova(dispersion_result)
permutest(dispersion_result)

#+begin_example
Analysis of Variance Table

Response: Distances
          Df  Sum Sq Mean Sq F value  Pr(>F)
Groups     1  21.114 21.1141  5.0528 0.02793 *
Residuals 66 275.795  4.1787
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Permutation test for homogeneity of multivariate dispersions
Permutation: free
Number of permutations: 999

Response: Distances
          Df  Sum Sq Mean Sq      F N.Perm Pr(>F)
Groups     1  21.114 21.1141 5.0528    999  0.023 *
Residuals 66 275.795  4.1787
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#+end_example

P values <0.05 suggest the dispersions are different at alpha=0.05. This is somewhat unexpected given the PERMANOVA result, but could be interpreted as "if PERMANOVA did return a significant result that result in the rejection of the PERMANOVA H<sub>0</sub>, it may be an artefact of dispersion rather than difference in centroids".



## Summary Table




| Data|PERMANOVA|PERMDISP|Difference Interpretation|
|---|---|---|---|
| Grouped|&lt;0.05|&lt;0.05|Dispersion|
| Srp Assoc only|&lt;0.05|&gt;0.05|Centroid|
| Srp Assoc minus high Ti outliers|$&asymp;$0.05|&lt;0.05|Dispersion*|

\\\* Dispersion being different only if PERMANOVA suggests a difference in the two samples.

The P value of the PERMANOVA test is low for all of the cases, but not always <0.05, which suggests there's likely a difference between the two samples. Whether this difference is an artefact of different dispersion is more difficult to determine due to inconsistent PERMDISP results with increasingly narrower data filtering criteria. This may suggest that the broad range of heterogeneity in partially vs heavily serpentinized samples has not been sampled (or that the partially vs heavily serpentinized groups contain multiple compositionally-distinct subgroups in different proportions), and so it's difficult to draw any conclusions regarding compositional differences between the two groups.

