In [None]:
# %% 

""" Created on November 13, 2023 // @author: Sarah Shi """

import os
import numpy as np
import pandas as pd
from sklearn.metrics import classification_report

import mineralML as mm

%matplotlib inline
%config InlineBackend.figure_format = 'retina'


We have loaded in the mineralML Python package with trained machine learning models for classifying minerals. Examples workflows working with these spectra can be found on the [ReadTheDocs](https://mineralML.readthedocs.io/en/latest/). 

The Google Colab implementation here aims to get your electron microprobe compositions classified and processes. We remove degrees of freedom to simplify the process. The igneous minerals considered for this study include: amphibole, apatite, biotite, clinopyroxene, garnet, ilmenite, K-feldspar, magnetite, muscovite, olivine, orthopyroxene, plagioclase, quartz, rutile, spinel, tourmaline, and zircon. 

The files necessary include a CSV file containing your electron microprobe analyses in oxide weight percentages. Find an example [here](https://github.com/sarahshi/mineralML/blob/main/Validation_Data/lepr_allphases_lim.csv). The necessary oxides are $SiO_2$, $TiO_2$, $Al_2O_3$, $FeO_t$, $MnO$, $MgO$, $CaO$, $Na_2O$, $K_2O$, $Cr_2O_3$. For the oxides not analyzed for specific minerals, the preprocessing will fill in the nan values as 0. 

We will apply both supervised and unsupervised machine learning models to the dataset. 


# I. Supervised Machine Learning (Bayesian Neural Networks with Variational Inference)


## Load and prepare data for analysis

In [None]:

# Read in your dataframe of mineral data, called DF.csv. 
# Prepare the dataframe by removing rows with too many NaNs, filling some with zeros, and filtering to the minerals described by mineralML. 

df_load = mm.load_df('lepr_valid_lim.csv')
df_nn = mm.prep_df_nn(df_load)


## Apply the trained neural network

In [None]:

df_pred_nn, probability_matrix = mm.predict_class_prob_nn(df_nn)


## Examine the predicted mineral classifications

In [None]:

df_pred_nn


In [None]:

# Create a classification report to determine the accuracy, precision, f1, etc. 

bayes_valid_report = classification_report(
    df_nn['Mineral'], df_pred_nn['Predict_Mineral'], zero_division=0
)
print("LEPR Validation Report:\n", bayes_valid_report)


In [None]:

# Create and plot a confusion matrix 

cm = mm.confusion_matrix_df(df_nn['Mineral'], df_pred_nn['Predict_Mineral'])
print("LEPR Confusion Matrix:\n", cm)
cm[cm < len(df_pred_nn['Predict_Mineral'])*0.0005] = 0
mm.pp_matrix(cm, savefig = 'none') 


Excellent, these classifications now provide the most likely minerals, along with associated probabilities. Let's turn to unsupervised learning, to visualize these minerals in latent space. 

# II. Unsupervised Machine Learning (Autoencoders and Clustering)

## Prepare data for analysis

In [None]:

df_ae, _ = mm.prep_df_ae(df_load)


## Apply the trained autoencoder

In [None]:

df_pred_ae = mm.predict_class_prob_ae(df_ae)


## Examine the latent variables and predicted mineral classifications

In [None]:

df_pred_ae


## Plot latent space 

In [None]:

mm.plot_latent_space(df_pred_ae)
