-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
4 changed files
with
1,406 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
.DS_Store | ||
__pycache__ | ||
*.ipynb | ||
/Data_Clean/TrainingData_Cleanup.ipynb | ||
*.icloud | ||
/GEOROC_minerals/ | ||
/backup/ | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,257 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# %% \n", | ||
"\n", | ||
"\"\"\" Created on November 13, 2023 // @author: Sarah Shi \"\"\"\n", | ||
"\n", | ||
"import os\n", | ||
"import numpy as np\n", | ||
"import pandas as pd\n", | ||
"from sklearn.metrics import classification_report\n", | ||
"\n", | ||
"!pip install torch torchaudio torchvision torchtext\n", | ||
"!pip install -i https://test.pypi.org/simple/ mineralML==0.0.0.0\n", | ||
"import mineralML as mm\n", | ||
"\n", | ||
"from google.colab import files\n", | ||
"\n", | ||
"%matplotlib inline\n", | ||
"%config InlineBackend.figure_format = 'retina'" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"\n", | ||
"We have loaded in the mineralML Python package with trained machine learning models for classifying minerals. Examples workflows working with these spectra can be found on the [ReadTheDocs](https://mineralML.readthedocs.io/en/latest/). \n", | ||
"\n", | ||
"The Google Colab implementation here aims to get your electron microprobe compositions classified and processes. We remove degrees of freedom to simplify the process. The igneous minerals considered for this study include: amphibole, apatite, biotite, clinopyroxene, garnet, ilmenite, K-feldspar, magnetite, muscovite, olivine, orthopyroxene, plagioclase, quartz, rutile, spinel, tourmaline, and zircon. \n", | ||
"\n", | ||
"The files necessary include a CSV file containing your electron microprobe analyses in oxide weight percentages. Find an example [here](https://github.com/sarahshi/mineralML/blob/main/Validation_Data/lepr_allphases_lim.csv). The necessary oxides are $SiO_2$, $TiO_2$, $Al_2O_3$, $FeO_t$, $MnO$, $MgO$, $CaO$, $Na_2O$, $K_2O$, $Cr_2O_3$. For the oxides not analyzed for specific minerals, the preprocessing will fill in the nan values as 0. \n", | ||
"\n", | ||
"We will apply both supervised and unsupervised machine learning models to the dataset. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# I. Supervised Machine Learning with Bayesian Neural Networks with Variational Inference\n", | ||
"\n", | ||
"# Load your CSV file here: " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"\n", | ||
"data_directory = \"/content/YOUR_DATA/\"\n", | ||
"\n", | ||
"# Create the directory if it doesn't exist\n", | ||
"if not os.path.exists(data_directory):\n", | ||
" os.makedirs(data_directory)\n", | ||
"\n", | ||
"# Change the current working directory\n", | ||
"os.chdir(data_directory)\n", | ||
"\n", | ||
"# Upload the files\n", | ||
"uploaded_files = files.upload()\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Read in the CSV file and prepare for analysis (fill in nans, limit to trained igneous minerals): " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"\n", | ||
"# Read in your dataframe of mineral data, called DF.csv. \n", | ||
"# Prepare the dataframe by removing rows with too many NaNs, filling some with zeros, and filtering to the minerals described by mineralML. \n", | ||
"\n", | ||
"df_load = mm.load_df('DF.csv')\n", | ||
"df_nn, _ = mm.prep_df_nn(df_load)\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Apply the Bayesian neural network with variational inference to your data:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"\n", | ||
"df_pred_nn, probability_matrix = mm.predict_class_prob_nn(df_nn)\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Examine the classifications report for your microanalyses, plot performance: " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"\n", | ||
"# Look at the predicted mineral dataframe. \n", | ||
"\n", | ||
"df_pred_nn\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"\n", | ||
"# Create a classification report to determine the accuracy, precision, f1, etc. \n", | ||
"\n", | ||
"bayes_valid_report = classification_report(\n", | ||
" df_nn['Mineral'], df_pred_nn['Predict_Mineral'], zero_division=0\n", | ||
")\n", | ||
"print(\"LEPR Validation Report:\\n\", bayes_valid_report)\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"\n", | ||
"# Create and plot a confusion matrix \n", | ||
"\n", | ||
"cm = mm.confusion_matrix_df(df_nn['Mineral'], df_pred_nn['Predict_Mineral'])\n", | ||
"print(\"LEPR Confusion Matrix:\\n\", cm)\n", | ||
"cm[cm < len(df_pred_nn['Predict_Mineral'])*0.0005] = 0\n", | ||
"mm.pp_matrix(cm, savefig = 'none') \n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Excellent, these classifications now provide the most likely minerals, along with associated probabilities. Let's turn to unsupervised learning, to visualize these minerals in latent space. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# I. Unsupervised Machine Learning with Autoencoders and Clustering (HDBSCAN)\n", | ||
"\n", | ||
"# Prepare the same CSV as above for analysis: " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"\n", | ||
"df_ae, _ = mm.prep_df_ae(df_load)\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Apply the Bayesian neural network with variational inference to your data:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"\n", | ||
"df_pred_ae = mm.predict_class_prob_ae(df_ae)\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Examine the output: " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"\n", | ||
"df_pred_ae\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Plot your data in latent space: " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"\n", | ||
"mm.plot_latent_space(df_pred_ae)\n" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "science", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.9.18" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
891 changes: 891 additions & 0 deletions
891
docs/examples/traindata_clean/TrainingData_Cleanup.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
Oops, something went wrong.