Skip to content

Commit

Permalink
Remove .ipynb from .gitignore
Browse files Browse the repository at this point in the history
  • Loading branch information
sarahshi committed Dec 7, 2023
1 parent 3e16fcc commit eb86a58
Show file tree
Hide file tree
Showing 4 changed files with 1,406 additions and 1 deletion.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.DS_Store
__pycache__
*.ipynb
/Data_Clean/TrainingData_Cleanup.ipynb
*.icloud
/GEOROC_minerals/
/backup/
Expand Down
257 changes: 257 additions & 0 deletions docs/examples/ml_models/mineralML_colab.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,257 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# %% \n",
"\n",
"\"\"\" Created on November 13, 2023 // @author: Sarah Shi \"\"\"\n",
"\n",
"import os\n",
"import numpy as np\n",
"import pandas as pd\n",
"from sklearn.metrics import classification_report\n",
"\n",
"!pip install torch torchaudio torchvision torchtext\n",
"!pip install -i https://test.pypi.org/simple/ mineralML==0.0.0.0\n",
"import mineralML as mm\n",
"\n",
"from google.colab import files\n",
"\n",
"%matplotlib inline\n",
"%config InlineBackend.figure_format = 'retina'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"We have loaded in the mineralML Python package with trained machine learning models for classifying minerals. Examples workflows working with these spectra can be found on the [ReadTheDocs](https://mineralML.readthedocs.io/en/latest/). \n",
"\n",
"The Google Colab implementation here aims to get your electron microprobe compositions classified and processes. We remove degrees of freedom to simplify the process. The igneous minerals considered for this study include: amphibole, apatite, biotite, clinopyroxene, garnet, ilmenite, K-feldspar, magnetite, muscovite, olivine, orthopyroxene, plagioclase, quartz, rutile, spinel, tourmaline, and zircon. \n",
"\n",
"The files necessary include a CSV file containing your electron microprobe analyses in oxide weight percentages. Find an example [here](https://github.com/sarahshi/mineralML/blob/main/Validation_Data/lepr_allphases_lim.csv). The necessary oxides are $SiO_2$, $TiO_2$, $Al_2O_3$, $FeO_t$, $MnO$, $MgO$, $CaO$, $Na_2O$, $K_2O$, $Cr_2O_3$. For the oxides not analyzed for specific minerals, the preprocessing will fill in the nan values as 0. \n",
"\n",
"We will apply both supervised and unsupervised machine learning models to the dataset. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# I. Supervised Machine Learning with Bayesian Neural Networks with Variational Inference\n",
"\n",
"# Load your CSV file here: "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"data_directory = \"/content/YOUR_DATA/\"\n",
"\n",
"# Create the directory if it doesn't exist\n",
"if not os.path.exists(data_directory):\n",
" os.makedirs(data_directory)\n",
"\n",
"# Change the current working directory\n",
"os.chdir(data_directory)\n",
"\n",
"# Upload the files\n",
"uploaded_files = files.upload()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Read in the CSV file and prepare for analysis (fill in nans, limit to trained igneous minerals): "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"# Read in your dataframe of mineral data, called DF.csv. \n",
"# Prepare the dataframe by removing rows with too many NaNs, filling some with zeros, and filtering to the minerals described by mineralML. \n",
"\n",
"df_load = mm.load_df('DF.csv')\n",
"df_nn, _ = mm.prep_df_nn(df_load)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Apply the Bayesian neural network with variational inference to your data:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"df_pred_nn, probability_matrix = mm.predict_class_prob_nn(df_nn)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Examine the classifications report for your microanalyses, plot performance: "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"# Look at the predicted mineral dataframe. \n",
"\n",
"df_pred_nn\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"# Create a classification report to determine the accuracy, precision, f1, etc. \n",
"\n",
"bayes_valid_report = classification_report(\n",
" df_nn['Mineral'], df_pred_nn['Predict_Mineral'], zero_division=0\n",
")\n",
"print(\"LEPR Validation Report:\\n\", bayes_valid_report)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"# Create and plot a confusion matrix \n",
"\n",
"cm = mm.confusion_matrix_df(df_nn['Mineral'], df_pred_nn['Predict_Mineral'])\n",
"print(\"LEPR Confusion Matrix:\\n\", cm)\n",
"cm[cm < len(df_pred_nn['Predict_Mineral'])*0.0005] = 0\n",
"mm.pp_matrix(cm, savefig = 'none') \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Excellent, these classifications now provide the most likely minerals, along with associated probabilities. Let's turn to unsupervised learning, to visualize these minerals in latent space. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# I. Unsupervised Machine Learning with Autoencoders and Clustering (HDBSCAN)\n",
"\n",
"# Prepare the same CSV as above for analysis: "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"df_ae, _ = mm.prep_df_ae(df_load)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Apply the Bayesian neural network with variational inference to your data:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"df_pred_ae = mm.predict_class_prob_ae(df_ae)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Examine the output: "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"df_pred_ae\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Plot your data in latent space: "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"mm.plot_latent_space(df_pred_ae)\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "science",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.18"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
891 changes: 891 additions & 0 deletions docs/examples/traindata_clean/TrainingData_Cleanup.ipynb

Large diffs are not rendered by default.

Loading

0 comments on commit eb86a58

Please sign in to comment.