Remove .ipynb from .gitignore

sarahshi · Dec 7, 2023 · eb86a58 · eb86a58
1 parent 3e16fcc
commit eb86a58
Show file tree

Hide file tree

Showing 4 changed files with 1,406 additions and 1 deletion.
diff --git a/.gitignore b/.gitignore
@@ -1,6 +1,6 @@
 .DS_Store
 __pycache__
-*.ipynb
+/Data_Clean/TrainingData_Cleanup.ipynb
 *.icloud
 /GEOROC_minerals/
 /backup/

diff --git a/docs/examples/ml_models/mineralML_colab.ipynb b/docs/examples/ml_models/mineralML_colab.ipynb
@@ -0,0 +1,257 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# %% \n",
+    "\n",
+    "\"\"\" Created on November 13, 2023 // @author: Sarah Shi \"\"\"\n",
+    "\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "from sklearn.metrics import classification_report\n",
+    "\n",
+    "!pip install torch torchaudio torchvision torchtext\n",
+    "!pip install -i https://test.pypi.org/simple/ mineralML==0.0.0.0\n",
+    "import mineralML as mm\n",
+    "\n",
+    "from google.colab import files\n",
+    "\n",
+    "%matplotlib inline\n",
+    "%config InlineBackend.figure_format = 'retina'"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "We have loaded in the mineralML Python package with trained machine learning models for classifying minerals. Examples workflows working with these spectra can be found on the [ReadTheDocs](https://mineralML.readthedocs.io/en/latest/). \n",
+    "\n",
+    "The Google Colab implementation here aims to get your electron microprobe compositions classified and processes. We remove degrees of freedom to simplify the process. The igneous minerals considered for this study include: amphibole, apatite, biotite, clinopyroxene, garnet, ilmenite, K-feldspar, magnetite, muscovite, olivine, orthopyroxene, plagioclase, quartz, rutile, spinel, tourmaline, and zircon. \n",
+    "\n",
+    "The files necessary include a CSV file containing your electron microprobe analyses in oxide weight percentages. Find an example [here](https://github.com/sarahshi/mineralML/blob/main/Validation_Data/lepr_allphases_lim.csv). The necessary oxides are $SiO_2$, $TiO_2$, $Al_2O_3$, $FeO_t$, $MnO$, $MgO$, $CaO$, $Na_2O$, $K_2O$, $Cr_2O_3$. For the oxides not analyzed for specific minerals, the preprocessing will fill in the nan values as 0. \n",
+    "\n",
+    "We will apply both supervised and unsupervised machine learning models to the dataset. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# I. Supervised Machine Learning with Bayesian Neural Networks with Variational Inference\n",
+    "\n",
+    "# Load your CSV file here: "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "data_directory = \"/content/YOUR_DATA/\"\n",
+    "\n",
+    "# Create the directory if it doesn't exist\n",
+    "if not os.path.exists(data_directory):\n",
+    "    os.makedirs(data_directory)\n",
+    "\n",
+    "# Change the current working directory\n",
+    "os.chdir(data_directory)\n",
+    "\n",
+    "# Upload the files\n",
+    "uploaded_files = files.upload()\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Read in the CSV file and prepare for analysis (fill in nans, limit to trained igneous minerals): "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# Read in your dataframe of mineral data, called DF.csv. \n",
+    "# Prepare the dataframe by removing rows with too many NaNs, filling some with zeros, and filtering to the minerals described by mineralML. \n",
+    "\n",
+    "df_load = mm.load_df('DF.csv')\n",
+    "df_nn, _ = mm.prep_df_nn(df_load)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Apply the Bayesian neural network with variational inference to your data:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "df_pred_nn, probability_matrix = mm.predict_class_prob_nn(df_nn)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Examine the classifications report for your microanalyses, plot performance: "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# Look at the predicted mineral dataframe. \n",
+    "\n",
+    "df_pred_nn\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# Create a classification report to determine the accuracy, precision, f1, etc. \n",
+    "\n",
+    "bayes_valid_report = classification_report(\n",
+    "    df_nn['Mineral'], df_pred_nn['Predict_Mineral'], zero_division=0\n",
+    ")\n",
+    "print(\"LEPR Validation Report:\\n\", bayes_valid_report)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# Create and plot a confusion matrix \n",
+    "\n",
+    "cm = mm.confusion_matrix_df(df_nn['Mineral'], df_pred_nn['Predict_Mineral'])\n",
+    "print(\"LEPR Confusion Matrix:\\n\", cm)\n",
+    "cm[cm < len(df_pred_nn['Predict_Mineral'])*0.0005] = 0\n",
+    "mm.pp_matrix(cm, savefig = 'none') \n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Excellent, these classifications now provide the most likely minerals, along with associated probabilities. Let's turn to unsupervised learning, to visualize these minerals in latent space. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# I. Unsupervised Machine Learning with Autoencoders and Clustering (HDBSCAN)\n",
+    "\n",
+    "# Prepare the same CSV as above for analysis: "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "df_ae, _ = mm.prep_df_ae(df_load)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Apply the Bayesian neural network with variational inference to your data:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "df_pred_ae = mm.predict_class_prob_ae(df_ae)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Examine the output: "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "df_pred_ae\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Plot your data in latent space: "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "mm.plot_latent_space(df_pred_ae)\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "science",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.18"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/docs/examples/traindata_clean/TrainingData_Cleanup.ipynb b/docs/examples/traindata_clean/TrainingData_Cleanup.ipynb