Merge pull request #313 from adriangzlz97/main

Add option in cryoDRGN_filtering jupyter notebook to filter by choosing UMAP/PC values
ml-struct-bio · Nov 13, 2023 · 9e77638 · 9e77638
2 parents bf27a0b + f8f9733
commit 9e77638
Showing 1 changed file with 96 additions and 2 deletions.
diff --git a/cryodrgn/templates/cryoDRGN_filtering_template.ipynb b/cryodrgn/templates/cryoDRGN_filtering_template.ipynb
@@ -12,6 +12,7 @@
     "* clustering of the latent space (k-means or Gaussian mixture model)\n",
     "* outlier detection (Z-score)\n",
     "* interactive selection with a lasso tool\n",
+    "* selection by UMAP or PC values\n",
     "\n",
     "For each method, the selected particles are tracked in the variable, `ind_selected`.\n",
     "\n",
@@ -845,6 +846,99 @@
     "plt.ylabel('UMAP2')"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# UMAP/PC selection"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Load data into a pandas dataframe\n",
+    "df = analysis.load_dataframe(z=z, \n",
+    "                             pc=pc, \n",
+    "                             euler=euler, \n",
+    "                             trans=trans, \n",
+    "                             labels=kmeans_labels, \n",
+    "                             umap=umap,\n",
+    "                             df1=ctf_params[:,2],\n",
+    "                             df2=ctf_params[:,3],\n",
+    "                             dfang=ctf_params[:,4],\n",
+    "                             phase=ctf_params[:,8])\n",
+    "df.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Selection by UMAP/PC values\n",
+    "\n",
+    "In the next cell, you can select different indexes using UMAP or PC values. Change the values in the selection, and add more selections if necessary. The default is UMAP1 and UMAP2, you can change that by changing the 'UMAP1' by your desired field (e.g. PC1).\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 1 selection\n",
+    "ind_selected1 = df.index[(df['UMAP1'] >= -5) & (df['UMAP1'] <= 5) & (df['UMAP2'] >= -5) & (df['UMAP2'] <= 5)]\n",
+    "ind_selected1 = np.array(ind_selected1)\n",
+    "ind_selected = ind_selected1\n",
+    "# 2 selections\n",
+    "#ind_selected2 = df.index[(df['UMAP1'] >= -5) & (df['UMAP1'] <= 5) & (df['UMAP2'] >= -5) & (df['UMAP2'] <= 5)]\n",
+    "#ind_selected2 = np.array(ind_selected2)\n",
+    "#ind_selected = np.append(ind_selected1, ind_selected2)\n",
+    "#ind_selected = np.unique(ind_selected)\n",
+    "# 3 selections\n",
+    "#ind_selected3 = df.index[(df['UMAP1'] >= -5) & (df['UMAP1'] <= 5) & (df['UMAP2'] >= -5) & (df['UMAP2'] <= 5)]\n",
+    "#ind_selected3 = np.array(ind_selected3)\n",
+    "#ind_selected = np.append(ind_selected, ind_selected3)\n",
+    "#ind_selected = np.unique(ind_selected)\n",
+    "\n",
+    "ind_selected_not = invert_selection(ind_selected)\n",
+    "\n",
+    "print('Selected indices:')\n",
+    "print(ind_selected)\n",
+    "print('Number of selected points:')\n",
+    "print(len(ind_selected))\n",
+    "print('Number of unselected points:')\n",
+    "print(len(ind_selected_not))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# View PCA\n",
+    "plt.scatter(pc[:,0], pc[:,1], alpha=.1, s=1)\n",
+    "plt.scatter(pc[ind_selected,0], pc[ind_selected,1], alpha=.1, s=1)\n",
+    "plt.xlabel('PC1 ({:.2f})'.format(pca.explained_variance_ratio_[0]))\n",
+    "plt.ylabel('PC2 ({:.2f})'.format(pca.explained_variance_ratio_[1]))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# View umap\n",
+    "plt.scatter(umap[:,0], umap[:,1], alpha=.1, s=1)\n",
+    "plt.scatter(umap[ind_selected,0], umap[ind_selected,1], alpha=.1, s=1)\n",
+    "plt.xlabel('UMAP1')\n",
+    "plt.ylabel('UMAP2')"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -980,7 +1074,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
    "name": "python3"
   },
@@ -994,7 +1088,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.3"
+   "version": "3.9.13"
   }
  },
  "nbformat": 4,