Skip to content

Commit

Permalink
Merge pull request #313 from adriangzlz97/main
Browse files Browse the repository at this point in the history
Add option in cryoDRGN_filtering jupyter notebook to filter by choosing UMAP/PC values
  • Loading branch information
michal-g committed Nov 13, 2023
2 parents bf27a0b + f8f9733 commit 9e77638
Showing 1 changed file with 96 additions and 2 deletions.
98 changes: 96 additions & 2 deletions cryodrgn/templates/cryoDRGN_filtering_template.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
"* clustering of the latent space (k-means or Gaussian mixture model)\n",
"* outlier detection (Z-score)\n",
"* interactive selection with a lasso tool\n",
"* selection by UMAP or PC values\n",
"\n",
"For each method, the selected particles are tracked in the variable, `ind_selected`.\n",
"\n",
Expand Down Expand Up @@ -845,6 +846,99 @@
"plt.ylabel('UMAP2')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# UMAP/PC selection"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Load data into a pandas dataframe\n",
"df = analysis.load_dataframe(z=z, \n",
" pc=pc, \n",
" euler=euler, \n",
" trans=trans, \n",
" labels=kmeans_labels, \n",
" umap=umap,\n",
" df1=ctf_params[:,2],\n",
" df2=ctf_params[:,3],\n",
" dfang=ctf_params[:,4],\n",
" phase=ctf_params[:,8])\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Selection by UMAP/PC values\n",
"\n",
"In the next cell, you can select different indexes using UMAP or PC values. Change the values in the selection, and add more selections if necessary. The default is UMAP1 and UMAP2, you can change that by changing the 'UMAP1' by your desired field (e.g. PC1).\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 1 selection\n",
"ind_selected1 = df.index[(df['UMAP1'] >= -5) & (df['UMAP1'] <= 5) & (df['UMAP2'] >= -5) & (df['UMAP2'] <= 5)]\n",
"ind_selected1 = np.array(ind_selected1)\n",
"ind_selected = ind_selected1\n",
"# 2 selections\n",
"#ind_selected2 = df.index[(df['UMAP1'] >= -5) & (df['UMAP1'] <= 5) & (df['UMAP2'] >= -5) & (df['UMAP2'] <= 5)]\n",
"#ind_selected2 = np.array(ind_selected2)\n",
"#ind_selected = np.append(ind_selected1, ind_selected2)\n",
"#ind_selected = np.unique(ind_selected)\n",
"# 3 selections\n",
"#ind_selected3 = df.index[(df['UMAP1'] >= -5) & (df['UMAP1'] <= 5) & (df['UMAP2'] >= -5) & (df['UMAP2'] <= 5)]\n",
"#ind_selected3 = np.array(ind_selected3)\n",
"#ind_selected = np.append(ind_selected, ind_selected3)\n",
"#ind_selected = np.unique(ind_selected)\n",
"\n",
"ind_selected_not = invert_selection(ind_selected)\n",
"\n",
"print('Selected indices:')\n",
"print(ind_selected)\n",
"print('Number of selected points:')\n",
"print(len(ind_selected))\n",
"print('Number of unselected points:')\n",
"print(len(ind_selected_not))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# View PCA\n",
"plt.scatter(pc[:,0], pc[:,1], alpha=.1, s=1)\n",
"plt.scatter(pc[ind_selected,0], pc[ind_selected,1], alpha=.1, s=1)\n",
"plt.xlabel('PC1 ({:.2f})'.format(pca.explained_variance_ratio_[0]))\n",
"plt.ylabel('PC2 ({:.2f})'.format(pca.explained_variance_ratio_[1]))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# View umap\n",
"plt.scatter(umap[:,0], umap[:,1], alpha=.1, s=1)\n",
"plt.scatter(umap[ind_selected,0], umap[ind_selected,1], alpha=.1, s=1)\n",
"plt.xlabel('UMAP1')\n",
"plt.ylabel('UMAP2')"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -980,7 +1074,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -994,7 +1088,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
"version": "3.9.13"
}
},
"nbformat": 4,
Expand Down

0 comments on commit 9e77638

Please sign in to comment.