## 6. Exploring the Embedding & Grouping in MEGMA

For the **Embedding**, we can use various manifold learning algorithms to embed microbes in 2D; For the **Grouping**, we can use various clustering methods to group microbes or use the pre-defined group information to group the microbes. Now, let's try different methods to perform the embedding and grouping operations in `MEGMA`.

### 6.1 Manifold embedding vs. random embedding

In this section, we can change the embedding method on the loaded `megma` object. The `megma` object supports a refit operation to update itself, so you don't need to reinitialize a new `megma`.

In [1]:
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import seaborn as sns
import pandas as pd
import numpy as np

from aggmap import loadmap, AggMap

In [4]:
embed_methods = ['umap', 'tsne', 'mds', 'isomap', 'random', 'lle', 'se']

In [9]:
megma = megma.copy()

In [5]:
megma = loadmap('./megma.all')
for emb_method in embed_methods:
    megma.fit(emb_method = emb_method,  verbose=0)
    print('Fitting megma using %s method.\n' % emb_method)
    megma.plot_scatter()

2022-08-11 20:55:05,004 - [32mINFO[0m - [bidd-aggmap][0m - applying hierarchical clustering to obtain group information ...[0m


Compilation is falling back to object mode WITH looplifting enabled because Function "fuzzy_simplicial_set" failed type inference due to: Untyped global name 'nearest_neighbors': cannot determine Numba type of <class 'function'>

File "../../../../../anaconda3/envs/aggmap/lib/python3.7/site-packages/umap/umap_.py", line 467:
def fuzzy_simplicial_set(
    <source elided>
    if knn_indices is None or knn_dists is None:
        knn_indices, knn_dists, _ = nearest_neighbors(
        ^

  @numba.jit()

File "../../../../../anaconda3/envs/aggmap/lib/python3.7/site-packages/umap/umap_.py", line 350:
@numba.jit()
def fuzzy_simplicial_set(
^

  state.func_ir.loc))
Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.

For more information visit https://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit

File "../../../../../anaconda3/envs/aggmap/li

2022-08-11 20:55:07,588 - [32mINFO[0m - [bidd-aggmap][0m - Applying grid assignment of feature points, this may take several minutes(1~30 min)[0m
2022-08-11 20:55:08,607 - [32mINFO[0m - [bidd-aggmap][0m - Finished[0m
Fitting megma using umap method.

2022-08-11 20:55:08,613 - [32mINFO[0m - [bidd-aggmap][0m - generate file: ./feature points_849_correlation_umap_scatter[0m
2022-08-11 20:55:08,621 - [32mINFO[0m - [bidd-aggmap][0m - save html file to ./feature points_849_correlation_umap_scatter[0m
2022-08-11 20:55:08,626 - [32mINFO[0m - [bidd-aggmap][0m - applying hierarchical clustering to obtain group information ...[0m
2022-08-11 20:55:09,685 - [32mINFO[0m - [bidd-aggmap][0m - Applying grid assignment of feature points, this may take several minutes(1~30 min)[0m
2022-08-11 20:55:10,055 - [32mINFO[0m - [bidd-aggmap][0m - Finished[0m
Fitting megma using tsne method.

2022-08-11 20:55:10,064 - [32mINFO[0m - [bidd-aggmap][0m - generate file: ./feature points_

In [9]:
import aggmap

['cluster_02',
 'cluster_05',
 'cluster_05',
 'cluster_05',
 'cluster_05',
 'cluster_05',
 'cluster_05',
 'cluster_05',
 'cluster_05',
 'cluster_04',
 'cluster_05',
 'cluster_05',
 'cluster_04',
 'cluster_04',
 'cluster_04',
 'cluster_02',
 'cluster_01',
 'cluster_04',
 'cluster_05',
 'cluster_05',
 'cluster_03',
 'cluster_02',
 'cluster_05',
 'cluster_02',
 'cluster_02',
 'cluster_03',
 'cluster_03',
 'cluster_01',
 'cluster_02',
 'cluster_02',
 'cluster_02',
 'cluster_01',
 'cluster_02',
 'cluster_02',
 'cluster_02',
 'cluster_02',
 'cluster_02',
 'cluster_01',
 'cluster_02',
 'cluster_02',
 'cluster_05',
 'cluster_02',
 'cluster_02',
 'cluster_01',
 'cluster_02',
 'cluster_02',
 'cluster_04',
 'cluster_02',
 'cluster_01',
 'cluster_03',
 'cluster_02',
 'cluster_01',
 'cluster_04',
 'cluster_02',
 'cluster_02',
 'cluster_02',
 'cluster_02',
 'cluster_01',
 'cluster_03',
 'cluster_02',
 'cluster_03',
 'cluster_01',
 'cluster_02',
 'cluster_02',
 'cluster_02',
 'cluster_01',
 'cluster_

In [17]:
len(megma.alist)

849

In [6]:
megma.isfit

True

### 6.2 Customized Grouping

a, the taxonomic grouping microbes by truncating taxonomic levels in the phylogenetic tree (the 1, 3, and 10 clusters are generated by truncating the Control, Kingdom, and Phylum levels, respectively. c is the number of the channels, e. g., c=10 means that the number of channels of the 2D-microbiomeprint is 10). 
b, the metagenomic grouping by microbes specifying the number of clusters in the hierarchical clustering tree. To make a fair comparison with the taxonomic-based grouping, the same number of the clusters are specified in metagenomic-based grouping (i. e., c=1, 3, 10, respectively.). 


In [2]:
url = 'https://raw.githubusercontent.com/shenwanxiang/bidd-aggmap/master/docs/source/_example_MEGMA/dataset/'
dfm = pd.read_csv(url + 'mOTUs_new_taxonomic_profile.txt',sep='\t')
dfk = megma_all.df_scatter.copy()
dfk['mOTU'] = dfk.IDs.apply(lambda x:x.split('[')[1]).apply(lambda x:x.split(']')[0])
dfm = dfm.set_index('#mOTU').consensus_taxonomy
dfk.set_index('mOTU').join(dfm).sort_values('consensus_taxonomy')

Unnamed: 0_level_0,x,y,IDs,Subtypes,colors,consensus_taxonomy
mOTU,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ref_mOTU_v2_1385,1.909458,-0.398272,Methanobrevibacter smithii [ref_mOTU_v2_1385],cluster_02,#08ff00,k__Archaea|p__Euryarchaeota|c__Methanobacteria...
ref_mOTU_v2_1384,0.852218,-2.511427,Methanobrevibacter smithii [ref_mOTU_v2_1384],cluster_01,#fcf500,k__Archaea|p__Euryarchaeota|c__Methanobacteria...
meta_mOTU_v2_6693,2.546017,-2.364811,unknown Methanomicrobia [meta_mOTU_v2_6693],cluster_01,#fcf500,k__Archaea|p__Euryarchaeota|c__Methanomicrobia...
ref_mOTU_v2_1156,-2.113837,-1.425572,Bifidobacterium adolescentis [ref_mOTU_v2_1156],cluster_03,#00fff6,k__Bacteria|p__Actinobacteria|c__Actinobacteri...
ref_mOTU_v2_1285,1.495760,1.216689,Bifidobacterium angulatum [ref_mOTU_v2_1285],cluster_01,#fcf500,k__Bacteria|p__Actinobacteria|c__Actinobacteri...
...,...,...,...,...,...,...
meta_mOTU_v2_6090,2.300636,-2.516803,Verrucomicrobia bacterium CAG:312_58_20 [meta_...,cluster_01,#fcf500,k__Bacteria|p__Verrucomicrobia|c__unknown Verr...
meta_mOTU_v2_6061,2.138557,-2.543592,unknown Verrucomicrobia [meta_mOTU_v2_6061],cluster_01,#fcf500,k__Bacteria|p__Verrucomicrobia|c__unknown Verr...
meta_mOTU_v2_5651,1.040720,-3.478097,unknown Bacteria [meta_mOTU_v2_5651],cluster_01,#fcf500,k__Bacteria|p__unknown Bacteria|c__unknown Bac...
meta_mOTU_v2_6079,0.631804,0.393673,unknown Bacteria [meta_mOTU_v2_6079],cluster_03,#00fff6,k__Bacteria|p__unknown Bacteria|c__unknown Bac...
