> [came PMC9977153](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9977153/)
>> For distant species, the markers of homologous cell types can be quite diverse. For example, for the major cell types in the retina defined by scRNA-seq data, there are only a few shared DEGs for human, mouse, chicken, and zebrafish (Supplemental Fig. S16A), which limits the performance of many marker-based methods for cross-species cell-type assignment. Here we took the retinal scRNA-seq data of **adult zebrafish (Hoang et al. 2020) as the reference**, and applied CAME to assign the retinal cells for two distant species, **human (Menon et al. 2019)** and **mouse (Macosko et al. 2015)**, and a nonmodel species, **chick (Hoang et al. 2020)**

>> zebrafish_Adult 这数量不太对啊

|Dataset|Organism|Organ|Platform|Number of cells|
|:-|:-|:-|:-|:-|
|Menon_microfluidics|Homo sapiens|Retina|microfluidics|20,091|
|Menon_seqwell|Homo sapiens|Retina|Seq-Well|3,014|
|Macosko|Mus musculus|Retina|Drop-seq|44,808|
|zebrafish_LD_Adult|Danio rerio|Retina|10x|19,485|
|zebrafish_NMDA_Adult|Danio rerio|Retina|10x|19,485|
|zebrafish_NMDA_Adult|Danio rerio|Retina|10x|19,485|
|chick_P10|Gallus gallus domesticus|Retina|10x|13,819|

+ [zebrafish and chick | Hoang et al. 2020](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7899183/)
+ [mouse | Macosko et al. 2015](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4481139/)
+ [human | Menon et al. 2019](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6814749/)
    > [healty]Retinas collected for this study had no known retinal disease and no abnormalities indicative of disease pathology

|Dataset|Organism|Organ|Platform|Number of cells|
|:-:|:-:|:-:|:-:|:-:|
|Menon_microfluidics|Homo sapiens|Retina|microfluidics|20,091|

[GSE137846](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE137846)
```
GSM4089762	Macula retina 6hr Seq-Well 视网膜黄斑
GSM4089763	Peripheral retina 6hr Seq-Well 视网膜外周
GSM4089764	Macula retina 8hr Seq-Well
GSM4089765	Peripheral retina 8hr Seq-Well
GSM4089766	Macula retina 24hr Seq-Well read 1
```

In [1]:
from pathlib import Path
import numpy as np
import pandas as pd
import scanpy as sc

from IPython.display import display

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
import sys
p_link = Path('/public/workspace/licanchengup/link')
p_test = p_link.joinpath('test')
p_root = Path('/public/workspace/licanchengup/link/db/came/dataset/Retina')
None if str(p_test) in sys.path else sys.path.append(str(p_test))

In [3]:
from func import h5ad_to_mtx, group_agg


[func]----------------------------------
group_agg
rm_rf
get_path_varmap
get_type_counts_info		aligned
aligned_type			aligned
test_came_and_csMAHN
h5ad_to_mtx
load_adata

[result tools]--------------------------
get_test_result_df
show_umap
get_res_obs
get_publi_head

[result tools][re plot umap]-----------
get_df_color			Convenient
> [dssp umap]
export_legend
dssp_plot_umap_legend_sp	Convenient
dssp_plot_umap_legend_cell_type	Convenient
dssp_plot_umap
dssp_get_umap_name
dssp_plot_save_umap		Convenient
> [umap]
get_default_df_color
plot_umap
get_umap_name
plot_save_umap			Convenient
----------------------------------------
    


In [4]:
df_path = pd.DataFrame({
    'mtx': [Path('Single-cell-retinal-regeneration-master/Zebrafish_{}_count_matrix.mtx'.format(_))
            for _ in 'LD,NMDA,TNFa'.split(',')],
    'obs': [Path('Single-cell-retinal-regeneration-master/Zebrafish_{}_cell_features.tsv'.format(_))
            for _ in 'LD,NMDA,TNFa'.split(',')]

}, index='LD,NMDA,TNFa'.split(','))
df_path

Unnamed: 0,mtx,obs
LD,Single-cell-retinal-regeneration-master/Zebraf...,Single-cell-retinal-regeneration-master/Zebraf...
NMDA,Single-cell-retinal-regeneration-master/Zebraf...,Single-cell-retinal-regeneration-master/Zebraf...
TNFa,Single-cell-retinal-regeneration-master/Zebraf...,Single-cell-retinal-regeneration-master/Zebraf...


In [5]:
df_var = pd.read_csv(
    Path('Single-cell-retinal-regeneration-master/Zebrafish_gene_features.tsv'),
    sep='\t',
    index_col=1)
display(df_var.head(2), df_var.shape, df_var.index.is_unique)

Unnamed: 0_level_0,#Ensembl.ID
Symbol,Unnamed: 1_level_1
rerg,ENSDARG00000104632
si:ch73-252i11.1,ENSDARG00000100660


(31498, 1)

False

In [6]:
for _i, _row in df_path.iterrows():
    print(
        ','.join(
            pd.read_csv(
                _row['obs'],
                sep='\t',
                index_col=0).columns))

_row

Sample,Cell.type,tSNE.1,tSNE.2,nGene,nUMI,Percentage.of.mitochondrial.genes,Percentage.of.ribosomal.protein.genes
Sample,Cell.type,tSNE.1,tSNE.2,nGene,nUMI,Percentage.of.mitochondrial.genes,Percentage.of.ribosomal.protein.genes
Sample,Cell.type,tSNE.1,tSNE.2,nGene,nUMI,Percentage.of.mitochondrial.genes,Percentage.of.ribosomal.protein.genes


mtx    Single-cell-retinal-regeneration-master/Zebraf...
obs    Single-cell-retinal-regeneration-master/Zebraf...
Name: TNFa, dtype: object

In [7]:
res = {}
for _i, _row in df_path.iterrows():
    pass
    print(_row.name.ljust(50, '-'))
    adata = sc.read_mtx(_row['mtx']).T

    df_obs = pd.read_csv(_row['obs'],
                         sep='\t',
                         index_col=0).loc[:,
                                          'Sample,Cell.type,tSNE.1,tSNE.2'.split(',')]
    df_obs = df_obs['Sample'].str.extract(
        '^(?P<age>[^ ]+) (?P<tissue>[^ ]+)').join(df_obs)
    df_obs = df_obs.rename(columns={'Sample': '_batch'})
    display(
        df_obs.head(2),
        df_obs.shape,
        df_obs['_batch'].value_counts(),
        df_obs['Cell.type'].value_counts())
    adata.var = df_var
    adata.obs = df_obs
    display(adata.obs.index.is_unique, adata.var.index.is_unique)
    adata.var_names_make_unique()
    adata.obs_names_make_unique()
    display(
        adata,
        adata.obs.head(2),
        adata.obs.shape,
        adata.var.head(2),
        adata.var.shape)
    res.update({_row.name: adata})
    del adata

LD------------------------------------------------


Unnamed: 0_level_0,age,tissue,_batch,Cell.type,tSNE.1,tSNE.2
#Barcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AAACCTGCAAGAAAGG-zfAd00R1,Adult,R1,Adult R1,HC,22.214539,-23.589493
ACTGCTCAGTGAACAT-zfAd00R1,Adult,R1,Adult R1,HC,22.67085,-24.557162


(45153, 6)

_batch
36hr LD R1    8426
20hr LD       6784
Adult R3      6499
Adult R4      5511
10hr LD       5226
Adult R5      4988
4hr LD        3475
36hr LD R2    1757
Adult R1      1617
Adult R2       870
Name: count, dtype: int64

Cell.type
Resting MG          7738
Rods                7294
Activated MG        6905
HC                  5083
Cones               4197
GABAergic AC        3078
RGC                 2789
Microglia           2520
BC                  1532
Glycinergic AC      1422
Pericytes            916
Progenitors          446
V/E cells            427
RPE                  422
Oligodendrocytes     384
Name: count, dtype: int64

True

False

AnnData object with n_obs × n_vars = 45153 × 31498
    obs: 'age', 'tissue', '_batch', 'Cell.type', 'tSNE.1', 'tSNE.2'
    var: '#Ensembl.ID'

Unnamed: 0_level_0,age,tissue,_batch,Cell.type,tSNE.1,tSNE.2
#Barcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AAACCTGCAAGAAAGG-zfAd00R1,Adult,R1,Adult R1,HC,22.214539,-23.589493
ACTGCTCAGTGAACAT-zfAd00R1,Adult,R1,Adult R1,HC,22.67085,-24.557162


(45153, 6)

Unnamed: 0_level_0,#Ensembl.ID
Symbol,Unnamed: 1_level_1
rerg,ENSDARG00000104632
si:ch73-252i11.1,ENSDARG00000100660


(31498, 1)

NMDA----------------------------------------------


Unnamed: 0_level_0,age,tissue,_batch,Cell.type,tSNE.1,tSNE.2
#Barcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AAACCTGCAAGAAAGG-zfAd00R1,Adult,R1,Adult R1,HC,-35.29616,-14.833732
ACTGCTCAGTGAACAT-zfAd00R1,Adult,R1,Adult R1,HC,-33.768618,-12.901787


(40236, 6)

_batch
4hr NMDA04M    7387
Adult R3       6499
Adult R4       5511
Adult R5       4988
10hr NMDA      4727
20hr NMDA      4603
36hr NMDA      4034
Adult R1       1617
Adult R2        870
Name: count, dtype: int64

Cell.type
Resting MG        6393
Activated MG      5485
Rods              5481
Cones             3678
Cone BC           3175
RGC               3086
GABAergic AC      2870
HC                2716
Microglia         2146
BC                1906
Pericytes         1365
Glycinergic AC     669
V/E cells          461
RPE                434
Progenitors        371
Name: count, dtype: int64

True

True

AnnData object with n_obs × n_vars = 40236 × 31498
    obs: 'age', 'tissue', '_batch', 'Cell.type', 'tSNE.1', 'tSNE.2'
    var: '#Ensembl.ID'

Unnamed: 0_level_0,age,tissue,_batch,Cell.type,tSNE.1,tSNE.2
#Barcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AAACCTGCAAGAAAGG-zfAd00R1,Adult,R1,Adult R1,HC,-35.29616,-14.833732
ACTGCTCAGTGAACAT-zfAd00R1,Adult,R1,Adult R1,HC,-33.768618,-12.901787


(40236, 6)

Unnamed: 0_level_0,#Ensembl.ID
Symbol,Unnamed: 1_level_1
rerg,ENSDARG00000104632
si:ch73-252i11.1,ENSDARG00000100660


(31498, 1)

TNFa----------------------------------------------


Unnamed: 0_level_0,age,tissue,_batch,Cell.type,tSNE.1,tSNE.2
#Barcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AAACCTGCAAGAAAGG-zfAd00R1,Adult,R1,Adult R1,Rods,18.679689,-20.307914
ACTGCTCAGTGAACAT-zfAd00R1,Adult,R1,Adult R1,Rods,17.279543,-21.054359


(48540, 6)

_batch
72hr T+R       7931
Adult R3       6499
36hr T+R       6269
Adult R4       5511
10hr T+R R1    5347
20hr T+R       5086
Adult R5       4988
10hr T+R R2    4422
Adult R1       1617
Adult R2        870
Name: count, dtype: int64

Cell.type
Resting MG          10469
BC                   6692
GABAergic AC         5076
Cones                4703
Rods                 4390
Microglia            3872
RGC                  3854
Activated MG         3158
HC                   2205
Pericytes            1616
Glycinergic AC        772
V/E cells             529
Progenitors           502
Oligodendrocytes      418
RPE                   284
Name: count, dtype: int64

True

True

AnnData object with n_obs × n_vars = 48540 × 31498
    obs: 'age', 'tissue', '_batch', 'Cell.type', 'tSNE.1', 'tSNE.2'
    var: '#Ensembl.ID'

Unnamed: 0_level_0,age,tissue,_batch,Cell.type,tSNE.1,tSNE.2
#Barcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AAACCTGCAAGAAAGG-zfAd00R1,Adult,R1,Adult R1,Rods,18.679689,-20.307914
ACTGCTCAGTGAACAT-zfAd00R1,Adult,R1,Adult R1,Rods,17.279543,-21.054359


(48540, 6)

Unnamed: 0_level_0,#Ensembl.ID
Symbol,Unnamed: 1_level_1
rerg,ENSDARG00000104632
si:ch73-252i11.1,ENSDARG00000100660


(31498, 1)

In [8]:
adata = sc.concat(res)
display(adata.obs.index.is_unique, adata.var.index.is_unique)
adata.var_names_make_unique()
adata.obs_names_make_unique()
display(adata.obs.index.is_unique, adata.var.index.is_unique)
adata.obs['pre_ct'] = adata.obs['Cell.type']
adata.obs['cell_type'] = adata.obs['Cell.type'].map(lambda k:{
'Glycinergic AC':'AC',
'GABAergic AC':'AC',
'V/E cells':'V/E cell',

'Resting MG':'Macroglial cell',
'Activated MG':'Macroglial cell',
'Oligodendrocytes':'Macroglial cell',
'Microglia':'Microglia cell',

'Pericytes':'Pericyte cell',
'Progenitors':'Progenitor cell',

'HC':'HC',

'BC':'BC',
'Cone BC':'BC',

'Cones':'Cones',
'RGC':'RGC',
'RPE':'RPE',
'Rods':'Rods',
}.setdefault(k,k))
display(
    adata,
    adata.obs.head(2),
    adata.obs.shape,
    adata.var.head(2),
    adata.var.shape)

  utils.warn_names_duplicates("obs")


False

True

True

True

AnnData object with n_obs × n_vars = 133929 × 31498
    obs: 'age', 'tissue', '_batch', 'Cell.type', 'tSNE.1', 'tSNE.2', 'pre_ct', 'cell_type'

Unnamed: 0_level_0,age,tissue,_batch,Cell.type,tSNE.1,tSNE.2,pre_ct,cell_type
#Barcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
AAACCTGCAAGAAAGG-zfAd00R1,Adult,R1,Adult R1,HC,22.214539,-23.589493,HC,HC
ACTGCTCAGTGAACAT-zfAd00R1,Adult,R1,Adult R1,HC,22.67085,-24.557162,HC,HC


(133929, 8)

rerg
si:ch73-252i11.1


(31498, 0)

In [9]:
for _ in 'age,tissue,_batch,Cell.type,cell_type'.split(','):
    print('\n', _.ljust(50, '-'))
    display(adata.obs[_].value_counts())


 age-----------------------------------------------


age
Adult    58455
36hr     20486
10hr     19722
20hr     16473
4hr      10862
72hr      7931
Name: count, dtype: int64


 tissue--------------------------------------------


tissue
T+R        29055
LD         25668
R3         19497
R4         16533
R5         14964
NMDA       13364
NMDA04M     7387
R1          4851
R2          2610
Name: count, dtype: int64


 _batch--------------------------------------------


_batch
Adult R3       19497
Adult R4       16533
Adult R5       14964
36hr LD R1      8426
72hr T+R        7931
4hr NMDA04M     7387
20hr LD         6784
36hr T+R        6269
10hr T+R R1     5347
10hr LD         5226
20hr T+R        5086
Adult R1        4851
10hr NMDA       4727
20hr NMDA       4603
10hr T+R R2     4422
36hr NMDA       4034
4hr LD          3475
Adult R2        2610
36hr LD R2      1757
Name: count, dtype: int64


 Cell.type-----------------------------------------


Cell.type
Resting MG          24600
Rods                17165
Activated MG        15548
Cones               12578
GABAergic AC        11024
BC                  10130
HC                  10004
RGC                  9729
Microglia            8538
Pericytes            3897
Cone BC              3175
Glycinergic AC       2863
V/E cells            1417
Progenitors          1319
RPE                  1140
Oligodendrocytes      802
Name: count, dtype: int64


 cell_type-----------------------------------------


cell_type
Macroglial cell    40950
Rods               17165
AC                 13887
BC                 13305
Cones              12578
HC                 10004
RGC                 9729
Microglia cell      8538
Pericyte cell       3897
V/E cell            1417
Progenitor cell     1319
RPE                 1140
Name: count, dtype: int64

In [10]:
_ = adata[adata.obs['age'] == 'Adult', :]
_ = _.copy()
display(_.obs['age'].value_counts())
display(_.obs.head(2),_.obs['age'].value_counts(),
        _.obs['tissue'].value_counts(),
        _.obs['_batch'].value_counts())
display(_.obs.shape,_.obs.index.is_unique,_.var.index.is_unique)
h5ad_to_mtx(_, p_root.joinpath('zebrafish_Adult'))

age
Adult    58455
Name: count, dtype: int64

Unnamed: 0_level_0,age,tissue,_batch,Cell.type,tSNE.1,tSNE.2,pre_ct,cell_type
#Barcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
AAACCTGCAAGAAAGG-zfAd00R1,Adult,R1,Adult R1,HC,22.214539,-23.589493,HC,HC
ACTGCTCAGTGAACAT-zfAd00R1,Adult,R1,Adult R1,HC,22.67085,-24.557162,HC,HC


age
Adult    58455
Name: count, dtype: int64

tissue
R3    19497
R4    16533
R5    14964
R1     4851
R2     2610
Name: count, dtype: int64

_batch
Adult R3    19497
Adult R4    16533
Adult R5    14964
Adult R1     4851
Adult R2     2610
Name: count, dtype: int64

(58455, 8)

True

True

frist 10 data.X nonzero elements:
 [[1 7 1 1 1 1 1 1 1 1]]
[out] /public/workspace/licanchengup/link/db/came/dataset/Retina/zebrafish_Adult


||||
|:-|:-|:-|
|AC| amacrine cells|无长突细胞；无轴突细胞；无突起细胞|
|V/E cell| vascular/endothelial cells|血管/内皮细胞|
|Macroglial cell||大胶质细胞|
|Microglia cell||小胶质细胞|
|Pericyte cell||周细胞|
|Progenitor cell||祖细胞|
|HC|||
|BC| bipolar cells|双极细胞；双极性细胞|
|Cones||视锥细胞|
|RGC| retinal ganglion cells|视网膜神经节细胞|
|RPE| retinal pigment epithelium|视网膜色素上皮细胞|
|Rods||视杆细胞|
|NIRG| nonastrocytic inner retinal glial cells|视网膜内非星形胶质细胞|

In [11]:
for x in _.obs['Cell.type'].unique():
    print("'{}':'',".format(x))

'HC':'',
'BC':'',
'Cones':'',
'Rods':'',
'Resting MG':'',
'Activated MG':'',
'GABAergic AC':'',
'V/E cells':'',
'RGC':'',
'Oligodendrocytes':'',
'Glycinergic AC':'',
'Microglia':'',
'RPE':'',
'Pericytes':'',
'Progenitors':'',
'Cone BC':'',
