# Marker Repo - search and combine lists

This notebook explains how to search the marker repository to further process and use a selection of lists. This happens mainly by selecting, combining, formatting and finally exporting the selection made. Wrapper functions facilitate this process and perform all four steps sequentially after a function call.

## 1. Loading packages

In [1]:
import markerrepo.marker_repo as mr
import markerrepo.wrappers as wrap

%load_ext autoreload
%autoreload 2



## Settings

Specify path of the cloned repository.

In [6]:
repo_path = "/mnt/workspace_stud/allstud/wp2/annotate_by_marker_and_features-sort"

## 2. Search database and combine selected lists

By using the <b>Guided Search</b>, a selection of marker lists can be compiled to suit the individual needs. First, a column of the available metadata of all lists is selected to search in it afterwards. This process can be repeated as often as you like until you have exactly the lists you need in the selection. Finally, the selection can be returned as metadata or as a finished marker list in the form of a DataFrame.

Return search results as metadata DataFrame.

In [7]:
results = mr.guided_search(repo_path=repo_path, out="metadata")
display(results)

Available columns for search:
1: ID
2: List name
3: Organism name
4: Taxonomy ID
5: Marker type
6: Submitter name
7: List type
8: Date
9: Source
10: tags_transferred
n: Next page
Enter identifier of column to search in (leave blank to search in all columns): 6
Do you want to see all unique entries in this column? (yes/no) yes
Unique entries:
Kessler, Micha Frederick
Quintanilla, Marta
Bentsen, Mette
Enter search terms (separated by commas, '-' for negative search): Quintanilla, Marta
Perform an exact search? (yes/no): no
Consider case sensitivity? (yes/no): no
Number of results: 14
Do you want to see the results? (yes/no): yes


Unnamed: 0_level_0,index,List name,Organism name,Taxonomy ID,Marker type,Submitter name,List type,Date,Source,tags_transferred,Email,Tissue,Marker,Info
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
170879528140527,13,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[CD83, ENSDARG00000079553, SH3BP5B, ENSDARG000...","[Oocyte_ca15b high, Macrophage, Oocyte_h2af1al..."
1708791034778127,119,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[CRIP1, ENSDARG00000053858, COX5B, ENSDARG0000...","[Filament ionocyte, NCC ionocyte, Proliferatin..."
1708794021236062,120,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[CRIP1, ENSDARG00000053858, COX5B, ENSDARG0000...","[Filament ionocyte, NCC ionocyte, Proliferatin..."
1708794224393050,121,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[PSMB9A, ENSDARG00000000656, TMCO1, ENSDARG000...","[Macrophage, B cell, Thrombocyte, Macrophage_g..."
1708794292444744,122,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[CISD2, ENSDARG00000052703, GCDHA, ENSDARG0000...","[Macrophage_grn1 high, Macrophage, Radial glia..."
1708794386648230,123,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[SLC38A4, ENSDARG00000018149, SI:CH211-117L17....","[Macrophage, Retinal cone cell, Stromal cell, ..."
1708795042806779,124,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[CD83, ENSDARG00000079553, AMY2A, ENSDARG00000...","[Pancreas endocrine cells , Macrophage, Intest..."
1708795114842373,125,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[ARPC3, ENSDARG00000057882, ALDH9A1A.1, ENSDAR...","[Proliferating cell, Macrophage, Nephron epith..."
1708795181273162,126,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[PSMB9A, ENSDARG00000000656, CD83, ENSDARG0000...","[Macrophage, Hepatocyte, Biliary epithelial ce..."
1708795232256198,127,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[GCDHA, ENSDARG00000037057, SPICE1, ENSDARG000...","[Epithelial cell_aqp3a high, Skeletal muscle c..."


Do you want to continue searching? (yes/no): no


Unnamed: 0_level_0,index,List name,Organism name,Taxonomy ID,Marker type,Submitter name,List type,Date,Source,tags_transferred,Email,Tissue,Marker,Info
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
170879528140527,13,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[CD83, ENSDARG00000079553, SH3BP5B, ENSDARG000...","[Oocyte_ca15b high, Macrophage, Oocyte_h2af1al..."
1708791034778127,119,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[CRIP1, ENSDARG00000053858, COX5B, ENSDARG0000...","[Filament ionocyte, NCC ionocyte, Proliferatin..."
1708794021236062,120,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[CRIP1, ENSDARG00000053858, COX5B, ENSDARG0000...","[Filament ionocyte, NCC ionocyte, Proliferatin..."
1708794224393050,121,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[PSMB9A, ENSDARG00000000656, TMCO1, ENSDARG000...","[Macrophage, B cell, Thrombocyte, Macrophage_g..."
1708794292444744,122,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[CISD2, ENSDARG00000052703, GCDHA, ENSDARG0000...","[Macrophage_grn1 high, Macrophage, Radial glia..."
1708794386648230,123,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[SLC38A4, ENSDARG00000018149, SI:CH211-117L17....","[Macrophage, Retinal cone cell, Stromal cell, ..."
1708795042806779,124,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[CD83, ENSDARG00000079553, AMY2A, ENSDARG00000...","[Pancreas endocrine cells , Macrophage, Intest..."
1708795114842373,125,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[ARPC3, ENSDARG00000057882, ALDH9A1A.1, ENSDAR...","[Proliferating cell, Macrophage, Nephron epith..."
1708795181273162,126,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[PSMB9A, ENSDARG00000000656, CD83, ENSDARG0000...","[Macrophage, Hepatocyte, Biliary epithelial ce..."
1708795232256198,127,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[GCDHA, ENSDARG00000037057, SPICE1, ENSDARG000...","[Epithelial cell_aqp3a high, Skeletal muscle c..."


Return search results as marker list DataFrame.

In [8]:
results = mr.guided_search(repo_path=repo_path, out="marker_list")
display(results)

Available columns for search:
1: ID
2: List name
3: Organism name
4: Taxonomy ID
5: Marker type
6: Submitter name
7: List type
8: Date
9: Source
10: tags_transferred
n: Next page
Enter identifier of column to search in (leave blank to search in all columns): 6
Do you want to see all unique entries in this column? (yes/no) yes
Unique entries:
Kessler, Micha Frederick
Quintanilla, Marta
Bentsen, Mette
Enter search terms (separated by commas, '-' for negative search): Quintanilla, Marta
Perform an exact search? (yes/no): no
Consider case sensitivity? (yes/no): no
Number of results: 14
Do you want to see the results? (yes/no): yes


Unnamed: 0_level_0,index,List name,Organism name,Taxonomy ID,Marker type,Submitter name,List type,Date,Source,tags_transferred,Email,Tissue,Marker,Info
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
170879528140527,13,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[CD83, ENSDARG00000079553, SH3BP5B, ENSDARG000...","[Oocyte_ca15b high, Macrophage, Oocyte_h2af1al..."
1708791034778127,119,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[CRIP1, ENSDARG00000053858, COX5B, ENSDARG0000...","[Filament ionocyte, NCC ionocyte, Proliferatin..."
1708794021236062,120,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[CRIP1, ENSDARG00000053858, COX5B, ENSDARG0000...","[Filament ionocyte, NCC ionocyte, Proliferatin..."
1708794224393050,121,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[PSMB9A, ENSDARG00000000656, TMCO1, ENSDARG000...","[Macrophage, B cell, Thrombocyte, Macrophage_g..."
1708794292444744,122,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[CISD2, ENSDARG00000052703, GCDHA, ENSDARG0000...","[Macrophage_grn1 high, Macrophage, Radial glia..."
1708794386648230,123,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[SLC38A4, ENSDARG00000018149, SI:CH211-117L17....","[Macrophage, Retinal cone cell, Stromal cell, ..."
1708795042806779,124,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[CD83, ENSDARG00000079553, AMY2A, ENSDARG00000...","[Pancreas endocrine cells , Macrophage, Intest..."
1708795114842373,125,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[ARPC3, ENSDARG00000057882, ALDH9A1A.1, ENSDAR...","[Proliferating cell, Macrophage, Nephron epith..."
1708795181273162,126,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[PSMB9A, ENSDARG00000000656, CD83, ENSDARG0000...","[Macrophage, Hepatocyte, Biliary epithelial ce..."
1708795232256198,127,zebrafish_gene_cellType,zebrafish,7955,Genes,"Quintanilla, Marta",,24.02.2024,,,,,"[GCDHA, ENSDARG00000037057, SPICE1, ENSDARG000...","[Epithelial cell_aqp3a high, Skeletal muscle c..."


Do you want to continue searching? (yes/no): no


Unnamed: 0,Marker,Info
0,MUC5.2 ENSDARG00000058556,Mucous cell
1,HBBA1 ENSDARG00000097238,Mucous cell
2,PVALB8 ENSDARG00000037790,Mucous cell
3,SI:CH211-5K11.8 ENSDARG00000079078,Mucous cell
4,SI:CH211-125E6.11 ENSDARG00000025783,Mucous cell
...,...,...
20997,SAGA ENSDARG00000012610,Retinal cone cell
20998,PDE6A ENSDARG00000000380,Retinal cone cell
20999,MTBL ENSDARG00000102051,Retinal cone cell
21000,SI:CH73-281N10.2 ENSDARG00000097102,Retinal cone cell


Export the combined marker list using Gene Symbols as identifiers.

In [9]:
mr.export_marker_list(results, file_name="Combined", marker_id="symbol")

Folder /mnt/workspace_stud/stud3/annotate_by_marker_and_features-sorting/annotate_by_marker_and_features/notebooks/exported_lists created.
Marker list saved: /mnt/workspace_stud/stud3/annotate_by_marker_and_features-sorting/annotate_by_marker_and_features/notebooks/exported_lists/Combined


'/mnt/workspace_stud/stud3/annotate_by_marker_and_features-sorting/annotate_by_marker_and_features/notebooks/exported_lists/Combined'

Export the combined marker list using Ensembl IDs as identifiers.

In [10]:
mr.export_marker_list(results, file_name="Combined", marker_id="ensembl")

Marker list saved: /mnt/workspace_stud/stud3/annotate_by_marker_and_features-sorting/annotate_by_marker_and_features/notebooks/exported_lists/Combined_20240224190350


'/mnt/workspace_stud/stud3/annotate_by_marker_and_features-sorting/annotate_by_marker_and_features/notebooks/exported_lists/Combined_20240224190350'

## 3. Wrapper functions

<b>Use the guided search to create a new marker list in "two column" style.</b><br>
<br>This format can be used for custom cell type annotations via SCSA, for example. The function combines, formats and exports the new marker list accordingly. It returns the path of the created file. You can adjust the file name by using the "file_name" parameter.

Get Gene Symbols

In [None]:
wrap.convert_markers(repo_path=repo_path, gs=True, style="two_column", file_name="Gene_symbols")

Get Ensembl IDs

In [None]:
wrap.convert_markers(repo_path=repo_path, gs=True, style="two_column", file_name="Ensembl_IDs", ensembl=True)

<b>Use the guided search to create a new marker list in "score" style.</b><br>
<br>
When you choose this function, the output is formatted into a three-column layout: the first column for the marker name, the second for the cell type it's associated with, and the third for the weight or score assigned to each marker.

In [None]:
wrap.convert_markers(repo_path=repo_path, gs=True, style="score", path=".", file_name="mouse_panglao", organism="Mm")