In [8]:
!cat README.md

Sleep models pipeline

Requirements:


`sleep-models==1.1.1`

Instructions

This pipeline has 5 steps:

* 1: make_dataset. Creates a cached `anndata.AnnData` (.h5ad) only with cells from the desired background.
* These backgrounds are read from the config, and the annotation used is defined in the `backgrounds` folder inside `config["data_dir"]`

2: get_marker_genes: Iteratively removes markers genes with a fold change above an increasingly low threshold, so the cell types
in the background become as homogeneous as possible. A Dimensionality Reduction plot is generated at each threshold

3: remove_marker_genes: Set the genes with a threshold higher than the selected threshold to 0

4: train_models: have one or more of the supported models train using the transcriptomic data to predict sleep and wake

5: predict: predict the sleep / wake status of a cell from the same cell type, or a different one



You can adjust the settings of the pipeline by changing the config.yaml file in the `SleepML` folder


To change the logFC threshold that defines marker genes, update

* `user_defined_log2FC_threshold.KC` for KC
* `user_defined_log2FC_threshold.glia` for glia

To change the logFC threshold probled, update

* `log2FC_thresholds.KC` for KC
* `log2FC_thresholds.glia` for glia

To change the model architectures to be trained, update

* `arch`

To change the DR algorithms ran during the marker gene probing, update

* `DR_algorithm`

To change the number of *in-silico* replicates, update

* seeds list in `seeds`

To use the marker gene database downloaded from scope, set

* `markers`

In [None]:
# Uncomment the one you wanna run
# !ln -sf config_wake_vs_sleep.yaml config.yaml 
# !ln -sf config_conditions.yaml config.yaml

**Please update the `results_dir` and `temp_data_dir` fields in the config so they have the timestamp of today**

In [1]:
!cat config.yaml

# Which background of cell types to work with
background:
  KC:
  glia:

# Marker gene filtering settings
user_defined_log2FC_threshold:
  # genes above this threshold will not be considerd for the analysis
  KC:
    1.0
  glia:
    1.0
  peptides:
    3.0
log2FC_thresholds:
  # A separate UMAP will be computed on the result of removing genes
  # with an abs log2FC higher than this
  KC:
    [8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.5, 2.0, 1.5, 1.0, 0.5, 0.0]
  glia:
    [9.99, 9.0, 8.0, 7.0, 6.5, 6.0, 5.5, 5.0, 4.5, 4.0, 3.5, 3.0, 2.5, 2.0, 1.5, 1.0, 0.5, 0.0]
max_clusters:
  # if a marker is gene is shared by this or more cell types
  # of the background, it is still considered for the analysis
  # because it is not a marker anymore within the background
  KC:
    5
  glia:
    10

# Dataset generation settings
raw: True
exclude_genes_file: None
batch_genes_file: "batch_effects.xlsx"
highly_variable_genes: False
h5ad_input: "h5ad/Preloom/All_Combined_No_ZT2_Wake.h5ad"

# Model settings
arch:

In [2]:
! /home/vibflysleep/anaconda3/envs/SleepML/bin/python 01-pipeline/01_make_dataset.py


This is where adjacency matrices should go now.

This is where adjacency matrices should go now.
  res = method(*args, **kwargs)
Trying to set attribute `.obs` of view, copying.
... storing 'CellType' as categorical
... storing 'Age' as categorical
... storing 'Condition' as categorical
... storing 'Genotype' as categorical
... storing 'Run' as categorical
... storing 'Sleep_Stage' as categorical
... storing 'Treatment' as categorical
... storing 'louvain' as categorical
... storing 'louvain_res0.4' as categorical
... storing 'louvain_res0.8' as categorical
... storing 'louvain_res1.0' as categorical
... storing 'louvain_res1.2' as categorical
... storing 'louvain_res1.6' as categorical
... storing 'louvain_res2.0' as categorical
... storing 'louvain_res3.0' as categorical
... storing 'louvain_res4.0' as categorical
... storing 'louvain_res8.0' as categorical
... storing 'Run_Set' as categorical
Trying to set attribute `.obs` of view, copying.
... storing 'CellType' as categorical
...

In [None]:
! /home/vibflysleep/anaconda3/envs/SleepML/bin/python 01-pipeline/02_get_marker_genes.py

Done in 0.65 seconds
  size=22,
  size=22,
Computing DR at threshold = 8.0:   0%|                   | 0/12 [00:00<?, ?it/s]Done in 0.09 seconds
  size=22,
  size=22,
Computing DR at threshold = 7.0:   8%|▉          | 1/12 [00:03<00:34,  3.13s/it]Done in 0.09 seconds
  size=22,
  size=22,
Computing DR at threshold = 6.0:  17%|█▊         | 2/12 [00:06<00:29,  2.99s/it]Done in 0.09 seconds
  size=22,
  size=22,
Computing DR at threshold = 5.0:  25%|██▊        | 3/12 [00:08<00:26,  2.94s/it]Done in 0.09 seconds
  size=22,
  size=22,
Computing DR at threshold = 4.0:  33%|███▋       | 4/12 [00:11<00:23,  2.91s/it]Done in 0.09 seconds
  size=22,
  size=22,
Computing DR at threshold = 3.0:  42%|████▌      | 5/12 [00:14<00:20,  2.91s/it]Done in 0.08 seconds
  size=22,
  size=22,
Computing DR at threshold = 2.5:  50%|█████▌     | 6/12 [00:17<00:17,  2.90s/it]Done in 0.09 seconds
  size=22,
  size=22,
Computing DR at threshold = 2.0:  58%|██████▍    | 7/12 [00:20<00:14,  2.94s/it]Done in 0.08 sec

In [None]:
! /home/vibflysleep/anaconda3/envs/SleepML/bin/python 01-pipeline/03_remove_marker_genes.py

In [None]:
! /home/vibflysleep/anaconda3/envs/SleepML/bin/python 01-pipeline/04_train_models.py

In [None]:
! /home/vibflysleep/anaconda3/envs/SleepML/bin/python 01-pipeline/05_predict.py