## Deconvolution with Scaden

Gemma Bel Bordes - 12/01/2024 - Python version I used: 3.10.12

Original source code from https://scaden.readthedocs.io/en/latest/usage.html --> move `scaden/` to this directory

But please note that I have modified 2 scripts to ensure reproducibility (since the simulation step produced different outputs due to randomness):
- `scaden/simulate.py`
- `scaden/simulation/bulk_simulator.py` 

You can find the changes by searching for #GBB (as a comment next to the added code).

==> This notebook and these two modified files can be found here: https://github.com/gemmabb/DeconvolutionAtheroscleroticPlaques/tree/main/scadenModification

Note also there are some folders you must create before this:
- `generatedData/` 
- `generatedModels/`
- `generatedPredictions/`
- `input/` where there should be all the sc and bulk data you have (linear bulk and all sc by patient - see https://github.com/gemmabb/DeconvolutionAtheroscleroticPlaques/blob/main/DataProcessing.R)

So you should be working with a working directory with this structure:

```
├── 'Deconvolution with scaden'.ipynb #this notebook
├── input
│   ├── sc_byPatient 
│   │   ├── ae*_counts.txt
│   │   ├── ae*_celltypes.txt
│   ├── linearBulk_forScaden.txt
├── generatedData
├── generatedModels
├── generatedPredictions
├── scaden #source code from scaden
    ├── example.py   
    ├── __main__.py  
    ├── model      
    │   ├── architectures.py  
    │   ├── functions.py
    │   ├── __init__.py  
    │   ├── scaden.py
    ├── process.py   
    ├── simulate.py #not the original, seed modification
    ├── train.py
    ├── __init__.py  
    ├── merge.py
    ├── predict.py  
    ├── simulation
    │   ├── __init__.py  
    │   ├── bulk_simulator.py #not the original, seed modification
```

In [3]:
import scaden

In [4]:
from scaden.train import training
from scaden.predict import prediction
from scaden.process import processing
from scaden.simulate import simulation
from scaden.merge import merge_datasets

2024-01-12 15:08:46.626972: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-01-12 15:08:46.629854: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2024-01-12 15:08:46.629862: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


In [5]:
seeds = [112, 113, 122, 123, 132, 133, 213, 231, 312, 321]

In [None]:
#iterate over 10 seeds, run 10 times (bc each simulation produces different data and final predictions, get average):

for i in range(len(seeds)):
    print("------ RUN:", i+1, "-------")
    #simulate per patient:
    simulation(simulate_dir="generatedData/", data_dir="./input/sc_byPatient/", 
               sample_size=100, num_samples=1000, pattern="*_counts.txt", unknown_celltypes="unknown", 
               out_prefix="data", fmt="txt", seed=seeds[i]) #I added this last argument, seed

    #merge datasets:
    merge_datasets("generatedData/", "generatedData/data"+str(seeds[i]))
    ! rm generatedData/ae* #keep only merged data (ae* are those from each patient)

    #process datasets before training:
    processing("input/linearBulk_forScaden.txt", "generatedData/data"+str(seeds[i])+".h5ad", 
               "generatedData/processed"+str(seeds[i])+".h5ad", var_cutoff=0.1)
    ! rm generatedData/data* #keep only processed data (data* is the merged dataset)

    #train the model:
    training(data_path="mergedData/processed"+str(seeds[i])+".h5ad", train_datasets="", 
            model_dir="generatedModels/", batch_size=128, learning_rate=0.0001, num_steps=5000, seed=0)
    ! rm generatedData/* #remove all (there should be the processed data) for storage purposes 

    #make predictions:
    prediction(model_dir="generatedModels/", data_path="input/linearBulk_forScaden.txt", 
               out_name="generatedPredictions/bothSex"+str(seeds[i])+"_scaden_predictions.txt")
    !rm -r generatedModels/* #remove models
               
    print("------ predictions stored as" + "generatedPredictions/bothSex"+str(seeds[i])+"_scaden_predictions.txt\n")

------ RUN: 1 -------


Output()



Output()



Output()



Output()



Output()



Output()



Output()



Output()



Output()



Output()



Output()



Output()



Output()



Output()



Output()



Output()



Output()



Output()



Output()



Output()



Output()



Output()