In [1]:
import sctoolbox
from sctoolbox.utils import bgcolor

# Assembling or loading anndata object
<hr style="border:2px solid black"> </hr>

This notebook is dedicated to load or create an anndata object. The anndata object is prepared for the following analysis notebooks and finally stored as an `.h5ad` file. Based on the available data files there are multiple options to create the anndata object.

### 1. `.h5ad` file
Choose this option if you have a `.h5ad` file. The file could be provided by a preprocessing pipeline, a public dataset or a preceeding analysis.

### 2. star solo quant folder
This option is intended to assemble anndata object from the standard [star solo](https://github.com/alexdobin/STAR/tree/master) output folder (`quant/`). This is done by scaning through the folder structure and using the `*_matrix.mtx`, `*_barcodes.tsv` and `*_genes.tsv` to create an anndata object per sample. The sample anndata objects are finally combined.

### 3. .mtx, barcode.tsv, genes.tsv
Choose this option if you have the expression matrix in `.mtx` format, a file containing the barcodes (`*_barcodes.tsv`) and a file containing the genes (`*_genes.tsv`). Use this option for cases with the aforementioned three files available e.g. from a public dataset.

### 4. convert from R object
This option should be used if the data was processed using R. This can either be a `.rds` or `.robj` file.

<h1><center>⬐ Fill in input data here ⬎</center></h1>

In [40]:
%bgcolor PowderBlue

# For option 1: The path to an existing .h5ad file
path_h5ad = "test_data/adata_rna.h5ad"

# For option 2: Path to a star solo quant directory
path_quant = ""

# For option 3: Directory containing .mtx, barcodes.tsv and genes.tsv
path_mtx = ""

# For option 4: This is the path to the Seurat (.rds, .robj) file
path_rds = ""

In [42]:
if sum(map(lambda x: x != "", [path_h5ad, path_quant, path_mtx, path_rds])) != 1:
    del path_h5ad, path_quant, path_mtx, path_rds
    raise ValueError("Please set only one of the above variables. Adjust the cell above and re-run.")

ValueError: Please set only one of the above variables. Adjust the cell above and re-run.

<hr style="border:2px solid black"> </hr>

## Setup

In [None]:
import sctoolbox.utilities as utils
import sctoolbox.assemblers as assembler
import sctoolbox.file_converter as converter

utils.settings_from_config("config.yaml", key="01")

---------

## Read in data

### Option 1: Read from h5ad

In [45]:
if path_h5ad:
    adata = utils.load_h5ad(path_h5ad)

NameError: name 'path_h5ad' is not defined

### Option 2: Assemble from preprocessing pipeline 'quant' folder

In [49]:
%bgcolor PowderBlue

# Set up additional sample the information below.
# Follows the scheme:
# <sample name>:<type>:<value>
# E.g.: 
# sample1:condition:room_air
# sample1:timepoint:early
# sample2:timepoint:late
the_10X_yml = []

In [50]:
if path_quant:
    adata = assembler.from_quant(path_quant, the_10X_yml)

NameError: name 'path_quant' is not defined

### Option 3: Create an anndata object from .mtx, barcodes.tsv and genes.tsv

In [None]:
if path_mtx:
    adata = assembler.from_mtx(path_mtx)

### Option 4: Convert from Seurat to anndata object

In [None]:
# Converting from Seurat to anndata object
if path_rds:
    adata = converter.convertToAdata(file=path_rds)

------------

## Saving the loaded anndata object

In [None]:
# Overview of loaded adata
display(adata)

In [None]:
# Saving the data
adata_output = "anndata_1.h5ad"
utils.save_h5ad(adata, adata_output)

In [None]:
sctoolbox.settings.close_logfile()