# Dataset preparation

**All datasets should be stored in the ./data/ folder**

This tutorial offers a step-by-step guide on how to prepare your single-cell datasets for training with the UNAGI model and for subsequent analysis.

The initial step in utilizing UNAGI involves appending time point attributes to the annData object. These time points should be sequentially organized as [0, 1, 2, ..., n]. Following this, segment the data into distinct stages.

In [None]:
import scanpy as sc
import os
PROJECT_NAME = 'your_project_name'
PATH_TO_YOUR_DATA = 'your_data.h5ad'
adata = sc.read(PATH_TO_YOUR_DATA)
adata.obs['stage'] = None

#Assuming the dataset has 3 batches, each with a different time point
adata.obs.loc[adata.obs['batch'] == 'batch1', 'stage'] = '0'
adata.obs.loc[adata.obs['batch'] == 'batch2', 'stage'] = '1'
adata.obs.loc[adata.obs['batch'] == 'batch3', 'stage'] = '2'
#....

for each in list(adata.obs['stage'].unique()):
    stage_adata = adata[adata.obs['stage'] == each]
    os.mkdir(f'../data/{PROJECT_NAME}')
    stage_adata.write(f'../data/{PROJECT_NAME}/{each}.h5ad', compression='gzip', compression_opts=9)