## DE

### **edgeR**: Pseudo-bulk

If the data you are working with does not have replicates, it could be beneficial to create multiple (e.g. 2-3) pseudobulks per patient to account for patient variability. Here we chose to create 3 pseudobulk per patient because we dont have replicates.

We strongly recommend to read this guide https://f1000research.com/articles/9-1444 on design matrices.

Regardless of whether we want to run the analysis only on a few cell subpopulations and fit a model for each one of them separately or fit one model for all of them, we first need to prepare the data, define a function to create pseudobulks and run the edgeR pipeline. First, let’s prepare the data.

Since we need to create pseudobulks for each patient-condition combination, we first need to create such a column by concatenating replicate and label.

We need to clean up the cell type names, i.e. replace spaces with underscores and remove + symbols, to avoid Python to R conversion issues.

In [1]:
# We create a backup so that the names are consistent with our tutorial
adata_backup = adata.copy()

NameError: name 'adata' is not defined

In [None]:
adata.obs = pd.get_dummies(adata.obs, columns=['leiden'])

We need 4 keys in `.obs`.
- 'sample' is combination of the replicate and label
- 'label' is the condition to check differential expression against
- 'replicate' is the experiment, (because we could have more than for the same subject, but we don't and we will simulate them)
- 'cell_type' is the cell type for which we could iterate to do various DE

In [28]:
adata.obs["sample"] = [
    f"{rep}_{l}" for rep, l in zip(adata.obs["sample"], adata.obs["final_pred"])
]
adata_raw.obs.groupby('sample').count()

5328     patient_41_normal_adjacent_Tumor
6046     patient_41_normal_adjacent_Tumor
13121    patient_45_normal_adjacent_Tumor
21763      patient_32_tumor_primary_Tumor
21858      patient_32_tumor_primary_Tumor
                       ...               
42936      patient_43_tumor_primary_Tumor
43031      patient_43_tumor_primary_Tumor
50547      patient_46_tumor_primary_Tumor
52728      patient_46_tumor_primary_Tumor
53066      patient_46_tumor_primary_Tumor
Name: sample, Length: 1844, dtype: object

In [70]:
type_label = "leiden"
adata.obs[type_label] = [ct.replace(" ", "_") for ct in adata.obs[type_label]]
adata.obs[type_label] = [ct.replace("+", "") for ct in adata.obs[type_label]]

We need to set categorical metadata to be indeed categorical to create pseudobulks.

In [71]:
adata.obs["sample"] = adata.obs["sample"].astype("category")
adata.obs[type_label] = adata.obs[type_label].astype("category")

Now, let’s define the function we need to aggregate single cells into pseudo-replicates:

- `aggregate_and_filter` is a function that creates an AnnData object with one pseudo-replicate for each donor for a specified subpopulation from the original single-cell AnnData object. Here we also filter out donors that have fewer than 30 cells for the specified population. 
  - Since we want to obtain the genes differentially expressed for every cluster proposed by our leiden algorithm.
We will be changing th design of our function so that the condition is
being or not in the cluster. And we only have one type of cell. Which is a constant everywhere.

- by changing the `replicates_per_patient` parameter, several (n) pseudo-replicates can be created for each sample; cells are then split into n subsets of roughly equal sizes.


In [80]:
NUM_OF_CELL_PER_DONOR = 20
import random

def aggregate_and_filter(
    adata,
    cell_identity,
    donor_key="sample",
    condition_key="leiden_0",
    cell_identity_key="constant",
    obs_to_keep=[],  # which additional metadata to keep, e.g. gender, age, etc.
    replicates_per_patient=2,
):
    obs_to_keep = list(set(obs_to_keep) | set([donor_key, condition_key]))
    # subset adata to the given cell identity
    adata_cell_pop = adata[adata.obs[cell_identity_key] == cell_identity].copy()
    # check which donors to keep according to the number of cells specified with NUM_OF_CELL_PER_DONOR
    size_by_donor = adata_cell_pop.obs.groupby([donor_key]).size()
    donors_to_drop = [
        donor
        for donor in size_by_donor.index
        if size_by_donor[donor] <= NUM_OF_CELL_PER_DONOR
    ]
    if len(donors_to_drop) > 0:
        print("Dropping the following samples:")
        print(donors_to_drop)
    df = pd.DataFrame(columns=[*adata_cell_pop.var_names, *obs_to_keep])

    adata_cell_pop.obs[donor_key] = adata_cell_pop.obs[donor_key].astype("category")
    for i, donor in enumerate(donors := adata_cell_pop.obs[donor_key].cat.categories):
        print(f"\tProcessing donor {i+1} out of {len(donors)}...", end="\r")
        if donor not in donors_to_drop:
            adata_donor = adata_cell_pop[adata_cell_pop.obs[donor_key] == donor]
            # create replicates for each donor
            indices = list(adata_donor.obs_names)
            random.shuffle(indices)
            indices = np.array_split(np.array(indices), replicates_per_patient)
            for i, rep_idx in enumerate(indices):
                adata_replicate = adata_donor[rep_idx]
                # specify how to aggregate: sum gene expression for each gene for each donor and also keep the condition information
                agg_dict = {gene: "sum" for gene in adata_replicate.var_names}
                for obs in obs_to_keep:
                    agg_dict[obs] = "first"
                
                # create a df with all genes, donor and condition info
                df_donor = pd.DataFrame(adata_replicate.X.A)
                df_donor.index = adata_replicate.obs_names
                df_donor.columns = adata_replicate.var_names
                df_donor = df_donor.join(adata_replicate.obs[obs_to_keep])
                # aggregate
                df_donor = df_donor.groupby(donor_key).agg(agg_dict)
                df_donor[donor_key] = donor
                df.loc[f"{donor}_{i}"] = df_donor.loc[donor]
                print(df.drop(columns=adata_cell_pop.var_names))
    print("\n")
    # create AnnData object from the df
    adata_cell_pop = sc.AnnData(
        df[adata_cell_pop.var_names], obs=df.drop(columns=adata_cell_pop.var_names)
    )
    return adata_cell_pop

In [None]:
aggregate_and_filter(adata, 'constant')

We also need to define a separate function to fit an edgeR GLM:

- `fit_model` takes a `SingleCellExperiment` object as input, creates the design matrix and outputs the fitted GLM. We also output the edgeR object of class DGEList to do some exploratory data analysis (EDA).


In [75]:
%%R
fit_model <- function(adata_){
    # create an edgeR object with counts and grouping factor
    y <- DGEList(assay(adata_, "X"), group = colData(adata_)$label)
    # filter out genes with low counts
    print("Dimensions before subsetting:")
    print(dim(y))
    print("")
    keep <- filterByExpr(y)
    y <- y[keep, , keep.lib.sizes=FALSE]
    print("Dimensions after subsetting:")
    print(dim(y))
    print("")
    # normalize
    y <- calcNormFactors(y)
    # create a vector that is concatentation of condition and cell type that we will later use with contrasts
    group <- paste0(colData(adata_)$label, ".", colData(adata_)$cell_type)
    # replicate <- colData(adata_)$replicate
    # create a design matrix: even though we have multiple donors we dont have normal and lunk form the same donor
    # so we dont consider this becuase it would lead to correlations in the coeffcients that would reduce the ranl
    # of the matrix
    design <- model.matrix(~ 0 + group)
    # estimate dispersion
    y <- estimateDisp(y, design = design)
    # fit the model
    fit <- glmQLFit(y, design)
    return(list("fit"=fit, "design"=design, "y"=y))
}

Now we defined all the functions we need, so we can proceed with creating pseudobulks. We might want to look at available metadata later and therefore keep it in the AnnData object.

In [106]:
obs_to_keep = ["tissue", 'condition', "sample"]

We need to pass the raw counts to edgeR. Hence, we set .X to the counts layer to ensure the pseudo-replicates are created for raw counts.

In [79]:
adata.X = adata.layers["counts"].copy()

Before creating the pseudobulks we need to hot-encode the leiden clsuters so each
one is compared to the background. And also add the constant column as an atavism.

In [92]:
# adata.obs = pd.get_dummies(adata.obs, columns=['leiden'])
adata.obs['constant'] = 'tumor'
adata.obs['constant'] = adata.obs['constant'].astype('category')

In [127]:
adata

AnnData object with n_obs × n_vars = 2401 × 28863
    obs: 'dataset', 'sample', 'accession', 'sex', 'condition', 'origin', 'patient', 'tissue', 'n_counts', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mito', 'pct_counts_mito', 'Level_2_transfered_label', 'final_pred', 'leiden_0', 'leiden_1', 'leiden_2', 'leiden_3', 'leiden_4', 'leiden_5', 'leiden_6', 'constant', 'leiden'
    var: 'n_counts', 'mito', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts'
    uns: 'pca', 'neighbors', 'leiden', 'umap', 'Level_2_transfered_label_colors', 'leiden_colors', 'log1p', 'sample_colors'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'counts'
    obsp: 'distances', 'connectivities'

Next, we create the AnnData object with pseudobulks. We do it concurrently as the procces of the
aggreggated sum does take a while.

HERE we execute it so that we pass a different condition for every cluster.

In [107]:
import concurrent.futures
# Process the remaining cell types concurrently
conditions = [f'leiden_{i}' for i in range(0,7)]

def process_cell_type(adata, condition, obs_to_keep, replicates_per_patient):
    # print(f'Processing {cell_type}...')
    return aggregate_and_filter(adata, 'tumor', condition_key=condition,
                                obs_to_keep=obs_to_keep,
                                replicates_per_patient=replicates_per_patient)


adata_pb = None
with concurrent.futures.ProcessPoolExecutor() as executor:
    futures = []
    for condition in conditions:
        print(f'Processing {condition}...')
        future = executor.submit(process_cell_type, adata, condition,
                               obs_to_keep, 3)
        futures.append(future)

    for i, future in enumerate(concurrent.futures.as_completed(futures)):
        print(f'Finished {conditions[i]}...')
        adata_leiden_cluster = future.result()
        
        if i==0:
            adata_pb = adata_leiden_cluster
        else:
            adata_pb = adata_pb.concatenate(adata_leiden_cluster)

adata_pb

Processing leiden_0...
Processing leiden_1...
Processing leiden_2...
Processing leiden_3...
Processing leiden_4...
Processing leiden_5...
Processing leiden_6...


  size_by_donor = adata_cell_pop.obs.groupby([donor_key]).size()
  size_by_donor = adata_cell_pop.obs.groupby([donor_key]).size()
  size_by_donor = adata_cell_pop.obs.groupby([donor_key]).size()


	Processing donor 1 out of 2...

  size_by_donor = adata_cell_pop.obs.groupby([donor_key]).size()
  df_donor = df_donor.groupby(donor_key).agg(agg_dict)
  size_by_donor = adata_cell_pop.obs.groupby([donor_key]).size()


	Processing donor 1 out of 2...

  df_donor = df_donor.groupby(donor_key).agg(agg_dict)
  size_by_donor = adata_cell_pop.obs.groupby([donor_key]).size()


	Processing donor 1 out of 2...

  df_donor = df_donor.groupby(donor_key).agg(agg_dict)
  size_by_donor = adata_cell_pop.obs.groupby([donor_key]).size()


	Processing donor 1 out of 2...

  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


	Processing donor 1 out of 2...

  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


	Processing donor 1 out of 2...

  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


	Processing donor 1 out of 2...

  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


          leiden_6 condition   sample tissue
NSCLC-9_0    False     NSCLC  NSCLC-9   lung


  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


          condition   sample tissue leiden_1
NSCLC-9_0     NSCLC  NSCLC-9   lung     True


  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


          condition   sample leiden_2 tissue
NSCLC-9_0     NSCLC  NSCLC-9    False   lung


  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


          condition   sample leiden_3 tissue
NSCLC-9_0     NSCLC  NSCLC-9    False   lung


  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


          condition   sample leiden_4 tissue
NSCLC-9_0     NSCLC  NSCLC-9    False   lung


  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


          condition   sample leiden_0 tissue
NSCLC-9_0     NSCLC  NSCLC-9    False   lung


  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


          condition leiden_5   sample tissue
NSCLC-9_0     NSCLC    False  NSCLC-9   lung


  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


          leiden_6 condition   sample tissue
NSCLC-9_0    False     NSCLC  NSCLC-9   lung
NSCLC-9_1    False     NSCLC  NSCLC-9   lung


  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


          condition   sample tissue leiden_1
NSCLC-9_0     NSCLC  NSCLC-9   lung     True
NSCLC-9_1     NSCLC  NSCLC-9   lung     True


  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


          condition   sample leiden_2 tissue
NSCLC-9_0     NSCLC  NSCLC-9    False   lung
NSCLC-9_1     NSCLC  NSCLC-9    False   lung


  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


          condition   sample leiden_3 tissue
NSCLC-9_0     NSCLC  NSCLC-9    False   lung
NSCLC-9_1     NSCLC  NSCLC-9    False   lung


  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


          condition   sample leiden_4 tissue
NSCLC-9_0     NSCLC  NSCLC-9    False   lung
NSCLC-9_1     NSCLC  NSCLC-9    False   lung


  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


          condition leiden_5   sample tissue
NSCLC-9_0     NSCLC    False  NSCLC-9   lung
NSCLC-9_1     NSCLC    False  NSCLC-9   lung


  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


          condition   sample leiden_0 tissue
NSCLC-9_0     NSCLC  NSCLC-9    False   lung
NSCLC-9_1     NSCLC  NSCLC-9    False   lung


  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


          leiden_6 condition   sample tissue
NSCLC-9_0    False     NSCLC  NSCLC-9   lung
NSCLC-9_1    False     NSCLC  NSCLC-9   lung
NSCLC-9_2    False     NSCLC  NSCLC-9   lung
	Processing donor 2 out of 2...

  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


           leiden_6 condition    sample tissue
NSCLC-9_0     False     NSCLC   NSCLC-9   lung
NSCLC-9_1     False     NSCLC   NSCLC-9   lung
NSCLC-9_2     False     NSCLC   NSCLC-9   lung
NSCLC-10_0    False     NSCLC  NSCLC-10   lung


  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


          condition   sample tissue leiden_1
NSCLC-9_0     NSCLC  NSCLC-9   lung     True
NSCLC-9_1     NSCLC  NSCLC-9   lung     True
NSCLC-9_2     NSCLC  NSCLC-9   lung     True
	Processing donor 2 out of 2...

  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


          condition leiden_5   sample tissue
NSCLC-9_0     NSCLC    False  NSCLC-9   lung
NSCLC-9_1     NSCLC    False  NSCLC-9   lung
NSCLC-9_2     NSCLC    False  NSCLC-9   lung
          condition   sample leiden_3 tissue
NSCLC-9_0     NSCLC  NSCLC-9    False   lung
NSCLC-9_1     NSCLC  NSCLC-9    False   lung
NSCLC-9_2     NSCLC  NSCLC-9    False   lung
	Processing donor 2 out of 2...

  df_donor = df_donor.groupby(donor_key).agg(agg_dict)
  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


          condition   sample leiden_4 tissue
NSCLC-9_0     NSCLC  NSCLC-9    False   lung
NSCLC-9_1     NSCLC  NSCLC-9    False   lung
NSCLC-9_2     NSCLC  NSCLC-9    False   lung
          condition   sample leiden_0 tissue
NSCLC-9_0     NSCLC  NSCLC-9    False   lung
NSCLC-9_1     NSCLC  NSCLC-9    False   lung
NSCLC-9_2     NSCLC  NSCLC-9    False   lung
	Processing donor 2 out of 2...

  df_donor = df_donor.groupby(donor_key).agg(agg_dict)
  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


          condition   sample leiden_2 tissue
NSCLC-9_0     NSCLC  NSCLC-9    False   lung
NSCLC-9_1     NSCLC  NSCLC-9    False   lung
NSCLC-9_2     NSCLC  NSCLC-9    False   lung
	Processing donor 2 out of 2...

  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


           leiden_6 condition    sample tissue
NSCLC-9_0     False     NSCLC   NSCLC-9   lung
NSCLC-9_1     False     NSCLC   NSCLC-9   lung
NSCLC-9_2     False     NSCLC   NSCLC-9   lung
NSCLC-10_0    False     NSCLC  NSCLC-10   lung
NSCLC-10_1    False     NSCLC  NSCLC-10   lung


  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


           condition    sample leiden_3 tissue
NSCLC-9_0      NSCLC   NSCLC-9    False   lung
NSCLC-9_1      NSCLC   NSCLC-9    False   lung
NSCLC-9_2      NSCLC   NSCLC-9    False   lung
NSCLC-10_0     NSCLC  NSCLC-10    False   lung
           condition    sample tissue leiden_1
NSCLC-9_0      NSCLC   NSCLC-9   lung     True
NSCLC-9_1      NSCLC   NSCLC-9   lung     True
NSCLC-9_2      NSCLC   NSCLC-9   lung     True
NSCLC-10_0     NSCLC  NSCLC-10   lung    False


  df_donor = df_donor.groupby(donor_key).agg(agg_dict)
  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


           condition    sample leiden_0 tissue
NSCLC-9_0      NSCLC   NSCLC-9    False   lung
NSCLC-9_1      NSCLC   NSCLC-9    False   lung
NSCLC-9_2      NSCLC   NSCLC-9    False   lung
NSCLC-10_0     NSCLC  NSCLC-10    False   lung


  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


           condition    sample leiden_2 tissue
NSCLC-9_0      NSCLC   NSCLC-9    False   lung
NSCLC-9_1      NSCLC   NSCLC-9    False   lung
NSCLC-9_2      NSCLC   NSCLC-9    False   lung
NSCLC-10_0     NSCLC  NSCLC-10    False   lung


  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


           condition leiden_5    sample tissue
NSCLC-9_0      NSCLC    False   NSCLC-9   lung
NSCLC-9_1      NSCLC    False   NSCLC-9   lung
NSCLC-9_2      NSCLC    False   NSCLC-9   lung
NSCLC-10_0     NSCLC    False  NSCLC-10   lung


  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


           condition    sample leiden_4 tissue
NSCLC-9_0      NSCLC   NSCLC-9    False   lung
NSCLC-9_1      NSCLC   NSCLC-9    False   lung
NSCLC-9_2      NSCLC   NSCLC-9    False   lung
NSCLC-10_0     NSCLC  NSCLC-10    False   lung


  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


           leiden_6 condition    sample tissue
NSCLC-9_0     False     NSCLC   NSCLC-9   lung
NSCLC-9_1     False     NSCLC   NSCLC-9   lung
NSCLC-9_2     False     NSCLC   NSCLC-9   lung
NSCLC-10_0    False     NSCLC  NSCLC-10   lung
NSCLC-10_1    False     NSCLC  NSCLC-10   lung
NSCLC-10_2    False     NSCLC  NSCLC-10   lung


Finished leiden_0...
           condition    sample leiden_4 tissue
NSCLC-9_0      NSCLC   NSCLC-9    False   lung
NSCLC-9_1      NSCLC   NSCLC-9    False   lung
NSCLC-9_2      NSCLC   NSCLC-9    False   lung
NSCLC-10_0     NSCLC  NSCLC-10    False   lung
NSCLC-10_1     NSCLC  NSCLC-10    False   lung


  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


           condition    sample tissue leiden_1
NSCLC-9_0      NSCLC   NSCLC-9   lung     True
NSCLC-9_1      NSCLC   NSCLC-9   lung     True
NSCLC-9_2      NSCLC   NSCLC-9   lung     True
NSCLC-10_0     NSCLC  NSCLC-10   lung    False
NSCLC-10_1     NSCLC  NSCLC-10   lung    False


  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


           condition    sample leiden_0 tissue
NSCLC-9_0      NSCLC   NSCLC-9    False   lung
NSCLC-9_1      NSCLC   NSCLC-9    False   lung
NSCLC-9_2      NSCLC   NSCLC-9    False   lung
NSCLC-10_0     NSCLC  NSCLC-10    False   lung
NSCLC-10_1     NSCLC  NSCLC-10    False   lung


  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


           condition leiden_5    sample tissue
NSCLC-9_0      NSCLC    False   NSCLC-9   lung
NSCLC-9_1      NSCLC    False   NSCLC-9   lung
NSCLC-9_2      NSCLC    False   NSCLC-9   lung
NSCLC-10_0     NSCLC    False  NSCLC-10   lung
NSCLC-10_1     NSCLC    False  NSCLC-10   lung
           condition    sample leiden_2 tissue
NSCLC-9_0      NSCLC   NSCLC-9    False   lung
NSCLC-9_1      NSCLC   NSCLC-9    False   lung
NSCLC-9_2      NSCLC   NSCLC-9    False   lung
NSCLC-10_0     NSCLC  NSCLC-10    False   lung
NSCLC-10_1     NSCLC  NSCLC-10    False   lung
           condition    sample leiden_3 tissue
NSCLC-9_0      NSCLC   NSCLC-9    False   lung
NSCLC-9_1      NSCLC   NSCLC-9    False   lung
NSCLC-9_2      NSCLC   NSCLC-9    False   lung
NSCLC-10_0     NSCLC  NSCLC-10    False   lung
NSCLC-10_1     NSCLC  NSCLC-10    False   lung


  df_donor = df_donor.groupby(donor_key).agg(agg_dict)
  df_donor = df_donor.groupby(donor_key).agg(agg_dict)
  df_donor = df_donor.groupby(donor_key).agg(agg_dict)


           condition    sample leiden_4 tissue
NSCLC-9_0      NSCLC   NSCLC-9    False   lung
NSCLC-9_1      NSCLC   NSCLC-9    False   lung
NSCLC-9_2      NSCLC   NSCLC-9    False   lung
NSCLC-10_0     NSCLC  NSCLC-10    False   lung
NSCLC-10_1     NSCLC  NSCLC-10    False   lung
NSCLC-10_2     NSCLC  NSCLC-10    False   lung


Finished leiden_1...


  adata_pb = adata_pb.concatenate(adata_leiden_cluster)


           condition    sample leiden_2 tissue
NSCLC-9_0      NSCLC   NSCLC-9    False   lung
NSCLC-9_1      NSCLC   NSCLC-9    False   lung
NSCLC-9_2      NSCLC   NSCLC-9    False   lung
NSCLC-10_0     NSCLC  NSCLC-10    False   lung
NSCLC-10_1     NSCLC  NSCLC-10    False   lung
NSCLC-10_2     NSCLC  NSCLC-10    False   lung


           condition    sample leiden_0 tissue
NSCLC-9_0      NSCLC   NSCLC-9    False   lung
NSCLC-9_1      NSCLC   NSCLC-9    False   lung
NSCLC-9_2      NSCLC   NSCLC-9    False   lung
NSCLC-10_0     NSCLC  NSCLC-10    False   lung
NSCLC-10_1     NSCLC  NSCLC-10    False   lung
NSCLC-10_2     NSCLC  NSCLC-10    False   lung


           condition    sample tissue leiden_1
NSCLC-9_0      NSCLC   NSCLC-9   lung     True
NSCLC-9_1      NSCLC   NSCLC-9   lung     True
NSCLC-9_2      NSCLC   NSCLC-9   lung     True
NSCLC-10_0     NSCLC  NSCLC-10   lung    False
NSCLC-10_1     NSCLC  NSCLC-10   lung    False
NSCLC-10_2     NSCLC  NSCLC-10   lung    False


       

  adata_pb = adata_pb.concatenate(adata_leiden_cluster)


Finished leiden_3...


  adata_pb = adata_pb.concatenate(adata_leiden_cluster)


Finished leiden_4...


  adata_pb = adata_pb.concatenate(adata_leiden_cluster)


Finished leiden_5...
Finished leiden_6...


  adata_pb = adata_pb.concatenate(adata_leiden_cluster)
  adata_pb = adata_pb.concatenate(adata_leiden_cluster)


AnnData object with n_obs × n_vars = 42 × 28863
    obs: 'leiden_6', 'condition', 'sample', 'tissue', 'leiden_4', 'batch', 'leiden_2', 'leiden_0', 'leiden_1', 'leiden_3', 'leiden_5'