# Differential Expression Analysis among the final labels
In this notebook, I analyzed the manual annotation stored in the column 'ThirdManualAnnotation' from `noAdolescence_nocc_noclusters_ThirdManualAnnotations_Interneurons.h5ad`, retrieving the top 25 marker genes for each label.

In [1]:
import numpy as np
import pandas as pd
import scanpy as sc
import matplotlib.pyplot as plt
import anndata as ad

In [2]:
adata = sc.read_h5ad('/hpc/hers_basak/rnaseq_data/Silettilab/samples/final_useful_datasets/noAdolescence_nocc_noclusters_ThirdManualAnnotations_Interneurons.h5ad')

In [3]:
ccGenesHuman = np.loadtxt('/hpc/hers_basak/rnaseq_data/Silettilab/samples/final_useful_datasets/ccGenesHuman.txt', dtype=str)
mask = ~adata.var_names.isin(ccGenesHuman)
adata = adata[:, mask]

In [4]:
adata = adata[:, ~adata.var_names.str.startswith(('MT-', 'RP'))]

#### I normalized, logarithmized scaled the data, and performed PCA.

In [5]:
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)

  view_to_actual(adata)


#### I performed Differential Expression Analysis, and plotted the first 25 markers for each cluster.

In [6]:
sc.tl.rank_genes_groups(adata, "ThirdManualAnnotations", method="logreg")
for group in adata.uns['rank_genes_groups']['names'].dtype.names:
    genes = adata.uns['rank_genes_groups']['names'][group][:25]
    print(f"Group {group}:")
    print("', '".join(genes))
    print("\n")

Group Astrocytes:
CLU', 'APOE', 'LINC00609', 'NTRK2', 'GPC5', 'DAAM2', 'PLCG2', 'AQP4-AS1', 'LRIG1', 'PPP2R2B', 'AQP4', 'BCAN', 'FAM107A', 'GLIS3', 'LRRC4C', 'NEAT1', 'RYR3', 'GRM3', 'ATP1A2', 'NAV2', 'LUZP2', 'SPARCL1', 'ADCY2', 'ACSBG1', 'COL5A3


Group OPCs:
PCDH15', 'LHFPL3', 'CA10', 'OPCML', 'FGF14', 'MMP16', 'NXPH1', 'DCC', 'TNR', 'ANKS1B', 'CNTN1', 'SGCD', 'LRRC4C', 'SNTG1', 'KIF13A', 'RNF144A', 'NRXN3', 'ASTN2', 'PPP2R2B', 'CNTNAP5', 'SEMA5A', 'DISC1', 'GRIA2', 'SCN1A', 'SORCS3


Group Subcortical nIPCs:
HBA2', 'CACNA2D1', 'HBA1', 'DLX2', 'KCNB2', 'KALRN', 'MAML3', 'HES6', 'AUTS2', 'PDE4D', 'ZIC1', 'ELAVL2', 'ROBO1', 'NNAT', 'CCSER1', 'PRKX', 'RUNX1T1', 'NFIB', 'MYT1L', 'CHRDL1', 'SPIRE1', 'KCNH7', 'KCNQ3', 'RBFOX1', 'USP9Y


Group early Radial Glia:
PLCG2', 'MALAT1', 'LINC00486', 'TUBA1A', 'EEF1G', 'HEY1', 'SFRP1', 'ID4', 'HBG2', 'EEF1A1', 'CDH12', 'HES4', 'VIM', 'HDAC9', 'LIX1', 'EIF1AY', 'LEF1', 'UNC13C', 'FAM182B', 'BAIAP2L1', 'B3GAT2', 'TUBA1B', 'MAP1B', 'DPP10', 'OOEP


G

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


#### I also performed a filtering analysis on the previous DEA to see more specific markers.

In [7]:
sc.tl.filter_rank_genes_groups(adata,  max_out_group_fraction=0.5)
filtered_genes = {}
for group in adata.uns['rank_genes_groups_filtered']['names'].dtype.names:
    genes = adata.uns['rank_genes_groups_filtered']['names'][group]
    filtered_gene_list = [gene for gene in genes if pd.notnull(gene)]
    filtered_genes[group] = filtered_gene_list[:25]

Overall, the markers for each class appear to be appropriate.