# Exon-Level Differential Gene Analysis (EDEG)

In this section, we perform exon-level differential gene analysis using the feature matrix. 
This analysis aims to identify genes that exhibit significant differences in exon-level expression 
between different conditions or cell types.

### Step 1: Identify Exon Markers Using MAST

In this step, **DOLPHIN** uses the [MAST](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0844-5) model through the [Seurat](https://satijalab.org/seurat/) package to compute p-values for each exon. This helps identify exons that are differentially expressed across cell clusters or experimental conditions.

> **Note:** A separate conda environment is required to run Seurat. You can create it using the following commands:

```bash
conda env create -f environment_linux_R.yaml
pip install .
```

and then install MAST using the code below

```bash
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("MAST")
```

In [1]:
### Step 1-1: Convert .h5ad file to .rds format using Python
# This step uses the Python kernel to call an R script that converts
# the input AnnData (.h5ad) file into a Seurat-compatible .rds object.
from DOLPHIN.EDEG.call_convert import run_h5ad_rds

run_h5ad_rds(
    input_anndata = "./Feature_PDAC.h5ad",
    output_rds = "./Feature_PDAC.rds"
)


Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

package ‘dplyr’ was built under R version 4.2.3 
Attaching SeuratObject
Seurat v4 was just loaded with SeuratObject v5; disabling v5 assays and
validation routines, and ensuring assays work in strict v3/v4
compatibility mode
package ‘Seurat’ was built under R version 4.2.1 
package ‘patchwork’ was built under R version 4.2.3 
package ‘reticulate’ was built under R version 4.2.3 
In asMethod(object) :
  sparse->dense coercion: allocating vector of size 6.5 GiB


In [2]:
###  Step 1-2: Run MAST to identify exon-level markers (using R kernel for this step)
library(Seurat)

“package ‘Seurat’ was built under R version 4.2.1”
Attaching SeuratObject

Seurat v4 was just loaded with SeuratObject v5; disabling v5 assays and
validation routines, and ensuring assays work in strict v3/v4
compatibility mode



In [None]:
seurat_obj <- readRDS(file = "./Feature_PDAC.rds")
seurat_obj <- NormalizeData(seurat_obj, normalization.method = "LogNormalize", scale.factor = 10000)

In [None]:
seurat_obj@meta.data$Condition <- ifelse(grepl("N", seurat_obj@meta.data$source), "normal", "cancer")

In [5]:
unique(seurat_obj@meta.data$cluster)

In [None]:
Idents(seurat_obj) <- "Condition"

In [9]:
### Performing within-cluster comparisons at the cluster level.  
### You can modify the code below based on the design of your project.
### The code below performs comparison between normal and cancer cells within the ductal cell population.

sub_seurat <- subset(seurat_obj, subset = cluster %in% c("Ductal cell type 1", "Ductal cell type 2"))

DE_MAST <- FindMarkers(sub_seurat, ident.1="cancer", ident.2="normal", test.use="MAST", logfc.threshold=0.5)
write.csv(DE_MAST, file = paste0("./PDAC_MAST_ductal.csv"), row.names = TRUE)


Done!

Combining coefficients and standard errors

Calculating log-fold changes

Calculating likelihood ratio tests

Refitting on reduced model...


Done!

