# Alternative Splicing Analysis

### Step 1: Convert Outrigger PSI output to .h5ad format

This step converts the Outrigger PSI matrix into an `.h5ad` file for downstream analysis. 
Missing (NaN) values are preserved to reflect unquantified splicing events.


In [1]:
from DOLPHIN.AS.convert_psi_to_h5ad import run_convert_psi

In [2]:
adata_psi = run_convert_psi(
    metadata_path="/mnt/data/kailu9/DOLPHIN_run_input_output/DOLPHIN_tutorial/fsla_meta.csv",
    outrigger_path="/mnt/data/kailu/00_scExon/01_flash_seq_original/06_v2/02_exon/N10/outrigger_output",
    out_name='fsla',
    out_directory="/mnt/data/kailu9/DOLPHIN_run_input_output/DOLPHIN_tutorial"
)

100%|██████████| 795/795 [05:34<00:00,  2.38it/s]


In [3]:
adata_psi 

AnnData object with n_obs × n_vars = 795 × 9487
    obs: 'celltype1', 'celltype2'
    var: 'gene_name'

In [7]:
adata_psi.to_df().head()

Unnamed: 0,isoform1=junction:10:100246936-100253420:-|isoform2=junction:10:100250333-100253420:-@exon:10:100250248-100250332:-@junction:10:100246936-100250247:-,isoform1=junction:10:100256477-100260965:-|isoform2=junction:10:100260320-100260965:-@exon:10:100260218-100260319:-@junction:10:100256477-100260217:-,isoform1=junction:10:100489762-100490705:-|isoform2=junction:10:100490323-100490705:-@exon:10:100490008-100490322:-@junction:10:100489762-100490007:-,isoform1=junction:10:100496432-100497666:-|isoform2=junction:10:100497281-100497666:-@exon:10:100497135-100497280:-@junction:10:100496432-100497134:-,isoform1=junction:10:100498208-100499159:-|isoform2=junction:10:100498805-100499159:-@exon:10:100498705-100498804:-@junction:10:100498208-100498704:-,isoform1=junction:10:100516961-100526974:-|isoform2=junction:10:100526555-100526974:-@exon:10:100526399-100526554:-@junction:10:100516961-100526398:-,isoform1=junction:10:100523930-100526974:-|isoform2=junction:10:100526555-100526974:-@exon:10:100526399-100526554:-@junction:10:100523930-100526398:-,isoform1=junction:10:100983818-100986748:-|isoform2=junction:10:100984075-100986748:-@exon:10:100983948-100984074:-@junction:10:100983818-100983947:-,isoform1=junction:10:101611770-101624744:-|isoform2=junction:10:101612478-101624744:-@exon:10:101612337-101612477:-@junction:10:101611770-101612336:-,isoform1=junction:10:101624811-101672914:-|isoform2=junction:10:101667981-101672914:-@exon:10:101667886-101667980:-@junction:10:101624811-101667885:-,...,isoform1=junction:X:78945496-78960507:+|isoform2=junction:X:78945496-78952192:+@exon:X:78952193-78952335:+@junction:X:78952336-78960507:+,isoform1=junction:X:78947864-78960507:+|isoform2=junction:X:78947864-78952192:+@exon:X:78952193-78952335:+@junction:X:78952336-78960507:+,isoform1=junction:X:79361480-79362941:-|isoform2=junction:X:79362692-79362941:-@exon:X:79362581-79362691:-@junction:X:79361480-79362580:-,isoform1=junction:X:81202246-81276983:+|isoform2=junction:X:81202246-81202436:+@exon:X:81202437-81202576:+@junction:X:81202577-81276983:+,isoform1=junction:Y:12909408-12912726:+|isoform2=junction:Y:12909408-12911838:+@exon:Y:12911839-12911968:+@junction:Y:12911969-12912726:+,isoform1=junction:Y:13359987-13366266:-|isoform2=junction:Y:13360529-13366266:-@exon:Y:13360430-13360528:-@junction:Y:13359987-13360429:-,isoform1=junction:Y:19587508-19590082:+|isoform2=junction:Y:19587508-19589520:+@exon:Y:19589521-19589612:+@junction:Y:19589613-19590082:+,isoform1=junction:Y:19735751-19741317:-|isoform2=junction:Y:19739663-19741317:-@exon:Y:19739528-19739662:-@junction:Y:19735751-19739527:-,isoform1=junction:Y:20582694-20588023:+|isoform2=junction:Y:20582694-20584473:+@exon:Y:20584474-20584524:+@junction:Y:20584525-20588023:+,isoform1=junction:Y:2854772-2866792:+|isoform2=junction:Y:2854772-2865087:+@exon:Y:2865088-2865245:+@junction:Y:2865246-2866792:+
SRR18388386,,,,,,,1.0,,,,...,,,1.0,0.0,,,,,,1.0
SRR18387779,,,,,,,1.0,,,,...,,,,0.054945,,,,,,1.0
SRR18387770,,,,,,,,,,,...,,,1.0,0.0,,,,,,1.0
SRR18388394,,,,,,,1.0,,,,...,,,,,,,,,,1.0
SRR18387788,,,,,,,1.0,,,,...,,,1.0,,,,,,,1.0


### Step 2: Cell Clustering Using PSI Values

This step processes the <sample_name>_PSI.h5ad file to facilitate cell clustering. 
To enable PCA and downstream analyses, missing PSI (NaN) values are imputed with random values between 0 and 1. 
The resulting matrix is saved as a new .h5ad file containing the imputed PSI values.


In [1]:
from DOLPHIN.AS.convert_random_psi import run_psi_random

In [3]:
adata_psi_random = run_psi_random(
    outrigger_psi_data="/mnt/data/kailu9/DOLPHIN_run_input_output/DOLPHIN_tutorial/alternative_splicing/fsla_PSI.h5ad",
    out_name="fsla",
    out_directory='/mnt/data/kailu9/DOLPHIN_run_input_output/DOLPHIN_tutorial')

In [4]:
adata_psi_random.to_df().head()

Unnamed: 0,isoform1=junction:10:100246936-100253420:-|isoform2=junction:10:100250333-100253420:-@exon:10:100250248-100250332:-@junction:10:100246936-100250247:-,isoform1=junction:10:100256477-100260965:-|isoform2=junction:10:100260320-100260965:-@exon:10:100260218-100260319:-@junction:10:100256477-100260217:-,isoform1=junction:10:100489762-100490705:-|isoform2=junction:10:100490323-100490705:-@exon:10:100490008-100490322:-@junction:10:100489762-100490007:-,isoform1=junction:10:100496432-100497666:-|isoform2=junction:10:100497281-100497666:-@exon:10:100497135-100497280:-@junction:10:100496432-100497134:-,isoform1=junction:10:100498208-100499159:-|isoform2=junction:10:100498805-100499159:-@exon:10:100498705-100498804:-@junction:10:100498208-100498704:-,isoform1=junction:10:100516961-100526974:-|isoform2=junction:10:100526555-100526974:-@exon:10:100526399-100526554:-@junction:10:100516961-100526398:-,isoform1=junction:10:100523930-100526974:-|isoform2=junction:10:100526555-100526974:-@exon:10:100526399-100526554:-@junction:10:100523930-100526398:-,isoform1=junction:10:100983818-100986748:-|isoform2=junction:10:100984075-100986748:-@exon:10:100983948-100984074:-@junction:10:100983818-100983947:-,isoform1=junction:10:101611770-101624744:-|isoform2=junction:10:101612478-101624744:-@exon:10:101612337-101612477:-@junction:10:101611770-101612336:-,isoform1=junction:10:101624811-101672914:-|isoform2=junction:10:101667981-101672914:-@exon:10:101667886-101667980:-@junction:10:101624811-101667885:-,...,isoform1=junction:X:78945496-78960507:+|isoform2=junction:X:78945496-78952192:+@exon:X:78952193-78952335:+@junction:X:78952336-78960507:+,isoform1=junction:X:78947864-78960507:+|isoform2=junction:X:78947864-78952192:+@exon:X:78952193-78952335:+@junction:X:78952336-78960507:+,isoform1=junction:X:79361480-79362941:-|isoform2=junction:X:79362692-79362941:-@exon:X:79362581-79362691:-@junction:X:79361480-79362580:-,isoform1=junction:X:81202246-81276983:+|isoform2=junction:X:81202246-81202436:+@exon:X:81202437-81202576:+@junction:X:81202577-81276983:+,isoform1=junction:Y:12909408-12912726:+|isoform2=junction:Y:12909408-12911838:+@exon:Y:12911839-12911968:+@junction:Y:12911969-12912726:+,isoform1=junction:Y:13359987-13366266:-|isoform2=junction:Y:13360529-13366266:-@exon:Y:13360430-13360528:-@junction:Y:13359987-13360429:-,isoform1=junction:Y:19587508-19590082:+|isoform2=junction:Y:19587508-19589520:+@exon:Y:19589521-19589612:+@junction:Y:19589613-19590082:+,isoform1=junction:Y:19735751-19741317:-|isoform2=junction:Y:19739663-19741317:-@exon:Y:19739528-19739662:-@junction:Y:19735751-19739527:-,isoform1=junction:Y:20582694-20588023:+|isoform2=junction:Y:20582694-20584473:+@exon:Y:20584474-20584524:+@junction:Y:20584525-20588023:+,isoform1=junction:Y:2854772-2866792:+|isoform2=junction:Y:2854772-2865087:+@exon:Y:2865088-2865245:+@junction:Y:2865246-2866792:+
SRR18388386,0.548814,0.715189,0.602763,0.544883,0.423655,0.645894,1.0,0.891773,0.963663,0.383442,...,0.1922,0.916999,1.0,0.0,0.224325,0.646099,0.377303,0.239175,0.843921,1.0
SRR18387779,0.471649,0.285935,0.872293,0.419384,0.465397,0.191993,1.0,0.549905,0.656898,0.418817,...,0.690972,0.570053,0.554895,0.054945,0.350364,0.765937,0.074863,0.808629,0.241341,1.0
SRR18387770,0.467157,0.207176,0.91384,0.688435,0.001312,0.802888,0.192368,0.41085,0.828048,0.916628,...,0.655658,0.15054,1.0,0.0,0.511076,0.095635,0.83567,0.217615,0.790823,1.0
SRR18388394,0.92163,0.576743,0.486409,0.64668,0.844161,0.30135,1.0,0.214558,0.589372,0.956229,...,0.258662,0.366379,0.270519,0.613285,0.829367,0.948184,0.816119,0.677352,0.157222,1.0
SRR18387788,0.978044,0.774697,0.780661,0.542142,0.946817,0.996528,1.0,0.841271,0.634649,0.234987,...,0.359507,0.236903,1.0,0.950538,0.37551,0.247062,0.320256,0.04728,0.274863,1.0


### Step 3: Differential Alternative Splicing Analysis

In this step, we perform differential alternative splicing analysis using the Wilcoxon test. 
To enable this, missing PSI (NaN) values are imputed using the average PSI across all events 
within each cell cluster. Specifically, for each cluster, we calculate the mean PSI across all 
available events and use this value to fill NaNs in that cluster's cells. This ensures that 
events with sparse coverage still receive imputations based on the overall splicing profile 
of their respective cluster.


In [1]:
from DOLPHIN.AS.generate_differential_as import run_differential_as

In [2]:
adata_psi_DAS = run_differential_as(
    outrigger_psi_data="/mnt/data/kailu9/DOLPHIN_run_input_output/DOLPHIN_tutorial/alternative_splicing/fsla_PSI.h5ad",
    out_name="fsla",
    cluster_name="celltype1",
    out_directory='/mnt/data/kailu9/DOLPHIN_run_input_output/DOLPHIN_tutorial'
)

Total number of splicing events before filtering: 9487
Number of splicing events after filtering (>= 10 cells with valid PSI): 4978
