# Project T21 RNA-seq data

Project T21 RNA-seq data to Multiplier model and study differential LVs between affected (T21) and control (D21) samples.

# Load libraries/modules

In [199]:
library(biomaRt)
library (EDASeq)
library(here)
library(biomaRt)
library(DESeq2)
library(tidyverse)

# load plier utils
source(here::here('scripts/plier_util.R'))

── [1mAttaching core tidyverse packages[22m ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mggplot2  [39m 3.5.1     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mpurrr    [39m 1.0.2     [32m✔[39m [34mtidyr    [39m 1.3.1
── [1mConflicts[22m ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mlubridate[39m::[32m%within%()[39m masks [34mIRanges[39m::%within%()
[31m✖[39m [34mdplyr[39m::[32mcollapse()[39m     masks [34mBiostrings[39m::collapse(), [34mIRanges[39m::collapse()
[31m✖[39m [34mdplyr[39m::[32mcombine()[39m      masks [34

# Load data

In [79]:
# define output nb
output_nb_path = here('output/nbs/project_T21_rnaseq_data')
dir.create(output_nb_path, showWarnings = FALSE)

counts_matrix=here::here('data/GSE151282/GSE151282_Raw_gene_counts_matrix.txt')

# multiplier model
multiplier_z = readRDS(here('data/multiplier/multiplier_model_z.rds'))
multiplier_summary = readRDS(here('data/multiplier/multiplier_model_summary.rds'))
multiplier_b = readRDS(here('data/multiplier/multiplier_model_b.rds'))
multiplier_metadata = readRDS(here('data/multiplier/multiplier_model_metadata.rds'))

# TPM Normalization Process

TPM (Transcripts Per Kilobase Million) is a method for normalizing RNA sequencing data. It helps in comparing gene expression levels across different samples. The steps to compute TPM are as follows:

1. **Load the Data**: Import your gene count data into R.
2. **Calculate Gene Lengths**: Obtain or calculate the length of each gene in kilobases. This data is necessary for the normalization process.
3. **Compute Scaled Reads**: Divide each gene's read count by its length in kilobases to account for gene length and get scaled reads.
4. **Sum Scaled Reads**: Calculate the sum of all the scaled reads across all genes in a sample. This sum will be used to normalize the read counts so that they are comparable across samples.
5. **Calculate TPM**: Normalize each gene's scaled reads by the sum of scaled reads across all genes and multiply by 1,000,000. This final step adjusts for the total amount of transcript in each sample, allowing for comparisons across samples.


# GSE151282 RNA-seq analysis

- The transcriptome profile of human trisomy 21 blood cells
- Antonaros, F., Zenatelli, R., Guerri, G. et al. The transcriptome profile of human trisomy 21 blood cells. Hum Genomics 15, 25 (2021). https://doi.org/10.1186/s40246-021-00325-4
- Human blood cell RNA-Seq
- 4 T21
- 4 D21

In [76]:
gene_counts_GSE151282 <- read.table(counts_matrix, header = TRUE, sep = "\t", check.names = FALSE)
head(gene_counts_GSE151282)

Unnamed: 0_level_0,Geneid,GeneSymbol,A2_T21,A1_T21,B1_N,B2_N,A3_T21,A4_T21,B4_N,B3_N
Unnamed: 0_level_1,<chr>,<chr>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>
1,ENSG00000223972,DDX11L1,0,1,3,0,0,1,3,1
2,ENSG00000227232,WASH7P,12,56,21,6,1,1,11,3
3,ENSG00000278267,MIR6859-1,0,2,3,1,0,0,2,1
4,ENSG00000243485,MIR1302-2HG,0,0,0,0,0,0,0,0
5,ENSG00000284332,MIR1302-2,0,0,0,0,0,0,0,0
6,ENSG00000237613,FAM138A,0,0,0,1,0,0,0,0


In [77]:
tpm_gene_counts_GSE151282=tpm_normalization(gene_counts_GSE151282)
head(tpm_gene_counts_GSE151282)

[1m[22mJoining with `by = join_by(Geneid)`


Unnamed: 0,A2_T21,A1_T21,B1_N,B2_N,A3_T21,A4_T21,B4_N,B3_N
DDX11L1,0.0,0.1562536,0.4171173,0.0,0.0,0.08917092,0.2520165,0.09060845
WASH7P,0.3952496,1.4254501,0.4756529,0.1417166,0.0147832,0.01452637,0.1505339,0.04428165
MIR6859-1,0.0,7.7427165,10.3345477,3.5922695,0.0,0.0,4.1626608,2.24492569
MIR1302-2HG,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
MIR1302-2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
FAM138A,0.0,0.0,0.0,0.1576176,0.0,0.0,0.0,0.0


# Multiplier projection

In [176]:
multiplier_model = list('Z'=multiplier_z, 'L2'= multiplier_metadata$L2, 'B'=multiplier_b)

In [184]:
result_GetOrderedRowNormEM <- GetOrderedRowNormEM(tpm_gene_counts_GSE151282, multiplier_model)
ordered_tpm_gene_counts_GSE151282 = result_GetOrderedRowNormEM$exprs.norm.filtered
ordered_multiplier_model = result_GetOrderedRowNormEM$plier.model

head(ordered_tpm_gene_counts_GSE151282)
head(ordered_multiplier_model$Z)

Unnamed: 0,A2_T21,A1_T21,B1_N,B2_N,A3_T21,A4_T21,B4_N,B3_N
NOC2L,0.6082366,1.6261065,0.8168904987,0.4634723,-0.96307643,-0.8575348,-0.7929693,-0.9011253
HES4,0.6841713,2.1204888,-0.8568928828,-0.8568929,-0.51105783,-0.517066,0.103532,-0.1662825
ISG15,2.408746,-0.1315984,-0.514090565,-0.1248644,-0.01391688,-0.6118065,-0.4234802,-0.588989
AGRN,1.1354949,1.855902,0.0738083565,-0.1157431,-0.51973662,-0.781465,-0.759943,-0.8883175
TNFRSF18,0.5922369,2.2501706,-0.2445089295,-0.6279588,-0.83225993,-0.3296646,-0.3587785,-0.4492367
TNFRSF4,0.6826815,1.7832696,0.0001312308,0.8123221,-0.83873942,-0.7476056,-0.8530254,-0.839034


0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21
NOC2L,2.23797023,0.0,0.112685524,0.0,0.0,0.0,0,0,0.42079364,0.0,⋯,0.0,0.08177377,0.07962461,0.0,0,0.0,0.43284712,0.0,0.0,0.0
HES4,0.26999934,0.08490505,0.0,0.05744514,0.013894115,0.7679087,0,0,1.70535493,0.0,⋯,0.0,0.0,0.0526065,0.014124645,0,0.0,0.57737279,0.08543191,0.0,0.0
ISG15,0.03686669,0.0,0.0,0.04568015,0.004998977,0.0,0,0,0.23015827,0.1165996,⋯,0.0,0.0,0.0,0.018421383,0,0.003058988,0.0,0.0,0.0,0.0
AGRN,0.43266367,0.0,0.010172089,0.0,0.025733268,0.0,0,0,0.09382693,0.0930792,⋯,0.0,0.0,0.0,0.0,0,0.0,0.38517356,0.06212099,0.0,0.1202636
TNFRSF18,0.01226595,0.0,0.003937571,0.0,0.01739017,0.08748055,0,0,0.07813778,0.0,⋯,0.02974547,0.02527198,0.0,0.008847392,0,0.008247321,0.03988636,0.0,0.0,0.0
TNFRSF4,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.41229068,0.0,⋯,0.0,0.09573129,0.0,0.025296539,0,0.0,0.0,0.0,0.01924974,0.0


In [188]:
projection_GSE151282 <-  GetNewDataB(ordered_tpm_gene_counts_GSE151282, ordered_multiplier_model)
head(projection_GSE151282)

Unnamed: 0,A2_T21,A1_T21,B1_N,B2_N,A3_T21,A4_T21,B4_N,B3_N
"1,REACTOME_MRNA_SPLICING",-0.025046111,0.06213857,0.01185781,0.18622678,-0.02140594,-0.07213162,-0.055081798,-0.086557688
"2,SVM Monocytes",0.03638389,0.18310789,-0.03228209,-0.079091497,-0.02368477,-0.019642693,-0.034447336,-0.0303434
"3,REACTOME_TRANSMISSION_ACROSS_CHEMICAL_SYNAPSES",0.013326022,0.08472124,0.037300109,0.010356093,-0.03901721,-0.04336303,-0.020010945,-0.043312279
"4,REACTOME_NEURONAL_SYSTEM",0.001694782,0.03182559,0.041157425,-0.031931286,-0.03206873,0.017130031,-0.008051524,-0.019756288
LV 5,-0.016508079,0.04292706,0.074817856,-0.021723231,-0.04699848,-0.01277037,0.005936056,-0.025680815
LV 6,0.003098557,0.01915066,0.002973902,-0.008038568,-0.01661441,0.007721955,-0.006224847,-0.002067249


In [210]:
df_projection_GSE151282 <- data.frame(projection_GSE151282) %>%
  mutate(LV = rownames(projection_GSE151282)) %>%
  select(LV, everything()) %>%
  `rownames<-`(1:nrow(projection_GSE151282))

head(df_projection_GSE151282)

Unnamed: 0_level_0,LV,A2_T21,A1_T21,B1_N,B2_N,A3_T21,A4_T21,B4_N,B3_N
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,"1,REACTOME_MRNA_SPLICING",-0.025046111,0.06213857,0.01185781,0.18622678,-0.02140594,-0.07213162,-0.055081798,-0.086557688
2,"2,SVM Monocytes",0.03638389,0.18310789,-0.03228209,-0.079091497,-0.02368477,-0.019642693,-0.034447336,-0.0303434
3,"3,REACTOME_TRANSMISSION_ACROSS_CHEMICAL_SYNAPSES",0.013326022,0.08472124,0.037300109,0.010356093,-0.03901721,-0.04336303,-0.020010945,-0.043312279
4,"4,REACTOME_NEURONAL_SYSTEM",0.001694782,0.03182559,0.041157425,-0.031931286,-0.03206873,0.017130031,-0.008051524,-0.019756288
5,LV 5,-0.016508079,0.04292706,0.074817856,-0.021723231,-0.04699848,-0.01277037,0.005936056,-0.025680815
6,LV 6,0.003098557,0.01915066,0.002973902,-0.008038568,-0.01661441,0.007721955,-0.006224847,-0.002067249


In [220]:
# Transpose Data for Analysis
lv_data_long <- df_projection_GSE151282 %>% 
  pivot_longer(cols = -LV, names_to = "Sample", values_to = "Value") %>% 
  mutate(Group = ifelse(str_detect(Sample, "A[0-9]_T21"), "T21", "N"))

# Function to Perform t-test for each LV
t_test_results <- lv_data_long %>%
  group_by(LV) %>%
  summarise(p_value = t.test(Value ~ Group)$p.value)

# Multiple Testing Correction (FDR)
t_test_results <- t_test_results %>%
  mutate(p_adjusted = p.adjust(p_value, method = "fdr"))

t_test_results %>% 
    arrange(p_value) %>% 
    head()

LV,p_value,p_adjusted
<chr>,<dbl>,<dbl>
LV 481,0.01101154,0.9986664
LV 219,0.02924647,0.9986664
LV 737,0.03122931,0.9986664
"26,SVM Macrophages M2",0.05164269,0.9986664
"707,REACTOME_PEPTIDE_CHAIN_ELONGATION",0.0617742,0.9986664
LV 281,0.06467135,0.9986664


In [223]:
df_ordered_multiplier_modelZ <- as.data.frame(as.matrix(ordered_multiplier_model$Z)) %>%
  `colnames<-`(paste0("LV", seq_len(ncol(.)))) 
head(df_ordered_multiplier_modelZ)

Unnamed: 0_level_0,LV1,LV2,LV3,LV4,LV5,LV6,LV7,LV8,LV9,LV10,⋯,LV978,LV979,LV980,LV981,LV982,LV983,LV984,LV985,LV986,LV987
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
NOC2L,2.23797023,0.0,0.112685524,0.0,0.0,0.0,0,0,0.42079364,0.0,⋯,0.0,0.08177377,0.07962461,0.0,0,0.0,0.43284712,0.0,0.0,0.0
HES4,0.26999934,0.08490505,0.0,0.05744514,0.013894115,0.7679087,0,0,1.70535493,0.0,⋯,0.0,0.0,0.0526065,0.014124645,0,0.0,0.57737279,0.08543191,0.0,0.0
ISG15,0.03686669,0.0,0.0,0.04568015,0.004998977,0.0,0,0,0.23015827,0.1165996,⋯,0.0,0.0,0.0,0.018421383,0,0.003058988,0.0,0.0,0.0,0.0
AGRN,0.43266367,0.0,0.010172089,0.0,0.025733268,0.0,0,0,0.09382693,0.0930792,⋯,0.0,0.0,0.0,0.0,0,0.0,0.38517356,0.06212099,0.0,0.1202636
TNFRSF18,0.01226595,0.0,0.003937571,0.0,0.01739017,0.08748055,0,0,0.07813778,0.0,⋯,0.02974547,0.02527198,0.0,0.008847392,0,0.008247321,0.03988636,0.0,0.0,0.0
TNFRSF4,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.41229068,0.0,⋯,0.0,0.09573129,0.0,0.025296539,0,0.0,0.0,0.0,0.01924974,0.0


In [227]:
df_ordered_multiplier_modelZ[c("LV26")] %>% 
    dplyr::arrange(desc(LV26)) %>% 
    head(10)

Unnamed: 0_level_0,LV26
Unnamed: 0_level_1,<dbl>
CCL18,6.342055
CCL23,6.144151
CD209,5.832267
CLEC10A,5.338751
ALOX15,5.02282
CLEC4A,4.229976
DNASE1L3,4.100035
C1QC,3.023242
CD1A,2.797564
PDCD1LG2,2.501536


In [228]:
head(ordered_multiplier_model$B)

Unnamed: 0,SRP000599.SRR013549,SRP000599.SRR013550,SRP000599.SRR013551,SRP000599.SRR013552,SRP000599.SRR013553,SRP000599.SRR013554,SRP000599.SRR013555,SRP000599.SRR013556,SRP000599.SRR013557,SRP000599.SRR013558,⋯,SRP035599.SRR1139372,SRP035599.SRR1139393,SRP035599.SRR1139388,SRP035599.SRR1139378,SRP035599.SRR1139399,SRP035599.SRR1139386,SRP035599.SRR1139375,SRP035599.SRR1139382,SRP035599.SRR1139356,SRP035599.SRR1139370
"1,REACTOME_MRNA_SPLICING",-0.059296689,-0.047909034,-0.049366085,-0.065078034,-0.036394186,-0.046432986,-0.0409805,-0.040068202,-0.046137392,-0.048547681,⋯,0.02821953,0.035137107,0.06507733,0.07814365,0.092361864,0.069042346,0.090913845,0.096341467,0.13111465,0.171751422
"2,SVM Monocytes",0.006212678,0.003625471,0.006604582,0.009258006,0.005061427,0.004132735,0.008950264,0.007226716,0.007240987,0.005709697,⋯,-0.050455152,-0.03450197,-0.03364029,-0.049702173,-0.037425739,-0.050069528,-0.022575052,-0.055091302,-0.05686929,-0.01807257
"3,REACTOME_TRANSMISSION_ACROSS_CHEMICAL_SYNAPSES",-0.026105335,-0.03223206,-0.020621382,-0.027598555,-0.035248076,-0.038700769,-0.032527087,-0.030592727,-0.028937277,-0.02740566,⋯,-0.028609689,-0.033449754,-0.030583001,-0.032399106,-0.029365381,-0.025405876,-0.033657228,-0.03131768,-0.03092424,-0.027868614
"4,REACTOME_NEURONAL_SYSTEM",-0.022079745,-0.00897091,-0.020341711,-0.016260213,-0.003022898,0.002442659,-0.020457842,-0.023735309,-0.021581483,-0.022477572,⋯,-0.037122216,-0.029658154,-0.036349546,-0.039253549,-0.035204624,-0.036345061,-0.03451388,-0.035925708,-0.04035837,-0.031131153
LV 5,0.007663157,0.007036176,0.006608393,0.003446311,0.006340665,0.007106127,0.007930485,0.009164026,0.008023601,0.007937586,⋯,-0.003055909,-0.004783739,-0.004352417,-0.004159541,-0.001084991,-0.001884109,-0.003561052,-0.003546184,-0.01210732,-0.001192709
LV 6,0.003014322,0.002005692,0.007768482,-0.004943417,0.019649274,0.003509484,0.008170202,0.00943419,0.007881926,0.007861974,⋯,-0.042169397,-0.051798302,-0.045170871,-0.048687711,-0.038200131,-0.046747069,-0.042864145,-0.032012738,-0.02439828,-0.062237591


In [235]:
int_lvs = c(481,
            219,
            737,
            26,
            707,
            281)

multiplier_summary %>% 
dplyr::filter(`LV index` %in% int_lvs) %>%
dplyr::filter(FDR < 0.05 & AUC > 0.7)

pathway,LV index,AUC,p-value,FDR
<chr>,<chr>,<dbl>,<dbl>,<dbl>
IRIS_DendriticCell-Control,26,0.7741243,8.082947e-14,6.226756e-12
SVM Macrophages M2,26,0.7896932,0.0005503019,0.00460078
SVM Dendritic cells resting,26,0.9803279,3.743382e-09,1.062431e-07
REACTOME_PEPTIDE_CHAIN_ELONGATION,707,0.9997656,1.032655e-15,1.012472e-13
KEGG_RIBOSOME,707,0.9566363,2.047765e-13,1.472343e-11
MIPS_40S_RIBOSOMAL_SUBUNIT_CYTOPLASMIC,707,0.9977679,6.831719e-07,1.351928e-05
MIPS_RIBOSOME_CYTOPLASMIC,707,0.9995303,7.948872e-15,7.144049e-13
REACTOME_ACTIVATION_OF_THE_MRNA_UPON_BINDING_OF_THE_CAP_BINDING_COMPLEX_AND_EIFS_AND_SUBSEQUENT_BINDING_TO_43S,707,0.8645801,8.299737e-07,1.612841e-05
