Learning objective: Understand how the EM algorithm can be used to estimate transcript expression and how it can be influenced by missing transcript annotations.
Your are given the following three transcripts with corresponding exon-level read counts. You can assume that all exons have equal length of 100 nucleotides. With the help of the EM algorithm tutorial, answer the following questions:
- Write down the transcript compatibility matrix
M
and the corresponding read count vectork
. See the example in the slides for reference. - Using the EM-algorithm implemented in the tutorial, estimate the expression of all three transcripts. In our report, provide these final estimates after 1000 iterations as well as a visualisation of how these estimates evolved.
- Now, remove Transcript 2 from the annotations, construct a new transcript compatibility matrix
M
and count vectork
and re-estimate the expression of the remaining two transcripts. How did the expression estimates change compared to using the original 'correct' annotations?
Learning objective: Learn how differentially used transcripts can be detected using DRIMSeq and how these changes can be visualized using IGV.
Dataset: SummarizedExperiment object (alternative link) containing transcript expression levels estimated using Salmon. BigWig files for visualisation.
Software: DRIMSeq R package, IGV
Following the DRIMSeq tutorial and example code provided in here, perform differential transcript usage (DTU) analysis on naive vs Salmonella condition. To limit the computational time required, only include genes from chromosome 6 in your analysis. Answer the following questions.
- How many genes undergo differential transcript usage (DTU) in these two conditions (FDR < 0.01)? What fraction of total genes tested is it?
- What are the three genes with the smallest DTU p-values? Report both the Ensembl gene ids as well as the their friendly names.
- Using the
plotProportions
function, visualise the transcript proportions before and after Salmonella infection for each of the top 3 genes. What do you see? Report the names of the transcript whose proportion changed the most. Is it only one transcript that changes or are there many transcript that change simultaneously? - Repeat the same analysis for naive vs IFNg conditions and answer the same questions. Are the top 3 genes with smallest DTU p-values the same or different?
Learning objective: Understand how differential transcript usage manifest at the level of RNA-seq read coverage and how these changes can be detected using visual inspection.
- First, download the BigWig files containing the RNA-seq read coverage from four of the samples found in the original dataset. The
aipt_A
andauim_A
samples are from the naive condition and theaipt_C
andauim_C
samples are from the Salmonella condition. - Open these four bigWig files in IGV and make sure that the reference genome version is set to GRCh38.
- Use the search box in IGV to locate the top 3 genes with the smallest differential transcript usage p-values from the naive vs Salmonella comparison that you identified in Task 2.
- Make screenshots of RNA-seq read coverage from these three genes and include them in your report. Also, highlight (e.g. with a red rectangle) the specific exons or parts of transcript whose usage changes between naive vs Salmonella conditions.