Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding the concept behind the metrics used for FRASER calculations #9

Open
jjyotikataria opened this issue Jun 29, 2020 · 6 comments

Comments

@jjyotikataria
Copy link

@c-mertes @ischeller It will be great if you guys can explain more about the metrics sigh3 and sigh5 and theta 5 and theta3 used for the calculations of FRASER. It will be really helpful.

@jjyotikataria jjyotikataria changed the title Understanding the concept behing the metrics used behind FRASER calculations Understanding the concept behind the metrics used for FRASER calculations Jun 29, 2020
@ischeller
Copy link
Collaborator

Hi @jjyotikataria,

did you already have a look at the explanations of these metrics in our vignette and in our preprint on bioRxiv?

@jjyotikataria
Copy link
Author

Yes, @c-mertes @ischeller . I went through the vignette and the paper. But I am not so much confident about what the splicing metrics sigh5, sigh3, theta3, theta5 are basically denoting. Also, can you give me a basic explanation regarding the p values and z scores.

@jjyotikataria
Copy link
Author

@c-mertes @ischeller I went through almost all the papers and references. But I am not able to be understanding it completely. Can you please tell me what the p value, z score basically represents in context of the genes of the samples we are working upon. Also, what does psi of 0 to range 1 represents practically?

@ischeller
Copy link
Collaborator

ischeller commented Jul 7, 2020

I still not really sure what part you did not understand regarding the p values and z scores, so I will give a brief basic explanation and hope that it helps:

Basically, psi5 quantifies acceptor site usage for a given donor site, and psi3 quantifies donor site usage for a given acceptor site. The interpretation of them is therefore on the intron-level, and not at the gene-level.
For a given intron with donor site D and acceptor site A, the metric psi5 indicates the ratio of the number of reads that map to this intron over the number of all reads that map to this intron or any other intron that also uses donor site D (see Figure 1 in our preprint for an illustration). For example, a psi5=0.8 for this intron means that 80% of split reads that contain the donor site D use the acceptor site A, while 20% of those split reads use another acceptor site. As the psi5 values are ratios, they can take values between 0 and 1 and the psi5 values of all introns that share a donor site sum up to 1.
Therefore, an aberrant psi5 value in one sample indicates aberrant acceptor site usage (and analogously aberrant donor site usage for aberrant psi3 values).

In FRASER, we model the expected value of psi3/psi5/theta for every intron, and then compute p values based on the beta-binomial distribution by comparing the observed read count of the intron with this expected psi3/psi5/theta value to assess the statistical significance of the observed count. We also provide delta psi3/psi5/theta values (i.e. the difference between observed and expected values). We don't recommend to use the z scores to find aberrant events, we mainly include them for comparison with other methods. Rather use the delta psi3/psi5/theta value (which are easy to interpret) together with the p value to find aberrant splicing events that are of interest to you.
For convenience, we also provide aggregated p values per gene. A significant gene-level p value indicates that this gene contained at least one intron with an aberrant psi3/psi5/theta values. But the psi3/psi5/theta values themselves can only be interpreted on the intron-level.

I hope this helps to answer your questions.

@jjyotikataria
Copy link
Author

@ischeller Very elaborately explained. Thank you. I clearly understood now what psi5 and psi3 ratios indicate. I have another question dear. Like you said, "Therefore, an aberrant psi5 value in one sample indicates aberrant acceptor site usage (and analogously aberrant donor site usage for aberrant psi3 values." How do one determine and tag that particular psi value be it psi5 or psi3 as aberrant?

Also, observed or the experimental psi values are the values which we get from the real time data by giving the required bam files. What are exactly the expected psi values and what is the criteria for this expectation

It will make clear what delta psi is i.e. difference between actual observed and the expected values.

@drewjbeh
Copy link
Collaborator

@jjyotikataria sorry for lack of response. I was wondering if these questions still remained unanswered or if you still require help here? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants