New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

efficient way to sort through GOterms? #660

Closed
yaaminiv opened this Issue Jul 18, 2017 · 8 comments

Comments

Projects
None yet
2 participants
@yaaminiv
Contributor

yaaminiv commented Jul 18, 2017

I have a list of proteins, p-values and GOterms. The proteins were those I used for my SRM targets and p-values are related to level of differential expression between site and eelgrass conditions. I want to make a REVIGO plot for the biological processes these proteins are involved in.

Each protein has many GOterms associated with it. Is there an easy way to sort through these GOterms and isolate those related to biological processes? I tried just inputting the first GOterm listed for each protein in REVIGO and found that only two were related to biological processes and the rest were for molecular function.

biologicalprocesses

molfunction

My method for getting GOterms and making these plots can be found at the bottom of this notebook. Thanks!

@yaaminiv yaaminiv added the question label Jul 18, 2017

@sr320

This comment has been minimized.

Owner

sr320 commented Jul 18, 2017

Note your p-values are not associated with GO terms but rather proteins...

In order to best address- what is the overall goal?

Maybe represent what GO terms are represented in your target protein list?

If so, I would have 1 column with GO# and second number with number of occurrences in your list.

@yaaminiv

This comment has been minimized.

Contributor

yaaminiv commented Jul 18, 2017

My overall goal is to have some form of REVIGO visualization that relates the biological processes of the proteins I used for my SRM assay with their differential expression. Having one column with GOterms and one column with no. occurrences could also be a good visualization, but maybe not in REVIGO, as it splits GOterms between biological processes and molecular functions. Ideally, I just want one visualization with all information.

@sr320

This comment has been minimized.

Owner

sr320 commented Jul 18, 2017

So there will not be a single differential expression value for each GO term? So revigo is probably not best approach.

Closest thing would be to have an average differential expression value per GO term.

@sr320

This comment has been minimized.

Owner

sr320 commented Jul 18, 2017

Also note, it would not be appropriate to just pick first GO # (this would introduce bias).

@yaaminiv

This comment has been minimized.

Contributor

yaaminiv commented Jul 18, 2017

I just picked the first GO term to test if all of the GOterms I had were for biological processes or for something else.

If GOterms are repeated between proteins (which I'm sure they are), then there won't be a single p-value for each GOterm. I could average the p-values for each occurrence and do something with that inside or outside REVIGO?

@sr320

This comment has been minimized.

Owner

sr320 commented Jul 18, 2017

The p-value has no direct relation to the GOterm, so you should not average.

Bigger reason why you are trying to generate this particular figure?

@yaaminiv

This comment has been minimized.

Contributor

yaaminiv commented Jul 18, 2017

I just wanted to visualize the proteins I selected for SRM. I have a heatmap but thought I could try something else.

@sr320

This comment has been minimized.

Owner

sr320 commented Jul 18, 2017

spawning plan :)

@sr320 sr320 closed this Jul 20, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment