Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add shorter GO descriptions? #19

Open
TylerSagendorf opened this issue May 13, 2022 · 3 comments
Open

Add shorter GO descriptions? #19

TylerSagendorf opened this issue May 13, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@TylerSagendorf
Copy link

The entries in the gs_description column for GO terms are rather long and not ideal for use as human-readable identifiers when plotting ORA or GSEA results. Would it be possible to add a gs_brief_description column that uses the names from the appropriate GO database release? I have been getting the data using the code below and then left-joining it to ORA and GSEA results tables made with fgsea. For other databases, I just use the entries in gs_description.

# install.packages(c("ontologyIndex", "dplyr"))
library(ontologyIndex)
library(dplyr)

# Brief GO term descriptions (use same data from MSigDB release notes)
file <- "http://release.geneontology.org/2021-12-15/ontology/go-basic.obo"
go_basic_list <- get_OBO(file,
                         propagate_relationships = "is_a",
                         extract_tags = "minimal")

# Convert to data.frame with fewer columns
go_basic_df <- as.data.frame(go_basic_list) %>%
  filter(!obsolete) %>%
  select(pathway = id, name)
@igordot igordot added the enhancement New feature or request label May 13, 2022
@igordot
Copy link
Owner

igordot commented May 13, 2022

Thank you for the suggestion. Currently, the package is just reformatting the original MSigDB for easier access. This might be outside the scope, but certainly worth considering.

To clarify, this is really an aesthetic change to make the name easier to read, right? For example, GOBP_5_PHOSPHORIBOSE_1_DIPHOSPHATE_METABOLIC_PROCESS becomes 5-phosphoribose 1-diphosphate metabolic process and GOBP_ACTIVATION_OF_CYSTEINE_TYPE_ENDOPEPTIDASE_ACTIVITY_INVOLVED_IN_APOPTOTIC_PROCESS_BY_CYTOCHROME_C becomes activation of cysteine-type endopeptidase activity involved in apoptotic process by cytochrome c.

@TylerSagendorf
Copy link
Author

To clarify, this is really an aesthetic change to make the name easier to read, right? For example, GOBP_5_PHOSPHORIBOSE_1_DIPHOSPHATE_METABOLIC_PROCESS becomes 5-phosphoribose 1-diphosphate metabolic process and GOBP_ACTIVATION_OF_CYSTEINE_TYPE_ENDOPEPTIDASE_ACTIVITY_INVOLVED_IN_APOPTOTIC_PROCESS_BY_CYTOCHROME_C becomes activation of cysteine-type endopeptidase activity involved in apoptotic process by cytochrome c.

Yeah that's really all it is. Another solution would be to replace the underscores with spaces and change all text to lowercase, but that would remove intentional capitalization (such as with "mRNA") and characters that were replaced by underscores (like the dashes in your examples).

@igordot
Copy link
Owner

igordot commented May 13, 2022

Yes, the original non-alphanumeric characters and capitalization are probably the most valuable aspect, and that can't be automatically fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants