Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with models containing RNAcentral identifiers #760

Closed
sylvainpoux opened this issue Mar 24, 2022 · 18 comments
Closed

Problem with models containing RNAcentral identifiers #760

sylvainpoux opened this issue Mar 24, 2022 · 18 comments

Comments

@sylvainpoux
Copy link

Hi Noctua developers,

my models describing binding to lncRNAs (using has_input RNACentral identifiers) do not pass the reasoner.

http://noctua.geneontology.org/editor/graph/gomodel:62183af000000874

http://noctua.geneontology.org/editor/graph/gomodel:6205c24300000050

After discussing with @pgaudet , it looks like, proteins with has_input are automatically considered as protein-binding. As consequence the RNA-binding terms (lncRNA-binding and pre-miRNA-binding) appear as protein-binding, which is incorrect

Would it be possible to correct this?

Thanks

Sylvain

@pgaudet
Copy link

pgaudet commented Mar 24, 2022

I think this is a problem with the reasoner asserting that the 'RNA binding' is a protein binding because there is a protein that participates in the reaction (the "enabler").

This has been reported for I think it was a transcription factor activity that was predicting protein catabolic process (not sure where that ticket is)

Thanks, Pascale

@vanaukenk
Copy link

@balhoff
I'm not sure how the 'protein binding' inference is being made here based on the logical def: binding AND ('has input') some protein and how the RNA Central IDs appear to be typed in GO-LEGO:
http://noctua-amigo.berkeleybop.org/amigo/term/RNAcentral:URS00005EB5B7_9606

@balhoff
Copy link
Member

balhoff commented Mar 25, 2022

@vanaukenk it looks like there is a SWRL rule in RO:

molecular_function(?x), molecular_function(?y), 'directly regulates'(?x, ?y), enables(?z, ?y) -> 'has input'(?x, ?z)

If one MF directly regulates another MF, the enabling gene product for the regulated MF is inferred to be an input to the regulator MF.

Does this make sense to you? Maybe @dosumis could explain the origin.

There is no logical violation in the model, but there appears to be a ShEx violation which I haven't dug into yet (not sure if it's connected to the inference).

@cmungall
Copy link
Member

let's talk about the biology for a second first then the technical parts

is trub1 really directly regulating dgcr8? I skimmed the paper v quickly but I would have thought trub1 -> let7 -> dgcr8? what is we later want to say SF3A3 -> let-7? I realize ncRNA functions are often trivial but should we not treat them like protein functions and not hop over them with direct regulation (hopping over with indirect is fine)?

@sylvainpoux
Copy link
Author

Hi @cmungall,

In my opinion, some RNAs should be treated like proteins. After all, a number of RNAs act as enzymes (ribosymes). Moreover, lncRNAs are gene products and a number of them have HGNC gene accessions. From that point of view, these RNAs should be represented in GO (Gene Ontology).

It is clear that we mainly concentrate on proteins, but in some cases, there are cross-reactions between different types of gene products (lncRNAs, tRNAs and proteins), and it would be important to represent this.

The two examples mentioned here show the following: some proteins can bind to a specific non-coding RNA present in RNAcentral, and this binding directly affects the activity of protein(s).

In the first example, TRUB1 binds to the stem-loop structure on pri-let-7, preventing binding between LIN28 and pri-let-7, thereby enhancing the interaction between pri-let-7 and the microprocessor DGCR8, which mediates pri-let-7 miRNA maturation (PubMed:32926445).

The second example shows that SIRT6 histone deacetylase activity is specifically repressed by long non-coding RNA lncPRESS1, which binds to SIRT6 and prevents chromatin-binding, thereby promoting stem cell pluripotency (PubMed:27912097).

In other words, I think we need a way to represent the binding of a protein to a specific RNA or the binding of a specific RNA to a protein.

Thanks

Sylvain

@pgaudet
Copy link

pgaudet commented Mar 31, 2022

@cmungall

We can discuss the model /paper separately, but we should be able to use 'has input' some RNA for a GO term like 'pri-miRNA binding', without inferring some protein binding, shouldn't we?

Thanks, Pascale

@thomaspd
Copy link

thomaspd commented May 5, 2022

I've looked at this and I think there are two separate issues.

  1. The SWRL rule @balhoff noted could explain the erroneous "protein binding" inference. I changed the model to use positively/negatively regulates instead of directly regulates, and the protein binding inference went away.
  2. I don't think the RNA Central IDs are in NEO, and that's why they aren't being recognized as chemical entities and are failing the ShEx. So we can solve this by adding selected RNA Central IDs to NEO. I don't think we want to allow use of arbitrary IDs anymore, right? In that case, curators would request any additional gene products they want to include in models.

@sylvainpoux I'd suggest that you make models in the Visual Pathway Editor. Then the relationships would not have been directly regulates. As an aside, in this specific case I don't think you need a regulates relationship between TRUB1 activity and DCGR8 activity. You've already captured the fact that it negatively regulates a negative regulating activity, which results in a positive effect. I've taken the liberty of modifying your model to illustrate it. But I haven't saved yet.

@sylvainpoux
Copy link
Author

Hi @thomaspd , Many thanks for your help and suggestions! I also modified the other model in the Visual pathway editor.
Thanks
Sylvain

@pgaudet
Copy link

pgaudet commented May 6, 2022

AFAIK RNA central IDs are in NEO - @kltm @balhoff is this right?

@pgaudet
Copy link

pgaudet commented May 6, 2022

Or, they used to be there: see Ruth's model http://noctua.geneontology.org/editor/graph/gomodel:5df932e000003004

@thomaspd
Copy link

thomaspd commented May 6, 2022

Thanks, I must have been wrong, and the RNA Central IDs are there. Ruth's model also fails the ShEx check. So it's a mystery to me why this would fail the ShEx check, since the type in GO-LEGO seems to be information biomacromolecule, but according to the error report (or at least my interpretation of it!) that is the check that it is failing.

@pgaudet
Copy link

pgaudet commented May 6, 2022 via email

@cmungall
Copy link
Member

cmungall commented May 6, 2022

There is a lot going on in this ticket - perhaps we can make new tickets for new issues?

  • the original problem was a direct link between TRUB1 and DCGR8, as I pointed out on Mar 30 - this has been fixed so the undesired (but correct) inference goes away. I agree with @thomaspd about doing these in the VPE
  • there seems to be some additional structural issues reported by the shex check, but I can't replicate this
  • there is an issue with autocomplete, but this is I think unrelated

Or, they used to be there: see Ruth's model http://noctua.geneontology.org/editor/graph/gomodel:5df932e000003004

When I look at that model I see an ID and not a label, this suggests it was injected rather than autocompleted

@cmungall
Copy link
Member

cmungall commented May 6, 2022

Put any questions regarding the presence/absence of RNAs here: geneontology/neo#99

let's keep this one for questions about validation of Sylvain's models

Note I may have mispoke here:

When I look at that model I see an ID and not a label, this suggests it was injected rather than autocompleted

I forgot that the labels for many things in RNA central are the IDs, e.g. URS000229B2C9_9606.

@sylvainpoux
Copy link
Author

Hi @cmungall , not sure to understand what you mean by "When I look at that model I see an ID and not a label, this suggests it was injected rather than autocompleted"

Do you mean that the identifier I used (URS000229B2C9_9606) is incorrect? If it is incorrect, which identifier should I use?

I checked my models and the reasoner still complains.

Sorry to ask such basic questions, but I'm not sure to understand the technical part of the discussion

Thanks

Sylvain

@cmungall
Copy link
Member

@sylvainpoux - sorry for causing confusion.

It turns out that RNAC provides names but not symbols for the RNAs, this is why they are showing up as IDs in the Noctua display rather than as something human readable as happens for proteins.

I think we can easily use the full name in place of the symbol for RNAs - would this be helpful to you?

The technical approach is here:
geneontology/neo#99 (comment)

@pgaudet
Copy link

pgaudet commented May 18, 2022

@sylvainpoux

To go back to your original comment - the two models you have there now pass the reasoner. Should we close this issue?

@sylvainpoux
Copy link
Author

Hi Chris and Pascale, that's great. Thank you so much!
Yes, you can close the ticket
Sylvain

@pgaudet pgaudet closed this as completed May 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants