Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcription versus regulation of transcription #14935

Closed
pgaudet opened this issue Jan 26, 2018 · 22 comments
Closed

Transcription versus regulation of transcription #14935

pgaudet opened this issue Jan 26, 2018 · 22 comments

Comments

@pgaudet
Copy link
Contributor

pgaudet commented Jan 26, 2018

Hello,

During our discussion with Ruth and Marcio yesterday we came up with the following proposal:

Transcription by the different polymerases always takes place via the same process, regardless of the transcription being produced.
The types of proteins mediating those processes are: polymerase, GTFs and Mediator.

For example:

GO:0042796 snRNA transcription from RNA polymerase III promoter
GO:0042797 tRNA transcription from RNA polymerase III promoter
are mediated by the same proteins that mediate
GO:0006383 transcription from RNA polymerase III promoter

Hence these should be merged.

Regulation of transcription of these different transcripts is what (potentially) changes. The types of proteins mediating those processes are specific tx factors.

For eg for snRNA: existing terms:
GO:1905380 regulation of snRNA transcription from RNA polymerase II promoter
GO:1905381 negative regulation of snRNA transcription from RNA polymerase II promoter
GO:1905382 positive regulation of snRNA transcription from RNA polymerase II promoter
(well they should match in terms of the polymerase... another thing to worry about)

Logical definitions:

GO:1905380 regulation of snRNA transcription from RNA polymerase II promoter

CURRENT:
biological_process
and (regulates some 'snRNA transcription by RNA polymerase II')

PROPOSED:
'biological regulation'
and (regulates some 'transcription by RNA polymerase II')
and ('has output' some snRNA)


@RLovering @krchristie @thomaspd @ValWood @ukemi @vanaukenk

Are you OK with this proposal ?
Thanks, Pascale

@pgaudet pgaudet self-assigned this Jan 26, 2018
@ValWood
Copy link
Contributor

ValWood commented Jan 26, 2018

OK for me. I'm not aware of nay examples even where the regulation is different. Do they exist?

@ValWood
Copy link
Contributor

ValWood commented Jan 26, 2018

seems to be one:
http://www.uniprot.org/citations/20212087

@pgaudet
Copy link
Contributor Author

pgaudet commented Jan 26, 2018

@ValWood
you mean the regulation of non-protein coding genes?

@ValWood
Copy link
Contributor

ValWood commented Jan 26, 2018

I didn't know that transcription of tRNAs and snRNAs were differently regulated RNA III.....
but ignore they probably are becuse the promoters are different, is that correct?

@RLovering
Copy link

Hi
I realise this is going to be radical but why not just have the process terms describing the product, rather than the enzyme involved? How many other BP GO terms (other than signaling pathways) mention the enzyme?

eg
mRNA transcription

regulation of mRNA transcription (with AE field to capture the mRNAs regulated if available)
snRNA transcription
regulation of snRNA transcription (with AE data if available)

There may be leaf nodes where we want to distinguish the type of polymerase involved in the transcription,. As Astrid pointed out a long time ago: very few experiments that investigate transcription of a specific RNAs type that is transcribed by more than 1 RNA pol type distinguish which RNA pol is being used. In addition for those RNAs only transcribed by one type of RNA pol, eg mRNAs, we make the assumption that if the amount of mRNA changes then it must be due to regulation of RNA pol II. As we are discussing how important it is to have data rather than author intent and background knowledge we could be encouraging curators to avoid the using the RNA pol II term in annotations because the paper has no evidence that it is RNA pol II.

Plus not all our users will know that RNA pol II is the only pol to transcribe mRNAs.

However, this would mean the term
GO:0045944 positive regulation of transcription by RNA polymerase II
might have to be deleted! As there are 4000 annotations made directly to this term there maybe a few errors if we just assume that any annotations direct to this term were made due to regulation of mRNA transcription and then revise this to
GO:0045944 positive regulation of mRNA transcription
Definition: Any process that activates or increases the frequency, rate or extent of mRNA transcription.

Well it's the weekend!

@krchristie
Copy link
Contributor

I am thinking through Pascale's proposal. I was never keen on having all of the different terms that differ by the "type" of RNA, e.g. snRNA, tRNA, mRNA, etc, and David H and I considered trying to get rid of these types of terms in the previous transcription overhaul, but they were quite popular. I also asked some RNAP III researchers about whether there is something distinct going on transcription of snRNAs by RNAP III, and they were hesitant to remove these terms, so we decided that was farther than we were ready to go.

For the most part, I agree that the basic process of transcription by a given RNAP, e.g. RNAP II or RNAP III, is largely the same regardless of which "type" of RNA is produced, although with RNAP II, I believe that there are differences in the type of 5' cap that is applied snRNA transcripts versus mRNA transcripts, which may be coupled to the transcription process itself as capping is coupled to transcription.

@krchristie
Copy link
Contributor

I am against Ruth's proposal of having process terms only for the "type" of RNA product because I think this has the potential to conflate things that are not the same, for example, some snRNAs in eukaryotes are transcribed by RNAP II while others are transcribed by RNAP III.

While it is true that researchers interested in the expression of their favorite gene(s), but not so interested in the mechanism of transcription, rarely mention which polymerase is doing the transcription. In some cases, I would agree with Ruth that there is no direct evidence in that paper that it is RNAP II. However, in other cases, I think that there actually is evidence, e.g. details in the methods of what was used that might require a background in transcription such that it is unrealistic for most curators to pick up on. Before the 2010 transcription overhaul that David H and I did, we did NOT encourage curators to assume RNAP II for "all" mRNA transcription in eukaryotes (there is a known exception in trypanosomes where RNAP I transcribes a select subset of protein coding genes). In that work in 2010, David H and I realized that there was a huge number of annotations made to the most general transcription terms, even though I think we are pretty certain that the bulk of this is RNAP II.

[maybe more later, but have to go now]

@ValWood
Copy link
Contributor

ValWood commented Jan 26, 2018

I realise this is going to be radical

I think I prefer to have the transciption described by mechanism (the name of th enzyme is really a proxy for the differnt enzyme machinery that is assayed, rather than an individual enzyme).

I don't mind losing mRNA,tRNA, snRNA at all...we don't use these term, we capture with a S extension if this is specified in the experiment, or a specific gene product ID if it is very relevant to the experiment.

So far, I haven't come across any sub parts/additiona to the specific polymerase I,II, or III specific for a particular RNA-type in fission yeast (which is why I asked the question above). Then I looked at the annotations and found the fly paper for snRNA (although I still wasn't sure whether this was also used in other contexts)

@pgaudet
Copy link
Contributor Author

pgaudet commented Jan 31, 2018

Hi @krchristie @ValWood @RLovering

It turns out the terms transcription of transcript x by RNA polymerase (I,II,III) are not widely used:

GO ID Label EXP
GO:0042792 rRNA transcription from mitochondrial promoter 0
GO:0042794 rRNA transcription from plastid promoter 1
GO:0042790 transcription of nuclear large rRNA transcript from RNA polymerase I promoter 44
GO:0042789 mRNA transcription from RNA polymerase II promoter 23
GO:0061614 pri-miRNA transcription from RNA polymerase II promoter 10
GO:0001015 snoRNA transcription from an RNA polymerase II promoter 6
GO:0042795 snRNA transcription from RNA polymerase II promoter 22
GO:0097394 telomeric repeat-containing RNA transcription from RNA pol II promoter 2
GO:0042796 snRNA transcription from RNA polymerase III promoter 14
GO:0042797 tRNA transcription from RNA polymerase III promoter 31
GO:1901837 negative regulation of transcription of nuclear large rRNA transcript from RNA polymerase I promoter 7
GO:1901836 regulation of transcription of nuclear large rRNA transcript from RNA polymerase I promoter 2
GO:1901838 positive regulation of transcription of nuclear large rRNA transcript from RNA polymerase I promoter 15
GO:1905380 regulation of snRNA transcription from RNA polymerase II promoter 0
GO:1905381 negative regulation of snRNA transcription from RNA polymerase II promoter- 0
GO:1905382 positive regulation of snRNA transcription from RNA polymerase II promoter 0
GO:0060962 regulation of ribosomal protein gene transcription from RNA polymerase II promoter 6
GO:0010688 negative regulation of ribosomal protein gene transcription from RNA polymerase II promoter 4
GO:0060963 positive regulation of ribosomal protein gene transcription from RNA polymerase II promoter 16
GO:1902893 regulation of pri-miRNA transcription from RNA polymerase II promoter 3
GO:1902894 negative regulation of pri-miRNA transcription from RNA polymerase II promoter 18
GO:1902895 positive regulation of pri-miRNA transcription from RNA polymerase II promoter 32
GO:0010689 negative regulation of ribosomal protein gene transcription from RNA polymerase II promoter in response to nutrient levels 2
GO:0010691 negative regulation of ribosomal protein gene transcription from RNA polymerase II promoter in response to chemical stimulus 1
GO:0010690 negative regulation of ribosomal protein gene transcription from RNA polymerase II promoter in response to stress 0

I propose to merge these with their respective parent, from the following list:

GO ID label
GO:0001121 transcription from bacterial-type RNA polymerase promoter
GO:0006390 transcription from mitochondrial promoter
GO:0042793 transcription from plastid promoter
GO:0006360 transcription from RNA polymerase I promoter
GO:0006366 transcription from RNA polymerase II promoter
GO:0006383 transcription from RNA polymerase III promoter
GO:0001059 transcription from RNA polymerase IV promoter
GO:0001060 transcription from RNA polymerase V promoter

**REGULATION **

GO ID label
GO:1903108 regulation of transcription from mitochondrial promoter
GO:1903109 positive regulation of transcription from mitochondrial promoter
GO:0006356 regulation of transcription from RNA polymerase I promoter
GO:0016479 negative regulation of transcription from RNA polymerase I promoter
GO:2001208 negative regulation of transcription elongation from RNA polymerase I promoter
GO:0045943 positive regulation of transcription from RNA polymerase I promoter
GO:0006357 regulation of transcription from RNA polymerase II promoter
GO:0000122 negative regulation of transcription from RNA polymerase II promoter
GO:0045944 positive regulation of transcription from RNA polymerase II promoter
GO:0006359 regulation of transcription from RNA polymerase III promoter
GO:0016480 negative regulation of transcription from RNA polymerase III promoter
GO:0045945 positive regulation of transcription from RNA polymerase III promoter
GO:1904279 regulation of transcription from RNA polymerase V promoter
GO:1904280 negative regulation of transcription from RNA polymerase V promoter
GO:1904281 positive regulation of transcription from RNA polymerase V promoter

OK ?

Thanks, Pascale

@ValWood
Copy link
Contributor

ValWood commented Jan 31, 2018

I have the info I require, happy for these to merge
pombase/curation#1855

@RLovering
Copy link

I am not happy with this. I think for the GTFs and RNA polymerases it is appropriate to annotate to 'transcription from RNA polymerase 'subtype' promoter'. But I don't get a sense of the bigger picture here. What BP terms will be associated with dbTFs? Are you suggesting that the only BP term for a dbTF will be positive regulation of transcription from RNA polymerase II promoter and then the inclusion in the AE field of the mRNAs whose level of expression is regulated , along with the cell/tissue type information?

I find it amazing that with an agreement that terms such as 'regulation of tubulin deacetylation' but we are going to group together the proteins involved in the regulation of transcription of many different classes of RNAs based on the type of RNA polymerase. When 99% of the time the expts we are annotating are specifically measuring the mRNA/tRNA/pri-miRNA and not even mentioning what RNA polymerase is being used to transcribe.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC111039/ describes the many similarities in the RNA polymerase I/II/III types of transcription, but does note differences. eg eukaryotic nuclear RNA polymerases are complex enzymes, made up of 12 or more subunits (7). Five of these are gene products shared by all three enzymes (ABC10α, ABC10β, ABC14.5, ABC23 and ABC27 in S.cerevisiae). In addition, pol I and pol III share two subunits (AC19 and AC40 in S.cerevisiae) (8) that are not found in pol II, although the B12.5 and B44 pol II subunits are functionally equivalent, respectively (9–11).

There are many processes in GO which are catagorised by the end product rather than the participating enzymes eg glucose metabolism (and most metabolic pathways), hepatocyte differentiation and most developmental pathways,

I think people will want to know which proteins regulate rRNA transcription and which regulate miRNA expression. Research in this area is very new but, for example http://europepmc.org/abstract/MED/25569094 suggests that TERT may play a role in regulating transcription of microRNAs but not in regulating mRNAs. I think it is probably too early to make that call, but we are not going to have transcription terms specific for each transcribed gene product and so to end up with no separation of the transcripts at all just seems very odd.

I think I am probably missing a ticket somewhere that shows how the type of RNA transcribed is captured, if so ignore my rant and point me to it and then I will try to work out what impact this decision is going to make on annotation.

But if this is the plan and reason for merging all these terms is due to a lack of annotation then I would like to point out that if there had been a better ontology structure which enabled curators to find mRNA transcription specific terms these would have been used. Before making this decision I would be very grateful for an opportunity to ask users and people generating transcription relevant data what terms they would expect to see here and how they would like to see their data annotated.

I am not sure how to create a questionnaire to ask this because I am not sure what the polymerase specific terms are going to be, maybe there are some ideas about what the general plan for the names of terms is?

Ruth

@pgaudet
Copy link
Contributor Author

pgaudet commented Jan 31, 2018

Hi @RLovering
If you look at the annotation stats, these terms are not very much used at all (despite the fact that they may be interesting biologically!) Perhaps we can talk about this tomorrow?

@RLovering
Copy link

HI Pascale
I have looked at the stats, but I don't think this is a good enough reason to effectively delete these terms.
Currently we have
GO:0009299 mRNA transcription

is_a GO:0042789 | mRNA transcription from RNA polymerase II promoter

GO:0042789 | mRNA transcription from RNA polymerase II promoter is not currently a child of
GO:0006366 | transcription from RNA polymerase II promoter
perhaps if it had been it would have been used more? And would have been a better additional parent for the majority of terms under GO:0006366 | transcription from RNA polymerase II promoter.

Transcribing some of the different types of RNAs may require different processes, due to their length, repetitiveness, high expression etc.

Also for some of the RNAs there may be complications with the IDs (due to duplicate sequences) or the IDs may not yet be present.

If annotations are going to be transferred to other orthologs how is the AE data listing the regulated RNA going to be transferred to these other genes?

Ruth

@ValWood
Copy link
Contributor

ValWood commented Jan 31, 2018

You can use SO terms for RNA type or promoter in extensions if required?

here is an example
https://www.pombase.org/gene/SPBC1778.02

negative regulation of telomeric RNA transcription from RNA pol II promoter
  | regulates TERRA, ARRET, ARIA, anti_ARRET

this would be just fine without telomeric in the term name

and there is much more information here, we know exactly what we were referring to
(we don't need to name individuals we can put miRNA, snRNA etc in the extension)

Would that work?

@RLovering
Copy link

Just seen ticket
#9617
which proposes: GO:0042790 | transcription of nucleolar large rRNA by RNA polymerase I rename 'nucleolar large rRNA transcription by RNA polymerase I'

and yet in the discussion above it is proposed:

GO:0042790 | transcription of nuclear large rRNA transcript from RNA polymerase I promoter (note the typo nuclear should be nucleolar)
will be merged with
GO:0006360 | transcription from RNA polymerase I promoter

So I agree with ticket 9617 but don't agree with the merge proposed here.

How can we work like this? I just don't know what you are planning to keep or delete or merge

Ruth

@ValWood
Copy link
Contributor

ValWood commented Jan 31, 2018

To me the term GO:0042790 does not make any sense biologically. the rRNA large subunit transcript is not transcribed indepedently of other rDNA, it is all transcribed tandemly and then processed.

https://en.wikipedia.org/wiki/Ribosomal_DNA

There is no specific process for the large subunit different from
"transcription from RNA polymerase I promoter"

all the things annotated appear to be involved in transcription from pol I promoter generally....

@RLovering
Copy link

I'm still confused. Looking at the first ticket I think what Pascale was proposing, I would be happy with this:
GO:0006366 transcription by RNA polymerase II
NO is_a transcription child terms
GO:0042795 snRNA transcription by RNA polymerase II (merged with GO:0006366)
GO:0042789 mRNA transcription by RNA polymerase II (merged with GO:0006366)
GO:0001015 snoRNA transcription by RNA polymerase II (merged with GO:0006366)
GO:0061614 pri-miRNA transcription by RNA polymerase II (merged with GO:0006366)

Keep some of the part_of terms such as
GO:0001111 promoter clearance from RNA polymerase II promoter

Keep regulation terms
GO:0006357 regulation of transcription by RNA polymerase II

GO:1905380 regulation of snRNA transcription by RNA polymerase II
GO:0060962 regulation of ribosomal protein gene transcription by RNA polymerase II (I think if you define this term well 'gene' could go, maybe use hyphen ribosomal-protein)
GO:1902893 regulation of pri-miRNA transcription by RNA polymerase II
new term regulation of mRNA transcription by RNA polymerase II

However GO:0006357 regulation of transcription by RNA polymerase II is not a great name, the regulation is not 'BY' RNA pol II. But how to rename to be correct and to work for child terms?: regulation of snRNA transcription, RNA polymerase II mediated? regulation of RNA polymerase II snRNA transcription, regulation of RNA polymerase II transcription of snRNA, regulation of RNA polymerase II-transcription of snRNA.

But I may be just being optimistic

Ruth

@RLovering
Copy link

Hi Val

wrt your suggestion, (we don't need to name individuals we can put miRNA, snRNA etc in the extension) this would mean that every time we make an annotation we would need to add not only the ID for the transcript regulated (which will be difficult for some) but also the individual type of RNA so that the information can be transferred to othologs, ie each time we would have to add either miRNA, snRNA etc in the extension. There are a limited number of RNA types and I would have thought it more efficient to just have those types rather than adding this information in every AE field.

Plus not every group is using AE or Noctua

However, this also makes me wonder whether the 'transcription' terms should also have specific RNA types to reduce curator time.

@krchristie
Copy link
Contributor

@ValWood

To me the term GO:0042790 does not make any sense biologically. the rRNA large subunit transcript is not transcribed indepedently of other rDNA, it is all transcribed tandemly and then processed.

This term "transcription of nuclear large rRNA transcript from RNA polymerase I promoter" (GO:0042790) is/was not about transcribing the large SUBUNIT rRNA; it is about transcribing the large rRNA TRANSCRIPT, which contains multiple rRNAs (exact composition has some species variability but in mammals and cerevisiae (with slight size differences from mammals) it contains 3 final rRNAs, one of which is in the small subunit (18S) and two of which are in the large (26S and 5.8S). In contrast, the 5S rRNA is interspersed between copies of the large rRNA transcript and is transcribed by RNAP III, in the opposite direction if I remember correctly.

@pgaudet - I find the new term name "transcription of nucleolar large rRNA by RNA polymerase I" kind of bizarre. I don't think I've ever heard this rRNA called the "nucleolar large rRNA". Is this name used in the literature? Honestly, I think this name is rather confusing, and not an improvement over the original name.

@krchristie
Copy link
Contributor

Regarding Ruth's comment:

I think people will want to know which proteins regulate rRNA transcription and which regulate miRNA expression. Research in this area is very new but, for example http://europepmc.org/abstract/MED/25569094 suggests that TERT may play a role in regulating transcription of microRNAs but not in regulating mRNAs. I think it is probably too early to make that call, but we are not going to have transcription terms specific for each transcribed gene product and so to end up with no separation of the transcripts at all just seems very odd.

I see your point @RLovering that the paper you cite is making a case for "regulation of miRNA expression" (though it's not clear to me that this paper goes far enough to show that this effect occurs via regulation of transcription versus regulation of processing). However, I don't know that a similar case can be made for a generic "regulation of mRNA transcription". It just does not seem like there is a generic process for "mRNA transcription", but rather lots and lots of different sets of TFs which allow specific regulation of genes, from individual genes to small sets of genes to large sets of genes which are co-regulated together. Do people really want everything involved in generic "mRNA transcription"? It seems to me that people are generally more interested in which things regulate transcription of specific mRNAs of interest. As Val already pointed out, indicating the specific gene targets can already be indicated in the extensions.

It seems to me that it would be helpful to curators if we had fewer GO terms. It seems we could help curators find the appropriate RNA polymerase 'n' terms if we added synonyms using the RNA "type", e.g. mRNA, miRNA, etc, to help guide them.

@pgaudet
Copy link
Contributor Author

pgaudet commented Feb 1, 2018

I agree that this proposal was conflating multiple things incorrectly. See new proposal: #14998 

Not implemented

@pgaudet
Copy link
Contributor Author

pgaudet commented Feb 1, 2018

new proposal coming soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants