-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parsing from sRNAbench data (II) #60
Comments
Thanks a lot for this. When I was implementing this, I realized some of the variants cannot be parsed to GFF. I can take a look into that since it is very important. Can you paste the information mirtop print while running the conversion from sRNAbench to GFF? |
Hi there Lorena, We think that these variants must be complicated to add them into any previous given category, but it could be appropiate to include them in the gff, even with a common name such as "non classified". If done so, then, you would be able to sum all counts, at the level of variant, canonical or isomir, and it will be the same total number. Also, whenever you want you can always change the category of this variant or better classify them among others. Here is the information generated during the creation of gff by mirtop. This is stored in run.log
I can read in the log that there are MV variants (1829) that could not be included but this number neither is the same as stated before (2535) that I can count from sRNAbench microRNAannotation.txt file. I guess, there is something else missing here. Thank you very much in advance |
Hi there, I have just realized that I made mistake in the previous comment. The number reported in this run.log output when generating gff, the 1829 reads with MV variants must be single entries. I mean that each entry can have multiple reads mapping to a given miRNA with a variant type. This 1829 number is the result of parsing the sRNAbench result for all the miRNA identified in this sample and condition. This has nothing to do with the 2535 that are read counts misbalanced for a given example miRNA. I have checked the total number of entries (= lines) in microRNAannotation file for this sample containing mv as a variant annotation, including others or single (e.g. mv, mv$lv3p, ...) and it accounts for 1821. (It is neither the same number reported but at least it is very close). PreviousIy I only attached here in this issue, as an example, a couple of miRNA example annotations with a clear misbalance in total sum counts between sRNAbench and miRTop. If you feel like it is necessary I can send you whole files generated by sRNAbench, or at least the whole microRNAannotation and reads.annotation. Thanks |
Hi there,
We (@lsumoy and I) have found an unexpected result from miRTop when parsing sRNAbench data. It is somehow related to the previous issue #53 but not entirely that is we generated a new one.
We came into this because we are working on an implementation of miRTop results to generate a matrix, as it could be useful for DE containing all information regarding canonical, mature, variants and license plate information. As stated before #53 (comment), we might be interested to contribute to the code.
We think the issue shown here must be fixed from other people familiarized with the miRTop code and so we are reporting it.
Expected behavior and actual behavior.
We have found that there is a conflict in the sum count of expression per microRNA isomir when parsed from sRNAbench to miRTop gff.
I have generated some tests and it all concludes that parsing is avoiding to include variant type "mv" (multiple variants) (among others) from microRNAannotation.txt and reads.annotation from sRNAbench. That is generating an imbalance when obtaining total counts:
e.g. hsa-miR-10b-5p
sRNAbench: 61136
miRTop: 58429
Steps to reproduce the problem.
I have included a couple of examples in these files with the example from below and other:
microRNAannotation.txt
reads.annotation.txt
jsanchez@cacau:test$ grep 'hsa-miR-10b-5p' microRNAannotation.txt | awk '{sum += $6} END {print sum}'
61136
jsanchez@cacau:test$ grep 'hsa-miR-10b-5p' microRNAannotation.txt | grep -v 'mv' | awk '{sum += $6} END {print sum}'
58601
I repeated the same command for the different variant types identified for this microRNA and sample:
lv3p: 53452
nta: 3233
mv: 2535
exact: 1391
lv5p: 439
exactNucVar: 84
mlv3p: 2
I can not cleary see what is going one and missing here. I can not reproduce the total sum count so I guess, among not counting mv variants, some others variants might be not included.
I also include here the gff file generated by miRTop.
miRTop.gff.txt
Specifications like the version of the project, operating system, or hardware.
We are running this on:
debian8.10
linux
python2.7
mirtop (0.3.17)
miRBase v22
genome-build-id: GRCh38
genome-build-accession: NCBI_Assembly:GCA_000001405.15
The text was updated successfully, but these errors were encountered: