New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different minimap2 alignment results of different FunAnnotate version #444
Comments
Thanks for reporting -- likely a bug in the master code. I added the |
The BAM file is not empty, but I notice the file size of this file increase by 1byte compared to my previous run of v1.7.2 |
okay, so |
And if you wouldn't mind checking the header for both of them? |
BAM from v1.8.0 has 36947 lines; from v1.7.2 has 37492 lines |
Okay -- that means its not there, its in the processing of the bam to gff3 (alignments). Its likely then a python2/3 issue -- which version of python? |
Python 2.7 |
Hmm, okay. Well when I test it locally I'm not seeing a problem. You can test the bam2gff3 conversion in the utility menu, ie.
I guess I didn't ask if the |
bam2gff3 could generate non-empty gff3 results |
So if the |
Hi, I have the same issue. After checking the codes in I'm not sure if I fixed it correctly , but it works for me. |
@lyl8086 thanks -- I think you are right, those look like leftover errors from using pybam which is now just samtools for compatibility. I appreciate the help. |
Please reopen if after installing master branch and it is still a problem. |
* python3 port * Delete .DS_Store * a few updates * handle signalP 5.0 parsing -- not tested yet * more updates to support signalp 5.0 as well as input results at command line * bump version and fix typo * string format error in signalp * ensure no empty sequences converting from bam to fasta * more py2/3 fixes -- remove pybam in favor of samtools parsing * DBxref to Dbxref in GFF3 * push some @photocyte PR changes to py3 dev branch, untested * small doc fix in argparse usage * try to fix unicode error in setupDB * fix download in test * update download print stmnt * fix typo in predict * fix genemark path call * fix typo in exonerate_pident * try to fix typos again * fix p2g cmd * fix genemark logic if not in path * more messing with genemark stupid * fix iprscan download for docker * remove sam2bam.sh from library map * fix typo in bam2gff3 * add busco_seed_species local directory check before running busco * fix another type in bam parser * remove .lib from library dummy * add back augustus generic to local folder * remove sam2bam.sh calls with subprocess.Popen * add lowercase to snap; start to move downloads to json * load download links from file * add status msg to download json file * fix a few typos in the bam alignment * fix minimap2_cmd typo * global FNULL * fix indent error * troubleshooting compare * troubleshooting compare * troubleshooting compare again * troubleshooting compare yet again * troubleshooting compare yet yet again * troubleshooting compare yet yet again; try loc * pandas bug fix, roll back debug print stmnts * code linting * update menu with p2g options * typo - identidy to identity * support explicit number of threads for tRNAscan-SE runs * support explicit number of threads for tRNAscan-SE runs #411 * force to string for cpu arg * fix trnascan cpus * update dbCAN to v8 * filter cazy subdomains out if exist * update EVM script to try to partition in between putative gene loci * EVM update; PASA updated options; predict simplified busco * fix typos in setup; add interpro fix * incorporate py2 backwards compatibility; fix to interpo mapping terms * updates to resources and typo fix * update pip install instructions, no longer py2 only * update setup menu with --local option * fix --rename option in annotate * add io.open to setup for py2 compatibility * updateTBL for error catching * updateTBL for error catching * updateTBL for error catching 2 * add ncRNA parsing in tbl2dict * test fix for antiSMASH v5 parsing * support for ncRNA in gff output * dynamic output of annotation table for custom headers * unify gb and gff3 dictionary * fix resources COGS bug * dont let smcogs be their own column * write synonyms/alias to tbl file * preference to custom annotations over automated * make sure synonyms are not name and do not repeat * make sure synonyms func update * make sure synonyms func update not working, print to debug * clean func diff logic * clean func diff logic * clean func working, remove debug * fix embarassing typo for cazymes * troubleshoot cazy subdomain reduction * fix cazy parsing * exit on parse error for custom * exit on parse error for custom * fix tbl parser for rRNA features * fix minimap2 bam2hints function, hopefully fix #444 * add alias= parsing for gff2dict * just some formatting changes to code * make sure synonyms are not duplcated in gff3 parsing * do not enumerate gene names if from alt transcripts * do not enumerate gene names if from alt transcripts * add debug print stmnt to naming method * hopfully fixed, forgot a len() * annotation table error to terminal * annotation table error to terminal * annotation table fix for note field * fix gff3 parsing skip all features it cant parse * updates to support strange genbank GFF3 files * fix for gbk parsing to capture misc_RNA as ncRNA * fix typo on pasa DB name * fix pasa function in update for string * same str change in train * try to fix gene number count if atypical locus_tag, #453 * minor update to latest fix * minor update to latest fix * update logging for GO enrichment in compare * better error handling for subprocess calls * fix py3 incompatibility in annotationtable * if --organism=other then make codingquarry default off unless valid --weights passed * specify the pasa conf file in env variable * replace . from names as well in dbname * require distro for linux_distribution info for python 3.8 and beyond * remove merge comments for commandline options in update Co-authored-by: Jon Palmer <nextgenusfs@gmail.com> Co-authored-by: Jason Stajich <jasonstajich.phd@gmail.com> Co-authored-by: Jason Stajich <jason.stajich@ucr.edu>
When I used FunAnnotate v1.7.2, I was able to align transcripts to genome with minimap2, log as
`[06/22/20 12:34:13]: Aligning transcript evidence to genome with minimap2
[06/22/20 12:34:13]: /home/qliang/0.soft/miniconda2/envs/funannotate/lib/python2.7/site-packages/funannotate/aux_scripts/sam2bam.sh minimap2 -ax splice -t 36 --cs -u b -G 3000 /home/qliang/results_minimap/predict_misc/genome.softmasked.fa results_minimap/predict_misc/transcripts.combined.fa 4 results_minimap/predict_misc/transcripts.minimap2.bam
[06/22/20 12:34:18]: [M::mm_idx_gen::0.4261.00] collected minimizers
[M::mm_idx_gen::0.4962.30] sorted minimizers
[M::main::0.4962.30] loaded/built the index for 21 target sequence(s)
[M::mm_mapopt_update::0.5402.19] mid_occ = 160
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 21
[M::mm_idx_stat::0.5722.12] distinct minimizers: 2337210 (92.46% are singletons); average occurrences: 1.407; average spacing: 2.929
[M::worker_pipeline::4.27019.98] mapped 30940 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -ax splice -t 36 --cs -u b -G 3000 /home/qliang/results_minimap/predict_misc/genome.softmasked.fa results_minimap/predict_misc/transcripts.combined.fa
[M::main] Real time: 4.290 sec; CPU: 85.338 sec; Peak RSS: 0.511 GB
[bam_sort_core] merging from 0 files and 4 in-memory blocks...
[06/22/20 12:34:25]: Found 20,273 alignments, wrote GFF3 and Augustus hints to file`
I updated FunAnnotate to v1.8.0 and repeated with same files/settings, but transcripts alignments were different with 0 alignments. Log as
[06/29/20 12:21:03]: Aligning transcript evidence to genome with minimap2 [06/29/20 12:21:03]: minimap2 -ax splice -t 36 --cs -u b -G 3000 /home/qliang/results_minimap/predict_misc/genome.softmasked.fa results_minimap/predict_misc/transcripts.combined.fa | samtools sort --reference /home/qliang/results_minimap/predict_misc/genome.softmasked.fa -@ 4 -o results_minimap/predict_misc/transcripts.minimap2.bam - [06/29/20 12:22:00]: Found 0 alignments, wrote GFF3 and Augustus hints to file
I ran the cmd in log
minimap2 -ax splice -t 36 --cs -u b -G 3000 /home/qliang/results_minimap/predict_misc/genome.softmasked.fa results_minimap/predict_misc/transcripts.combined.fa | samtools sort --reference /home/qliang/results_minimap/predict_misc/genome.softmasked.fa -@ 4 -o results_minimap/predict_misc/transcripts.minimap2.bam -
The output was
[M::mm_idx_gen::0.414*1.00] collected minimizers [M::mm_idx_gen::0.480*2.30] sorted minimizers [M::main::0.480*2.30] loaded/built the index for 21 target sequence(s) [M::mm_mapopt_update::0.520*2.20] mid_occ = 160 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 21 [M::mm_idx_stat::0.548*2.14] distinct minimizers: 2337210 (92.46% are singletons); average occurrences: 1.407; average spacing: 2.929 [M::worker_pipeline::3.805*22.79] mapped 30940 sequences [M::main] Version: 2.17-r941 [M::main] CMD: minimap2 -ax splice -t 36 --cs -u b -G 3000 /home/qliang/results_minimap/predict_misc/genome.softmasked.fa results_minimap/predict_misc/transcripts.combined.fa [M::main] Real time: 3.828 sec; CPU: 86.715 sec; Peak RSS: 0.573 GB [bam_sort_core] merging from 0 files and 4 in-memory blocks...
It seems like there is no explicit error of running minimap2, but it is weird that there is no alignment at all for transcripts2genome. Could you help troubleshooting this? Many thanks!
The text was updated successfully, but these errors were encountered: