V1.8 prerelease (#463)

* python3 port * Delete .DS_Store * a few updates * handle signalP 5.0 parsing -- not tested yet * more updates to support signalp 5.0 as well as input results at command line * bump version and fix typo * string format error in signalp * ensure no empty sequences converting from bam to fasta * more py2/3 fixes -- remove pybam in favor of samtools parsing * DBxref to Dbxref in GFF3 * push some @photocyte PR changes to py3 dev branch, untested * small doc fix in argparse usage * try to fix unicode error in setupDB * fix download in test * update download print stmnt * fix typo in predict * fix genemark path call * fix typo in exonerate_pident * try to fix typos again * fix p2g cmd * fix genemark logic if not in path * more messing with genemark stupid * fix iprscan download for docker * remove sam2bam.sh from library map * fix typo in bam2gff3 * add busco_seed_species local directory check before running busco * fix another type in bam parser * remove .lib from library dummy * add back augustus generic to local folder * remove sam2bam.sh calls with subprocess.Popen * add lowercase to snap; start to move downloads to json * load download links from file * add status msg to download json file * fix a few typos in the bam alignment * fix minimap2_cmd typo * global FNULL * fix indent error * troubleshooting compare * troubleshooting compare * troubleshooting compare again * troubleshooting compare yet again * troubleshooting compare yet yet again * troubleshooting compare yet yet again; try loc * pandas bug fix, roll back debug print stmnts * code linting * update menu with p2g options * typo - identidy to identity * support explicit number of threads for tRNAscan-SE runs * support explicit number of threads for tRNAscan-SE runs #411 * force to string for cpu arg * fix trnascan cpus * update dbCAN to v8 * filter cazy subdomains out if exist * update EVM script to try to partition in between putative gene loci * EVM update; PASA updated options; predict simplified busco * fix typos in setup; add interpro fix * incorporate py2 backwards compatibility; fix to interpo mapping terms * updates to resources and typo fix * update pip install instructions, no longer py2 only * update setup menu with --local option * fix --rename option in annotate * add io.open to setup for py2 compatibility * updateTBL for error catching * updateTBL for error catching * updateTBL for error catching 2 * add ncRNA parsing in tbl2dict * test fix for antiSMASH v5 parsing * support for ncRNA in gff output * dynamic output of annotation table for custom headers * unify gb and gff3 dictionary * fix resources COGS bug * dont let smcogs be their own column * write synonyms/alias to tbl file * preference to custom annotations over automated * make sure synonyms are not name and do not repeat * make sure synonyms func update * make sure synonyms func update not working, print to debug * clean func diff logic * clean func diff logic * clean func working, remove debug * fix embarassing typo for cazymes * troubleshoot cazy subdomain reduction * fix cazy parsing * exit on parse error for custom * exit on parse error for custom * fix tbl parser for rRNA features * fix minimap2 bam2hints function, hopefully fix #444 * add alias= parsing for gff2dict * just some formatting changes to code * make sure synonyms are not duplcated in gff3 parsing * do not enumerate gene names if from alt transcripts * do not enumerate gene names if from alt transcripts * add debug print stmnt to naming method * hopfully fixed, forgot a len() * annotation table error to terminal * annotation table error to terminal * annotation table fix for note field * fix gff3 parsing skip all features it cant parse * updates to support strange genbank GFF3 files * fix for gbk parsing to capture misc_RNA as ncRNA * fix typo on pasa DB name * fix pasa function in update for string * same str change in train * try to fix gene number count if atypical locus_tag, #453 * minor update to latest fix * minor update to latest fix * update logging for GO enrichment in compare * better error handling for subprocess calls * fix py3 incompatibility in annotationtable * if --organism=other then make codingquarry default off unless valid --weights passed * specify the pasa conf file in env variable * replace . from names as well in dbname * require distro for linux_distribution info for python 3.8 and beyond * remove merge comments for commandline options in update Co-authored-by: Jon Palmer <nextgenusfs@gmail.com> Co-authored-by: Jason Stajich <jasonstajich.phd@gmail.com> Co-authored-by: Jason Stajich <jason.stajich@ucr.edu>
nextgenusfs · Aug 9, 2020 · d5d21b3 · d5d21b3
1 parent 9e8483b
commit d5d21b3
Show file tree

Hide file tree

Showing 47 changed files with 6,488 additions and 2,576 deletions.
diff --git a/.gitignore b/.gitignore
@@ -4,3 +4,4 @@
 */*.pyc
 dockerbuild/
 sample_data/
+.DS_Store
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -7,4 +7,5 @@ include funannotate/utilities/*
 include funannotate/html_template/*
 include funannotate/html_template/css/*
 include funannotate/html_template/js/*
-include scripts/*
+include scripts/*
+include funannotate/downloads.json
diff --git a/README.md b/README.md
@@ -19,15 +19,15 @@ conda create -n funannotate funannotate
 If you want to use GeneMark-ES/ET you will need to install that manually following developers instructions:
 http://topaz.gatech.edu/GeneMark/license_download.cgi
 
-Note that you will need to change the shebang line for all perl scripts in GeneMark to use `/usr/bin/env perl`. 
+Note that you will need to change the shebang line for all perl scripts in GeneMark to use `/usr/bin/env perl`.
 You will then also need to add `gmes_petap.pl` to the $PATH or set the environmental variable $GENEMARK_PATH to the gmes_petap directory.
 
 To install just the python funannotate package, you can do this with pip:
 ```
-pip install funannotate
+python -m pip install funannotate
 ```
 
 To install the most updated code in master you can run:
 ```
-python2 -m pip install git+https://github.com/nextgenusfs/funannotate.git
+python -m pip install git+https://github.com/nextgenusfs/funannotate.git
 ```
diff --git a/docs/conda.rst b/docs/conda.rst
@@ -54,7 +54,7 @@ I'd really like to build a bioconda installation package, but would need some he
     conda create -y -n funannotate python=2.7 numpy pandas scipy matplotlib seaborn \
         natsort scikit-learn psutil biopython requests blast rmblast goatools fisher \
         bedtools hmmer exonerate diamond>=0.9 tbl2asn ucsc-pslcdnafilter \
-        samtools raxml trimal mafft>=7 iqtree kallisto bowtie2 infernal mummer \
+        samtools raxml trimal mafft>=7 iqtree kallisto>=0.46.0 bowtie2 infernal mummer \
         evidencemodeler  gmap=2017.11.15 hisat2 blat minimap2 snap glimmerhmm  \
         ete3 salmon>=0.9 jellyfish>=2.2 htslib trnascan-se codingquarry \
         trf perl-threaded perl-db-file perl-bioperl perl-dbd-mysql perl-dbd-sqlite \

diff --git a/funannotate/__version__.py b/funannotate/__version__.py
@@ -1,3 +1,3 @@
-VERSION = (1, 7, 4)
+VERSION = (1, 8, 0)
 
-__version__ = '.'.join(map(str, VERSION))
+__version__ = '.'.join(map(str, VERSION))
diff --git a/funannotate/annotate.py b/funannotate/annotate.py
diff --git a/funannotate/aux_scripts/augustus_parallel.py b/funannotate/aux_scripts/augustus_parallel.py
@@ -16,7 +16,8 @@ def __init__(self, prog):
         super(MyFormatter, self).__init__(prog, max_help_position=48)
 
 
-parser = argparse.ArgumentParser(prog='augustus_parallel.py', usage="%(prog)s [options] -i genome.fasta -s botrytis_cinera -o prediction_output_base",
+parser = argparse.ArgumentParser(prog='augustus_parallel.py',
+                                 usage="%(prog)s [options] -i genome.fasta -s botrytis_cinera -o prediction_output_base",
                                  description='''Script runs augustus in parallel to use multiple processors''',
                                  epilog="""Written by Jon Palmer (2016) nextgenusfs@gmail.com""",
                                  formatter_class=MyFormatter)
@@ -31,8 +32,8 @@ def __init__(self, prog):
                     help='Number of CPUs to run')
 parser.add_argument('-v', '--debug', action='store_true',
                     help='Keep intermediate files')
-parser.add_argument('--logfile', default='augustus-parallel.log', 
-					help='logfile')
+parser.add_argument('--logfile', default='augustus-parallel.log',
+                    help='logfile')
 parser.add_argument('--local_augustus')
 parser.add_argument('--AUGUSTUS_CONFIG_PATH')
 parser.add_argument('-e', '--extrinsic', help='augustus extrinsic file')
@@ -67,7 +68,7 @@ def __init__(self, prog):
 
 def countGFFgenes(input):
     count = 0
-    with open(input, 'rU') as f:
+    with open(input, 'r') as f:
         for line in f:
             if "\tgene\t" in line:
                 count += 1
@@ -116,13 +117,13 @@ def runAugustus(Input):
 scaffolds = []
 global ranges
 ranges = {}
-with open(args.input, 'rU') as InputFasta:
+with open(args.input, 'r') as InputFasta:
     for record in SeqIO.parse(InputFasta, 'fasta'):
         contiglength = len(record.seq)
         if contiglength > 500000:  # split large contigs
             num_parts = contiglength / 500000 + 1
             chunks = contiglength / num_parts
-            for i in range(0, num_parts):
+            for i in range(0, int(num_parts)):
                 name = str(record.id)+'_part'+str(i+1)
                 scaffolds.append(name)
                 outputfile = os.path.join(tmpdir, str(record.id)+'.fa')
@@ -174,7 +175,7 @@ def runAugustus(Input):
 lib.log.debug(cmd)
 
 with open(args.out, 'w') as finalout:
-    with open(os.path.join(tmpdir, 'augustus_all.gff3'), 'rU') as infile:
+    with open(os.path.join(tmpdir, 'augustus_all.gff3'), 'r') as infile:
         subprocess.call([join_script], stdin=infile, stdout=finalout)
 
 if not args.debug:

diff --git a/funannotate/aux_scripts/enrichment_parallel.py b/funannotate/aux_scripts/enrichment_parallel.py
@@ -10,19 +10,24 @@
 def runGOenrichment(input):
     basename = os.path.basename(input).replace('.txt', '')
     goa_out = os.path.join(args.out, basename+'.go.enrichment.txt')
+    go_log = os.path.join(args.out, basename+'.go.enrichment.log')
     if not lib.checkannotations(goa_out):
         cmd = ['find_enrichment.py', '--obo', os.path.join(FUNDB, 'go.obo'),
-               '--pval', '0.001', '--alpha', '0.001', '--method', 'fdr', '--outfile', goa_out,
-               input, os.path.join(args.input, 'population.txt'), os.path.join(args.input, 'associations.txt')]
-        subprocess.call(cmd, stdout=FNULL, stderr=FNULL)
+               '--pval', '0.001', '--alpha', '0.001', '--method', 'fdr',
+               '--outfile', goa_out, input, os.path.join(args.input, 'population.txt'),
+               os.path.join(args.input, 'associations.txt')]
+        with open(go_log, 'w') as outfile:
+            outfile.write('{}\n'.format(' '.join(cmd)))
+        with open(go_log, 'a') as outfile:
+            subprocess.call(cmd, stdout=outfile, stderr=outfile)
 
 
 def GO_safe_run(*args, **kwargs):
     """Call run(), catch exceptions."""
     try:
         runGOenrichment(*args, **kwargs)
     except Exception as e:
-        print("error: %s run(*%r, **%r)" % (e, args, kwargs))
+        print(("error: %s run(*%r, **%r)" % (e, args, kwargs)))
 
 # setup menu with argparse
 
@@ -57,7 +62,10 @@ def __init__(self, prog):
     if f.startswith('population'):
         continue
     file = os.path.join(args.input, f)
-    file_list.append(file)
+    if lib.checkannotations(file):
+        file_list.append(file)
+    else:
+        print('  WARNING: skipping {} as no GO terms'.format(f))
 
 # run over multiple CPUs
 if len(file_list) > args.cpus: