Skip to content

Commit

Permalink
V1.8 prerelease (#463)
Browse files Browse the repository at this point in the history
* python3 port

* Delete .DS_Store

* a few updates

* handle signalP 5.0 parsing -- not tested yet

* more updates to support signalp 5.0 as well as input results at command line

* bump version and fix typo

* string format error in signalp

* ensure no empty sequences converting from bam to fasta

* more py2/3 fixes -- remove pybam in favor of samtools parsing

* DBxref to Dbxref in GFF3

* push some @photocyte PR changes to py3 dev branch, untested

* small doc fix in argparse usage

* try to fix unicode error in setupDB

* fix download in test

* update download print stmnt

* fix typo in predict

* fix genemark path call

* fix typo in exonerate_pident

* try to fix typos again

* fix p2g cmd

* fix genemark logic if not in path

* more messing with genemark stupid

* fix iprscan download for docker

* remove sam2bam.sh from library map

* fix typo in bam2gff3

* add busco_seed_species local directory check before running busco

* fix another type in bam parser

* remove .lib from library dummy

* add back augustus generic to local folder

* remove sam2bam.sh calls with subprocess.Popen

* add lowercase to snap; start to move downloads to json

* load download links from file

* add status msg to download json file

* fix a few typos in the bam alignment

* fix minimap2_cmd typo

* global FNULL

* fix indent error

* troubleshooting compare

* troubleshooting compare

* troubleshooting compare again

* troubleshooting compare yet again

* troubleshooting compare yet yet again

* troubleshooting compare yet yet again; try loc

* pandas bug fix, roll back debug print stmnts

* code linting

* update menu with p2g options

* typo - identidy to identity

* support explicit number of threads for tRNAscan-SE runs

* support explicit number of threads for tRNAscan-SE runs #411

* force to string for cpu arg

* fix trnascan cpus

* update dbCAN to v8

* filter cazy subdomains out if exist

* update EVM script to try to partition in between putative gene loci

* EVM update; PASA updated options; predict simplified busco

* fix typos in setup; add interpro fix

* incorporate py2 backwards compatibility; fix to interpo mapping terms

* updates to resources and typo fix

* update pip install instructions, no longer py2 only

* update setup menu with --local option

* fix --rename option in annotate

* add io.open to setup for py2 compatibility

* updateTBL for error catching

* updateTBL for error catching

* updateTBL for error catching 2

* add ncRNA parsing in tbl2dict

* test fix for antiSMASH v5 parsing

* support for ncRNA in gff output

* dynamic output of annotation table for custom headers

* unify gb and gff3 dictionary

* fix resources COGS bug

* dont let smcogs be their own column

* write synonyms/alias to tbl file

* preference to custom annotations over automated

* make sure synonyms are not name and do not repeat

* make sure synonyms func update

* make sure synonyms func update not working, print to debug

* clean func diff logic

* clean func diff logic

* clean func working, remove debug

* fix embarassing typo for cazymes

* troubleshoot cazy subdomain reduction

* fix cazy parsing

* exit on parse error for custom

* exit on parse error for custom

* fix tbl parser for rRNA features

* fix minimap2 bam2hints function, hopefully fix #444

* add alias= parsing for gff2dict

* just some formatting changes to code

* make sure synonyms are not duplcated in gff3 parsing

* do not enumerate gene names if from alt transcripts

* do not enumerate gene names if from alt transcripts

* add debug print stmnt to naming method

* hopfully fixed, forgot a len()

* annotation table error to terminal

* annotation table error to terminal

* annotation table fix for note field

* fix gff3 parsing skip all features it cant parse

* updates to support strange genbank GFF3 files

* fix for gbk parsing to capture misc_RNA as ncRNA

* fix typo on pasa DB name

* fix pasa function in update for string

* same str change in train

* try to fix gene number count if atypical locus_tag, #453

* minor update to latest fix

* minor update to latest fix

* update logging for GO enrichment in compare

* better error handling for subprocess calls

* fix py3 incompatibility in annotationtable

* if --organism=other then make codingquarry default off unless valid --weights passed

* specify the pasa conf file in env variable

* replace . from names as well in dbname

* require distro for linux_distribution info for python 3.8 and beyond

* remove merge comments for commandline options in update

Co-authored-by: Jon Palmer <nextgenusfs@gmail.com>
Co-authored-by: Jason Stajich <jasonstajich.phd@gmail.com>
Co-authored-by: Jason Stajich <jason.stajich@ucr.edu>
  • Loading branch information
4 people committed Aug 9, 2020
1 parent 9e8483b commit d5d21b3
Show file tree
Hide file tree
Showing 47 changed files with 6,488 additions and 2,576 deletions.
1 change: 1 addition & 0 deletions .gitignore
Expand Up @@ -4,3 +4,4 @@
*/*.pyc
dockerbuild/
sample_data/
.DS_Store
3 changes: 2 additions & 1 deletion MANIFEST.in
Expand Up @@ -7,4 +7,5 @@ include funannotate/utilities/*
include funannotate/html_template/*
include funannotate/html_template/css/*
include funannotate/html_template/js/*
include scripts/*
include scripts/*
include funannotate/downloads.json
6 changes: 3 additions & 3 deletions README.md
Expand Up @@ -19,15 +19,15 @@ conda create -n funannotate funannotate
If you want to use GeneMark-ES/ET you will need to install that manually following developers instructions:
http://topaz.gatech.edu/GeneMark/license_download.cgi

Note that you will need to change the shebang line for all perl scripts in GeneMark to use `/usr/bin/env perl`.
Note that you will need to change the shebang line for all perl scripts in GeneMark to use `/usr/bin/env perl`.
You will then also need to add `gmes_petap.pl` to the $PATH or set the environmental variable $GENEMARK_PATH to the gmes_petap directory.

To install just the python funannotate package, you can do this with pip:
```
pip install funannotate
python -m pip install funannotate
```

To install the most updated code in master you can run:
```
python2 -m pip install git+https://github.com/nextgenusfs/funannotate.git
python -m pip install git+https://github.com/nextgenusfs/funannotate.git
```
2 changes: 1 addition & 1 deletion docs/conda.rst
Expand Up @@ -54,7 +54,7 @@ I'd really like to build a bioconda installation package, but would need some he
conda create -y -n funannotate python=2.7 numpy pandas scipy matplotlib seaborn \
natsort scikit-learn psutil biopython requests blast rmblast goatools fisher \
bedtools hmmer exonerate diamond>=0.9 tbl2asn ucsc-pslcdnafilter \
samtools raxml trimal mafft>=7 iqtree kallisto bowtie2 infernal mummer \
samtools raxml trimal mafft>=7 iqtree kallisto>=0.46.0 bowtie2 infernal mummer \
evidencemodeler gmap=2017.11.15 hisat2 blat minimap2 snap glimmerhmm \
ete3 salmon>=0.9 jellyfish>=2.2 htslib trnascan-se codingquarry \
trf perl-threaded perl-db-file perl-bioperl perl-dbd-mysql perl-dbd-sqlite \
Expand Down
4 changes: 2 additions & 2 deletions funannotate/__version__.py
@@ -1,3 +1,3 @@
VERSION = (1, 7, 4)
VERSION = (1, 8, 0)

__version__ = '.'.join(map(str, VERSION))
__version__ = '.'.join(map(str, VERSION))
198 changes: 131 additions & 67 deletions funannotate/annotate.py

Large diffs are not rendered by default.

15 changes: 8 additions & 7 deletions funannotate/aux_scripts/augustus_parallel.py
Expand Up @@ -16,7 +16,8 @@ def __init__(self, prog):
super(MyFormatter, self).__init__(prog, max_help_position=48)


parser = argparse.ArgumentParser(prog='augustus_parallel.py', usage="%(prog)s [options] -i genome.fasta -s botrytis_cinera -o prediction_output_base",
parser = argparse.ArgumentParser(prog='augustus_parallel.py',
usage="%(prog)s [options] -i genome.fasta -s botrytis_cinera -o prediction_output_base",
description='''Script runs augustus in parallel to use multiple processors''',
epilog="""Written by Jon Palmer (2016) nextgenusfs@gmail.com""",
formatter_class=MyFormatter)
Expand All @@ -31,8 +32,8 @@ def __init__(self, prog):
help='Number of CPUs to run')
parser.add_argument('-v', '--debug', action='store_true',
help='Keep intermediate files')
parser.add_argument('--logfile', default='augustus-parallel.log',
help='logfile')
parser.add_argument('--logfile', default='augustus-parallel.log',
help='logfile')
parser.add_argument('--local_augustus')
parser.add_argument('--AUGUSTUS_CONFIG_PATH')
parser.add_argument('-e', '--extrinsic', help='augustus extrinsic file')
Expand Down Expand Up @@ -67,7 +68,7 @@ def __init__(self, prog):

def countGFFgenes(input):
count = 0
with open(input, 'rU') as f:
with open(input, 'r') as f:
for line in f:
if "\tgene\t" in line:
count += 1
Expand Down Expand Up @@ -116,13 +117,13 @@ def runAugustus(Input):
scaffolds = []
global ranges
ranges = {}
with open(args.input, 'rU') as InputFasta:
with open(args.input, 'r') as InputFasta:
for record in SeqIO.parse(InputFasta, 'fasta'):
contiglength = len(record.seq)
if contiglength > 500000: # split large contigs
num_parts = contiglength / 500000 + 1
chunks = contiglength / num_parts
for i in range(0, num_parts):
for i in range(0, int(num_parts)):
name = str(record.id)+'_part'+str(i+1)
scaffolds.append(name)
outputfile = os.path.join(tmpdir, str(record.id)+'.fa')
Expand Down Expand Up @@ -174,7 +175,7 @@ def runAugustus(Input):
lib.log.debug(cmd)

with open(args.out, 'w') as finalout:
with open(os.path.join(tmpdir, 'augustus_all.gff3'), 'rU') as infile:
with open(os.path.join(tmpdir, 'augustus_all.gff3'), 'r') as infile:
subprocess.call([join_script], stdin=infile, stdout=finalout)

if not args.debug:
Expand Down
18 changes: 13 additions & 5 deletions funannotate/aux_scripts/enrichment_parallel.py
Expand Up @@ -10,19 +10,24 @@
def runGOenrichment(input):
basename = os.path.basename(input).replace('.txt', '')
goa_out = os.path.join(args.out, basename+'.go.enrichment.txt')
go_log = os.path.join(args.out, basename+'.go.enrichment.log')
if not lib.checkannotations(goa_out):
cmd = ['find_enrichment.py', '--obo', os.path.join(FUNDB, 'go.obo'),
'--pval', '0.001', '--alpha', '0.001', '--method', 'fdr', '--outfile', goa_out,
input, os.path.join(args.input, 'population.txt'), os.path.join(args.input, 'associations.txt')]
subprocess.call(cmd, stdout=FNULL, stderr=FNULL)
'--pval', '0.001', '--alpha', '0.001', '--method', 'fdr',
'--outfile', goa_out, input, os.path.join(args.input, 'population.txt'),
os.path.join(args.input, 'associations.txt')]
with open(go_log, 'w') as outfile:
outfile.write('{}\n'.format(' '.join(cmd)))
with open(go_log, 'a') as outfile:
subprocess.call(cmd, stdout=outfile, stderr=outfile)


def GO_safe_run(*args, **kwargs):
"""Call run(), catch exceptions."""
try:
runGOenrichment(*args, **kwargs)
except Exception as e:
print("error: %s run(*%r, **%r)" % (e, args, kwargs))
print(("error: %s run(*%r, **%r)" % (e, args, kwargs)))

# setup menu with argparse

Expand Down Expand Up @@ -57,7 +62,10 @@ def __init__(self, prog):
if f.startswith('population'):
continue
file = os.path.join(args.input, f)
file_list.append(file)
if lib.checkannotations(file):
file_list.append(file)
else:
print(' WARNING: skipping {} as no GO terms'.format(f))

# run over multiple CPUs
if len(file_list) > args.cpus:
Expand Down

0 comments on commit d5d21b3

Please sign in to comment.