All notable changes to this project will be documented here. This file is divided into different groups. The unreleased section contains updates that are currently being processed in the lab. The future enhancement section contains updates that have been requested by users and/or have been discussed in the lab but is not being currently developed.
Here we write upgrading notes for finder
. It's an effort to make them as straightforward as possible.
- Option to provide alignments by
exonerate
tofinder
finder
will now be able to process gzippedfastq
files. But this option will work only if the files are available locallyfinder
performs enhanced memory checks to ensure- Utility program added to convert
gtf
files togff3
including options to generate UTR annotation.README.md
file has been updated accordingly - Included an option to use local RNA-Seq files in the testing data
- Coding Sequence annotation was improved by incorporating valid ORFs
finder
now verifies the maximum length of command and issues a warningpsiclass
developers were requested to modify the program to incorporate options to adjust the length of the end-exons based on RNA-Seq coverage data. The latest version ofpsiclass
is available from conda- Updated
environment.yml
file.psiclass
will now be installed directly fromconda
- Updated
setup.py
file. Removed the part wherepsiclass
was being installed - Option added to skip
braker
run completely - Option added to incorporate reads from PacBio or any other long read technology
- Functionality added to merge genes that are close to one another (separated by only a few nucleotides). Some gene models of
psiclass
have been split despite a continuous coverage - Add options to perform repeat masking of unmasked genome
- Total time for downloading, aligning and assembling all samples provided in
progress.log
file
- Issues with reading the
metadata.csv
file, especially when some fields are blank, have been resolved
- Add option to predict transcription factor binding sites from motif data by incorporating softwares from. MEME suite
- Add option to process different kinds of NGS data like CAGE-Seq, RAMPAGE-Seq, Ribo-Seq etc
- Modify the CPD package. Also recode the parts in C programming language to make things run fast
- Try gene predictors other than
GeneMark
to circumvent license issue - Also change CDS prediction technique
- Simpler alignment strategy - do away with 4 rounds of mapping. This will increase speed considerably.
- Replace OLego with some de-novo strategy of assembling. Olego is currenlty taking way too much time.
- First release of
finder