- Rail-RNA no longer exits with an error if the output bucket is not owned by the user.
- Access to a bucket's lifecycle is no longer required to execute a job flow on EMR.
- Rail-RNA now automatically unpacks sample FASTQs in TAR archives to accommodate TCGA raw data.
- Rail-RNA no longer crashes for paired-end samples if it encounters a non-ATCG character an input sequence. All non-ATCGs are interpreted as Ns.
- Some issues with writing to user-specified scratch directories have been resolved.
- On EMR, if connections to S3 had a high fail rate, the job flow would complete, but some output files would sometimes be missing. Now tasks are failed if this occurs.
- Does not use LZO index in some steps on Elastic MapReduce because opening all the LZO indexes does not scale, can take a while, and thus can slow down an alignment job flow.
- Intermediate LZOs are now always indexed on Elastic MapReduce to improve load balance when a step is ingesting data from S3.
- Hanging tasks during the first alignment step on Elastic MapReduce now use more processing cores so they finish faster.
- Rail-RNA now runs on Bowtie v1.1.2 and Bowtie2 v2.2.7.
- Reattempting preprocess tasks works again.
- The task timeout on Elastic MapReduce has been increased from 10 minutes to 30 minutes so not every task attempt that's taking a while to reduce when writing bigWigs for large samples will fail.
- Speculative execution on Elastic MapReduce is now by default. This eliminates the very unlikely scenario where a killed task attempt's incomplete output is preferred to that of a succeeded task attempt.
- Rail-RNA now correctly identifies reads with long poly-A tails and leaves them unmapped; this can greatly speed up the first alignment step when running on many samples. Previously, reads with long poly-A tails were not being recognized.
- A new command line option
--ignore-missing-sra-samplesdoes not fail preprocess job flows if an SRA run accession number is not found by
fastq-dump; instead, the sample is treated as missing.