Switch branches/tags
Nothing to show
Find file History
Pull request Compare This branch is 1128 commits behind chapmanb:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.



Python scripts and modules for automated next gen sequencing analysis. These provide a fully automated pipeline for taking sequencing results from an Illumina sequencer, converting them to standard Fastq format, aligning to a reference genome, doing SNP calling, and producing a summary PDF of results.

The scripts are tightly integrated with the Galaxy web-based analysis tool. Samples are entered and tracked through a LIMS system and processed results are uploading into Galaxy Data Libraries for researcher access and additional analysis. Our clone of Galaxy tracks the main development work adding an intuitive interface for sample management on top of the existing functionality.

Code structure

Two main scripts drive the automation of the process:

  • scripts/illumina_finished_msg.py -- Sits on a machine where sequencing runs are dumped. It checks for new results, reporting to a RabbitMQ messaging queue whenever a new run is finished.
  • scripts/analyze_finished_sqn.py -- Continuously running server script on the Galaxy analysis machine. When new results are reported in the messaging queue, this copies over the relevant files and kicks off an automated analysis.

The scripts involved in the actual processing:

  • scripts/automated_initial_analysis.py -- Drives the high level analysis of sequencing lanes based on information specified through the Galaxy LIMS system
  • scripts/upload_to_galaxy.py -- Handles storing and uploading Fastq, alignment, analysis and summary files to Galaxy.
  • scripts/align_summary_report.py -- Produces a PDF summary file with statistics on alignments, duplicates, GC distribution, quality scores, and other metrics of interest.

System specific information is specified in YAML configuration files:

  • config/post_process.yaml -- The main configuration file containing Galaxy details, program commandlines and customization for processing algorithms.
  • config/transfer_info.yaml -- Configuration on the sequencing machine, specifying where to check for new results.


Next gen analysis

Processing infrastructure

  • RabbitMQ
  • LaTeX -- pdflatex
  • R with ggplot2 and sqldf
  • ps2pdf

Python modules