Skip to content
/ SLAEP Public

Small Long-read Assembler for Educational Purposes.

License

Notifications You must be signed in to change notification settings

jlalisan/SLAEP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

An Inside Look into a Genome Assembler

The Small Long-read Assembler for Educational Purposes (SLAEP) is a versatile assembler that can assemble minor data sets into contigs for genome assembly and visualize the data within the PacBio MinION data set. It is written in Python 3.8 and documented, making it an excellent tool for teaching students about genome assembly and the inner workings of an assembler.

Repository Overview

Scripts

  • Assembly: Contains the scripts for genome assembly. The main.py script calls upon these for the assembly process.
  • Visualization: A script for visualizing the FastQ data set, aiding in the understanding of the data.

Separate Attempts

  • de_bruijn_graph.py: An attempt at implementing a De Bruijn graph for genome assembly.
  • OLC.py: An attempt at the Overlap-Layout-Consensus (OLC) approach.
  • directed_graph.py: An attempt at building and analyzing directed graphs for assembly.
  • repeat_graph.py: An attempt to handle repeat graphs during assembly.
  • Not functional examples: A directory with other attempted code that is not functional at the moment.

Test_data

  • artificial_data_generator.py: A script for generating test data for validation.
  • foo-reads.fq: A sample FASTQ file containing read data.
  • foo.paf: A sample PAF file containing read alignments.

Table of Contents

Introduction

Genome assembly is a fundamental process in bioinformatics, involving the reconstruction of complete genomes from fragmented DNA sequences. The complexity of this task increases significantly when dealing with long-read data generated by technologies like PacBio MinION. The SLAEP assembler aims to simplify this process and make it accessible to students and learners interested in understanding the algorithms behind genome assembly.

Key Features

  • Educational Purpose: SLAEP is primarily designed for educational use, allowing students and newcomers to gain hands-on experience with genome assembly concepts.
  • Minor Data Set Assembly: While SLAEP may not be optimized for large-scale assembly projects, it handles minor data sets well, making it a proper tool for educational exercises.
  • Visualization Support: The included visualization script enables users to gain insights into the FastQ data set, aiding in understanding the assembly process and its results.
  • Python 3.8 Implementation: SLAEP is implemented entirely in Python 3.8, making it easy to understand and modify for educational purposes.

Installation

To set up the SLAEP assembler on your system, follow these steps:

  1. Clone the SLAEP repository:
git clone https://github.com/jlalisan/slaep.git
cd slaep
  1. Install the requirements
pip install -r requirements.txt
  1. Run the assembler
Python main.py file.fastq -p file.paf -o output.fasta -v

Replace file.fastq with the path to your FASTQ file and file.paf with the desired PAF file.

Usage

To assemble your genome data using SLAEP, use the following command:

Python main.py file.fastq -p file.paf -o output.fasta -v

The FastQ file is required, other files are optional as long as Minimap2 is installed on the device the script is run on. The visualization option is optional as well

Example usage

Example 1: Usage with only a FastQ file. For this Minimap2 is required, it will create a PAF file from the FastQ

Python main.py file.fastq

Example 2: Usage with FastQ file and a premade PAF file. Visualization is ignored here. The output file will be based on the input file name-wise

Python main.py file.fastq -p file.paf

Example 3: Usage with FastQ file and visualization. Minimap2 is required.

Python main.py file.fastq -v

Contact

If you have any questions, feedback, or suggestions regarding the SLAEP assembler, feel free to contact me:

About

Small Long-read Assembler for Educational Purposes.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages