Index Based Genome Mapper

Genome mappers can be used for mapping short next-generation sequencing reads to reference genomes, as well as locating any sub-sequence of interest on a genome. Mapping with index based mapper is a two-step process. First, all n-mers in full genome must be indexed. Secondly, this index is used to locate queries/reads listed in a separate file. This way indices could be used for several times since creating an index can be very time-consuming for larger genomes.

Build instructions

Required package: gcc 4.7

In the root source directory:

$ make all

Usage instructions

Example files

example.tgz contains the following files that can be used for testing the application:

pseudomonas_full_genome.fna - input for Indexer in FastA format
pseudomonas_10.index - output of Indexer, input for Mapper
pseudomonas.names - output of Indexer, input for Mapper
queries - input for Mapper

Indexer

Indexer takes a genome file(s) as a compulsory parameter, output name and wordlength are optional parameters. An example of the commandline looks as follows:

$ ./indexer -i example/pseudomonas_full_genome.fna -o pseudomonas -n 10

This creates the two files that are referred to as output files of the Indexer in the examples directory.

Additional help:

$ ./indexer --help

Mapper

Mapper takes both Indexer's output files and a query file as an input. Number of mismatches and the length of the step are optional parameters. An example of the commandline looks as follows:

$ ./mapper -i example/pseudomonas_10.index -g example/pseudomonas.names -q example/queries -mm 2

This creates the output that looks as follows:

0	gi|15595198|ref|NC_002516.1|	1765971	2	F
1	gi|15595198|ref|NC_002516.1|	1764846	0	F
2	gi|15595198|ref|NC_002516.1|	3975100	0	F
3	gi|15595198|ref|NC_002516.1|	319766	1	F
4	gi|15595198|ref|NC_002516.1|	320675	0	F

First column indicates the number of the query, second is the name of the chromosome, third is the location on that chromosome where the query mapped, fourth is the number of mismatches/errors and the last column indicates the strand (forward (F), reverse (R)).

Additional help:

$ ./mapper --help

Results

results.tgz contains the results of the validation process. We generated 25,000 random reads from the human reference genome build 37 with up to 2 mismatches and mapped them back to the genome using our Mapper. The reads are given in a file "reads" and the results in "human37.results".

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.depend		.depend
.directory		.directory
IndexBasedGenomeMapperLepametsPajuste.pdf		IndexBasedGenomeMapperLepametsPajuste.pdf
Makefile		Makefile
README.md		README.md
example.tgz		example.tgz
indexer.c		indexer.c
mapper.c		mapper.c
mappermethods.c		mappermethods.c
results.tgz		results.tgz
utils.c		utils.c
utils.h		utils.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Index Based Genome Mapper

Build instructions

Usage instructions

Example files

Indexer

Mapper

Results

About

Releases

Packages

Contributors 2

Languages

maarjl/read-mapping

Folders and files

Latest commit

History

Repository files navigation

Index Based Genome Mapper

Build instructions

Usage instructions

Example files

Indexer

Mapper

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages