Skip to content
/ gmap Public

Bachelor thesis on "Effective analysis of genotype datasets"

Notifications You must be signed in to change notification settings

egonelbre/gmap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

======================================== Effective algorithms on genotype dataset

Abstract

Genotype datasets are usually very large and they are expected to grow rapidly. Data size and format will start affecting speed of programs, therefore it is neccesary to have a fast framework and data representation and structure to get the best performance.

This paper discusses how simplifying the datasets and algorithms can have an improvement on program execution speed. We developed a data format (GMAP), framework (GMap) and programs for analysing genotype datasets.

We compared the speed of programs with Plink using different file formats. Testing showed that we can get a large improvement in performance using binary file format (such as our GMAP and Plink's BED) instead of text-based format (such as PED and TPED). Also, we show that our programs work faster than Plink, yet we could not say definitively if this is due to our data format or our algorithm implementation.

Compiling

Install gdc or dmd. Install dsss and rebuild http://www.dsource.org/projects/dsss.

make

Folder Structure

./src                  -- source
./src/gmap/            -- gmap libraries

./test                 -- testing scripts
    clean.sh           -- deletes all automatically generated data
    generate_data.sh   -- generates data for testing into data folder
                          change it if you want more data
    test_gmap.sh       -- runs gmap programs, timing data is in test_gmap.log
    test_plink.sh      -- runs plink program, timing data is in test_plink.log
    _test_/            -- this folder holds all results and data generated by tests

./bin                  -- binary files
    gmapassoc          -- does a association study
    gmapfreq           -- outputs genotype frequencies
    gmaphardyweinberg  -- tests for hardy-weinberg equilibrium
    gmaprandpheno      -- generates random phenotype data
    gmapconvert        -- converts ped to gmap
    gmapgenerate       -- generates a random ped file
    gmappack           -- packs gmap file

./obj                  -- object files

About

Bachelor thesis on "Effective analysis of genotype datasets"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published