Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



3 Commits

Repository files navigation

======================================== Effective algorithms on genotype dataset


Genotype datasets are usually very large and they are expected to grow rapidly. Data size and format will start affecting speed of programs, therefore it is neccesary to have a fast framework and data representation and structure to get the best performance.

This paper discusses how simplifying the datasets and algorithms can have an improvement on program execution speed. We developed a data format (GMAP), framework (GMap) and programs for analysing genotype datasets.

We compared the speed of programs with Plink using different file formats. Testing showed that we can get a large improvement in performance using binary file format (such as our GMAP and Plink's BED) instead of text-based format (such as PED and TPED). Also, we show that our programs work faster than Plink, yet we could not say definitively if this is due to our data format or our algorithm implementation.


Install gdc or dmd. Install dsss and rebuild


Folder Structure

./src                  -- source
./src/gmap/            -- gmap libraries

./test                 -- testing scripts           -- deletes all automatically generated data   -- generates data for testing into data folder
                          change it if you want more data       -- runs gmap programs, timing data is in test_gmap.log      -- runs plink program, timing data is in test_plink.log
    _test_/            -- this folder holds all results and data generated by tests

./bin                  -- binary files
    gmapassoc          -- does a association study
    gmapfreq           -- outputs genotype frequencies
    gmaphardyweinberg  -- tests for hardy-weinberg equilibrium
    gmaprandpheno      -- generates random phenotype data
    gmapconvert        -- converts ped to gmap
    gmapgenerate       -- generates a random ped file
    gmappack           -- packs gmap file

./obj                  -- object files


Bachelor thesis on "Effective analysis of genotype datasets"






No releases published


No packages published