Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Warehousing DbSNP's JSON Data into PostgreSQL


NCBI hosts a large, open-sourced dataset of human SNPs (Single-nucleotide Polymorphisms). Further, they store a good deal of auxillary data that is related to each SNP. The data is hosted on an FTP server here:

and is split across 25 gzipped JSON files (Chromosomes 1-22, X, Y and Mitochondrial DNA), amassing a total compressed size of ~100GB (~2TB uncompressed!).

Further Reading

More details can be found in this series of blog posts, detailing a three-part walkthrough, breaking the development of this application down in three steps:

  1. Downloading JSON SNP Data & Initilizing the Database
  2. Extracting ClinVar Disease & Frequency Study Data
  3. Efficiently Writing Data to PostgreSQL Database