Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Warehousing DbSNP's JSON Data into PostgreSQL

Intro

NCBI hosts a large, open-sourced dataset of human SNPs (Single-nucleotide Polymorphisms). Further, they store a good deal of auxillary data that is related to each SNP. The data is hosted on an FTP server here:

ftp://ftp.ncbi.nlm.nih.gov/snp/.redesign/latest_release/JSON

and is split across 25 gzipped JSON files (Chromosomes 1-22, X, Y and Mitochondrial DNA), amassing a total compressed size of ~100GB (~2TB uncompressed!).

Further Reading

More details can be found in this series of blog posts, detailing a three-part walkthrough, breaking the development of this application down in three steps:

  1. Downloading JSON SNP Data & Initilizing the Database
  2. Extracting ClinVar Disease & Frequency Study Data
  3. Efficiently Writing Data to PostgreSQL Database

Releases

No releases published

Packages

No packages published

Languages