# Building a SNP Namespace

**Author:** [Charles Tapley Hoyt](https://github.com/cthoyt/)

This notebook outlines the process to programatically download a curated SNP listing and build a namespace with the PyBEL namespace builder. 

In [1]:
dir=$(mktemp -d)
cd $dir
pwd

/var/folders/n1/h1c266qj2jq8kfdzfxj53pzw0000gn/T/tmp.qvkq84G2


## Download

The data comes from Will Rayner, who has curated the Illumina genotype chip data across many platforms and build. These SNP id's aren't necessarily the latest from dbSNP, but the source database is not so easy to access or query.

In [2]:
url="http://www.well.ox.ac.uk/~wrayner/strand/HumanOmni5-4v1-1_A-b35-strand.zip"
output=~/Downloads/illumina_snps.belns



In [3]:
curl $url -o snps.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  2 41.7M    2 1152k    0     0  2049k      0  0:00:20 --:--:--  0:00:20 2047k 14 41.7M   14 6095k    0     0  3911k      0  0:00:10  0:00:01  0:00:09 3909k 21 41.7M   21 8991k    0     0  3512k      0  0:00:12  0:00:02  0:00:10 3511k 28 41.7M   28 11.6M    0     0  3348k      0  0:00:12  0:00:03  0:00:09 3347k 33 41.7M   33 14.0M    0     0  3165k      0  0:00:13  0:00:04  0:00:09 3165k 39 41.7M   39 16.6M    0     0  3059k      0  0:00:13  0:00:05  0:00:08 3173k 45 41.7M   45 18.9M    0     0  2966k      0  0:00:14  0:00:06  0:00:08 2671k 51 41.7M   51 21.5M    0     0  2924k      0  0:00:14  0:00:07  0:00:07 2623k 57 41.7M   57 23.9M    0     0  2865k      0  0:00:14  0:00:08  0:00:06 2519k 63 41.7M   63 26.5M    0     0  2847k      0  0:

## Filter

The data contains putative, unnamed, and unmatched SNPs. These are removed with two `grep` statements.

In [4]:
unzip -p snps.zip | cut -d $'\t' -f 1 | grep "rs" | grep -v "No Match" | sort > illumina_snps.txt



In [5]:
wc illumina_snps.txt

 1114146 1114146 11363002 illumina_snps.txt


## Output

The `buildns` command in `pybel_tools` takes a list of items on stdin and the appropriate annotations as arguments to write a `*.belns` file conforming to the [specification](openbel-framework.readthedocs.io/en/latest/tutorials/building_custom_namespaces.html) from the OpenBEL Framework

In [6]:
cat illumina_snps.txt | python3 -m pybel_tools buildns --functions "G" \
    --title "Illumina SNPS" \
    --url "http://www.well.ox.ac.uk/~wrayner/strand/HumanOmni5-4v1-1_A-b35-strand.zip" \
    --description "SNP List acquired from Illumina HumanOmni5Exome-4 v1.1" \
    --email "charles.hoyt@scai.fraunhofer.de" \
    --creator "Charles Tapley Hoyt" \
    --subject "dbSNP" > $output



In [7]:
# clean up, clean up, everybody do your share!!!
cd
rm -rf $dir
unset dir



# Conclusions

This namespace is a better solution than manually curated namespaces, but is also very large and not necessarily updated with dbSNP. It's still worth looking for other sources within dbSNP.