# gnomAD

## Description

https://gnomad.broadinstitute.org/downloads

# Register dataShop [env:bashtools]

## Verifications

In [1]:
echo "dataShop location"
pwd
echo "repository status"
df -h .
echo "dataShop size"
du -sh .
echo "Groups you belong to"
groups

dataShop location
/hyperion/databank/booth/jmcarter/sub.gnomAD
repository status
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdl2        82T  4.2T   78T   6% /hyperion
dataShop size
33K	.
Groups you belong to
jmcarter datascigroup


## Input metadata

In the next cell:
1. Enter a **Memorable Description**. No spaces or special characters allowed!
2. Enter a **Version Number** 0-100, if you are making changes to an existing dataShop.
3. Enter the **Permission Group** to grant access privileges, please check with admin if uncertain.
4. Enter the appropriate **Access Level** for your data (1, 2, 3 or 4).

In [2]:
#dataShop permissions
shdesc="gnomAD" #no spaces, only underscores allowed.
versid=0 #0-100
accgrp=datascigroup #input Permission Group with access privileges
acclvl=1 #1:OPEN/2:RESTRICTED/3:CONFIDENTIAL/4:CLASSIFIED

5. **Run** the next cell to register and set appropriate dataShop permissions

In [3]:
#configure dataShop
lockfile="shop.data"
foldnm="$(basename $PWD)" #folder name
if [[ -f "$lockfile" ]]; then
    echo "Previous dataShop configuration detected"
    cat $lockfile
    echo ""
    read -r shopid iversid iauthor ishdesc iaccgrp iacclvl < $lockfile
    if [[ -z "$versid" ]]; then versid=$iversid ; fi
    if [[ -z "$shdesc" ]]; then shdesc=$ishdesc ; fi
    if [[ -z "$accgrp" ]]; then accgrp=$iaccgrp ; fi
    if [[ -z "$acclvl" ]]; then acclvl=$iacclvl ; fi
else
    epoch=$(date "+%s")
    b36arr=(0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z)
    shopid=$(for i in $(echo "obase=36; $epoch"| bc); do echo -n ${b36arr[${i#0}]}; done)
fi
#sanitize variables
acclvl=${acclvl//[!1-4]/}
accgrp=${accgrp//[^[:alnum:]_]/}
versid=${versid//[!0-9]/}
shdesc=${shdesc//[^[:alnum:]_]/}
#setting permissions and lock file
if [[ -z "$accgrp" || -z "$acclvl" || -z "$shopid" || -z "$versid" || -z "$shdesc" ]]
then
    echo "Metadata missing or failed checks, please set appropriately"
else
    if id -nGz "$USER" | grep -qzxF "$accgrp" 
    then
        echo User \`$USER\' belongs to group \`$accgrp\'
        case $acclvl in
        1)
            echo "Classification Level 1: OPEN"
            echo "Enabling access for Others on ${foldnm}"
            echo "Enabling access for ${accgrp} on ${foldnm}"
            chmod go=rx $PWD
            chgrp $accgrp $PWD
            ;;
        2 | 3 | 4)
            echo "Classification Level ${acclvl}: RESTRICTED"
            echo "Disabling access for Others on ${foldnm}"
            echo "Enabling access for ${accgrp} on ${foldnm}"
            chmod g=rx,o= $PWD
            chgrp $accgrp $PWD
            ;;
        *)
            echo "Classification Level ${acclvl}: INVALID"
            echo "Please specify a correct classification level"
            return
            ;;
        esac
        echo "Building dataShop ${shopid} version ${versid} at ${foldnm}"
        mkdir -p depo #new files received that cannot be imported
        mkdir -p reco #record datashop versions here
        mkdir -p "reco/v${versid}/reqs" #requirements to run (new environment)
        mkdir -p "reco/v${versid}/impo" #importable local files
        mkdir -p "reco/v${versid}/expo" #deposited/generated files shared externally
        mkdir -p "reco/v${versid}/temp" #temporary storage
        mkdir -p "reco/v${versid}/prod" #data output folder
        mkdir -p "reco/v${versid}/logs" #output logs folder
        mkdir -p "reco/v${versid}/plot" #figure output folder
        ln -frsn depo "reco/v${versid}/depo"
        ln -frsn "reco/v${versid}/reqs" reqs
        ln -frsn "reco/v${versid}/impo" impo
        ln -frsn "reco/v${versid}/expo" expo
        ln -frsn "reco/v${versid}/temp" temp
        ln -frsn "reco/v${versid}/prod" prod
        ln -frsn "reco/v${versid}/logs" logs
        ln -frsn "reco/v${versid}/plot" plot
        ls -l
        echo "Registering dataShop"
        printf "${shopid} ${versid} ${USER} ${shdesc} ${accgrp} ${acclvl}" > $lockfile
        echo "This dataShop (v${versid}) can be accessed at:"
        echo "/data/shop/acc${shopid}.${USER}/v${versid}"
        datamall="/data/mall/${USER}/acc${shopid}"
        datashop="/data/shop/acc${shopid}.${USER}"
        ln -fsn "${PWD}/reco" "$datamall"
        ln -fsn "$datamall" "$datashop"
    else
        echo User \`$USER\' does not belong to group \`$accgrp\'
        echo Ensure correct spelling and consult your administrator.
    fi
fi

User `jmcarter' belongs to group `datascigroup'
Classification Level 1: OPEN
Enabling access for Others on sub.gnomAD
Enabling access for datascigroup on sub.gnomAD
Building dataShop RK1S79 version 0 at sub.gnomAD
total 36
drwxrwxr-x 1 jmcarter jmcarter     0 Oct 20 18:38 [0m[01;34mdepo[0m
lrwxrwxrwx 1 jmcarter jmcarter    12 Oct 20 18:38 [01;36mexpo[0m -> [01;34mreco/v0/expo[0m
lrwxrwxrwx 1 jmcarter jmcarter    12 Oct 20 18:38 [01;36mimpo[0m -> [01;34mreco/v0/impo[0m
lrwxrwxrwx 1 jmcarter jmcarter    12 Oct 20 18:38 [01;36mlogs[0m -> [01;34mreco/v0/logs[0m
lrwxrwxrwx 1 jmcarter jmcarter    12 Oct 20 18:38 [01;36mplot[0m -> [01;34mreco/v0/plot[0m
lrwxrwxrwx 1 jmcarter jmcarter    12 Oct 20 18:38 [01;36mprod[0m -> [01;34mreco/v0/prod[0m
drwxrwxr-x 1 jmcarter jmcarter   136 Oct 20 18:38 [01;34mreco[0m
lrwxrwxrwx 1 jmcarter jmcarter    12 Oct 20 18:38 [01;36mreqs[0m -> [01;34mreco/v0/reqs[0m
lrwxrwxrwx 1 jmcarter jmcarter    12 Oct 20 18:38 [01;36mtemp[0m ->

In [1]:
ln -rs RK1S79.gnomAD.ipynb reco/v0/

## Requirements

Use `conda env export > reqs/custom_env.yml` or `conda list --explicit > reqs/custom_env.txt` to save information about required tools to re-run dataShop.

In [None]:
conda env export > reqs/custom_env.yml

In [80]:
ls -l reqs

ls: cannot access 'reqs': No such file or directory


: 2

## Deposits
Deposit any new data in the `depo` folder. Use `rsync` or Grsync (GUI version).

In [4]:
wget -O depo/gnomad.genomes.v3.1.2.sites.chr4.vcf.bgz "https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr4.vcf.bgz"

--2022-10-31 14:24:55--  https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr4.vcf.bgz
Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.10.128, 142.251.12.128, 74.125.200.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.10.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 163327599361 (152G) [application/octet-stream]
Saving to: ‘depo/gnomad.genomes.v3.1.2.sites.chr4.vcf.bgz’


2022-10-31 15:38:14 (35.4 MB/s) - ‘depo/gnomad.genomes.v3.1.2.sites.chr4.vcf.bgz’ saved [163327599361/163327599361]



In [3]:
wget -O depo/gnomad.genomes.v3.1.2.sites.chr9.vcf.bgz "https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr9.vcf.bgz"

--2022-10-31 13:43:10--  https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr9.vcf.bgz
Resolving storage.googleapis.com (storage.googleapis.com)... 142.250.4.128, 74.125.24.128, 142.251.10.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.250.4.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 110733327638 (103G) [application/octet-stream]
Saving to: ‘depo/gnomad.genomes.v3.1.2.sites.chr9.vcf.bgz’


2022-10-31 14:24:55 (42.2 MB/s) - ‘depo/gnomad.genomes.v3.1.2.sites.chr9.vcf.bgz’ saved [110733327638/110733327638]



In [5]:
wget -O depo/gnomad.genomes.v3.1.2.sites.chr10.vcf.bgz "https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr10.vcf.bgz"

--2022-10-31 15:38:14--  https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr10.vcf.bgz
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.24.128, 142.251.10.128, 142.251.12.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.24.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 117914739898 (110G) [application/octet-stream]
Saving to: ‘depo/gnomad.genomes.v3.1.2.sites.chr10.vcf.bgz’


2022-10-31 16:23:50 (41.1 MB/s) - ‘depo/gnomad.genomes.v3.1.2.sites.chr10.vcf.bgz’ saved [117914739898/117914739898]



In [6]:
wget -O depo/gnomad.genomes.v3.1.2.sites.chr11.vcf.bgz "https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr11.vcf.bgz"

--2022-10-31 16:23:50--  https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr11.vcf.bgz
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.24.128, 142.251.10.128, 142.251.12.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.24.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 116325219774 (108G) [application/octet-stream]
Saving to: ‘depo/gnomad.genomes.v3.1.2.sites.chr11.vcf.bgz’


2022-10-31 17:09:58 (40.1 MB/s) - ‘depo/gnomad.genomes.v3.1.2.sites.chr11.vcf.bgz’ saved [116325219774/116325219774]



In [7]:
wget -O depo/gnomad.genomes.v3.1.2.sites.chr12.vcf.bgz "https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr12.vcf.bgz"

--2022-10-31 17:09:58--  https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr12.vcf.bgz
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.24.128, 142.251.10.128, 142.251.12.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.24.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 114042301500 (106G) [application/octet-stream]
Saving to: ‘depo/gnomad.genomes.v3.1.2.sites.chr12.vcf.bgz’


2022-10-31 17:55:12 (40.1 MB/s) - ‘depo/gnomad.genomes.v3.1.2.sites.chr12.vcf.bgz’ saved [114042301500/114042301500]



In [4]:
wget -O depo/gnomad.genomes.v3.1.2.sites.chr14.vcf.bgz "https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr14.vcf.bgz"

--2022-10-20 18:39:42--  https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr14.vcf.bgz
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.200.128, 74.125.68.128, 142.250.4.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.200.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 78504248952 (73G) [application/octet-stream]
Saving to: ‘depo/gnomad.genomes.v3.1.2.sites.chr14.vcf.bgz’


2022-10-20 19:08:15 (43.7 MB/s) - ‘depo/gnomad.genomes.v3.1.2.sites.chr14.vcf.bgz’ saved [78504248952/78504248952]



In [8]:
wget -O depo/gnomad.genomes.v3.1.2.sites.chr16.vcf.bgz "https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr16.vcf.bgz"

--2022-10-31 17:55:13--  https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr16.vcf.bgz
Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.12.128, 74.125.68.128, 142.250.4.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.12.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 82406068224 (77G) [application/octet-stream]
Saving to: ‘depo/gnomad.genomes.v3.1.2.sites.chr16.vcf.bgz’


2022-10-31 18:30:47 (36.8 MB/s) - ‘depo/gnomad.genomes.v3.1.2.sites.chr16.vcf.bgz’ saved [82406068224/82406068224]



In [9]:
wget -O depo/gnomad.genomes.v3.1.2.sites.chr17.vcf.bgz "https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr17.vcf.bgz"

--2022-10-31 18:30:47--  https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr17.vcf.bgz
Resolving storage.googleapis.com (storage.googleapis.com)... 142.250.4.128, 74.125.24.128, 142.251.10.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.250.4.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 73832366577 (69G) [application/octet-stream]
Saving to: ‘depo/gnomad.genomes.v3.1.2.sites.chr17.vcf.bgz’


2022-10-31 19:00:55 (39.0 MB/s) - ‘depo/gnomad.genomes.v3.1.2.sites.chr17.vcf.bgz’ saved [73832366577/73832366577]



In [10]:
wget -O depo/gnomad.genomes.v3.1.2.sites.chr20.vcf.bgz "https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr20.vcf.bgz"

--2022-10-31 19:00:55--  https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr20.vcf.bgz
Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.10.128, 142.251.12.128, 74.125.200.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.10.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 53285996839 (50G) [application/octet-stream]
Saving to: ‘depo/gnomad.genomes.v3.1.2.sites.chr20.vcf.bgz’


2022-10-31 19:26:12 (33.5 MB/s) - ‘depo/gnomad.genomes.v3.1.2.sites.chr20.vcf.bgz’ saved [53285996839/53285996839]



In [11]:
wget -O depo/gnomad.genomes.v3.1.2.sites.chr4.vcf.bgz.tbi \
"https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr4.vcf.bgz.tbi"

--2022-11-01 10:41:50--  https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr4.vcf.bgz.tbi
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.68.128, 142.250.4.128, 74.125.24.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.68.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 204030 (199K) [application/octet-stream]
Saving to: ‘depo/gnomad.genomes.v3.1.2.sites.chr4.vcf.bgz.tbi’


2022-11-01 10:41:50 (11.7 MB/s) - ‘depo/gnomad.genomes.v3.1.2.sites.chr4.vcf.bgz.tbi’ saved [204030/204030]



In [12]:
wget -O depo/gnomad.genomes.v3.1.2.sites.chr9.vcf.bgz.tbi \
"https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr9.vcf.bgz.tbi"

--2022-11-01 10:42:20--  https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr9.vcf.bgz.tbi
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.130.128, 74.125.200.128, 142.251.10.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.130.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 130995 (128K) [application/octet-stream]
Saving to: ‘depo/gnomad.genomes.v3.1.2.sites.chr9.vcf.bgz.tbi’


2022-11-01 10:42:21 (11.4 MB/s) - ‘depo/gnomad.genomes.v3.1.2.sites.chr9.vcf.bgz.tbi’ saved [130995/130995]



In [13]:
wget -O depo/gnomad.genomes.v3.1.2.sites.chr10.vcf.bgz.tbi \
"https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr10.vcf.bgz.tbi"

--2022-11-01 10:42:33--  https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr10.vcf.bgz.tbi
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.130.128, 74.125.200.128, 142.251.10.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.130.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 143033 (140K) [application/octet-stream]
Saving to: ‘depo/gnomad.genomes.v3.1.2.sites.chr10.vcf.bgz.tbi’


2022-11-01 10:42:34 (11.7 MB/s) - ‘depo/gnomad.genomes.v3.1.2.sites.chr10.vcf.bgz.tbi’ saved [143033/143033]



In [14]:
wget -O depo/gnomad.genomes.v3.1.2.sites.chr11.vcf.bgz.tbi \
"https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr11.vcf.bgz.tbi"

--2022-11-01 10:42:48--  https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr11.vcf.bgz.tbi
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.130.128, 74.125.200.128, 142.251.10.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.130.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 142713 (139K) [application/octet-stream]
Saving to: ‘depo/gnomad.genomes.v3.1.2.sites.chr11.vcf.bgz.tbi’


2022-11-01 10:42:49 (11.7 MB/s) - ‘depo/gnomad.genomes.v3.1.2.sites.chr11.vcf.bgz.tbi’ saved [142713/142713]



In [15]:
wget -O depo/gnomad.genomes.v3.1.2.sites.chr12.vcf.bgz.tbi \
"https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr12.vcf.bgz.tbi"

--2022-11-01 10:43:01--  https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr12.vcf.bgz.tbi
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.130.128, 74.125.200.128, 142.251.10.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.130.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 142118 (139K) [application/octet-stream]
Saving to: ‘depo/gnomad.genomes.v3.1.2.sites.chr12.vcf.bgz.tbi’


2022-11-01 10:43:02 (11.1 MB/s) - ‘depo/gnomad.genomes.v3.1.2.sites.chr12.vcf.bgz.tbi’ saved [142118/142118]



In [6]:
wget -O depo/gnomad.genomes.v3.1.2.sites.chr14.vcf.bgz.tbi \
"https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr14.vcf.bgz.tbi"

--2022-10-26 11:43:29--  https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr14.vcf.bgz.tbi
Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.10.128, 142.251.12.128, 74.125.200.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.10.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 96071 (94K) [application/octet-stream]
Saving to: ‘depo/gnomad.genomes.v3.1.2.sites.chr14.vcf.bgz.tbi’


2022-10-26 11:43:30 (13.4 MB/s) - ‘depo/gnomad.genomes.v3.1.2.sites.chr14.vcf.bgz.tbi’ saved [96071/96071]



In [16]:
wget -O depo/gnomad.genomes.v3.1.2.sites.chr16.vcf.bgz.tbi \
"https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr16.vcf.bgz.tbi"

--2022-11-01 10:43:33--  https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr16.vcf.bgz.tbi
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.68.128, 142.250.4.128, 74.125.24.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.68.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 87761 (86K) [application/octet-stream]
Saving to: ‘depo/gnomad.genomes.v3.1.2.sites.chr16.vcf.bgz.tbi’


2022-11-01 10:43:33 (12.1 MB/s) - ‘depo/gnomad.genomes.v3.1.2.sites.chr16.vcf.bgz.tbi’ saved [87761/87761]



In [17]:
wget -O depo/gnomad.genomes.v3.1.2.sites.chr17.vcf.bgz.tbi \
"https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr17.vcf.bgz.tbi"

--2022-11-01 10:43:48--  https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr17.vcf.bgz.tbi
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.130.128, 74.125.200.128, 142.251.10.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.130.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 86804 (85K) [application/octet-stream]
Saving to: ‘depo/gnomad.genomes.v3.1.2.sites.chr17.vcf.bgz.tbi’


2022-11-01 10:43:49 (11.9 MB/s) - ‘depo/gnomad.genomes.v3.1.2.sites.chr17.vcf.bgz.tbi’ saved [86804/86804]



In [18]:
wget -O depo/gnomad.genomes.v3.1.2.sites.chr20.vcf.bgz.tbi \
"https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr20.vcf.bgz.tbi"

--2022-11-01 10:44:02--  https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr20.vcf.bgz.tbi
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.130.128, 74.125.200.128, 142.251.10.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.130.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 65533 (64K) [application/octet-stream]
Saving to: ‘depo/gnomad.genomes.v3.1.2.sites.chr20.vcf.bgz.tbi’


2022-11-01 10:44:04 (12.1 MB/s) - ‘depo/gnomad.genomes.v3.1.2.sites.chr20.vcf.bgz.tbi’ saved [65533/65533]



## High Confidence / Gold Standard Variants

https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0

In [24]:
wget -O depo/1000G_phase1.snps.high_confidence.hg38.vcf.gz \
"https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz"

--2022-11-25 12:43:14--  https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz
Resolving storage.googleapis.com (storage.googleapis.com)... 172.253.118.128, 74.125.200.128, 142.250.4.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|172.253.118.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1888262073 (1.8G) [text/x-vcard]
Saving to: ‘depo/1000G_phase1.snps.high_confidence.hg38.vcf.gz’


2022-11-25 12:44:41 (20.9 MB/s) - ‘depo/1000G_phase1.snps.high_confidence.hg38.vcf.gz’ saved [1888262073/1888262073]



In [25]:
wget -O depo/1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi \
"https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi"

--2022-11-25 12:44:41--  https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi
Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.12.128, 172.253.118.128, 74.125.200.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.12.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2128536 (2.0M) [application/octet-stream]
Saving to: ‘depo/1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi’


2022-11-25 12:44:43 (1.28 MB/s) - ‘depo/1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi’ saved [2128536/2128536]



In [26]:
wget -O depo/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz \
"https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz"

--2022-11-25 12:44:43--  https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.10.128, 172.217.194.128, 74.125.24.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.10.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20685880 (20M) [text/x-vcard]
Saving to: ‘depo/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz’


2022-11-25 12:44:47 (6.54 MB/s) - ‘depo/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz’ saved [20685880/20685880]



In [27]:
wget -O depo/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi \
"https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi"

--2022-11-25 12:44:47--  https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi
Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.10.128, 172.217.194.128, 74.125.24.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.10.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1500013 (1.4M) [application/octet-stream]
Saving to: ‘depo/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi’


2022-11-25 12:44:49 (859 KB/s) - ‘depo/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi’ saved [1500013/1500013]



In [29]:
wget -O depo/Homo_sapiens_assembly38.dbsnp138.vcf \
"https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf"

--2022-11-28 12:24:58--  https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf
Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.10.128, 142.251.12.128, 142.250.4.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.10.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10950827213 (10G) [text/x-vcard]
Saving to: ‘depo/Homo_sapiens_assembly38.dbsnp138.vcf’


2022-11-28 12:36:04 (15.7 MB/s) - ‘depo/Homo_sapiens_assembly38.dbsnp138.vcf’ saved [10950827213/10950827213]



In [28]:
wget -O depo/Homo_sapiens_assembly38.dbsnp138.vcf.idx \
"https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx"

--2022-11-28 12:24:37--  https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.200.128, 142.250.4.128, 74.125.24.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.200.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12480412 (12M) [application/octet-stream]
Saving to: ‘depo/Homo_sapiens_assembly38.dbsnp138.vcf.idx’


2022-11-28 12:24:39 (7.91 MB/s) - ‘depo/Homo_sapiens_assembly38.dbsnp138.vcf.idx’ saved [12480412/12480412]



In [30]:
wget -O depo/Homo_sapiens_assembly38.known_indels.vcf.gz \
"https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz"

--2022-11-28 13:35:31--  https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz
Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.12.128, 74.125.200.128, 74.125.68.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.12.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 61692306 (59M) [text/x-vcard]
Saving to: ‘depo/Homo_sapiens_assembly38.known_indels.vcf.gz’


2022-11-28 13:35:37 (11.1 MB/s) - ‘depo/Homo_sapiens_assembly38.known_indels.vcf.gz’ saved [61692306/61692306]



In [31]:
wget -O depo/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi \
"https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi"

--2022-11-28 13:35:37--  https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi
Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.10.128, 172.217.194.128, 74.125.24.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.10.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1567886 (1.5M) [application/octet-stream]
Saving to: ‘depo/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi’


2022-11-28 13:35:40 (891 KB/s) - ‘depo/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi’ saved [1567886/1567886]



In [32]:
ls -l depo

total 901672288
-rw-rw-r-- 1 jmcarter jmcarter   1888262073 Jul 22  2016 [0m[01;31m1000G_phase1.snps.high_confidence.hg38.vcf.gz[0m
-rw-rw-r-- 1 jmcarter jmcarter      2128536 Jul 22  2016 1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi
-rw-rw-r-- 1 jmcarter jmcarter 117914739898 Oct 30  2021 gnomad.genomes.v3.1.2.sites.chr10.vcf.bgz
-rw-rw-r-- 1 jmcarter jmcarter       143033 Oct 30  2021 gnomad.genomes.v3.1.2.sites.chr10.vcf.bgz.tbi
-rw-rw-r-- 1 jmcarter jmcarter 116325219774 Oct 30  2021 gnomad.genomes.v3.1.2.sites.chr11.vcf.bgz
-rw-rw-r-- 1 jmcarter jmcarter       142713 Oct 30  2021 gnomad.genomes.v3.1.2.sites.chr11.vcf.bgz.tbi
-rw-rw-r-- 1 jmcarter jmcarter 114042301500 Oct 30  2021 gnomad.genomes.v3.1.2.sites.chr12.vcf.bgz
-rw-rw-r-- 1 jmcarter jmcarter       142118 Oct 30  2021 gnomad.genomes.v3.1.2.sites.chr12.vcf.bgz.tbi
-rw-rw-r-- 1 jmcarter jmcarter  78504248952 Oct 30  2021 gnomad.genomes.v3.1.2.sites.chr14.vcf.bgz
-rw-rw-r-- 1 jmcarter jmcarter        96071 Oct 30  2

## Imports
Symlink any existing data to the `impo` folder.  
Use `ln -s /path/to/source impo/destination`.

In [79]:
ls -l impo

total 0


## Exports

Data generated in the dataShop to be shared externally can be symlinked here.  
Prefix with dataShop UID.  
e.g. `ln -rs proc/results.tsv expo/RAL2CG.0.JMC.results.tsv`

In [77]:
ls -l expo

total 0


---

WARNING: Upon launching custom bash kernels you may need to run this:

In [None]:
bind 'set enable-bracketed-paste off'

---

# dataShop Code [env:my_env]

In [4]:
#awk is much faster than sed for this task
vcfstrip() {
awk 'BEGIN{OFS=FS="\t"} !/^#/ {
     $3=".";$6=".";match($8, /AF=[0-9]*\.[e0-9+-]*/); 
     $8="AF="0.00000+substr($8, RSTART+3, RLENGTH-3)}; {print $0}'
}
export -f vcfstrip

In [22]:
##awk is much faster than sed for this task
#vcfstrip() {
#awk 'BEGIN{OFS=FS="\t"} !/^#/ {
#     $3="."; $6="."; match($8, /AF[_|a-z]*=[0-9]*\.[e0-9+-]*/, arr);
#     for (i in arr) {start=arr[i, "start"]+3; length=arr[i, "length"]-3;
#                     AF=substr($8, start, length); if (AF > best){best = AF}};
#     $8="AF="best}; {print $0}'
#}

In [14]:
#time zcat depo/gnomad.genomes.v3.1.2.sites.chr14.vcf.bgz \
#    | vcfstrip \
#    | bgzip -@ 4 \
#    > prod/AFonly.gnomad.genomes.v3.1.2.sites.chr14.vcf.bgz


real	35m54.343s
user	37m50.482s
sys	6m37.191s


In [19]:
ls depo/*.bgz

depo/gnomad.genomes.v3.1.2.sites.chr10.vcf.bgz
depo/gnomad.genomes.v3.1.2.sites.chr11.vcf.bgz
depo/gnomad.genomes.v3.1.2.sites.chr12.vcf.bgz
depo/gnomad.genomes.v3.1.2.sites.chr14.vcf.bgz
depo/gnomad.genomes.v3.1.2.sites.chr16.vcf.bgz
depo/gnomad.genomes.v3.1.2.sites.chr17.vcf.bgz
depo/gnomad.genomes.v3.1.2.sites.chr20.vcf.bgz
depo/gnomad.genomes.v3.1.2.sites.chr4.vcf.bgz
depo/gnomad.genomes.v3.1.2.sites.chr9.vcf.bgz


In [15]:
for chr in depo/*.bgz; do
name="$(basename -- $chr)"
echo "prod/AFonly.${name%.bgz}.gz"
done

prod/AFonly.gnomad.genomes.v3.1.2.sites.chr10.vcf.gz
prod/AFonly.gnomad.genomes.v3.1.2.sites.chr11.vcf.gz
prod/AFonly.gnomad.genomes.v3.1.2.sites.chr12.vcf.gz
prod/AFonly.gnomad.genomes.v3.1.2.sites.chr14.vcf.gz
prod/AFonly.gnomad.genomes.v3.1.2.sites.chr16.vcf.gz
prod/AFonly.gnomad.genomes.v3.1.2.sites.chr17.vcf.gz
prod/AFonly.gnomad.genomes.v3.1.2.sites.chr20.vcf.gz
prod/AFonly.gnomad.genomes.v3.1.2.sites.chr4.vcf.gz
prod/AFonly.gnomad.genomes.v3.1.2.sites.chr9.vcf.gz


In [5]:
for chr in depo/*.bgz; do
name="$(basename -- $chr)"
sem -j 2 "zcat ${chr} \
    | vcfstrip \
    | bgzip -@ 4 \
    > prod/AFonly.${name%.bgz}.gz ; echo ${name} done"; echo "${name} submitted"
done
sem --wait

gnomad.genomes.v3.1.2.sites.chr10.vcf.bgz submitted
gnomad.genomes.v3.1.2.sites.chr11.vcf.bgz submitted
gnomad.genomes.v3.1.2.sites.chr12.vcf.bgz submitted
gnomad.genomes.v3.1.2.sites.chr14.vcf.bgz submitted
gnomad.genomes.v3.1.2.sites.chr16.vcf.bgz submitted
gnomad.genomes.v3.1.2.sites.chr17.vcf.bgz submitted
gnomad.genomes.v3.1.2.sites.chr20.vcf.bgz submitted
gnomad.genomes.v3.1.2.sites.chr4.vcf.bgz submitted
gnomad.genomes.v3.1.2.sites.chr9.vcf.bgz submitted


In [10]:
#time zcat depo/gnomad.genomes.v3.1.2.sites.chr14.vcf.bgz \
#    | parallel -l 1000000 -j 10 -k --spreadstdin vcfstrip \
#    | bgzip -@ 10 \
#    > prod/chr14.vcf.bgz



real	58m21.785s
user	58m13.608s
sys	0m4.872s


In [16]:
ls prod/AFonly.gnomad.genomes.v3.1.2.sites.*.vcf.gz

[0m[01;31mprod/AFonly.gnomad.genomes.v3.1.2.sites.9X.vcf.gz[0m
[01;31mprod/AFonly.gnomad.genomes.v3.1.2.sites.chr10.vcf.gz[0m
[01;31mprod/AFonly.gnomad.genomes.v3.1.2.sites.chr11.vcf.gz[0m
[01;31mprod/AFonly.gnomad.genomes.v3.1.2.sites.chr12.vcf.gz[0m
[01;31mprod/AFonly.gnomad.genomes.v3.1.2.sites.chr14.vcf.gz[0m
[01;31mprod/AFonly.gnomad.genomes.v3.1.2.sites.chr16.vcf.gz[0m
[01;31mprod/AFonly.gnomad.genomes.v3.1.2.sites.chr17.vcf.gz[0m
[01;31mprod/AFonly.gnomad.genomes.v3.1.2.sites.chr20.vcf.gz[0m
[01;31mprod/AFonly.gnomad.genomes.v3.1.2.sites.chr4.vcf.gz[0m
[01;31mprod/AFonly.gnomad.genomes.v3.1.2.sites.chr9.vcf.gz[0m


In [17]:
for chr in prod/AFonly.gnomad.genomes.v3.1.2.sites.*.vcf.gz; do
gatk IndexFeatureFile \
     -I $chr \
     --verbosity WARNING
done

Using GATK jar /opt/datasci_apps/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar
Running:
19:31:32.449 WARN  IntelInflater - Zero Bytes Written : 0
[1 November, 2022 7:31:32 PM SGT] org.broadinstitute.hellbender.tools.IndexFeatureFile done. Elapsed time: 0.62 minutes.
Runtime.totalMemory()=2562719744
Tool returned:
/hyperion/databank/booth/jmcarter/sub.gnomAD/prod/AFonly.gnomad.genomes.v3.1.2.sites.chr10.vcf.gz.tbi
Using GATK jar /opt/datasci_apps/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar
Running:
19:32:10.303 WARN  IntelInflater - Zero Bytes Written : 0
[1 November, 2022 7:32:10 PM SGT] org.broadinstitute.hellbender.tools.IndexFeatureFile done. Elapsed time: 0.60 minutes.
Runtime.totalMemory()=2585788416
Tool returned:
/hyperion/databank/booth/jmcarter/sub.gnomAD/prod/AFonly.gnomad.genomes.v3.1.2.sites.chr11.vcf.gz.tbi
Using GATK jar /opt/datasci_apps/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar
Running:
19:32:47.401 WARN  IntelInflater - Zero Bytes Written : 0
[1 November, 2022 7:32:47 

In [18]:
for chr in prod/AFonly.gnomad.genomes.v3.1.2.sites.*.vcf.gz; do
echo $chr >> prod/AFonly.list
done

In [19]:
head prod/AFonly.list

prod/AFonly.gnomad.genomes.v3.1.2.sites.chr10.vcf.gz
prod/AFonly.gnomad.genomes.v3.1.2.sites.chr11.vcf.gz
prod/AFonly.gnomad.genomes.v3.1.2.sites.chr12.vcf.gz
prod/AFonly.gnomad.genomes.v3.1.2.sites.chr14.vcf.gz
prod/AFonly.gnomad.genomes.v3.1.2.sites.chr16.vcf.gz
prod/AFonly.gnomad.genomes.v3.1.2.sites.chr17.vcf.gz
prod/AFonly.gnomad.genomes.v3.1.2.sites.chr20.vcf.gz
prod/AFonly.gnomad.genomes.v3.1.2.sites.chr4.vcf.gz
prod/AFonly.gnomad.genomes.v3.1.2.sites.chr9.vcf.gz


In [20]:
gatk MergeVcfs \
    -I prod/AFonly.list \
    -O prod/AFonly.gnomad.genomes.v3.1.2.sites.9X.vcf.gz \
    --VERBOSITY WARNING

Using GATK jar /opt/datasci_apps/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar
Running:
19:40:46.699 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/datasci_apps/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Tue Nov 01 19:40:46 SGT 2022] Executing as jmcarter@odin on Linux 5.8.0-53-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_112-b16; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.3.0.0
19:42:43.374 WARN  IntelInflater - Zero Bytes Written : 0
19:44:00.619 WARN  IntelInflater - Zero Bytes Written : 0
19:45:19.352 WARN  IntelInflater - Zero Bytes Written : 0
19:46:35.244 WARN  IntelInflater - Zero Bytes Written : 0
19:47:51.676 WARN  IntelInflater - Zero Bytes Written : 0
19:48:42.953 WARN  IntelInflater - Zero Bytes Written : 0
19:49:36.486 WARN  IntelInflater - Zero Bytes Written : 0
19:50:23.765 WARN  IntelInflater - Zero Bytes Written : 0
19:50:58.318 WARN  IntelInflater - 

In [21]:
ls prod/AFonly.gnomad.genomes.v3.1.2.sites.*.vcf.gz

[0m[01;31mprod/AFonly.gnomad.genomes.v3.1.2.sites.9X.vcf.gz[0m
[01;31mprod/AFonly.gnomad.genomes.v3.1.2.sites.chr10.vcf.gz[0m
[01;31mprod/AFonly.gnomad.genomes.v3.1.2.sites.chr11.vcf.gz[0m
[01;31mprod/AFonly.gnomad.genomes.v3.1.2.sites.chr12.vcf.gz[0m
[01;31mprod/AFonly.gnomad.genomes.v3.1.2.sites.chr14.vcf.gz[0m
[01;31mprod/AFonly.gnomad.genomes.v3.1.2.sites.chr16.vcf.gz[0m
[01;31mprod/AFonly.gnomad.genomes.v3.1.2.sites.chr17.vcf.gz[0m
[01;31mprod/AFonly.gnomad.genomes.v3.1.2.sites.chr20.vcf.gz[0m
[01;31mprod/AFonly.gnomad.genomes.v3.1.2.sites.chr4.vcf.gz[0m
[01;31mprod/AFonly.gnomad.genomes.v3.1.2.sites.chr9.vcf.gz[0m


In [22]:
for chr in prod/AFonly.gnomad.genomes.v3.1.2.sites.*.vcf.gz; do
name="$(basename -- $chr)"
gatk SelectVariants \
    -V $chr \
    -select-type SNP -restrict-alleles-to BIALLELIC \
    -select "AF > 0.01" \
    -O "prod/BA${name}" \
    --lenient --verbosity WARNING
done

Using GATK jar /opt/datasci_apps/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar
Running:
20:11:42.014 WARN  IntelInflater - Zero Bytes Written : 0
[1 November, 2022 8:11:42 PM SGT] org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants done. Elapsed time: 5.90 minutes.
Runtime.totalMemory()=1243086848
Using GATK jar /opt/datasci_apps/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar
Running:
20:12:29.262 WARN  IntelInflater - Zero Bytes Written : 0
[1 November, 2022 8:12:29 PM SGT] org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants done. Elapsed time: 0.76 minutes.
Runtime.totalMemory()=2022178816
Using GATK jar /opt/datasci_apps/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar
Running:
20:13:15.653 WARN  IntelInflater - Zero Bytes Written : 0
[1 November, 2022 8:13:15 PM SGT] org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants done. Elapsed time: 0.74 minutes.
Runtime.totalMemory()=2403336192
Using GATK jar /opt/datasci_apps/gatk-4.3.0.0/gat

In [1]:
#gatk IndexFeatureFile \
#     -I prod/AFonly.gnomad.genomes.v3.1.2.sites.chr14.vcf.bgz \
#     --verbosity WARNING

Using GATK jar /opt/datasci_apps/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar
Running:
18:03:47.371 WARN  IntelInflater - Zero Bytes Written : 0
[27 October, 2022 6:03:47 PM SGT] org.broadinstitute.hellbender.tools.IndexFeatureFile done. Elapsed time: 0.43 minutes.
Runtime.totalMemory()=2529165312
Tool returned:
/hyperion/databank/booth/jmcarter/sub.gnomAD/prod/AFonly.gnomad.genomes.v3.1.2.sites.chr14.vcf.bgz.tbi


In [2]:
#gatk SelectVariants \
#    -V prod/AFonly.gnomad.genomes.v3.1.2.sites.chr14.vcf.bgz \
#    -select-type SNP -restrict-alleles-to BIALLELIC \
#    -select "AF > 0.01" \
#    -O prod/BAonly.gnomad.genomes.v3.1.2.sites.chr14.vcf.gz \
#    --lenient --verbosity WARNING

Using GATK jar /opt/datasci_apps/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar
Running:
18:04:40.920 WARN  IntelInflater - Zero Bytes Written : 0
[27 October, 2022 6:04:40 PM SGT] org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants done. Elapsed time: 0.51 minutes.
Runtime.totalMemory()=2612002816
