# Downloading Reference Genome and Whole Genome SRA (Sequence Read Archive) Data for Downstream Use in FreeBayes

## 1. Download reference genome

[Papio anubis](https://www.ncbi.nlm.nih.gov/genome/394?genome_assembly_id=324755) genome information on NCBI

[Additional downloads for Papio anubis](ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/264/685/GCF_000264685.3_Panu_3.0/) - e.g. genome annotations, etc.

In [15]:
%%bash
# save ftp download link as a variable
refpapio="ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/264/685/GCF_000264685.3_Panu_3.0/GCF_000264685.3_Panu_3.0_genomic.fna.gz"

# make directory for storing reference file
mkdir -p /moto/eaton/projects/macaques/refpapio

# download file to dir
curl -Lk $refpapio -o /moto/eaton/projects/macaques/refpapio/refpapio.fna.gz

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0  0  869M    0  425k    0     0   118k      0  2:05:04  0:00:03  2:05:01  118k  0  869M    0 1365k    0     0   305k      0  0:48:37  0:00:04  0:48:33  304k  0  869M    0 2522k    0     0   460k      0  0:32:12  0:00:05  0:32:07  523k  0  869M    0 3796k    0     0   585k      0  0:25:21  0:00:06  0:25:15  768k  0  869M    0 4763k    0     0   637k      0  0:23:16  0:00:07  0:23:09  954k  0  869M    0 5896k    0     0   695k      0  0:21:18  0:00:08  0:21:10 1119k  0  869M    0 6766k    0     0   710k      0  0:20

In [1]:
ls /moto/eaton/projects/macaques/refpapio

[0m[38;5;9mrefpapio.fna.gz[0m


## 2. SRA File Download Using [sratools](https://github.com/ncbi/sra-tools) (`conda install -c bioconda sra-tools`)

Open the csv of runs to download. NaNs in SRR are because the data are either not available on NCBI or because the genome data is spread across multiple runs:

In [None]:
##!conda install pandas

In [2]:
import pandas as pd
import os

In [3]:
df = pd.read_csv("./data/SRA-table.csv")
df[["Species", "Group", "SRR", "BioSample", "Sample", "Study", "PRJ"]]

Unnamed: 0,Species,Group,SRR,BioSample,Sample,Study,PRJ
0,Macaca mulatta northern,mulatta,SRR4454026,SAMN05883679,SRS1762015,SRP092140,PRJNA345528
1,Macaca mulatta southern low altitude,mulatta,SRR4454020,SAMN05883709,SRS1762009,SRP092140,PRJNA345529
2,Macaca mulatta southern high altitude,mulatta,SRR4453966,SAMN05883736,SRS1761955,SRP092140,PRJNA345530
3,Macaca mulatta Indian,mulatta,SRR5628058,SAMN07168901,SRS2238957,SRP049547,PRJNA251548
4,Macaca fascicularis northern,fascicularis,,SAMN00116341,SRS117874,SRP045755,PRJNA51411
5,Macaca fascicularis southern,fascicularis,,SAMD00006158,DRS000787,DRP000438,PRJDB2038
6,Macaca fuscata,mulatta,DRR002233,SAMD00011919,DRS001583,DRP000620,PRJDB2459
7,Macaca thibethana,sinica,SRR1024051,SAMN02390221,SRS498543,SRP032525,PRJNA226187
8,Macaca assamensis,sinica,SRR2981114,SAMN04316321,SRS1196892,SRP067118,PRJNA305009
9,Macaca arctoides,fascicularis,SRR2981139,SAMN04316319,SRS1196879,SRP067118,PRJNA305009


In [30]:
mkdir /moto/eaton/projects/macaques/SRA

In [None]:
##Download individuals with single SRRs. The ones with NaN will need more attention so we will do them separately.
for i in df["SRR"]:
    if type(i) is str:
        cmd='wget -P /moto/eaton/projects/macaques/SRA/ ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/'+i[0:3]+'/'+i[0:6]+'/'+i+'/'+i+'.sra'
        os.system(cmd)

In [6]:
os.listdir("/moto/eaton/projects/macaques/SRA")

['SRR4454020.sra',
 'SRR2981114.sra',
 'SRR4453966.sra',
 'SRR5947292.sra',
 'SRR4454026.sra',
 'SRR5628058.sra',
 'SRR2981139.sra',
 'SRR5947294.sra',
 'SRR5947293.sra',
 'SRR1024051.sra',
 'DRR002233.sra']

In [8]:
##special folders for the SRPs with more than one run (fascicularis northern and southern)
!mkdir /moto/eaton/projects/macaques/SRA/fasno
!mkdir /moto/eaton/projects/macaques/SRA/fasso

In [9]:
!wget -O /moto/eaton/projects/macaques/SRA/fasno/SRS117874.csv 'http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?save=efetch&db=sra&rettype=runinfo&term=SRS117874'

--2019-01-30 19:21:27--  http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?save=efetch&db=sra&rettype=runinfo&term=SRS117874
Resolving trace.ncbi.nlm.nih.gov (trace.ncbi.nlm.nih.gov)... 130.14.29.113, 2607:f220:41e:4290::113
Connecting to trace.ncbi.nlm.nih.gov (trace.ncbi.nlm.nih.gov)|130.14.29.113|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?save=efetch&db=sra&rettype=runinfo&term=SRS117874 [following]
--2019-01-30 19:21:27--  https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?save=efetch&db=sra&rettype=runinfo&term=SRS117874
Connecting to trace.ncbi.nlm.nih.gov (trace.ncbi.nlm.nih.gov)|130.14.29.113|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/octet-stream]
Saving to: ‘/moto/eaton/projects/macaques/SRA/fasno/SRS117874.csv’

    [  <=>                                  ] 69,706       245KB/s   in 0.3s   

2019-01-30 19:21:27 (245 KB/s) - ‘/

In [10]:
!wget -O /moto/eaton/projects/macaques/SRA/fasso/DRS000787.csv 'http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?save=efetch&db=sra&rettype=runinfo&term=DRS000787'

--2019-01-30 19:21:37--  http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?save=efetch&db=sra&rettype=runinfo&term=DRS000787
Resolving trace.ncbi.nlm.nih.gov (trace.ncbi.nlm.nih.gov)... 130.14.29.113, 2607:f220:41e:4290::113
Connecting to trace.ncbi.nlm.nih.gov (trace.ncbi.nlm.nih.gov)|130.14.29.113|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?save=efetch&db=sra&rettype=runinfo&term=DRS000787 [following]
--2019-01-30 19:21:37--  https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?save=efetch&db=sra&rettype=runinfo&term=DRS000787
Connecting to trace.ncbi.nlm.nih.gov (trace.ncbi.nlm.nih.gov)|130.14.29.113|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/octet-stream]
Saving to: ‘/moto/eaton/projects/macaques/SRA/fasso/DRS000787.csv’

    [ <=>                                   ] 3,558       --.-K/s   in 0.01s   

2019-01-30 19:21:37 (347 KB/s) - ‘/

In [12]:
df1 = pd.read_csv("/moto/eaton/projects/macaques/SRA/fasno/SRS117874.csv")
df1[0,3,7,16]

Unnamed: 0,Run,ReleaseDate,LoadDate,spots,bases,spots_with_mates,avgLength,size_MB,AssemblyName,download_path,...,Affection_Status,Analyte_Type,Histological_Type,Body_Site,CenterName,Submission,dbgap_study_accession,Consent,RunHash,ReadHash
0,SRR069635,2011-10-14 06:23:44,2014-05-28 08:49:15,7043321,619812248,7043321,88,286,,https://sra-download.ncbi.nlm.nih.gov/traces/s...,...,,,,,BGI,SRA023855,,public,60D8544B4A6ED0CF191794E5649D02C9,4797619F7942C163B4CADE0C534D82CE
1,SRR069636,2011-10-14 06:23:44,2014-05-28 08:49:38,7500713,660062744,7500713,88,312,,https://sra-download.ncbi.nlm.nih.gov/traces/s...,...,,,,,BGI,SRA023855,,public,2AE28E669412DDA12B91DE333A01117B,4167A7A269D6DFA8E17E62D34E8129E7
2,SRR069637,2011-10-14 06:23:44,2014-05-28 08:49:51,7530557,662689016,7530557,88,321,,https://sra-download.ncbi.nlm.nih.gov/traces/s...,...,,,,,BGI,SRA023855,,public,4CDEF55C6D8D7CECB69C9BF41340E814,B16391E509937ADE17B788478392EC1B
3,SRR069638,2011-10-14 06:23:44,2014-05-28 08:48:13,1372576,120786688,1372576,88,51,,https://sra-download.ncbi.nlm.nih.gov/traces/s...,...,,,,,BGI,SRA023855,,public,7975D1FF84E8CA0163A4D222E42896EA,8E2382A431AF23DA94DEF11D9BEA39DB
4,SRR069639,2011-10-14 06:23:44,2014-05-28 08:49:09,6607678,581475664,6607678,88,220,,https://sra-download.ncbi.nlm.nih.gov/traces/s...,...,,,,,BGI,SRA023855,,public,9DF27FBA2FC54F15D18CCE7D95F236D1,701DE2C5BE26AB87ED135E0CA38C998D
5,SRR069640,2011-10-14 06:23:44,2014-05-28 08:50:30,11923072,1049230336,11923072,88,471,,https://sra-download.ncbi.nlm.nih.gov/traces/s...,...,,,,,BGI,SRA023855,,public,71181A9690AC494A9BBE5E8A007CAEB4,2DA54F0CDA25BC3BCC1F6809C86FE45B
6,SRR069641,2011-10-14 06:23:44,2014-05-28 08:48:57,4427753,389642264,4427753,88,144,,https://sra-download.ncbi.nlm.nih.gov/traces/s...,...,,,,,BGI,SRA023855,,public,C90D618E383FF4AC1360FA3D662FBDEA,ACBE637037C263C188BFDAF408829402
7,SRR069642,2011-10-14 06:23:44,2014-05-28 08:50:01,9966047,877012136,9966047,88,374,,https://sra-download.ncbi.nlm.nih.gov/traces/s...,...,,,,,BGI,SRA023855,,public,F64CF8638E4BA616B4AA442D20A3C5F2,2DEE57217F72AAF8C18B14E0A6C88730
8,SRR069643,2011-10-14 06:23:44,2014-05-28 08:51:25,13129805,1155422840,13129805,88,609,,https://sra-download.ncbi.nlm.nih.gov/traces/s...,...,,,,,BGI,SRA023855,,public,A4A194B42AAA064A1927D0ADD6ADB8B3,C2D4465538B4D6706CDE0334849D35F5
9,SRR069644,2011-10-14 06:23:44,2014-05-28 08:52:07,13057773,1149084024,13057773,88,898,,https://sra-download.ncbi.nlm.nih.gov/traces/s...,...,,,,,BGI,SRA023855,,public,D4970984B96BC4C1BA7409A139FC7584,5007D679BC087607AA20F94BD3E09A9A


In [13]:
df2 = pd.read_csv("/moto/eaton/projects/macaques/SRA/fasso/DRS000787.csv")
df2

Unnamed: 0,Run,ReleaseDate,LoadDate,spots,bases,spots_with_mates,avgLength,size_MB,AssemblyName,download_path,...,Affection_Status,Analyte_Type,Histological_Type,Body_Site,CenterName,Submission,dbgap_study_accession,Consent,RunHash,ReadHash
0,DRR001228,2012-06-27 00:36:55,2014-05-31 20:17:39,499666247,24983312350,0,50,20110,,https://sra-download.ncbi.nlm.nih.gov/traces/d...,...,,,,,NIBIO,DRA000430,,public,3241B651C9F9267AD5B108D4A5A10330,3B901E233A5ACE2CD7ACA120FD532AAA
1,DRR001229,2012-06-27 00:36:55,2014-05-31 20:11:41,507397307,25369865350,0,50,20508,,https://sra-download.ncbi.nlm.nih.gov/traces/d...,...,,,,,NIBIO,DRA000430,,public,628D6F98D22328BC5590B28C250699E7,ACAEE0C742664F6C48C2737DE269043D
2,DRR001232,2012-06-27 00:36:55,2014-05-31 18:58:49,88731153,4436557650,0,50,3222,,https://sra-download.ncbi.nlm.nih.gov/traces/d...,...,,,,,NIBIO,DRA000430,,public,874652C2F578A348B19E071E615AA6FD,C68643185828B9DB561794F4474190C7
3,DRR001233,2012-06-27 00:36:55,2014-05-31 19:25:36,254084368,12704218400,254084368,50,9303,,https://sra-download.ncbi.nlm.nih.gov/traces/d...,...,,,,,NIG,DRA000430,,public,4614B9F28CE0C8A79AF320B8C35B86F6,6FDBA43445FD2FAC6343E1021DB852E0
4,DRR001230,2012-06-27 00:36:55,2014-05-31 20:09:15,490614702,24530735100,0,50,19595,,https://sra-download.ncbi.nlm.nih.gov/traces/d...,...,,,,,NIBIO,DRA000430,,public,B0A80BAB04441F06203CA7F199F0B33D,E58225887AE71E485B0DFEF56F67461C
5,DRR001231,2012-06-27 00:36:55,2014-05-31 18:59:15,89063351,4453167550,0,50,3246,,https://sra-download.ncbi.nlm.nih.gov/traces/d...,...,,,,,NIBIO,DRA000430,,public,92499A44587878351F64C6B650C57630,A91F59AE6972DAC1F6D9DFE672427886
6,DRR001227,2012-06-26 03:45:10,2014-05-31 20:14:25,476818081,23840904050,0,50,19575,,https://sra-download.ncbi.nlm.nih.gov/traces/d...,...,,,,,NIBIO,DRA000430,,public,D7C3974A1A453D6F4F341DF1DEBBDF8B,3D0B2B9316B3960023959338D754410D


In [16]:
for i in df1["Run"]:
    cmd='wget -P /moto/eaton/projects/macaques/SRA/fasno/ ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/'+i[0:3]+'/'+i[0:6]+'/'+i+'/'+i+'.sra'
    os.system(cmd)

In [18]:
os.listdir("/moto/eaton/projects/macaques/SRA/fasno/")

['SRR069709.sra',
 'SRR069753.sra',
 'SRR069710.sra',
 'SRR069694.sra',
 'SRR069760.sra',
 'SRR069733.sra',
 'SRR069658.sra',
 'SRR069765.sra',
 'SRR069696.sra',
 'SRR069773.sra',
 'SRR069747.sra',
 'SRR069670.sra',
 'SRR069698.sra',
 'SRR069666.sra',
 'SRR069758.sra',
 'SRR069678.sra',
 'SRR069726.sra',
 'SRR069748.sra',
 'SRR069781.sra',
 'SRR069778.sra',
 'SRR069647.sra',
 'SRR069645.sra',
 'SRR069671.sra',
 'SRR069727.sra',
 'SRR069662.sra',
 'SRR069661.sra',
 'SRR069766.sra',
 'SRR069728.sra',
 'SRR069652.sra',
 'SRR069648.sra',
 'SRR069636.sra',
 'SRR069641.sra',
 'SRR069749.sra',
 'SRR069653.sra',
 'SRR069716.sra',
 'SRR069722.sra',
 'SRR069657.sra',
 'SRR069723.sra',
 'SRR069638.sra',
 'SRR069751.sra',
 'SRR069718.sra',
 'SRR069699.sra',
 'SRR069717.sra',
 'SRR069738.sra',
 'SRR069763.sra',
 'SRR069687.sra',
 'SRR069681.sra',
 'SRR069692.sra',
 'SRR069770.sra',
 'SRR069664.sra',
 'SRR069701.sra',
 'SRR069745.sra',
 'SRR069688.sra',
 'SRR069695.sra',
 'SRR069679.sra',
 'SRR06968

In [17]:
for i in df2["Run"]:
    cmd='wget -P /moto/eaton/projects/macaques/SRA/fasso ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/'+i[0:3]+'/'+i[0:6]+'/'+i+'/'+i+'.sra'
    os.system(cmd)

In [19]:
os.listdir("/moto/eaton/projects/macaques/SRA/fasso/")

['DRR001233.sra',
 'DRR001231.sra',
 'DRS000787.csv',
 'DRR001228.sra',
 'DRR001227.sra',
 'DRR001229.sra',
 'DRR001230.sra',
 'DRR001232.sra']

In [6]:
##!conda install pigz
##!conda install -c bioconda parallel-fastq-dump

In [None]:
fasterq-dump SRR0000018 -t /dev/shm -e 10

In [20]:
!mkdir -p /moto/eaton/projects/macaques/fastqdump

In [4]:
##fastq-dump of all the data found in single SRR runs:
for i in df["SRR"]:
    if type(i) is str:
        cmd='parallel-fastq-dump --sra-id /moto/eaton/projects/macaques/SRA/'+i+'.sra --threads 12 \
            --tmpdir /moto/eaton/projects/macaques/tmp --outdir /moto/eaton/projects/macaques/fastqdump/ \
            --split-3 --gzip --skip-technical --readids --dumpbase --clip'
        os.system(cmd)

In [5]:
!mkdir -p /moto/eaton/projects/macaques/fastqdump/fasno
!mkdir -p /moto/eaton/projects/macaques/fastqdump/fasso

In [None]:
##fastq-dump of all the data found in the northern fascicularis (multiple runs but all paired end):
for i in df1["Run"]:
    cmd='parallel-fastq-dump --sra-id /moto/eaton/projects/macaques/SRA/fasno/'+i+'.sra --threads 12 \
        --tmpdir /moto/eaton/projects/macaques/tmp --outdir /moto/eaton/projects/macaques/fastqdump/fasno/ \
        --split-3 --gzip --skip-technical --readids --dumpbase --clip'
    os.system(cmd)

In [None]:
##fastq-dump of all the data found in the southern fascicularis (multiple runs, paired and single end):
for i in df2["Run"]:
    cmd='parallel-fastq-dump --sra-id /moto/eaton/projects/macaques/SRA/fasso/'+i+'.sra --threads 12 \
        --tmpdir /moto/eaton/projects/macaques/tmp --outdir /moto/eaton/projects/macaques/fastqdump/fasso/ \
        --split-3 --gzip --skip-technical --readids --dumpbase --clip'
    os.system(cmd)

In [9]:
os.system('mv /moto/eaton/projects/macaques/mulattanorthern/SRR4454026_1.fastq.gz /moto/eaton/projects/macaques/mulattanorthern/mulattanorthernSRR4454026_1.fastq.gz')

0

In [10]:
##renaming to something more human-readable
os.system('mv /moto/eaton/projects/macaques/mulattanorthern/SRR4454026_2.fastq.gz /moto/eaton/projects/macaques/mulattanorthern/mulattanorthernSRR4454026_2.fastq.gz')

0

2) Southern low altitude _Macaca mulatta_

In [1]:
!mkdir -p /moto/eaton/projects/macaques/mulattasouthernlow

In [2]:
%time !parallel-fastq-dump --sra-id SRR4454020 --threads 24 --tmpdir /moto/eaton/projects/macaques/tmp --outdir /moto/eaton/projects/macaques/mulattasouthernlow --split-3 --gzip --skip-technical --readids --dumpbase --clip

SRR ids: ['SRR4454020']
extra args: ['--split-3', '--gzip', '--skip-technical', '--readids', '--dumpbase', '--clip']
tempdir: /moto/eaton/projects/macaques/tmp/pfd_m_2uzo1c
SRR4454020 spots: 123339229
blocks: [[1, 5139134], [5139135, 10278268], [10278269, 15417402], [15417403, 20556536], [20556537, 25695670], [25695671, 30834804], [30834805, 35973938], [35973939, 41113072], [41113073, 46252206], [46252207, 51391340], [51391341, 56530474], [56530475, 61669608], [61669609, 66808742], [66808743, 71947876], [71947877, 77087010], [77087011, 82226144], [82226145, 87365278], [87365279, 92504412], [92504413, 97643546], [97643547, 102782680], [102782681, 107921814], [107921815, 113060948], [113060949, 118200082], [118200083, 123339229]]
Read 5139134 spots for SRR4454020
Written 5139134 spots for SRR4454020
Read 5139134 spots for SRR4454020
Written 5139134 spots for SRR4454020
Read 5139134 spots for SRR4454020
Written 5139134 spots for SRR4454020
Read 5139134 spots for SRR4454020
Written 5139134

In [5]:
os.system('mv /moto/eaton/projects/macaques/mulattasouthernlow/SRR4454020_1.fastq.gz /moto/eaton/projects/macaques/mulattasouthernlow/mulattasouthernlowSRR4454020_1.fastq.gz')

0

In [6]:
os.system('mv /moto/eaton/projects/macaques/mulattasouthernlow/SRR4454020_2.fastq.gz /moto/eaton/projects/macaques/mulattasouthernlow/mulattasouthernlowSRR4454020_2.fastq.gz')

0

3) Southern high altitude _Macaca mulatta_

In [7]:
!mkdir -p /moto/eaton/projects/macaques/mulattasouthernhigh

In [11]:
%time !parallel-fastq-dump --sra-id SRR4453966 --threads 24 --tmpdir /moto/eaton/projects/macaques/tmp --outdir /moto/eaton/projects/macaques/mulattasouthernhigh  --split-3 --gzip --skip-technical --readids --dumpbase --clip

SRR ids: ['SRR4453966']
extra args: ['--split-3', '--gzip', '--skip-technical', '--readids', '--dumpbase', '--clip']
tempdir: /moto/eaton/projects/macaques/tmp/pfd_086gzotw
SRR4453966 spots: 138547703
blocks: [[1, 5772820], [5772821, 11545640], [11545641, 17318460], [17318461, 23091280], [23091281, 28864100], [28864101, 34636920], [34636921, 40409740], [40409741, 46182560], [46182561, 51955380], [51955381, 57728200], [57728201, 63501020], [63501021, 69273840], [69273841, 75046660], [75046661, 80819480], [80819481, 86592300], [86592301, 92365120], [92365121, 98137940], [98137941, 103910760], [103910761, 109683580], [109683581, 115456400], [115456401, 121229220], [121229221, 127002040], [127002041, 132774860], [132774861, 138547703]]
Read 5772820 spots for SRR4453966
Written 5772820 spots for SRR4453966
Read 5772820 spots for SRR4453966
Written 5772820 spots for SRR4453966
Read 5772820 spots for SRR4453966
Written 5772820 spots for SRR4453966
Read 5772820 spots for SRR4453966
Written 577

In [19]:
os.system('mv /moto/eaton/projects/macaques/mulattasouthernhigh/SRR4453966_1.fastq.gz /moto/eaton/projects/macaques/mulattasouthernhigh/mulattasouthernhighSRR4453966_1.fastq.gz')

0

In [20]:
os.system('mv /moto/eaton/projects/macaques/mulattasouthernhigh/SRR4453966_2.fastq.gz /moto/eaton/projects/macaques/mulattasouthernhigh/mulattasouthernhighSRR4453966_2.fastq.gz')

0

3) Indian _Macaca mulatta_ (lab specimen)

In [14]:
!mkdir -p /moto/eaton/projects/macaques/mulattaindian

In [None]:
ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR119/SRR1192353/SRR1192353.sra

In [3]:
## file too large, need to prefetch first;
%time !wget -P /moto/eaton/projects/macaques/mulattaindian/ ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR562/SRR5628058/SRR5628058.sra

--2019-01-29 14:00:04--  ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR562/SRR5628058/SRR5628058.sra
           => ‘/moto/eaton/projects/macaques/mulattaindian/SRR5628058.sra’
Resolving ftp-trace.ncbi.nih.gov (ftp-trace.ncbi.nih.gov)... 130.14.250.10, 2607:f220:41e:250::10
Connecting to ftp-trace.ncbi.nih.gov (ftp-trace.ncbi.nih.gov)|130.14.250.10|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /sra/sra-instant/reads/ByRun/sra/SRR/SRR562/SRR5628058 ... done.
==> SIZE SRR5628058.sra ... 37901613875
==> PASV ... done.    ==> RETR SRR5628058.sra ... done.
Length: 37901613875 (35G) (unauthoritative)


2019-01-29 14:23:37 (25.6 MB/s) - ‘/moto/eaton/projects/macaques/mulattaindian/SRR5628058.sra’ saved [37901613875]

CPU times: user 18.2 s, sys: 3.5 s, total: 21.7 s
Wall time: 23min 33s


In [1]:
%time !parallel-fastq-dump --sra-id SRR5628058 --threads 24 --tmpdir /moto/eaton/projects/macaques/tmp --outdir /moto/eaton/projects/macaques/mulattaindian --split-3 --gzip --skip-technical --readids --dumpbase --clip

SRR ids: ['SRR5628058']
extra args: ['--split-3', '--gzip', '--skip-technical', '--readids', '--dumpbase', '--clip']
tempdir: /moto/eaton/projects/macaques/tmp/pfd_bbdqi8xx
SRR5628058 spots: 473742092
blocks: [[1, 19739253], [19739254, 39478506], [39478507, 59217759], [59217760, 78957012], [78957013, 98696265], [98696266, 118435518], [118435519, 138174771], [138174772, 157914024], [157914025, 177653277], [177653278, 197392530], [197392531, 217131783], [217131784, 236871036], [236871037, 256610289], [256610290, 276349542], [276349543, 296088795], [296088796, 315828048], [315828049, 335567301], [335567302, 355306554], [355306555, 375045807], [375045808, 394785060], [394785061, 414524313], [414524314, 434263566], [434263567, 454002819], [454002820, 473742092]]
Read 19739253 spots for SRR5628058
Written 19739253 spots for SRR5628058
Read 19739253 spots for SRR5628058
Written 19739253 spots for SRR5628058
Read 19739253 spots for SRR5628058
Written 19739253 spots for SRR5628058
Read 19739253

In [None]:
os.system('mv /moto/eaton/projects/macaques/mulattaindian/SRR5628058_1.fastq.gz /moto/eaton/projects/macaques/mulattaindian/mulattaindianSRR5628058_1.fastq.gz')

In [None]:
os.system('mv /moto/eaton/projects/macaques/mulattaindian/SRR5628058_2.fastq.gz /moto/eaton/projects/macaques/mulattaindian/mulattaindianSRR5628058_2.fastq.gz')