#  Whole Genome Shotgun metagenomics: de novo Assembly

We have two fastqc files to process:

*Vir1_100k* that contains $100,000$ paired end reads from the same saliva sample but after purification of viral particles. So this is a virome.


In [27]:
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
FILE_ID = "ECTV"
FASTQ_STR = "@HWUSI-EAS1752R"
MIN_LEN = "70"

## Preprocessing and quality check


In [32]:
%%bash -s "$FILE_ID" "$FASTQ_STR"
ssh microbioinf@192.168.56.101 env FILE_ID=$1 FASTQ_STR=$2 2>/dev/null /bin/bash <<'EOT'
export PATH=$PATH:/home/microbioinf/miniconda3/bin
echo "#### Check files FILE_ID=${FILE_ID}, FASTQ_STR=$FASTQ_STR"
cd Documentos/Tema_3
head ${FILE_ID}*fastq
grep -c $FASTQ_STR ${FILE_ID}*fastq

echo "#### Compute quality"
mkdir ${FILE_ID}_Quality
fastqc ${FILE_ID}_R1.fastq -o ${FILE_ID}_Quality/
fastqc ${FILE_ID}_R2.fastq -o ${FILE_ID}_Quality/

echo "#### Replace ' ' by '_' in header"
head -n 1 ${FILE_ID}*fastq
cat ${FILE_ID}_R1.fastq | sed 's/ /_/g' > ${FILE_ID}_R1_.fastq
cat ${FILE_ID}_R2.fastq | sed 's/ /_/g' > ${FILE_ID}_R2_.fastq
head -n 1 ${FILE_ID}*fastq
EOT

#### Check files FILE_ID=ECTV, FASTQ_STR=@HWUSI-EAS1752R
==> ECTV_R1_.fastq <==
@HWUSI-EAS1752R:23:FC62KHPAAXX:6:3:3542:1008_1:N:0:GCCAAT
NGGATCTCCGATTTCTTTACGATATGGATCTATACCGGACGAATTAATAAACAAACATCCAAAAAAATATGGAATT
+HWUSI-EAS1752R:23:FC62KHPAAXX:6:3:3542:1008_1:N:0:GCCAAT
#*(.(,+,,+@@@@@@222@@@@@@@@:@@@@@@@:::<<71757<<:<:22@222:@@@8518500000:::8:5
@HWUSI-EAS1752R:23:FC62KHPAAXX:6:3:3893:1011_1:N:0:GCCAAT
NGAGTAATACGGTTCAAAATCATAAATGTGATAGTTTCCAGACTGGTATCCGAGTTTTTCTTGGATGATGGATACT
+HWUSI-EAS1752R:23:FC62KHPAAXX:6:3:3893:1011_1:N:0:GCCAAT
#,*))33322@C@@@@C@@C@C@C@@@@@@@@CC@@@C@C@@@@@C@C@@@@@@@C@@CC222@@@@@@C@@@@@@
@HWUSI-EAS1752R:23:FC62KHPAAXX:6:3:5526:1010_1:N:0:GCCAAT
NTGGAGTCGTAAAAAAGTTTTATCTCTTTCTCTCTTTGATGGTCTCATAAAAAAGTTTTACAAAAATATTTTTATT

==> ECTV_R1.fastq <==
@HWUSI-EAS1752R:23:FC62KHPAAXX:6:3:3542:1008 1:N:0:GCCAAT
NGGATCTCCGATTTCTTTACGATATGGATCTATACCGGACGAATTAATAAACAAACATCCAAAAAAATATGGAATT
+HWUSI-EAS1752R:23:FC62KHPAAXX:6:3:3542:1008 1:N:0:GCCAAT
#*(.(,+,,+@@@@@@222@@@@@@@@:@

## Trimming and decontaminating
Trimming poor quality ends and short sequences (**Trimmomatic**) and removal of reads aligning to the human and phiX174 genomes (***bowtie2**). The later one is a contaminant used as spike by Illumina kits to control quality of the sequencing process.

We are only filtering only R1 files because forward reads have usually better quality than reverse reads. 

### Process

In [33]:
%%bash -s "$FILE_ID" "$FASTQ_STR" "$MIN_LEN"
ssh microbioinf@192.168.56.101 env FILE_ID=$1 FASTQ_STR=$2 MIN_LEN=$3 2>/dev/null /bin/bash <<'EOT'
export PATH=$PATH:/home/microbioinf/miniconda3/bin
cd Documentos/Tema_3
echo "#### Trimming and decontaminating FILE_ID=${FILE_ID} MIN_LEN=${MIN_LEN}"
kneaddata -i ${FILE_ID}_R1_.fastq -i ${FILE_ID}_R2_.fastq \
-o kneaddata_out -db /home/shared/bowtiedb/GRCh38_PhiX \
--trimmomatic /home/microbioinf/miniconda3/pkgs/trimmomatic-0.38-1/share/trimmomatic-0.38-1/ \
-t 2 --trimmomatic-options "SLIDINGWINDOW:4:20 MINLEN:${MIN_LEN}" \
--bowtie2-options "--very-sensitive --dovetail" --remove-intermediate-output
EOT

#### Trimming and decontaminating FILE_ID=ECTV MIN_LEN=70
Initial number of reads ( /home/microbioinf/Documentos/Tema_3/ECTV_R1_.fastq ): 50000
Initial number of reads ( /home/microbioinf/Documentos/Tema_3/ECTV_R2_.fastq ): 50000
Running Trimmomatic ... 
Total reads after trimming ( /home/microbioinf/Documentos/Tema_3/kneaddata_out/ECTV_R1__kneaddata.trimmed.1.fastq ): 41996
Total reads after trimming ( /home/microbioinf/Documentos/Tema_3/kneaddata_out/ECTV_R1__kneaddata.trimmed.2.fastq ): 41996
Total reads after trimming ( /home/microbioinf/Documentos/Tema_3/kneaddata_out/ECTV_R1__kneaddata.trimmed.single.1.fastq ): 4107
Total reads after trimming ( /home/microbioinf/Documentos/Tema_3/kneaddata_out/ECTV_R1__kneaddata.trimmed.single.2.fastq ): 1791
Decontaminating ...
Running bowtie2 ... 
Total reads after removing those found in reference database ( /home/microbioinf/Documentos/Tema_3/kneaddata_out/ECTV_R1__kneaddata_GRCh38_PhiX_bowtie2_paired_clean_1.fastq ): 41839
Total reads after 

### Process statistics

In [38]:
%%bash -s "$FILE_ID" "$FASTQ_STR" "$MIN_LEN"
ssh microbioinf@192.168.56.101 env FILE_ID=$1 FASTQ_STR=$2 MIN_LEN=$3 2>/dev/null /bin/bash <<'EOT'
export PATH=$PATH:/home/microbioinf/miniconda3/bin
cd Documentos/Tema_3
cd kneaddata_out/
l
less *log
kneaddata_read_count_table --input ./ --output kneaddata_read_counts${FILE_ID}.txt 
grep -c $FASTQ_STR *fastq

04/26/2019 08:44:27 AM - kneaddata.knead_data - INFO: Running kneaddata v0.6.1
04/26/2019 08:44:27 AM - kneaddata.knead_data - INFO: Output files will be written to: /home/microbioinf/Documentos/Tema_3/kneaddata_out
04/26/2019 08:44:27 AM - kneaddata.knead_data - DEBUG: Running with the following arguments: 
verbose = False
bmtagger_path = None
minscore = 50
bowtie2_path = /home/microbioinf/miniconda3/bin/bowtie2
maxperiod = 500
no_discordant = False
serial = False
fastqc_start = False
bmtagger = False
cat_final_output = False
log_level = DEBUG
log = /home/microbioinf/Documentos/Tema_3/kneaddata_out/ECTV_R1__kneaddata.log
max_memory = 500m
remove_intermediate_output = True
fastqc_path = None
output_dir = /home/microbioinf/Documentos/Tema_3/kneaddata_out
trf_path = None
remove_temp_output = True
reference_db = /home/shared/bowtiedb/GRCh38_PhiX
input = /home/microbioinf/Documentos/Tema_3/ECTV_R1_.fastq /home/microbioinf/Documentos/Tema_3/ECTV_R2_.fastq
pi = 10
reorder = False
pm = 80
tri

In [123]:
data = """
cat Documentos/Tema_3/kneaddata_out/kneaddata_read_counts%s.txt
EOT
""" % FILE_ID
output = !ssh microbioinf@192.168.56.101 /bin/bash <<'EOT' {data}

data = []
# To list of lists
for row in output:
    data.append(row.split('\t'))
# To dataframe
df_knead = pd.DataFrame(data[1:], columns=data[0])
df_knead.style.hide_index().set_properties(**{'text-align': 'right', 'font-family' : 'courier', 'color' : 'darkgreen', "font-size" : "11pt"}).\
set_properties(**{'text-align': 'right', 'font-family' : 'courier', 'color' : 'darkblue', "font-size" : "12pt"}, subset=['Sample'])
df_knead.transpose()

Unnamed: 0,0
Sample,ECTV_R1__kneaddata
raw pair1,50000
raw pair2,50000
trimmed pair1,41996
trimmed pair2,41996
trimmed orphan1,4107
trimmed orphan2,1791
decontaminated GRCh38_PhiX pair1,41839
decontaminated GRCh38_PhiX pair2,41839
decontaminated GRCh38_PhiX orphan1,12


### Check number of reads

With grep we can identify the non-contaminated high-quality files

In [66]:
%%bash -s "$FILE_ID" "$FASTQ_STR" "$MIN_LEN"
ssh microbioinf@192.168.56.101 env FILE_ID=$1 FASTQ_STR=$2 MIN_LEN=$3 2>/dev/null /bin/bash <<'EOT'
export PATH=$PATH:/home/microbioinf/miniconda3/bin
cd Documentos/Tema_3
cd kneaddata_out/
grep -c $FASTQ_STR *fastq

ECTV_R1__kneaddata_GRCh38_PhiX_bowtie2_paired_contam_1.fastq:115
ECTV_R1__kneaddata_GRCh38_PhiX_bowtie2_paired_contam_2.fastq:115
ECTV_R1__kneaddata_GRCh38_PhiX_bowtie2_unmatched_1_contam.fastq:30
ECTV_R1__kneaddata_GRCh38_PhiX_bowtie2_unmatched_2_contam.fastq:89
ECTV_R1__kneaddata_paired_1.fastq:41839
ECTV_R1__kneaddata_paired_2.fastq:41839
ECTV_R1__kneaddata_unmatched_1.fastq:12
ECTV_R1__kneaddata_unmatched_2.fastq:5851


### Check quality

In [68]:
%%bash -s "$FILE_ID" "$FASTQ_STR" "$MIN_LEN"
ssh microbioinf@192.168.56.101 env FILE_ID=$1 FASTQ_STR=$2 MIN_LEN=$3 2>/dev/null /bin/bash <<'EOT'
export PATH=$PATH:/home/microbioinf/miniconda3/bin
cd Documentos/Tema_3
cd kneaddata_out/
echo "#### Compute quality"
mkdir ${FILE_ID}_HighQuality
fastqc ${FILE_ID}_R1__kneaddata_paired_1.fastq -o ${FILE_ID}_HighQuality/
fastqc ${FILE_ID}_R1__kneaddata_paired_2.fastq -o ${FILE_ID}_HighQuality/

#### Compute quality
Analysis complete for ECTV_R1__kneaddata_paired_1.fastq
Analysis complete for ECTV_R1__kneaddata_paired_2.fastq


## Assembly

 We are going to use a Refseq database of viral proteins (around 100Mb) from ncbi (ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/), and you have to download it in two separated files that can be joined into one with cat.
 

### Process (spades)

In this step we run command **spades** with the paired high-quality and free of known contaminants reads.

#### K_MER = 35

In [127]:
K_MER = 35

In [124]:
%%bash -s "$FILE_ID" "$FASTQ_STR" "$MIN_LEN" "35"
ssh microbioinf@192.168.56.101 env FILE_ID=$1 FASTQ_STR=$2 MIN_LEN=$3 K_MER=$4 2>/dev/null /bin/bash <<'EOT'
export PATH=$PATH:/home/microbioinf/miniconda3/bin
cd Documentos/Tema_3
cd kneaddata_out/
echo "#### Compute assembly K_MER=${K_MER}"
spades.py -1 ${FILE_ID}_R1__kneaddata_paired_1.fastq -2 ${FILE_ID}_R1__kneaddata_paired_2.fastq \
--sc -k ${K_MER} -o ${FILE_ID}-Assembly${K_MER}
EOT

#### Compute assembly K_MER=35
Command line: /home/microbioinf/miniconda3/bin/spades.py	-1	/home/microbioinf/Documentos/Tema_3/kneaddata_out/ECTV_R1__kneaddata_paired_1.fastq	-2	/home/microbioinf/Documentos/Tema_3/kneaddata_out/ECTV_R1__kneaddata_paired_2.fastq	--sc	-k	35	-o	/home/microbioinf/Documentos/Tema_3/kneaddata_out/ECTV-Assembly35	

System information:
  SPAdes version: 3.13.0
  Python version: 2.7.15
  OS: Linux-4.15.0-47-generic-x86_64-with-Ubuntu-18.04-bionic

Output dir: /home/microbioinf/Documentos/Tema_3/kneaddata_out/ECTV-Assembly35
Mode: read error correction and assembling
Debug mode is turned OFF

Dataset parameters:
  Single-cell mode
  Reads:
    Library number: 1, library type: paired-end
      orientation: fr
      left reads: ['/home/microbioinf/Documentos/Tema_3/kneaddata_out/ECTV_R1__kneaddata_paired_1.fastq']
      right reads: ['/home/microbioinf/Documentos/Tema_3/kneaddata_out/ECTV_R1__kneaddata_paired_2.fastq']
      interlaced reads: not specified
      s

In [130]:
%%bash -s "$FILE_ID" "$FASTQ_STR" "$MIN_LEN" "$K_MER"
ssh microbioinf@192.168.56.101 env FILE_ID=$1 FASTQ_STR=$2 MIN_LEN=$3 K_MER=$4 2>/dev/null /bin/bash <<'EOT'
export PATH=$PATH:/home/microbioinf/miniconda3/bin
cd Documentos/Tema_3
cd kneaddata_out/
echo "#### Check output K_MER=${K_MER}"
cd ${FILE_ID}-Assembly${K_MER}
rep -c ">" *fasta
grep ">" -m 8 contigs.fasta 
grep ">" -m 8 scaffolds.fasta 
grep "NN" *fasta

#### Check output K_MER=35
>NODE_1_length_61627_cov_16.136024
>NODE_2_length_37406_cov_14.491584
>NODE_3_length_35119_cov_16.619741
>NODE_4_length_23130_cov_14.144966
>NODE_5_length_14681_cov_16.895125
>NODE_6_length_12066_cov_15.782063
>NODE_7_length_9054_cov_26.310456
>NODE_8_length_1060_cov_59.155122
>NODE_1_length_61627_cov_16.136024
>NODE_2_length_60827_cov_14.296881
>NODE_3_length_47271_cov_16.363811
>NODE_4_length_14681_cov_16.895125
>NODE_5_length_9054_cov_26.310456
>NODE_6_length_1060_cov_59.155122
>NODE_7_length_945_cov_8.277551
>NODE_8_length_294_cov_3.567568
scaffolds.fasta:ACCTCAAAAAATTTGTTAATAATGGTATCTTNNNNNNNNNNTATGAATACAAAAGAATTG
scaffolds.fasta:TTTACTATCTGNNNNNNNNNNAAGGTTGTAACATTTTATTACCGTGTGGGATATTAATTT
scaffolds.fasta:AGATAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
scaffolds.fasta:NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTGGTATAGTACAGG


#### List of different K_MER

In [206]:
K_MERS_LIST = ["25", "35", "45"]
K_MERS =  ",".join(K_MERS_LIST)
print(K_MERS)

25,35,45


In [207]:
%%bash -s "$FILE_ID" "$FASTQ_STR" "$MIN_LEN" "$K_MERS"
ssh microbioinf@192.168.56.101 env FILE_ID=$1 FASTQ_STR=$2 MIN_LEN=$3 K_MERS=$4 2>/dev/null /bin/bash <<'EOT'
export PATH=$PATH:/home/microbioinf/miniconda3/bin
cd Documentos/Tema_3
cd kneaddata_out/
echo "#### Compute assembly with no specified K_MER"
spades.py -1 ${FILE_ID}_R1__kneaddata_paired_1.fastq -2 ${FILE_ID}_R1__kneaddata_paired_2.fastq \
--sc -o ${FILE_ID}-Assembly${K_MER}
IFS=","
for K_MER in ${K_MERS}
do
echo "#### Compute assembly K_MER=${K_MER}"
spades.py -1 ${FILE_ID}_R1__kneaddata_paired_1.fastq -2 ${FILE_ID}_R1__kneaddata_paired_2.fastq \
--sc -k ${K_MER} -o ${FILE_ID}-Assembly${K_MER}
done
EOT

#### Compute assembly with no specified M_MER
Command line: /home/microbioinf/miniconda3/bin/spades.py	-1	/home/microbioinf/Documentos/Tema_3/kneaddata_out/ECTV_R1__kneaddata_paired_1.fastq	-2	/home/microbioinf/Documentos/Tema_3/kneaddata_out/ECTV_R1__kneaddata_paired_2.fastq	--sc	-o	/home/microbioinf/Documentos/Tema_3/kneaddata_out/ECTV-Assembly	

System information:
  SPAdes version: 3.13.0
  Python version: 2.7.15
  OS: Linux-4.15.0-47-generic-x86_64-with-Ubuntu-18.04-bionic

Output dir: /home/microbioinf/Documentos/Tema_3/kneaddata_out/ECTV-Assembly
Mode: read error correction and assembling
Debug mode is turned OFF

Dataset parameters:
  Single-cell mode
  Reads:
    Library number: 1, library type: paired-end
      orientation: fr
      left reads: ['/home/microbioinf/Documentos/Tema_3/kneaddata_out/ECTV_R1__kneaddata_paired_1.fastq']
      right reads: ['/home/microbioinf/Documentos/Tema_3/kneaddata_out/ECTV_R1__kneaddata_paired_2.fastq']
      interlaced reads: not specified
  

In [218]:
K_MERS_LIST = ["","25", "35", "45"]
K_MERS =  ",".join(K_MERS_LIST)
print(K_MERS)

,25,35,45


In [219]:
%%bash -s "$FILE_ID" "$FASTQ_STR" "$MIN_LEN" "$K_MERS"
ssh microbioinf@192.168.56.101 env FILE_ID=$1 FASTQ_STR=$2 MIN_LEN=$3 K_MERS=$4 2>/dev/null /bin/bash <<'EOT'
export PATH=$PATH:/home/microbioinf/miniconda3/bin
cd Documentos/Tema_3
cd kneaddata_out/
IFS=","
for K_MER in ${K_MERS}
do
    echo
    echo "#### Check output K_MER=${K_MER}"
    cd ${FILE_ID}-Assembly${K_MER}
    rep -c ">" *fasta
    grep ">" -m 8 contigs.fasta 
    grep ">" -m 8 scaffolds.fasta 
    grep "NN" *fasta
done


#### Check output K_MER=
>NODE_1_length_92446_cov_8.851382
>NODE_2_length_61624_cov_7.032646
>NODE_3_length_29587_cov_8.324665
>NODE_4_length_13405_cov_7.334082
>NODE_5_length_421_cov_1.073770
>NODE_6_length_261_cov_1.072816
>NODE_7_length_228_cov_1.942197
>NODE_8_length_227_cov_0.610465
>NODE_1_length_92877_cov_8.814516
>NODE_2_length_75358_cov_7.054686
>NODE_3_length_29587_cov_8.324665
>NODE_4_length_261_cov_1.072816
>NODE_5_length_227_cov_0.610465
>NODE_6_length_90_cov_11.685714
>NODE_7_length_79_cov_49.625000
scaffolds.fasta:TNNNNNNNNNNTACCGCCATTATGGTGGCTAGTGATGTTTGTAAAAAAAATTTGGATTTA
scaffolds.fasta:ATAGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
scaffolds.fasta:NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAGTGGATCCGACGTTTCAACTATTG
scaffolds.fasta:GTTACTGTTACTACTAAAGACTANNNNNNNNNNCTTTTAGAATGACGTCTTGTAATATCA

#### Check output K_MER=25
>NODE_1_length_92446_cov_8.851382
>NODE_2_length_61624_cov_7.032646
>NODE_3_length_29587_cov_8.324665
>NODE_4_length_13405_cov_7.334082
>NODE_5_

## Comparison of assemblies (*quast*)

In [220]:
%%bash -s "$FILE_ID" "$FASTQ_STR" "$MIN_LEN" "$K_MER"
ssh microbioinf@192.168.56.101 env FILE_ID=$1 FASTQ_STR=$2 MIN_LEN=$3 K_MER=$4 2>/dev/null /bin/bash <<'EOT'
export PATH=$PATH:/home/microbioinf/miniconda3/bin
cd Documentos/Tema_3
cd kneaddata_out/
echo "#### Compare assemblies FILE_ID=${FILE_ID}"
for assembly in ${FILE_ID}-Assembly*; 
    do echo "Processing $assembly file..."; 
    cp ${assembly}/contigs.fasta contigs-${assembly}.fasta
    cp ${assembly}/scaffolds.fasta scaffolds-${assembly}.fasta
done
quast.py contigs* scaffolds* -R ../ECTV-MoscowGenome.fasta
EOT

#### Compare assemblies FILE_ID=ECTV
Processing ECTV-Assembly file...
Processing ECTV-Assembly25 file...
Processing ECTV-Assembly35 file...
Processing ECTV-Assembly45 file...
/home/microbioinf/miniconda3/lib/python2.7/site-packages/quast-5.0.2-py2.7.egg-info/scripts/quast.py contigs-ECTV-Assembly25.fasta contigs-ECTV-Assembly35.fasta contigs-ECTV-Assembly45.fasta contigs-ECTV-Assembly.fasta scaffolds-ECTV-Assembly25.fasta scaffolds-ECTV-Assembly35.fasta scaffolds-ECTV-Assembly45.fasta scaffolds-ECTV-Assembly.fasta -R ../ECTV-MoscowGenome.fasta

Version: 5.0.2

System information:
  OS: Linux-4.15.0-47-generic-x86_64-with-debian-buster-sid (linux_64)
  Python version: 2.7.11
  CPUs number: 3

Started: 2019-04-26 13:12:24

Logging to /home/microbioinf/Documentos/Tema_3/kneaddata_out/quast_results/results_2019_04_26_13_12_24/quast.log
NOTICE: Maximum number of threads is set to 1 (use --threads option to set it manually)

CWD: /home/microbioinf/Documentos/Tema_3/kneaddata_out
Main paramet

In [221]:
data = """
cat Documentos/Tema_3/kneaddata_out/quast*/latest/report.tsv
EOT
"""
output = !ssh microbioinf@192.168.56.101 /bin/bash <<'EOT' {data}
data = []
# To list of lists
for row in output:
    data.append(row.split('\t'))
# To dataframe
df_quast = pd.DataFrame(data[1:], columns=data[0])
df_quast.style.hide_index().set_properties(**{'text-align': 'left', 'font-family' : 'courier', 'color' : 'darkgreen', "font-size" : "10pt"}).\
set_properties(**{'text-align': 'left', 'font-family' : 'courier', 'color' : 'darkblue', "font-size" : "10pt"}, \
               subset=['Assembly'])

Assembly,contigs_ECTV_Assembly25,contigs_ECTV_Assembly35,contigs_ECTV_Assembly45,contigs_ECTV_Assembly,scaffolds_ECTV_Assembly25,scaffolds_ECTV_Assembly35,scaffolds_ECTV_Assembly45,scaffolds_ECTV_Assembly
# contigs (>= 0 bp),38,22,34,10,33,19,27,7
# contigs (>= 1000 bp),11,8,13,4,9,6,10,3
# contigs (>= 5000 bp),8,7,9,4,6,5,8,3
# contigs (>= 10000 bp),6,6,6,4,5,4,5,3
# contigs (>= 25000 bp),2,3,3,3,3,3,3,3
# contigs (>= 50000 bp),1,1,0,2,1,2,1,2
Total length (>= 0 bp),198533,196511,196309,198368,198845,196617,196673,198479
Total length (>= 1000 bp),193592,194143,190456,197062,194196,194520,192259,197822
Total length (>= 5000 bp),186718,193083,182355,197062,187322,193460,187864,197822
Total length (>= 10000 bp),171220,184029,158872,197062,179237,184406,164381,197822
