# Introduction

This notebook is used to create counts of reads around known amplification breakpoints of CRT, MDR1, PM2_PM3 and GCH1. Most of these breakpoints were found by manual inspection of bam files in IGV in Pf6, but we also included known breakpoints from the literature based on the following papers:

Nair 2007 https://www.ncbi.nlm.nih.gov/pubmed/17124182 MDR1 - see Table 1 for breakpoints

Nair 2008 http://dx.plos.org/10.1371/journal.pgen.1000243 GCH1 - see supplementary table S3 for breakpoints

And finally, we also included the following new breakpoints based on Chiyun's manual inspections.

New tandem duplication breakpoints (identified using faceaway reads):
```
Pf3D7_05_v3	946346	946375	978174	978199			PfMDR1_dup_28	PfMDR1_dup_28	MDR1	1	

Pf3D7_12_v3	945388	945416	977926	977949			PfGCH1_dup_10	PfGCH1_dup_10	GCH1	1	
```

New dup/trp-inv/dup (DTD) breakpoints (identified using same direction reads):
```
Pf3D7_05_v3	948197	948217	948453	948471	964505	964540	980012	980040					32kb dup-trpinv-dup MDR1	PfMDR1_DTD_2	MDR1	1	

Pf3D7_12_v3	971147	971178	971233	971251	976399	976426	976456	976503					5kb dup-trpinv-dup GCH1	PfGCH1_DTD_3	GCH1	1	
```

Note that although in this notebook we determine evidence for DTDs using same direction reads, this information isn't used in the final CNV calls which only uses faceaway evidence, i.e. only evidence for tandem duplications.

This notebook is based on https://gitlab.com/malariagen/gsp/pf7.0/-/blob/69cd1a03ec30e79cd82e0a4ca1b79b8eb644d29c/work/89_call_cnvs_using_pf6_pipeline/20201005_pf7_read_pair_dups.ipynb which in turn was based on https://github.com/malariagen/Pf-6.0-public-release/blob/335ba7e60d7d006e2d817e82cfdd53e9f18e0f91/notebooks/rp7/20170405_pf_60_read_pair_dups.ipynb.



In [1]:
!date
!pwd
!hostname

Mon Feb 24 09:46:06 AM GMT 2025
<INSERT PATH HERE>/malariagen-pf8-cnv-calling/05_faceaway_calls
node-14-10


In [2]:
import collections
import os
import numpy as np
import pandas as pd
import petl as etl
import pysam

etl.config.display_index_header = True

# Setup

In [None]:
todays_date = '20250128'

lustre_dir = "<INSERT PATH HERE>/malariagen-pf8-cnv-calling/05_faceaway_data_generation"

SAMPLE_META_PATH = "../../assets_pf8/Pf_8_samples_20241212.txt"
BAM_PATHS_PATH   = "../../assets_pf8/01_paths_to_bams.tsv"

pf8_qc_pass_bams_fn           = f'pf8_qc_pass_bams_{todays_date}.tsv'
tandem_dup_breakpoints_fn     = f"../breakpoint_searching_notebooks/tandem_dup_breakpoints_{todays_date}.txt"
dup_trpinv_dup_breakpoints_fn = f"../breakpoint_searching_notebooks/dup_trpinv_dup_breakpoints_{todays_date}.txt"
breakpoint_read_counts_fn     = f"../breakpoint_read_counts_{todays_date}.tsv"

py_script_fn  = f'create_breakpoint_read_counts_file_{todays_date}.py'
job_script_fn = f'run_create_breakpoint_read_counts_file_{todays_date}.sh' # desired job script name

output_dir = "all_samples"


In [4]:
!mkdir -p {lustre_dir}/all_samples/create_breakpoint_read_counts_file_logs

---

# Read Pf8 bams file

In [None]:
df_pf8_internal_release_bams = pd.read_csv(BAM_PATHS_PATH, sep='\t', index_col=0, names = ["Sample", "PATH_TO_BAM"])
print(df_pf8_internal_release_bams.shape)
df_pf8_internal_release_bams.head(3)

In [6]:
df_pf8_postqc = pd.read_csv(SAMPLE_META_PATH, sep='\t', index_col=0, usecols=['Sample', 'QC pass'])
print(df_pf8_postqc.shape)
df_pf8_postqc.head(3)

(33325, 1)


Unnamed: 0_level_0,QC pass
Sample,Unnamed: 1_level_1
FP0008-C,True
FP0009-C,True
FP0010-CW,True


In [None]:
df_pf8_qcpass_bams = df_pf8_postqc.join(df_pf8_internal_release_bams)

print(df_pf8_qcpass_bams.shape)
print()

print(df_pf8_qcpass_bams["QC pass"].value_counts(dropna = False))
print()

df_pf8_qcpass_bams = df_pf8_qcpass_bams.loc[df_pf8_qcpass_bams["QC pass"] == True][["PATH_TO_BAM"]]
df_pf8_qcpass_bams.to_csv(pf8_qc_pass_bams_fn, sep = "\t", header = False)
print(df_pf8_qcpass_bams.shape)
print("Generated", pf8_qc_pass_bams_fn)

df_pf8_qcpass_bams.head(3)

---
---

# Functions

In [None]:
# This function calculates the number of reads in a specific region of the genome, and also the number of these that
# are considered "faceaway" reads. The region considered is the `start_region` here, and this would typically be a few
# hundred base pairs to the right of the left hand breakpoint. A read is considered a "faceaway" read only if it maps in
# a reverse oritentation, if its mate maps in a forward orientation, if its mate maps in the `end_region` - typically
# a few hundred base pairs to the left of the right hand breakpoint and if it has a mapping quality greater than zero.
# Note that reads around the right hand breakpoint are not considered, other than if they are mates of read near the left
# hand breakpoint
# This function could probably be improved, for example also looking at reads around right hand breakpoint

def calc_proportion_faceaway(
    bam_fn,
    chrom,
    start_region, # 2-element list containing start and end positions (1-based) of region to right of first breakpoint in which to search for faceaway reads
    end_region,   # 2-element list containing start and end positions (1-based) of region to left of second breakpoint in which to search for faceaway reads
):
    """Returns a tuple of len 3 containing: proportion_faceaway, num_faceaway, num_reads"""
    samfile = pysam.AlignmentFile(bam_fn, "rb")
    num_faceaway = 0
    num_reads = 0
    iter = samfile.fetch(chrom, start_region[0], start_region[1])
    
    for x in iter:
        num_reads += 1
        if (
            x.is_paired and
            x.is_reverse and
            (not x.mate_is_reverse) and
            x.mpos >= end_region[0] and
            x.mpos <= end_region[1] and
            x.mapping_quality > 0 and # Important, as don't want ambiguously mapped reads
            x.pos < x.mpos            # Important, e.g. in PfGCH1_promoter_dup_1, where start_region very close to end region
        ):
            num_faceaway += 1
    
    if num_reads > 0:
        proportion_faceaway = num_faceaway/num_reads
    else:
        proportion_faceaway = float(np.nan)
    
    return(proportion_faceaway, num_faceaway, num_reads)



# Example usage:
calc_proportion_faceaway(
    bam_fn       = df_pf8_qcpass_bams.loc["PH0906-C", "PATH_TO_BAM"],
    chrom        = "Pf3D7_14_v3",
    start_region = [282969, 283269],
    end_region   = [362690, 362990]
)

# Below, it reads 6 of 163 reads (3.7%) were faceaway

(0.03680981595092025, 6, 163)

In [None]:
# This looks for pairs of reads mapping in the same direction (indicative of an inversion)
# This is used when looking for dup-trpinv-dup (DTD) events
# The logic is similar to that of the calc_proportion_faceaway function above
def calc_proportion_same_direction(
    bam_fn,
    chrom,
    start_region, # 2-element list containing start and end positions (1-based) of region to left of first breakpoint in which to search for faceaway reads
    end_region,   # 2-element list containing start and end positions (1-based) of region to left of second breakpoint in which to search for faceaway reads
    direction     # can be "forward" or "reverse"
):
    samfile = pysam.AlignmentFile(bam_fn, "rb")
    num_same_direction = 0
    num_reads = 0
    iter = samfile.fetch(chrom, start_region[0], start_region[1])
    
    for x in iter:
        num_reads += 1
        if direction=='forward':
            if (x.is_paired and (not x.is_reverse) and (not x.mate_is_reverse) and x.mpos >= end_region[0] and x.mpos <= end_region[1]):
                num_same_direction += 1
        if direction=='reverse':
            if (x.is_paired and x.is_reverse and x.mate_is_reverse and x.mpos >= end_region[0] and x.mpos <= end_region[1]):
                num_same_direction += 1
    
    if num_reads > 0:
        proportion_same_direction = num_same_direction/num_reads
    else:
        proportion_same_direction = float(np.nan)
    
    return(proportion_same_direction, num_same_direction, num_reads)

calc_proportion_same_direction(
    bam_fn       = '<INSERT PATH HERE>/2b/PH0267-C/PH0267-C.bam', # PH0267-C
    chrom        = 'Pf3D7_05_v3',
    start_region = [928240, 928840],
    end_region   = [938811, 939411],
    direction    = 'reverse'
)

# Below, it reads 33 of 502 reads (6.6%) were same orientation

(0.06573705179282868, 33, 502)

In [None]:
%%time

def genotype_duplications(
    bam_fn,
    tandem_dup_breakpoints_fn     = tandem_dup_breakpoints_fn,
    dup_trpinv_dup_breakpoints_fn = dup_trpinv_dup_breakpoints_fn,
    region_size                   = 600,
    region_offset                 = 100,
):
    df_tandem_dup_breakpoints = pd.read_csv(tandem_dup_breakpoints_fn, sep='\t')
    df_dup_trpinv_dup_breakpoints = pd.read_csv(dup_trpinv_dup_breakpoints_fn, sep='\t')
    
    genotype_results = collections.OrderedDict()
    genotype_results['faceaway'] = collections.OrderedDict()
    genotype_results['dup_trpinv_dup'] = collections.OrderedDict()
    
    for _, rec in df_tandem_dup_breakpoints.iterrows():
        genotype_results['faceaway'][rec.iloc[7]] = calc_proportion_faceaway(
            bam_fn,
            chrom = rec.iloc[0],
            start_region = [
                rec.iloc[2] - region_offset,
                rec.iloc[2] - region_offset + region_size
            ],
            end_region = [
                rec.iloc[3] - region_size,
                rec.iloc[3],
            ]
        )
    
    for _, rec in df_dup_trpinv_dup_breakpoints.iterrows():
        genotype_results['dup_trpinv_dup']["%s first" % rec.iloc[13]] = calc_proportion_same_direction(
            bam_fn,
            chrom = rec.iloc[0],
            start_region = [
                rec.iloc[1] - region_offset,
                rec.iloc[1] - region_offset + region_size
            ],
            end_region = [
                rec.iloc[3] - region_offset,
                rec.iloc[3] - region_offset + region_size,
            ],
            direction = 'reverse'
        )
        genotype_results['dup_trpinv_dup']["%s second" % rec.iloc[13]] = calc_proportion_same_direction(
            bam_fn,
            chrom = rec.iloc[0],
            start_region = [
                rec.iloc[6] - region_size + region_offset,
                rec.iloc[6] + region_offset
            ],
            end_region = [
                rec.iloc[8] - region_size + region_offset,
                rec.iloc[8] + region_offset,
            ],
            direction = 'forward'
        )
    
    return(genotype_results)

# Example usage:
genotype_duplications(
    bam_fn                        = '<INSERT PATH HERE>/74/PH0906-C/PH0906-C.bam',
    tandem_dup_breakpoints_fn     = tandem_dup_breakpoints_fn,
    dup_trpinv_dup_breakpoints_fn = dup_trpinv_dup_breakpoints_fn,
    region_size                   = 600,
    region_offset                 = 100,
)

# Below, you will see a line that says `('Nair 2.2kb GCH1', (0.012048192771084338, 2, 166))`, which says that for this "Nair 2.2kb GCH1" type
# amplification, there were 2 out of 166 reads that showed evidence for it, which was 1.2%

CPU times: user 301 ms, sys: 57.9 ms, total: 358 ms
Wall time: 366 ms


OrderedDict([('faceaway',
              OrderedDict([('42kb around MDR1', (0.0, 0, 78)),
                           ('17.7kb around MDR1', (0.0, 0, 248)),
                           ('15kb around MDR1', (0.0, 0, 126)),
                           ('19kb around MDR1', (0.0, 0, 124)),
                           ('11kb around MDR1', (0.0, 0, 124)),
                           ('22kb around MDR1', (0.0, 0, 131)),
                           ('16kb around MDR1', (0.0, 0, 124)),
                           ('199kb around MDR1', (0.0, 0, 88)),
                           ('94kb around MDR1', (0.0, 0, 143)),
                           ('169kb around MDR1', (0.0, 0, 223)),
                           ('82kb around MDR1', (0.0, 0, 101)),
                           ('96kb around MDR1', (0.0, 0, 163)),
                           ('18kb around MDR1', (0.0, 0, 162)),
                           ('24kb around MDR1', (0.0, 0, 162)),
                           ('PfMDR1_dup_15', (0.0, 0, 149)),
               

In [None]:
# Example usage: This should be a triplication at SCO1-MED14
genotype_duplications('<INSERT PATH HERE>/5e/PF0149-C/PF0149-C.bam')

# We see "('SCO1-MED14', (0.028985507246376812, 2, 69))", which tells us that there were 2 reads out of 69 (2.9%) for this specific
# breakpoint, which showed faceaway reads

OrderedDict([('faceaway',
              OrderedDict([('42kb around MDR1', (0.0, 0, 24)),
                           ('17.7kb around MDR1', (0.0, 0, 141)),
                           ('15kb around MDR1', (0.0, 0, 49)),
                           ('19kb around MDR1', (0.0, 0, 40)),
                           ('11kb around MDR1', (0.0, 0, 40)),
                           ('22kb around MDR1', (0.0, 0, 42)),
                           ('16kb around MDR1', (0.0, 0, 40)),
                           ('199kb around MDR1', (0.0, 0, 23)),
                           ('94kb around MDR1', (0.0, 0, 72)),
                           ('169kb around MDR1', (0.0, 0, 120)),
                           ('82kb around MDR1', (0.0, 0, 28)),
                           ('96kb around MDR1', (0.0, 0, 55)),
                           ('18kb around MDR1', (0.0, 0, 68)),
                           ('24kb around MDR1', (0.0, 0, 68)),
                           ('PfMDR1_dup_15', (0.0, 0, 74)),
                          

In [None]:
# This should be PfMDR1_dup_28
genotype_duplications('<INSERT PATH HERE>/c6/QC0129-C/QC0129-C.bam')

# ('PfMDR1_dup_28', (0.04804270462633452, 27, 562)) tells us that 27 of 562 reads (4.8%) were faceaway

OrderedDict([('faceaway',
              OrderedDict([('42kb around MDR1', (0.0, 0, 220)),
                           ('17.7kb around MDR1', (0.0, 0, 737)),
                           ('15kb around MDR1', (0.0, 0, 704)),
                           ('19kb around MDR1', (0.0, 0, 576)),
                           ('11kb around MDR1', (0.0, 0, 576)),
                           ('22kb around MDR1', (0.0, 0, 698)),
                           ('16kb around MDR1', (0.0, 0, 576)),
                           ('199kb around MDR1', (0.0, 0, 290)),
                           ('94kb around MDR1', (0.0, 0, 326)),
                           ('169kb around MDR1', (0.0, 0, 366)),
                           ('82kb around MDR1', (0.0, 0, 241)),
                           ('96kb around MDR1', (0.0, 0, 324)),
                           ('18kb around MDR1', (0.0, 0, 655)),
                           ('24kb around MDR1', (0.0, 0, 655)),
                           ('PfMDR1_dup_15', (0.0, 0, 562)),
             

In [10]:
def create_breakpoint_read_counts_file(
    sample,
    bam_fn,
    tandem_dup_breakpoints_fn     = tandem_dup_breakpoints_fn,
    dup_trpinv_dup_breakpoints_fn = dup_trpinv_dup_breakpoints_fn,
    region_size                   = 600,
    region_offset                 = 100,
    output_dir                    = output_dir,
    overwrite                     = False,
):
    
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
        
    output_fn = f"{output_dir}/{sample}.tsv"
    
    if os.path.exists(output_fn) and not overwrite:
        print(f"File {output_fn} already exists, exiting")

    else:
        fo = open(output_fn, 'w')
    
        duplication_genotypes = genotype_duplications(bam_fn)

        print("%s" % sample, end="", file=fo)
        for dup in duplication_genotypes['faceaway']:
            results = duplication_genotypes['faceaway'][dup]
            print("\t%d\t%d" % (results[1], results[2]), end="", file=fo)
        for dup in duplication_genotypes['dup_trpinv_dup']:
            results = duplication_genotypes['dup_trpinv_dup'][dup]
            print("\t%d\t%d" % (results[1], results[2]), end="", file=fo)
        print("\n", end="", file=fo)

        fo.close()

    pass

---

#### Now with functions defined, let's test them on a few samples

In [None]:
create_breakpoint_read_counts_file(
    sample = 'PH0906-C',
    bam_fn = '<INSERT PATH HERE>/74/PH0906-C/PH0906-C.bam',
)

In [None]:
create_breakpoint_read_counts_file(
    sample = 'PF0149-C',
    bam_fn = '<INSERT PATH HERE>/5e/PF0149-C/PF0149-C.bam',
)

---

# These functions were then added to `create_breakpoint_read_counts_file_20250128.py`

---

## Now, let's start running the script on all samples

In [55]:
job_script_fn

'<INSERT PATH HERE>/malariagen-pf8-cnv-calling/05_faceaway_calls/run_create_breakpoint_read_counts_file_20250128.sh'

In [147]:
# Create the job script file
with open(job_script_fn, "w") as fo:
    fo.write("""
MANIFEST_FN=$1
JOB=$LSB_JOBINDEX

IN=$(sed "$JOB q;d" $MANIFEST_FN)
read -a LINE <<< "$IN"
SAMPLE=${LINE[0]}
BAM_FN=${LINE[1]}

echo $SAMPLE
echo $BAM_FN

""" + 

f"""

python3 {py_script_fn} \
    --sample $SAMPLE \
    --bam_fn $BAM_FN \
    --output_dir {output_dir} \
    --tandem_dup_breakpoints_fn {tandem_dup_breakpoints_fn} \
    --dup_trpinv_dup_breakpoints_fn {dup_trpinv_dup_breakpoints_fn}
""")


In [148]:
# Submit array job to LSF queue
wc_output = !wc -l {pf8_qc_pass_bams_fn}
num_samples = int(wc_output[0].split(' ')[0])
print(f"Number of samples = {num_samples}")
print()

out_fn = f"{lustre_dir}/all_samples/create_breakpoint_read_counts_file_logs/output_%J-%I.out"
err_fn = f"{lustre_dir}/all_samples/create_breakpoint_read_counts_file_logs/output_%J-%I.err"

bsub_command_str = f"bsub -J cbrcj_[1-{num_samples}] -q normal -o {out_fn} -e {err_fn} -M 4000 -R'span[hosts=1] select[type==X86_64 && model==Intel_Platinum && mem>4000] rusage[mem=4000]' bash {job_script_fn} {pf8_qc_pass_bams_fn}"

print()
print(bsub_command_str)
print()

!{bsub_command_str}

Number of samples = 24409


bsub -J cbrcj_[1-24409] -q normal -o <INSERT PATH HERE>/malariagen-pf8-cnv-calling/05_faceaway_calls/all_samples/create_breakpoint_read_counts_file_logs/output_%J-%I.out -e <INSERT PATH HERE>/malariagen-pf8-cnv-calling/05_faceaway_calls/all_samples/create_breakpoint_read_counts_file_logs/output_%J-%I.err -M 4000 -R'span[hosts=1] select[type==X86_64 && model==Intel_Platinum && mem>4000] rusage[mem=4000]' bash run_create_breakpoint_read_counts_file_20250128.sh pf8_qc_pass_bams_20250128.tsv

Job <655385> is submitted to queue <normal>.


In [11]:
# How many jobs currently running? Note this was done soon after jobs started
!bjobs | grep 655385 | grep RUN | wc -l

0


In [12]:
!bjobs | grep 655385 | grep PEND | wc -l

0


In [13]:
!bjobs | grep 655385 | grep UNKWN | wc -l

0


In [15]:
# How many outputs?
!ls -1 all_samples/*.tsv | wc -l

24409


---

## Ensure all samples have run by rerunning the cells above until all files have been generated, then concatenate all files into a single file

In [20]:
%%time
!cat all_samples/*.tsv > breakpoint_read_counts_noheader_{todays_date}.tsv

CPU times: user 223 ms, sys: 45.6 ms, total: 269 ms
Wall time: 25.8 s


In [24]:
!wc -l breakpoint_read_counts_noheader_{todays_date}.tsv

print()

!head -n 2 breakpoint_read_counts_noheader_{todays_date}.tsv

24409 <INSERT PATH HERE>/malariagen-pf8-cnv-calling/05_faceaway_calls/breakpoint_read_counts_noheader_20250128.tsv

FP0008-C	0	101	0	360	0	168	0	210	0	210	0	159	0	210	0	73	0	213	0	309	0	108	0	196	0	151	0	151	0	123	0	140	0	123	0	210	0	210	0	151	0	123	0	145	0	118	0	42	0	101	0	210	0	185	0	123	0	141	0	139	0	127	0	209	0	209	0	113	0	209	0	166	0	147	0	209	0	209	0	186	0	191	0	110	0	319	0	319	0	282	0	77	0	238	0	191	0	370	0	161	0	291	0	370	0	185	0	77	0	238	0	269	0	214	0	81	0	143	0	233	0	114	0	263	0	91	0	127	0	38	0	256	0	237	0	140	0	238	0	61	0	172	0	152	0	206	0	227	0	61
FP0009-C	0	311	0	1334	0	531	0	572	0	572	0	514	0	572	0	236	0	837	0	1118	0	458	0	621	0	575	0	575	0	505	0	541	0	505	0	572	0	572	0	575	0	505	0	570	0	497	0	141	0	311	0	572	0	551	0	505	0	555	0	530	0	502	0	570	0	570	0	488	0	570	0	537	0	578	0	570	0	570	0	763	0	606	0	458	0	1027	0	1027	0	989	0	246	0	836	0	593	0	1240	0	668	0	1250	0	1240	0	604	0	246	0	836	0	905	0	765	0	235	0	452	0	813	0	357	0	803	0	314	0	433	0	183	0	989	0	1016	0	531	0	1010	0	

In [None]:
# Saving a file which stores the column names
with open(f"breakpoint_read_counts_header_{todays_date}.tsv", 'w') as fo:
    header_line = "Sample"
    
    for breakpoint_name in etl.fromtsv(tandem_dup_breakpoints_fn).values('breakpoint_id'):
        header_line += "\t%s faceaways\t%s read pairs" % (breakpoint_name, breakpoint_name)
        
    for breakpoint_name in etl.fromtsv(dup_trpinv_dup_breakpoints_fn).values('breakpoint_id'):
        header_line += "\t%s same directions first\t%s read pairs first\t%s same directions second\t%s read pairs second" % (breakpoint_name, breakpoint_name, breakpoint_name, breakpoint_name)

    print(header_line, file=fo)

In [None]:
# Combine columns files and file containing data to produce final file
!cat breakpoint_read_counts_header_{todays_date}.tsv breakpoint_read_counts_noheader_{todays_date}.tsv > {breakpoint_read_counts_fn}

In [None]:
# Sanity check
!wc -l {breakpoint_read_counts_fn}

print()

!head -n 2 {breakpoint_read_counts_fn}

24410 breakpoint_read_counts_20250128.tsv

Sample	PfMDR1_dup_1 faceaways	PfMDR1_dup_1 read pairs	PfMDR1_dup_2 faceaways	PfMDR1_dup_2 read pairs	PfMDR1_dup_3 faceaways	PfMDR1_dup_3 read pairs	PfMDR1_dup_4 faceaways	PfMDR1_dup_4 read pairs	PfMDR1_dup_5 faceaways	PfMDR1_dup_5 read pairs	PfMDR1_dup_6 faceaways	PfMDR1_dup_6 read pairs	PfMDR1_dup_7 faceaways	PfMDR1_dup_7 read pairs	PfMDR1_dup_8 faceaways	PfMDR1_dup_8 read pairs	PfMDR1_dup_9 faceaways	PfMDR1_dup_9 read pairs	PfMDR1_dup_10 faceaways	PfMDR1_dup_10 read pairs	PfMDR1_dup_11 faceaways	PfMDR1_dup_11 read pairs	PfMDR1_dup_12 faceaways	PfMDR1_dup_12 read pairs	PfMDR1_dup_13 faceaways	PfMDR1_dup_13 read pairs	PfMDR1_dup_14 faceaways	PfMDR1_dup_14 read pairs	PfMDR1_dup_15 faceaways	PfMDR1_dup_15 read pairs	PfMDR1_dup_16 faceaways	PfMDR1_dup_16 read pairs	PfMDR1_dup_17 faceaways	PfMDR1_dup_17 read pairs	PfMDR1_dup_18 faceaways	PfMDR1_dup_18 read pairs	PfMDR1_dup_19 faceaways	PfMDR1_dup_19 read pairs	PfMDR1_dup_20 faceaways	PfMDR1_dup_20

# Conclusions

Here I have created what should be the final breakpoint read counts for Pf8