# Classify reads covering close-by positions into four groups

**Part 1 of the "linked mutations" analyses.**

See the section of the paper on linked positions for details. This produces, for each genome, a mapping named
`pospair2groupcts`: a `defaultdict` mapping pairs of positions (tuples) to a list of `[0, 0, 0, 0]`, where each entry in the list indicates the number of reads of each of the four groups represented for this pair.

This section takes a while (roughly 2.5 hours for the whole notebook, as of writing). To allow for later steps to be rerun without rerunning this one, we write out each `pospair2groupcts` object to the `pospair2groupcts/` folder within the `notebooks` directory.

In [1]:
%run "Header.ipynb"
%run "LoadMutationJSONData.ipynb"

In [2]:
import time
import pickle
import pysam
import skbio
from collections import defaultdict
from itertools import combinations
from linked_mutations_utils import (
    MAX_DIST_BTWN_LINKED_POSITIONS_NONINCLUSIVE, emptyListOf4, find_mutated_positions
)

In [3]:
bf = pysam.AlignmentFile("../main-workflow/output/fully-filtered-and-sorted-aln.bam", "rb")

t1 = time.time()
for seq in SEQS:
    fasta = skbio.DNA.read("../seqs/{}.fasta".format(seq))
    # Maps tuple of (left integer pos, right integer pos) to a list of [0, 0, 0, 0].
    # (Since a pair (i, j) is equal to a pair (j, i), we just index this so that the leftmost position is the
    # first element in the tuple and the rightmost position is the second element. This seems like a more intuitive
    # way of structuring this then as a nested dict of leftpos2rightpos2groupcts.)
    # https://stackoverflow.com/a/13065439
    #
    # Each entry in the list indicates counts of types of reads connecting these two positions we've seen thus
    # far. In 0-indexed coordinates:
    #
    # 0. Reads(i, j): reads that support mutations at both positions
    # 1. Reads(i, -): reads that only support mutations at i
    # 2. Reads(-, j): reads that only support mutations at j
    # 3. Reads(-, -): reads that do not support mutations at either position
    #
    # This matches the definitions in the paper (currently that is section 3.6.2, but that number may change as
    # the paper is edited and restructured).
    
    pospair2groupcts = defaultdict(emptyListOf4)
    
    # Identify all mutated positions in this genome up front to save time.
    print(f"Identifying mutated positions in genome {seq2name[seq]}...")
    mutated_positions = find_mutated_positions(seq, seq2pos2matchct, seq2pos2mismatchct)
    print(f"Found {len(mutated_positions)} mutated positions in {seq2name[seq]}.")
    
    # This should already be implicitly sorted, I think, but the code below relies on mutated_positions being
    # in the exact same order as expected. So we may as well be paranoid.
    mutated_positions = sorted(mutated_positions)
    
    # Maps read.query_name to mutated positions to a bool of True (this read is mutated at this position, compared
    # to the reference) or False (this read is not mutated at this position, compared to the reference).
    # The absence of a mutated position from the inner dict implies that this position is not seen in this read
    # (either due to indels/skips or this read just not being aligned to cover it).
    # Updated as we go through bf.fetch(), since we want to count supplementary alignments of a given read
    # together.
    readname2mutpos2ismutated = defaultdict(dict)
    
    # Go through all aligned segments for this genome...
    # (NOTE that "read" in the below loop really means "aligned segment", since a read can have multiple
    # aligned segments in the case of supplementary alignments)
    ts1 = time.time()
    for ri, read in enumerate(bf.fetch(seq), 1):
        if ri % 1000 == 0:
            print(
                f"On read {ri} in seq {seq2name[seq]}."
                f"Time spent on this seq so far: {time.time() - ts1:.2f} sec."
            )
        ap = read.get_aligned_pairs(matches_only=True)
        
        # Iterating through the aligned pairs is expensive. Since read lengths are generally in the thousands
        # to tens of thousands of bp (which is much less than the > 1 million bp length of any bacterial genome),
        # we set things up so that we only iterate through the aligned pairs once. We maintain an integer, mpi,
        # that is a poor man's "pointer" to an index in mutated_positions.
        
        mpi = 0
        
        # Go through this read's aligned pairs. As we see each pair, compare the pair's reference position
        # (refpos) to the mpi-th mutated position (herein referred to as "mutpos").
        #
        # If refpos >  mutpos, increment mpi until refpos <= mutpos (stopping as early as possible).
        # If refpos == mutpos, we have a match! Update readname2mutpos2ismutated[mutpos] based on
        #                      comparing the read to the reference at the aligned positions.
        # If refpos <  mutpos, continue to the next pair.
        
        readname = read.query_name
        for pair in ap:
            
            refpos = pair[1]
            mutpos = mutated_positions[mpi]
            
            no_mutations_to_right_of_here = False
            
            # Increment mpi until we get to the next mutated position at or after the reference pos for this
            # aligned pair (or until we run out of mutated positions).
            while refpos > mutpos:
                mpi += 1
                if mpi < len(mutated_positions):
                    mutpos = mutated_positions[mpi]
                else:
                    no_mutations_to_right_of_here = True
                    break
            
            # I expect this should happen only for reads aligned near the right end of the genome.
            if no_mutations_to_right_of_here:
                break
            
            # If the next mutation occurs after this aligned pair, continue on to a later pair.
            if refpos < mutpos:
                continue
                
            # If we've made it here, refpos == mutpos!
            # (...unless I messed something up in how I designed this code.)
            if refpos != mutpos:
                raise ValueError("This should never happen!")
                
            # Finally, compare the read to the reference at this position, and update mutpos2ismutated.
            readpos = pair[0]
            
            # WE NEED TO CONVERT TO A STRING because slicing a skbio.DNA object returns another DNA object.
            # May or may not have spent an hour staring at the screen due to that ._.
            refval = str(fasta[refpos])
            readval = read.query_sequence[readpos]

            mutated = (readval != refval)

            # (For debugging -- this is the first "highly mutated" position in G1217, the binary CAMP gene)
            # if mutpos == 1209000:
            #    text += (f"{read.query_name} @ {refpos}. ref = {refval}, read = {readval}, mutated = {mutated}")
            
            # This means that if a given mutated position is covered by multiple alignments of a single read,
            # the _last_ alignment we see will trump previous alignments. It's arbitrary, but we could modify this
            # if desired (e.g. respecting the primary alignment if possible -- but then what to do when there are
            # 2 supplementary alignments that overlap with each other?)
            readname2mutpos2ismutated[readname][mutpos] = mutated
        
    # Now that we've seen all alignments of each read, 
    for readname in readname2mutpos2ismutated:
        mutated_positions_covered_in_read = readname2mutpos2ismutated[readname].keys()
        # Now that we've seen all mutated positions covered by this read, update pair information.
        
        for (ii, jj) in combinations(mutated_positions_covered_in_read, 2):
            
            # To make life easier, just sort the pair and save that as i and j.
            # I *think* we could sort mutated_positions_covered_in_read and then combinations() should
            # automatically generate sorted combinations, but I'm not sure if that is guaranteed -- so to
            # reduce the probability of weird bugs we can just sort things here.
            i, j = sorted([ii, jj])
            
            # See if i and j are close enough to each other. There are two ways this can happen (these aren't
            # necessarily mutually exclusive but in practice probs will be, depending on genome size and
            # MAX_DIST_BTWN_LINKED_POSITIONS_NONINCLUSIVE)
            #
            # 1. i and j are close to each other without looping around the genome
            #    (e.g. i = 15,000; j = 15,001)
            #
            # 2. i and j are close to each other when you loop around the genome
            #    (e.g. genome length = 1,000,000; i = 0; j = 999,999)
            #    This case is only allowed when seq2iscircular[seq] is True. (For edges that aren't circular --
            #    e.g. edge 6104 [CAMP], as of writing, which is a linear edge within a circular component --
            #    we don't allow this case to ever be True.)
            
            # Case 1
            close_enough_nolooping = (j - i) < MAX_DIST_BTWN_LINKED_POSITIONS_NONINCLUSIVE
            
            # Case 2
            if seq2iscircular[seq]:
                close_enough_looping = (seq2len[seq] + i - j) < MAX_DIST_BTWN_LINKED_POSITIONS_NONINCLUSIVE
            else:
                close_enough_looping = False
                
            if close_enough_nolooping or close_enough_looping:
                im = readname2mutpos2ismutated[readname][i]
                jm = readname2mutpos2ismutated[readname][j]
                if im:
                    if jm:
                        # Read supports mutations at both i and j
                        pospair2groupcts[(i, j)][0] += 1
                    else:
                        # Read supports a mutation at i but not j
                        pospair2groupcts[(i, j)][1] += 1
                else:
                    if jm:
                        # Read supports a mutation at j but not i
                        pospair2groupcts[(i, j)][2] += 1
                    else:
                        # Read doesn't support mutations at either i or j
                        pospair2groupcts[(i, j)][3] += 1

    print(f"Finished going through reads in {seq2name[seq]}.")
    
    # Write out pospair2json to a safe location, just because this is probably going to take a while
    # and I don't want to risk losing this work.
    #
    # We use pickle instead of JSON because JSON can't handle tuples as the index of pospair2json:
    # see https://stackoverflow.com/a/16439720.
    # 
    # We use the file suffix ".pickle" and "wb" based on the conventions described in
    # https://stackoverflow.com/a/40433504 (...which in turn just reference the python docs).
    
    with open(f"pospair2groupcts/{seq}_pospair2groupcts.pickle", "wb") as dumpster:
        dumpster.write(pickle.dumps(pospair2groupcts))
        
print(f"Time taken: {time.time() - t1} sec.")

Identifying mutated positions in genome CAMP...
Found 470 mutated positions in CAMP.
On read 1000 in seq CAMP.Time spent on this seq so far: 1.93 sec.
On read 2000 in seq CAMP.Time spent on this seq so far: 3.66 sec.
On read 3000 in seq CAMP.Time spent on this seq so far: 5.59 sec.
On read 4000 in seq CAMP.Time spent on this seq so far: 7.33 sec.
On read 5000 in seq CAMP.Time spent on this seq so far: 9.50 sec.
On read 6000 in seq CAMP.Time spent on this seq so far: 12.82 sec.
On read 7000 in seq CAMP.Time spent on this seq so far: 16.01 sec.
On read 8000 in seq CAMP.Time spent on this seq so far: 19.20 sec.
On read 9000 in seq CAMP.Time spent on this seq so far: 22.40 sec.
On read 10000 in seq CAMP.Time spent on this seq so far: 26.14 sec.
On read 11000 in seq CAMP.Time spent on this seq so far: 29.48 sec.
On read 12000 in seq CAMP.Time spent on this seq so far: 32.73 sec.
On read 13000 in seq CAMP.Time spent on this seq so far: 36.01 sec.
On read 14000 in seq CAMP.Time spent on this 

On read 119000 in seq CAMP.Time spent on this seq so far: 396.85 sec.
On read 120000 in seq CAMP.Time spent on this seq so far: 400.20 sec.
On read 121000 in seq CAMP.Time spent on this seq so far: 403.63 sec.
On read 122000 in seq CAMP.Time spent on this seq so far: 407.12 sec.
On read 123000 in seq CAMP.Time spent on this seq so far: 410.61 sec.
On read 124000 in seq CAMP.Time spent on this seq so far: 414.05 sec.
On read 125000 in seq CAMP.Time spent on this seq so far: 417.50 sec.
On read 126000 in seq CAMP.Time spent on this seq so far: 420.94 sec.
On read 127000 in seq CAMP.Time spent on this seq so far: 424.33 sec.
On read 128000 in seq CAMP.Time spent on this seq so far: 427.70 sec.
On read 129000 in seq CAMP.Time spent on this seq so far: 431.44 sec.
On read 130000 in seq CAMP.Time spent on this seq so far: 434.80 sec.
On read 131000 in seq CAMP.Time spent on this seq so far: 438.19 sec.
On read 132000 in seq CAMP.Time spent on this seq so far: 441.59 sec.
On read 133000 in se

On read 237000 in seq CAMP.Time spent on this seq so far: 805.79 sec.
On read 238000 in seq CAMP.Time spent on this seq so far: 809.17 sec.
On read 239000 in seq CAMP.Time spent on this seq so far: 812.52 sec.
On read 240000 in seq CAMP.Time spent on this seq so far: 815.92 sec.
On read 241000 in seq CAMP.Time spent on this seq so far: 819.45 sec.
On read 242000 in seq CAMP.Time spent on this seq so far: 822.99 sec.
On read 243000 in seq CAMP.Time spent on this seq so far: 826.54 sec.
On read 244000 in seq CAMP.Time spent on this seq so far: 830.02 sec.
On read 245000 in seq CAMP.Time spent on this seq so far: 833.56 sec.
On read 246000 in seq CAMP.Time spent on this seq so far: 837.14 sec.
On read 247000 in seq CAMP.Time spent on this seq so far: 841.15 sec.
On read 248000 in seq CAMP.Time spent on this seq so far: 844.63 sec.
On read 249000 in seq CAMP.Time spent on this seq so far: 848.05 sec.
On read 250000 in seq CAMP.Time spent on this seq so far: 851.51 sec.
On read 251000 in se

On read 354000 in seq CAMP.Time spent on this seq so far: 1219.88 sec.
On read 355000 in seq CAMP.Time spent on this seq so far: 1225.26 sec.
On read 356000 in seq CAMP.Time spent on this seq so far: 1230.78 sec.
On read 357000 in seq CAMP.Time spent on this seq so far: 1235.18 sec.
On read 358000 in seq CAMP.Time spent on this seq so far: 1238.83 sec.
On read 359000 in seq CAMP.Time spent on this seq so far: 1242.49 sec.
On read 360000 in seq CAMP.Time spent on this seq so far: 1246.05 sec.
On read 361000 in seq CAMP.Time spent on this seq so far: 1249.65 sec.
On read 362000 in seq CAMP.Time spent on this seq so far: 1253.19 sec.
On read 363000 in seq CAMP.Time spent on this seq so far: 1256.72 sec.
On read 364000 in seq CAMP.Time spent on this seq so far: 1260.23 sec.
On read 365000 in seq CAMP.Time spent on this seq so far: 1264.07 sec.
On read 366000 in seq CAMP.Time spent on this seq so far: 1267.72 sec.
On read 367000 in seq CAMP.Time spent on this seq so far: 1271.25 sec.
On rea

On read 470000 in seq CAMP.Time spent on this seq so far: 1643.35 sec.
On read 471000 in seq CAMP.Time spent on this seq so far: 1647.29 sec.
On read 472000 in seq CAMP.Time spent on this seq so far: 1650.82 sec.
On read 473000 in seq CAMP.Time spent on this seq so far: 1654.29 sec.
On read 474000 in seq CAMP.Time spent on this seq so far: 1657.79 sec.
On read 475000 in seq CAMP.Time spent on this seq so far: 1661.26 sec.
On read 476000 in seq CAMP.Time spent on this seq so far: 1664.52 sec.
On read 477000 in seq CAMP.Time spent on this seq so far: 1667.37 sec.
Finished going through reads in CAMP.
Identifying mutated positions in genome BACTERIA...
Found 24188 mutated positions in BACTERIA.
On read 1000 in seq BACTERIA.Time spent on this seq so far: 2.70 sec.
On read 2000 in seq BACTERIA.Time spent on this seq so far: 6.70 sec.
On read 3000 in seq BACTERIA.Time spent on this seq so far: 11.58 sec.
On read 4000 in seq BACTERIA.Time spent on this seq so far: 15.42 sec.
On read 5000 in s

On read 104000 in seq BACTERIA.Time spent on this seq so far: 647.47 sec.
On read 105000 in seq BACTERIA.Time spent on this seq so far: 654.39 sec.
On read 106000 in seq BACTERIA.Time spent on this seq so far: 661.51 sec.
On read 107000 in seq BACTERIA.Time spent on this seq so far: 669.23 sec.
On read 108000 in seq BACTERIA.Time spent on this seq so far: 677.08 sec.
On read 109000 in seq BACTERIA.Time spent on this seq so far: 685.88 sec.
On read 110000 in seq BACTERIA.Time spent on this seq so far: 693.48 sec.
On read 111000 in seq BACTERIA.Time spent on this seq so far: 700.30 sec.
On read 112000 in seq BACTERIA.Time spent on this seq so far: 707.09 sec.
On read 113000 in seq BACTERIA.Time spent on this seq so far: 715.21 sec.
On read 114000 in seq BACTERIA.Time spent on this seq so far: 723.49 sec.
On read 115000 in seq BACTERIA.Time spent on this seq so far: 730.36 sec.
On read 116000 in seq BACTERIA.Time spent on this seq so far: 737.16 sec.
On read 117000 in seq BACTERIA.Time sp

On read 214000 in seq BACTERIA.Time spent on this seq so far: 1554.45 sec.
On read 215000 in seq BACTERIA.Time spent on this seq so far: 1565.61 sec.
On read 216000 in seq BACTERIA.Time spent on this seq so far: 1574.93 sec.
On read 217000 in seq BACTERIA.Time spent on this seq so far: 1586.05 sec.
On read 218000 in seq BACTERIA.Time spent on this seq so far: 1596.16 sec.
On read 219000 in seq BACTERIA.Time spent on this seq so far: 1604.72 sec.
On read 220000 in seq BACTERIA.Time spent on this seq so far: 1612.50 sec.
On read 221000 in seq BACTERIA.Time spent on this seq so far: 1621.51 sec.
On read 222000 in seq BACTERIA.Time spent on this seq so far: 1631.19 sec.
On read 223000 in seq BACTERIA.Time spent on this seq so far: 1641.52 sec.
On read 224000 in seq BACTERIA.Time spent on this seq so far: 1650.64 sec.
On read 225000 in seq BACTERIA.Time spent on this seq so far: 1659.64 sec.
On read 226000 in seq BACTERIA.Time spent on this seq so far: 1669.15 sec.
On read 227000 in seq BAC

On read 57000 in seq BACTEROIDALES.Time spent on this seq so far: 177.86 sec.
On read 58000 in seq BACTEROIDALES.Time spent on this seq so far: 181.05 sec.
On read 59000 in seq BACTEROIDALES.Time spent on this seq so far: 184.23 sec.
On read 60000 in seq BACTEROIDALES.Time spent on this seq so far: 187.49 sec.
On read 61000 in seq BACTEROIDALES.Time spent on this seq so far: 190.76 sec.
On read 62000 in seq BACTEROIDALES.Time spent on this seq so far: 194.10 sec.
On read 63000 in seq BACTEROIDALES.Time spent on this seq so far: 197.47 sec.
On read 64000 in seq BACTEROIDALES.Time spent on this seq so far: 200.83 sec.
On read 65000 in seq BACTEROIDALES.Time spent on this seq so far: 204.17 sec.
On read 66000 in seq BACTEROIDALES.Time spent on this seq so far: 207.54 sec.
On read 67000 in seq BACTEROIDALES.Time spent on this seq so far: 210.81 sec.
On read 68000 in seq BACTEROIDALES.Time spent on this seq so far: 214.19 sec.
On read 69000 in seq BACTEROIDALES.Time spent on this seq so far

On read 162000 in seq BACTEROIDALES.Time spent on this seq so far: 538.64 sec.
On read 163000 in seq BACTEROIDALES.Time spent on this seq so far: 542.11 sec.
On read 164000 in seq BACTEROIDALES.Time spent on this seq so far: 545.54 sec.
On read 165000 in seq BACTEROIDALES.Time spent on this seq so far: 548.97 sec.
On read 166000 in seq BACTEROIDALES.Time spent on this seq so far: 552.47 sec.
On read 167000 in seq BACTEROIDALES.Time spent on this seq so far: 555.79 sec.
On read 168000 in seq BACTEROIDALES.Time spent on this seq so far: 559.16 sec.
On read 169000 in seq BACTEROIDALES.Time spent on this seq so far: 562.81 sec.
On read 170000 in seq BACTEROIDALES.Time spent on this seq so far: 566.19 sec.
On read 171000 in seq BACTEROIDALES.Time spent on this seq so far: 569.53 sec.
On read 172000 in seq BACTEROIDALES.Time spent on this seq so far: 572.94 sec.
On read 173000 in seq BACTEROIDALES.Time spent on this seq so far: 576.29 sec.
On read 174000 in seq BACTEROIDALES.Time spent on th

On read 266000 in seq BACTEROIDALES.Time spent on this seq so far: 900.17 sec.
On read 267000 in seq BACTEROIDALES.Time spent on this seq so far: 903.56 sec.
On read 268000 in seq BACTEROIDALES.Time spent on this seq so far: 907.07 sec.
On read 269000 in seq BACTEROIDALES.Time spent on this seq so far: 910.48 sec.
On read 270000 in seq BACTEROIDALES.Time spent on this seq so far: 913.92 sec.
On read 271000 in seq BACTEROIDALES.Time spent on this seq so far: 917.40 sec.
On read 272000 in seq BACTEROIDALES.Time spent on this seq so far: 920.91 sec.
On read 273000 in seq BACTEROIDALES.Time spent on this seq so far: 924.30 sec.
On read 274000 in seq BACTEROIDALES.Time spent on this seq so far: 927.74 sec.
On read 275000 in seq BACTEROIDALES.Time spent on this seq so far: 932.33 sec.
On read 276000 in seq BACTEROIDALES.Time spent on this seq so far: 936.15 sec.
On read 277000 in seq BACTEROIDALES.Time spent on this seq so far: 939.85 sec.
On read 278000 in seq BACTEROIDALES.Time spent on th

On read 369000 in seq BACTEROIDALES.Time spent on this seq so far: 1262.88 sec.
On read 370000 in seq BACTEROIDALES.Time spent on this seq so far: 1266.34 sec.
On read 371000 in seq BACTEROIDALES.Time spent on this seq so far: 1269.76 sec.
On read 372000 in seq BACTEROIDALES.Time spent on this seq so far: 1273.26 sec.
On read 373000 in seq BACTEROIDALES.Time spent on this seq so far: 1276.76 sec.
On read 374000 in seq BACTEROIDALES.Time spent on this seq so far: 1280.29 sec.
On read 375000 in seq BACTEROIDALES.Time spent on this seq so far: 1283.79 sec.
On read 376000 in seq BACTEROIDALES.Time spent on this seq so far: 1287.40 sec.
On read 377000 in seq BACTEROIDALES.Time spent on this seq so far: 1290.91 sec.
On read 378000 in seq BACTEROIDALES.Time spent on this seq so far: 1294.46 sec.
On read 379000 in seq BACTEROIDALES.Time spent on this seq so far: 1297.95 sec.
On read 380000 in seq BACTEROIDALES.Time spent on this seq so far: 1301.46 sec.
On read 381000 in seq BACTEROIDALES.Time

On read 472000 in seq BACTEROIDALES.Time spent on this seq so far: 1629.98 sec.
On read 473000 in seq BACTEROIDALES.Time spent on this seq so far: 1633.83 sec.
On read 474000 in seq BACTEROIDALES.Time spent on this seq so far: 1637.21 sec.
On read 475000 in seq BACTEROIDALES.Time spent on this seq so far: 1640.66 sec.
On read 476000 in seq BACTEROIDALES.Time spent on this seq so far: 1644.15 sec.
On read 477000 in seq BACTEROIDALES.Time spent on this seq so far: 1647.62 sec.
On read 478000 in seq BACTEROIDALES.Time spent on this seq so far: 1651.07 sec.
On read 479000 in seq BACTEROIDALES.Time spent on this seq so far: 1654.49 sec.
On read 480000 in seq BACTEROIDALES.Time spent on this seq so far: 1658.04 sec.
On read 481000 in seq BACTEROIDALES.Time spent on this seq so far: 1661.48 sec.
On read 482000 in seq BACTEROIDALES.Time spent on this seq so far: 1664.91 sec.
On read 483000 in seq BACTEROIDALES.Time spent on this seq so far: 1668.44 sec.
On read 484000 in seq BACTEROIDALES.Time

On read 575000 in seq BACTEROIDALES.Time spent on this seq so far: 1998.03 sec.
On read 576000 in seq BACTEROIDALES.Time spent on this seq so far: 2001.58 sec.
On read 577000 in seq BACTEROIDALES.Time spent on this seq so far: 2005.13 sec.
On read 578000 in seq BACTEROIDALES.Time spent on this seq so far: 2008.68 sec.
On read 579000 in seq BACTEROIDALES.Time spent on this seq so far: 2012.62 sec.
On read 580000 in seq BACTEROIDALES.Time spent on this seq so far: 2016.10 sec.
On read 581000 in seq BACTEROIDALES.Time spent on this seq so far: 2019.64 sec.
On read 582000 in seq BACTEROIDALES.Time spent on this seq so far: 2023.25 sec.
On read 583000 in seq BACTEROIDALES.Time spent on this seq so far: 2026.83 sec.
On read 584000 in seq BACTEROIDALES.Time spent on this seq so far: 2030.48 sec.
On read 585000 in seq BACTEROIDALES.Time spent on this seq so far: 2034.06 sec.
On read 586000 in seq BACTEROIDALES.Time spent on this seq so far: 2037.51 sec.
On read 587000 in seq BACTEROIDALES.Time

On read 678000 in seq BACTEROIDALES.Time spent on this seq so far: 2378.00 sec.
On read 679000 in seq BACTEROIDALES.Time spent on this seq so far: 2381.68 sec.
On read 680000 in seq BACTEROIDALES.Time spent on this seq so far: 2385.00 sec.
On read 681000 in seq BACTEROIDALES.Time spent on this seq so far: 2388.61 sec.
On read 682000 in seq BACTEROIDALES.Time spent on this seq so far: 2392.19 sec.
On read 683000 in seq BACTEROIDALES.Time spent on this seq so far: 2395.82 sec.
On read 684000 in seq BACTEROIDALES.Time spent on this seq so far: 2399.31 sec.
On read 685000 in seq BACTEROIDALES.Time spent on this seq so far: 2403.01 sec.
On read 686000 in seq BACTEROIDALES.Time spent on this seq so far: 2407.12 sec.
On read 687000 in seq BACTEROIDALES.Time spent on this seq so far: 2410.77 sec.
On read 688000 in seq BACTEROIDALES.Time spent on this seq so far: 2414.41 sec.
On read 689000 in seq BACTEROIDALES.Time spent on this seq so far: 2418.08 sec.
On read 690000 in seq BACTEROIDALES.Time