# Test the `cs_tag.Alignment.extract_cs` objects
This notebook is designed to be run with `nbval` as a test.

First, create a `DummyAlignedSegment` class that emulates the necessary part of a `pysam.AlignedSegment`.
It returns all the required attributes for internal functioning of `cs_tag.Alignment`, even if many of these just have dummy values:

In [1]:
class DummyAlignedSegment:
    """Emulates `pysam.AlignedSegment` for testing `cs_tag.Alignment`."""
    
    def __init__(self, *, reference_start, cs,
                 query_alignment_qualities=None):
        # attributes with hardcoded values are irrelevant to test
        self.query_name = 'queryname'
        self.reference_name = 'refname'
        self.query_alignment_start = 0
        self.query_length = 0
        self.query_alignment_end = 0
        self.is_reverse = '+'
        self.reference_end = 0
        self.is_unmapped = False
        
        # relevant attributes
        self.reference_start = reference_start
        self._cs = cs
        self.query_alignment_qualities = query_alignment_qualities
        
    def get_tag(self, tag):
        if tag == 'cs':
            return self._cs
        else:
            raise ValueError('invalid tag')

We will test `cs_tag.Alignment` objects:

In [2]:
from alignparse.cs_tag import Alignment

## Test segment that is all identities:

In [3]:
a_ident = Alignment(DummyAlignedSegment(
                reference_start=0,
                cs=':20'))

In [4]:
a_ident.extract_cs(0, 20)

(':20', 0, 0)

In [5]:
a_ident.extract_cs(5, 26)

(':15', 0, 6)

## Segment that is all identities but starts shifted:

In [6]:
a_ident_shifted = Alignment(DummyAlignedSegment(
                reference_start=5,
                cs=':20'))

In [7]:
a_ident_shifted.extract_cs(0, 20)

(':15', 5, 0)

In [8]:
a_ident_shifted.extract_cs(4, 26)

(':20', 1, 1)

In [9]:
a_ident_shifted.extract_cs(3, 18)

(':13', 2, 0)

## Segment that is all deletions:

In [10]:
a_del = Alignment(DummyAlignedSegment(
                reference_start=0,
                cs='-aatgccgcttcaatgcc'))

In [11]:
a_del.extract_cs(6, 10)

('-gctt', 0, 0)

In [12]:
a_del.extract_cs(6, 7)

('-g', 0, 0)

In [13]:
a_del.extract_cs(1, 19)

('-atgccgcttcaatgcc', 0, 2)

## Segment that is deletions but shifted:

In [14]:
a_del_shifted = Alignment(DummyAlignedSegment(
                    reference_start=7,
                    cs='-aatgccgcttcaatgcc'))

In [15]:
a_del_shifted.extract_cs(6, 10)

('-aat', 1, 0)

In [16]:
a_del_shifted.extract_cs(8, 27)

('-atgccgcttcaatgcc', 0, 3)

## Segment with insertions

In [17]:
a_ins_split = Alignment(DummyAlignedSegment(
                reference_start=0,
                cs=':5+atg:3'))

Insertion goes at end of feature when at boundary:

In [18]:
a_ins_split.extract_cs(0, 5)

(':5+atg', 0, 0)

Does **not** go at start of feature when at boundary:

In [19]:
a_ins_split.extract_cs(5, 8)

(':3', 0, 0)

More checks:

In [20]:
a_ins_split.extract_cs(4, 6)

(':1+atg:1', 0, 0)

## Segment with insertions that is shifted

In [21]:
a_ins_split_shifted = Alignment(DummyAlignedSegment(
                        reference_start=2,
                        cs=':5+atg-caat:3'))

In [22]:
a_ins_split_shifted.extract_cs(7, 9)

('-ca', 0, 0)

In [23]:
a_ins_split_shifted.extract_cs(0, 7)

(':5+atg', 2, 0)

In [24]:
a_ins_split_shifted.extract_cs(1, 16)

(':5+atg-caat:3', 1, 2)

In [25]:
a_ins_split_shifted.extract_cs(3, 13)

(':4+atg-caat:2', 0, 0)

## Longer more complex alignment
This alignment is:
  - clipped 0 to 3
  - identity 3 to 8
  - substitution 8 to 9
  - identity 9 to 11
  - deletion 11 to 15
  - substitution 15 to 16
  - identity 16 to 19
  - insertion at 19
  - identity 19 to 22
  - substitution at 22 to 23

In [26]:
a_long = Alignment(DummyAlignedSegment(
            reference_start=3,
            cs=':5*na:2-accg*ta:3+tt:3*ga'))

In [27]:
a_long.extract_cs(0, 3) is None

True

In [28]:
a_long.extract_cs(3, 8)

(':5', 0, 0)

In [29]:
a_long.extract_cs(8, 9)

('*na', 0, 0)

In [30]:
a_long.extract_cs(9, 16)

(':2-accg*ta', 0, 0)

In [31]:
a_long.extract_cs(16, 23)

(':3+tt:3*ga', 0, 0)

In [32]:
a_long.extract_cs(12, 21)

('-ccg*ta:3+tt:2', 0, 0)

In [33]:
a_long.extract_cs(1, 14)

(':5*na:2-acc', 2, 0)

In [34]:
a_long.extract_cs(19, 25)

(':3*ga', 0, 2)

In [35]:
a_long.extract_cs(8, 19)

('*na:2-accg*ta:3+tt', 0, 0)