### Sequence numbering
In this notebook, we illustrate how to use the SingleChainAnnotator tool to determine
whether a sequence is heavy or light (lambda, kappa) chain and number it.

In [1]:
from antpack import SingleChainAnnotator

my_sequence = "AAAAAAAEVHLQQSGAELMKPGASVKISCKASGYTFITYWIEWVKQRPGHGLEWIGDILPGSGSTNYNENFKGKATFTADSSSNTAYMQLSSLTSEDSAVYYCARSGYYGNSGFAYWGQGTLVTVSA"


By passing ["H", "K", "L"] (the default) we ensure the annotator will align each sequence to
heavy, kappa and lambda (kappa and lambda are different variants of light chains)
and will determine the type of the chain based on which option returns the best
alignment. If we KNOW our chains are all "H", we could pass "H" as the only
option, and this will improve speed slightly. In general however this isn't
necessary.

In [2]:
my_annotator = SingleChainAnnotator(["H", "K", "L"], scheme = "imgt")

In [3]:
numbering, percent_identity, chain_type, err_message = my_annotator.analyze_seq(my_sequence)

The numbering is a list of the same length as the input sequence where each element is either
"-" (a gap, meaning there's no numbering assignment that corresponds to that amino acid or
a numbering assignment.

In [4]:
print(f"{numbering}\n\n")
print(f"{my_sequence}\n\n")
print([(a, z) for a, z in zip(numbering, my_sequence)])

['-', '-', '-', '-', '-', '-', '-', '1', '2', '3', '4', '5', '6', '7', '8', '9', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '35', '36', '37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59', '62', '63', '64', '65', '66', '67', '68', '69', '70', '71', '72', '74', '75', '76', '77', '78', '79', '80', '81', '82', '83', '84', '85', '86', '87', '88', '89', '90', '91', '92', '93', '94', '95', '96', '97', '98', '99', '100', '101', '102', '103', '104', '105', '106', '107', '108', '109', '110', '111', '112', '113', '114', '115', '116', '117', '118', '119', '120', '121', '122', '123', '124', '125', '126', '127', '128']


AAAAAAAEVHLQQSGAELMKPGASVKISCKASGYTFITYWIEWVKQRPGHGLEWIGDILPGSGSTNYNENFKGKATFTADSSSNTAYMQLSSLTSEDSAVYYCARSGYYGNSGFAYWGQGTLVTVSA


[('-', 'A'), ('-', 'A'), ('-', 'A'), ('-', 'A'), ('-', 'A'), ('-', 'A'), ('-', 'A'), ('1', '

SingleChainAnnotator determined this was a heavy chain ("H"). "K" and "L" both correspond to light
chains (kappa and lambda).

In [5]:
print(chain_type)

H


A low percent identity (<< 0.8) could mean this isn't an antibody sequence, contains large
deletions, or some other issue. In this case, no problems.

In [6]:
print(percent_identity)

0.9659090909090909


Finally, if the error message is anything other than "", something went wrong. The most common
error message indicates an unexpected amino acid was found at a highly conserved position --
this usually occurs if there is a very large N or C terminal deletion. In this case we're fine.

In [7]:
print(err_message)




Note that above we used ``analyze_seq`` to analyze a single sequence. If we have a list
of sequences we can use ``analyze_seqs``, or we can loop over our list and feed each
sequence to ``analyze_seq`` as we go.