# Influenza HAs (H3, H4, H14 subtypes)

In this example, we align a small set of influenza hemagglutinins (HAs): one each from the H3, H4, and H14 subtype.

The **un-aligned** set of HAs are in [input_files/HA_H3_H4_H14.fa](input_files/HA_H3_H4_H14.fa):

In [1]:
! cat input_files/HA_H3_H4_H14.fa

>cds:CAA29337 A/England/321/1977 1977// HA
MKTIIALSYIFCQVLAQNLPGNDNSTATLCLAHHAVPNGTLVKTITNDQIEVTNATELVQSSSTGRICDSPHRILDGKNCTLIDALLGDPHCDGFQNEKWDLFVERSKAFSNCYPYDVPDYASLRSLVASSGTLEFINEGFNWTGVTQNGGSYACKRGPDNSFFSRLNWLYKSESTYPVLNVTMPNNDNFDKLYIWGVHHPSTDKEQTKLYVQASGRVTVSTKRSQQTIIPNVGSRPWVRGLSSRISIYWTIVKPGDILLINSNGNLIAPRGYFKIRTGKSSIMRSDAPIGTCSSECITPNGSIPNDKPFQNVNKITYGACPKYVKQNTLKLATGMRNVPEKQTRGIFGAIAGFIENGWEGMIDGWYGFRHQNSEGTGQAADLKSTQAAIDQINGKLNRVIEKTNEKFHQIEKEFSEVEGRIQDLEKYVEDTKIDLWSYNAELLVALENQHTIDLTDSEMNKLFEKTRRQLRENAEDMGNGCFKIYHKCDNACIGSIRNGTYDHDVYRDEALNNRFQIKGVELKSGYKDWILWISFAISCFLLCVVLLGFIMWACQKGNIRCNICI
>cds:BAA14332 A/duck/Czechoslovakia/1956 1956// HA
MLSIVILFLLIAENSSQNYTGNPVICMGHHAVANGTMVKTLADDQVEVVTAQELVESQNLPELCPSPLRLVDGQTCDIINGALGSPGCDHLNGAEWDVFIERPNAVDTCYPFDVPEYQSLRSILANNGKFEFIAEEFQWNTVKQNGKSGACKRANVDDFFNRLNWLVKSDGNAYPLQNLTKINNGDYARLYIWGVHHPSTSTEQTNLYKNNPGRVTVSTKTSQTSVVPDIGSRPLVRGQSGRVSFYWTIVEPGDLIVFNTIGNLIAPRGHYKLNNQKKSTILNTAIPIGSCVSKCHTDKGSLSTTKPFQNISRIAVGDCPRYVKQGSLKLATGMRNIPE

We would like to align these HA proteins to each and to the protein chains for a trimer of the H3 HA in [PDB 4o5n](https://www.rcsb.org/structure/4o5n).
This PDB only shows a monomer, so a full trimer was generated using [makemultimer.py](http://watcut.uwaterloo.ca/tools/makemultimer/), and is in [input_files/4o5n_trimer.pdb](input_files/4o5n_trimer.pdb).
Chains `A`, `C`, and `E`) correspond to HA1, and chains `B`, `D`, and `F` correspond to HA2.
Here are the first few lines of the PDB file:

In [2]:
! head -n 13 input_files/4o5n_trimer.pdb

REMARK  Multimer expanded from BIOMT matrix in pdb file 4O5N
REMARK  by MakeMultimer.py (watcut.uwaterloo.ca/makemultimer)
REMARK  
REMARK  -------------------------------------------------------------
REMARK  Chain  original  1st resid.  last resid.  1st atom  last atom
REMARK  -------------------------------------------------------------
REMARK      A         A           9          325         1       2498
REMARK      C         A           9          325         1       2498
REMARK      E         A           9          325         1       2498
REMARK      B         B           1          173         1       1431
REMARK      D         B           1          173         1       1431
REMARK      F         B           1          173         1       1431
REMARK  -------------------------------------------------------------


For our reference sequence in the alignment, we choose the H3 HA from *A/England/321/1977*.

Run `pdb_prot_align`, sending the output files to the subdirectory `./output_files/` (which needs to have already been created).
We add the `--reorder` command to our call of `mafft`:

In [3]:
! pdb_prot_align --protsfile input_files/HA_H3_H4_H14.fa \
                 --refprot_regex A/England/321/1977 \
                 --pdbfile input_files/4o5n_trimer.pdb \
                 --chain_ids A B C D E F \
                 --outprefix output_files/HA_H3_H4_H14 \
                 --mafft "mafft --reorder"


Running `pdb_prot_align` 0.5.0

Parsing PDB input_files/4o5n_trimer.pdb chains A B C D E F
For chain A, parsed 317 residues, ranging from 9 to 325 in PDB numbering.
For chain B, parsed 173 residues, ranging from 1 to 173 in PDB numbering.
For chain C, parsed 317 residues, ranging from 9 to 325 in PDB numbering.
For chain D, parsed 173 residues, ranging from 1 to 173 in PDB numbering.
For chain E, parsed 317 residues, ranging from 9 to 325 in PDB numbering.
For chain F, parsed 173 residues, ranging from 1 to 173 in PDB numbering.

Read 3 sequences from input_files/HA_H3_H4_H14.fa
Reference protein is of length 566 and has the following header:
cds:CAA29337 A/England/321/1977 1977// HA

Using `mafft` to align sequences to output_files/HA_H3_H4_H14_unstripped_alignment.fa
Stripping gaps relative to reference cds:CAA29337 A/England/321/1977 1977// HA
Dropping PDB chains from alignment
Writing gap-stripped alignment to output_files/HA_H3_H4_H14_alignment.fa

Writing CSV with detailed infor

The alignment output file ([output_files/HA_H3_H4_H14_alignment.fa](output_files/HA_H3_H4_H14_alignment.fa)) has the HA alignment with all gaps stripped relative to the reference sequence:

In [4]:
! cat output_files/HA_H3_H4_H14_alignment.fa

>cds:CAA29337 A/England/321/1977 1977// HA
MKTIIALSYIFCQVLAQNLPGNDNSTATLCLAHHAVPNGTLVKTITNDQIEVTNATELVQSSSTGRICDSPHRILDGKNCTLIDALLGDPHCDGFQNEKWDLFVERSKAFSNCYPYDVPDYASLRSLVASSGTLEFINEGFNWTGVTQNGGSYACKRGPDNSFFSRLNWLYKSESTYPVLNVTMPNNDNFDKLYIWGVHHPSTDKEQTKLYVQASGRVTVSTKRSQQTIIPNVGSRPWVRGLSSRISIYWTIVKPGDILLINSNGNLIAPRGYFKIRTGKSSIMRSDAPIGTCSSECITPNGSIPNDKPFQNVNKITYGACPKYVKQNTLKLATGMRNVPEKQTRGIFGAIAGFIENGWEGMIDGWYGFRHQNSEGTGQAADLKSTQAAIDQINGKLNRVIEKTNEKFHQIEKEFSEVEGRIQDLEKYVEDTKIDLWSYNAELLVALENQHTIDLTDSEMNKLFEKTRRQLRENAEDMGNGCFKIYHKCDNACIGSIRNGTYDHDVYRDEALNNRFQIKGVELKSGYKDWILWISFAISCFLLCVVLLGFIMWACQKGNIRCNICI
>cds:BAA14332 A/duck/Czechoslovakia/1956 1956// HA
MLSIVILFLLIAENSSQNYTGN----PVICMGHHAVANGTMVKTLADDQVEVVTAQELVESQNLPELCPSPLRLVDGQTCDIINGALGSPGCDHLNGAEWDVFIERPNAVDTCYPFDVPEYQSLRSILANNGKFEFIAEEFQWNTVKQNGKSGACKRANVDDFFNRLNWLVKSDNAYPLQNLTKINNGDYARLYIWGVHHPSTSTEQTNLYKNNPGRVTVSTKTSQTSVVPDIGSRPLVRGQSGRVSFYWTIVEPGDLIVFNTIGNLIAPRGHYKLNNKKSTILNTAIPIGSCVSKCHTDKGSLSTTKPFQNISRIAVGDCPRYVKQGSLKLATGMRNI

However, the really "precious" information is in the output CSV file, [output_files/HA_H3_H4_H14_sites.csv](output_files/HA_H3_H4_H14_sites.csv).
Here are some lines of that file:

In [5]:
! head -n 5000 output_files/HA_H3_H4_H14_sites.csv | tail -n 35

99,K,E,83,K,1.58496,3.00000,F,0.00000
99,K,E,83,K,1.58496,3.00000,G,0.00000
99,K,E,83,K,1.58496,3.00000,H,0.00000
99,K,E,83,K,1.58496,3.00000,I,0.00000
99,K,E,83,K,1.58496,3.00000,K,0.33333
99,K,E,83,K,1.58496,3.00000,L,0.00000
99,K,E,83,K,1.58496,3.00000,M,0.00000
99,K,E,83,K,1.58496,3.00000,N,0.00000
99,K,E,83,K,1.58496,3.00000,P,0.00000
99,K,E,83,K,1.58496,3.00000,Q,0.00000
99,K,E,83,K,1.58496,3.00000,R,0.00000
99,K,E,83,K,1.58496,3.00000,S,0.00000
99,K,E,83,K,1.58496,3.00000,T,0.33333
99,K,E,83,K,1.58496,3.00000,V,0.00000
99,K,E,83,K,1.58496,3.00000,W,0.00000
99,K,E,83,K,1.58496,3.00000,Y,0.00000
100,W,A,84,W,0.00000,1.00000,A,0.00000
100,W,A,84,W,0.00000,1.00000,C,0.00000
100,W,A,84,W,0.00000,1.00000,D,0.00000
100,W,A,84,W,0.00000,1.00000,E,0.00000
100,W,A,84,W,0.00000,1.00000,F,0.00000
100,W,A,84,W,0.00000,1.00000,G,0.00000
100,W,A,84,W,0.00000,1.00000,H,0.00000
100,W,A,84,W,0.00000,1.00000,I,0.00000
100,W,A,84,W,0.00000,1.00000,K,0.00000
100,W,A,84,W,0.00000,1.00000,L,0.00000
10