# EspY3 flanking-region homologue search
## 4/11/25

EspY3 has a domain of two pentapeptide repeats (PPRs). This is the only annotated region in Pfam and is conserved across 10,000 proteins. Whilst it is highly conserved and structurally stable, the surrounding flanks evolve more rapidly and often determine function or interactions. Variation in these flanks therefore can signal functional divergence or specialisation among homologous proteins. Phylogenetic analysis of these regions may uncover lineage-specific adaptations and link sequence evolution outside the repeat domain to potential differences in protein function within pathogens. 


In this notebook I aim to:
1. Extract the N- and C-terminal flanks *outside* the PPR domain (165-342.
2. Search BLASTP with each flanking sequence to find homologues
3. Curate hits with CD-HIT to remove redundancy
4. Annotate with Pfam/InterPro to see if any new domains appear in the flanks
5. Build allignments and trees


In [13]:
# 1. Isolating the N- and C- flanks.

# N flank residues 1 - 164
!seqkit subseq -r 1:164 ../../data/raw/espy3.fasta > ../../results/flank-search/flank_N.fasta

# C flank residues 343 - 523
!seqkit subseq -r 343:-1 ../../data/raw/espy3.fasta > ../../results/flank-search/flank_C.fasta

[INFO][0m create or read FASTA index ...
[INFO][0m read FASTA index from ../../data/raw/espy3.fasta.seqkit.fai
[INFO][0m   1 records loaded from ../../data/raw/espy3.fasta.seqkit.fai
[INFO][0m create or read FASTA index ...
[INFO][0m read FASTA index from ../../data/raw/espy3.fasta.seqkit.fai
[INFO][0m   1 records loaded from ../../data/raw/espy3.fasta.seqkit.fai


## 2. Blast search

I used the online NCBI BLAST. I searched each flanking region against the refseq_protein database, and set the max target sequences to 5000. All other settings are default blastp.

The output of both flanks shows that the flanking regions are restricted to enterobacterales. This supports a model where this protein evolved from a general PPR scaffold but specialised within enteric pathogens.