# Homology Search Using BLAST

After validating the protein sequence, a homology-based similarity search is performed to identify related proteins in public databases.

Sequence similarity can provide insights into the potential function of an unknown protein.

## Rationale for Homology Search

Proteins with similar amino acid sequences often share similar structures and biological functions.
Therefore, identifying homologous sequences is a common approach for predicting the function of uncharacterized proteins.

In this project, BLAST (Basic Local Alignment Search Tool) is used to compare the target protein sequence against known protein sequences.

## Choice of BLAST Program

Because the input sequence is a protein sequence, BLASTp is selected.
BLASTp compares an amino acid query sequence against a protein database, making it appropriate for this analysis.

## BLAST Execution Strategy

BLAST can be performed either locally or by submitting the sequence to the NCBI BLAST servers.
In this project, the BLAST search is performed online using NCBI servers via Biopython.

This approach ensures access to up-to-date protein databases without requiring local database installation.

## BLAST Input Sequence

The input sequence for BLAST analysis is the validated protein sequence stored in FASTA format.

FASTA format is the standard input format required by BLAST for sequence similarity searches.

In [2]:
from Bio.Blast import NCBIWWW
from Bio import SeqIO

In [3]:
record = SeqIO.read("../data/input_sequence.fasta", "fasta")
sequence = str(record.seq)

In [4]:
result_handle = NCBIWWW.qblast(
    program="blastp",
    database="nr",
    sequence=sequence
)

URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Basic Constraints of CA cert not marked critical (_ssl.c:1032)>

In [None]:
with open("../results/blast_results.xml", "w") as out_handle:
    out_handle.write(result_handle.read())

print("BLAST results saved to blast_results.xml")

## Note on BLAST Execution

An attempt was made to submit the BLAST search programmatically using Biopython.
However, due to SSL certificate verification issues on the local system, the BLAST search was performed via the NCBI web interface instead.

The resulting BLAST output was downloaded in XML format and used for downstream analysis.

## BLAST Output

The BLAST search was completed successfully.
The resulting output was downloaded in XML format and stored as:

`results/blast_results.xml`
