# 🧬 Introduction to BLAST

**BLAST** (Basic Local Alignment Search Tool) is one of the most widely used tools in bioinformatics. It allows you to compare a query sequence (DNA or protein) against a database to find regions of local similarity.

### 🧠 What does BLAST do?
- Finds regions of local similarity
- Identifies homologous genes or proteins
- Estimates evolutionary distances

### 🔍 Types of BLAST
- `blastn`: DNA vs DNA
- `blastp`: Protein vs Protein
- `blastx`, `tblastn`, `tblastx`: Mixed queries with translation

## ⚡ Quick BLAST Demo (Online)
This example shows how to query a short DNA sequence using `blastn` against the NCBI nucleotide database.

In [3]:
from Bio.Blast import NCBIWWW, NCBIXML

result_handle = NCBIWWW.qblast(
    program="blastn",
    database="nt",
    sequence="GATTTGGGGTTTTAGTAGAATTCTCGC",
)

# Save to file for inspection later
with open("blast_result.xml", "w") as f:
    f.write(result_handle.read())
print("BLAST result saved to blast_result.xml")



In [6]:
# Parse and display quick BLAST result
with open("blast_result.xml") as result_file:
    blast_record = NCBIXML.read(result_file)

print(f"Found {len(blast_record.alignments)} alignments.")

for alignment in blast_record.alignments[:3]:
    hsp = alignment.hsps[0]
    print("\n****Alignment****")
    print(f"Title: {alignment.title}")
    print(f"Length: {alignment.length}")
    print(f"E-value: {hsp.expect}")
    print(hsp.query[0:75] + "...")
    print(hsp.match[0:75] + "...")
    print(hsp.sbjct[0:75] + "...")



# 🔬 Full BLAST Workflow: From Genome to Protein Analysis

In this section, we will:
1. Fetch the **E. coli K12 genome** from NCBI
2. Extract coding sequences (CDS) and proteins
3. Save a selected protein in FASTA format
4. Perform a BLAST search using the protein
5. Analyze and interpret the results

## 📥 Step 1: Fetch E. coli K12 Genome from NCBI

In [7]:
from Bio import Entrez, SeqIO

Entrez.email = "blagoj4e@gmail.com"  # Replace with your email
handle = Entrez.efetch(db="nucleotide", id="U00096.3", rettype="gb", retmode="text")  # E. coli K12 MG1655
with open("ecoli_k12.gb", "w") as f:
    f.write(handle.read())
print("Genome saved to ecoli_k12.gb")



## 🧬 Step 2: Parse Genome and Extract Proteins

In [8]:
record = SeqIO.read("ecoli_k12.gb", "genbank")
proteins = []

for feature in record.features:
    if feature.type == "CDS" and "translation" in feature.qualifiers:
        protein = feature.qualifiers["translation"][0]
        if len(protein) > 20:
            proteins.append(protein)

print(f"Extracted {len(proteins)} protein sequences longer than 20 amino acids.")



## 💾 Step 3: Save a Protein to FASTA

In [9]:
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio import SeqIO

selected_protein = proteins[0]
record = SeqRecord(Seq(selected_protein), id="EcoliK12_protein1", description="E. coli K12 protein")
SeqIO.write(record, "ecoli_protein.fasta", "fasta")
print("Saved first protein to ecoli_protein.fasta")



## 🔍 Step 4: Run BLAST and Parse Results
We’ll now use `SearchIO` to analyze the BLAST result (online or local).

In [10]:
from Bio.Blast import NCBIWWW
from Bio import SearchIO

# Run online BLAST (this can take a while)
with open("ecoli_protein.fasta") as fasta_file:
    result_handle = NCBIWWW.qblast("blastp", "nr", fasta_file.read())

# Save and parse
with open("blast_result_ecoli.xml", "w") as out_handle:
    out_handle.write(result_handle.read())

blast_qresult = SearchIO.read("blast_result_ecoli.xml", "blast-xml")

print(f"Query ID: {blast_qresult.id}")
print(f"Number of hits: {len(blast_qresult)}")



## 📊 Step 5: Show Top BLAST Hits

In [12]:
for hit in blast_qresult.hits[:]:
    print(f"Hit ID: {hit.id}\nDescription: {hit.description}\nE-value: {hit.hsps[0].evalue}\n")

