<a href="https://colab.research.google.com/github/lordchipo/proteinblast/blob/main/PDB_Protein_BLAST.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Protein Sequence Similarity Search on PDB Using NCBI-BLAST
This script aims to search protein structures in the PDB using NCBI-BLAST based on the structures' sequence similarity to the provided query protein sequence.

## 1. Import the required modules from the biopython package
- Install Biopython
- Import the Bio.Blast.NCBIWWW module which invokes the NCBI-BLAST server over the Internet.
- Import the SeqIO and SearchIO modules as interfaces for input sequences, output sequences, and search results.
- Import "files" module from the google.colab library
- Install and import pandas (important for saving the search results)

In [None]:
!pip install biopython
!pip install pandas
from Bio.Blast import NCBIWWW
from Bio import SeqIO, SearchIO
from google.colab import files
import pandas as pd

## 2. Upload a protein query sequence file in the FASTA format

In [None]:
upload = files.upload()
query_dir = list(upload.keys())[0]

query = SeqIO.read(query_dir, format = "fasta")
print(query.description)
print("Length of the query protein is: " + str(len(query)))
print(query.seq)

## 3. Search the PDB for target proteins based on sequence similarity to the query

In [None]:
result_handle = NCBIWWW.qblast("blastp", "pdb", query.seq)
search_results = SearchIO.read(result_handle, "blast-xml")

# Display the search results
print(search_results)

# 4. Save the search results in a CSV or Excel file

In [None]:
df = pd.DataFrame(search_results)
save_choice = input("Type 0 to save as CSV or 1 to save as Excel:")

if save_choice == "0":
  df.to_csv("BLAST_PDB_search_results.csv", index=False)
  files.download("BLAST_PDB_search_results.csv")
elif save_choice == "1":
  df.to_excel("BLAST_PDB_search_results.xlsx", index=False)
  files.download("BLAST_PDB_search_results.xlsx")
else:
  print("Error: Invalid choice")

## (Optional) Display a specific range of search results (e.g. results #10-20)

In [None]:
i1 = input("Enter lower bound of range: ")
i2 = input("Enter upper bound of range: ")

print(search_results[int(i1):int(i2)])