<a href="https://colab.research.google.com/github/lordchipo/proteinblast/blob/main/UniProtKB_Protein_BLAST.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Protein Sequence Similarity Search on UniProtKB/Swiss-Prot
## 1. Install biopython and import the required modules
- Install Biopython into the Google Colab runtime
- Import the Bio.Blast.NCBIWWW module which invokes the NCBI-BLAST server over the Internet.
- Import the SeqIO and SearchIO modules as interfaces for input sequences, output sequences, and search results.
- Import "files" from the google.colab library
- Install and import pandas (important for saving search results)

In [None]:
!pip install biopython
!pip install pandas
from Bio.Blast import NCBIWWW
from Bio import SeqIO, SearchIO
from google.colab import files
import pandas as pd

## 2. Enter a protein query sequence file
The file format for the query is *.fasta

In [None]:
file_upload = files.upload()
query_dir = list(file_upload.keys())[0]

query = SeqIO.read(query_dir, format = "fasta")
print(query.description)
print("Length of the query protein is: " + str(len(query)))
print(query.seq)

## 3. Search UniProtKB for target proteins based on sequence similarity to the query

In [None]:
result_handle = NCBIWWW.qblast("blastp", "swissprot", query.seq)
search_results = SearchIO.read(result_handle, "blast-xml")

# Display the search results
print(search_results)

*UniProt IDs can be used to search protein structures in the AlphaFold Protein Structure Database (alphafold.ebi.ac.uk)*

#4. Save the search results as a CSV file

In [None]:
results_df = pd.DataFrame(search_results)
results_df.to_csv("UniProtKB_search_results.csv")
files.download("UniProtKB_search_results.csv")

## (Optional) Display a specific range of search results (e.g. results #10-20)

In [None]:
i1 = input("Enter lower bound of range: ")
i2 = input("Enter upper bound of range: ")

print(search_results[int(i1):int(i2)])