<a href="https://colab.research.google.com/github/marlanaswann/spr5-kdm1a-conservation-/blob/main/notebooks/01_RBH_spr5_kdm1a.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Set Up NCBI BLAST+

In [1]:
# Install BLAST+
!apt-get install ncbi-blast+


#Check version
!blastp -version



Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  ncbi-data
The following NEW packages will be installed:
  ncbi-blast+ ncbi-data
0 upgraded, 2 newly installed, 0 to remove and 41 not upgraded.
Need to get 15.8 MB of archives.
After this operation, 71.8 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 ncbi-data all 6.1.20170106+dfsg1-9 [3,519 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy/universe amd64 ncbi-blast+ amd64 2.12.0+ds-3build1 [12.3 MB]
Fetched 15.8 MB in 0s (34.7 MB/s)
Selecting previously unselected package ncbi-data.
(Reading database ... 121713 files and directories currently installed.)
Preparing to unpack .../ncbi-data_6.1.20170106+dfsg1-9_all.deb ...
Unpacking ncbi-data (6.1.20170106+dfsg1-9) ...
Selecting previously unselected package ncbi-blast+.
Preparing to unpack .../ncbi-blast+_2.12.0+ds-3build1_amd64.deb .

## Upload Your Files

In [2]:
from google.colab import files
uploaded = files.upload()

#Verify they exist
!ls -lh

Saving celegansproteome.fasta.gz to celegansproteome.fasta.gz
Saving humanproteome.fasta.gz to humanproteome.fasta.gz
Saving KDM1Afin.fasta to KDM1Afin.fasta
Saving SPR5fin.fasta to SPR5fin.fasta
total 31M
-rw-r--r-- 1 root root 7.1M Nov 20 18:40 celegansproteome.fasta.gz
-rw-r--r-- 1 root root  24M Nov 20 18:40 humanproteome.fasta.gz
-rw-r--r-- 1 root root  972 Nov 20 18:40 KDM1Afin.fasta
drwxr-xr-x 1 root root 4.0K Nov 17 14:29 sample_data
-rw-r--r-- 1 root root  897 Nov 20 18:40 SPR5fin.fasta


## Decompress Proteome Files & Verify Format

In [6]:
!gunzip humanproteome.fasta.gz
!gunzip celegansproteome.fasta.gz

!head -n 5 humanproteome.fasta
!head -n 5 celegansproteome.fasta


gzip: humanproteome.fasta.gz: No such file or directory
gzip: celegansproteome.fasta.gz: No such file or directory
>tr|A0A087WVL8|A0A087WVL8_HUMAN Fragile X messenger ribonucleoprotein 1 OS=Homo sapiens OX=9606 GN=FMR1 PE=1 SV=1
MEELVVEVRGSNGAFYKAFVKDVHEDSITVAFENNWQPDRQIPFHDVRFPPPVGYNKDIN
ESDEVEVYSRANEKEPCCWWLAKVRMIKGEFYVIEYAACDATYNEIVTIERLRSVNPNKP
ATKDTFHKIKLDVPEDLRQMCAKEAAHKDFKKAVGAFSVTYDPENYQLVILSINEVTSKR
AHMLIDMHFRSLRTKLSLIMRNEEASKQLESSRQLASRFHEQFIVREDLMGLAIGTHGAN
>sp|A0A061ACU2|PIEZ1_CAEEL Piezo-type mechanosensitive ion channel component 1 OS=Caenorhabditis elegans OX=6239 GN=pezo-1 PE=1 SV=1
MTVPPLLKSCVVKLLLPAALLAAAIIRPSFLSIGYVLLALVSAVLPPIRKSLALPKLVGT
FVIITFLFCLAVALGVGSYQISEQVVHKNDRTYICNRSDTTLFRSIGLVRFHPTGTFEST
RAFLPEIIATSAALLTIIIVMFLSHRDEQLDVVGDVVTVRSESGREQRRQRKLAAIMWSA
IGNSLRRLTNFVLFLFTAYVGIVKPSLSNSIYFLAFLFISTWWSTYTPLRHGVYNQIKKF


## Create BLAST databases

In [8]:
# For human proteome / KDM1A
!makeblastdb -in humanproteome.fasta -dbtype prot -parse_seqids -out human_db

# For C. elegans proteome / SPR-5
!makeblastdb -in celegansproteome.fasta -dbtype prot -parse_seqids -out celegans_db

#Verify if database files were created
!ls -lh human_db*




Building a new DB, current time: 11/20/2025 18:42:17
New DB name:   /content/human_db
New DB title:  humanproteome.fasta
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 105719 sequences in 2.94265 seconds.




Building a new DB, current time: 11/20/2025 18:42:21
New DB name:   /content/celegans_db
New DB title:  celegansproteome.fasta
Sequence type: Protein
Deleted existing Protein BLAST database named /content/celegans_db
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 26695 sequences in 0.768257 seconds.


-rw-r--r-- 1 root root 2.3M Nov 20 18:42 human_db.pdb
-rw-r--r-- 1 root root  17M Nov 20 18:42 human_db.phr
-rw-r--r-- 1 root root 827K Nov 20 18:42 human_db.pin
-rw-r--r-- 1 root root 413K Nov 20 18:42 human_db.pog
-rw-r--r-- 1 root root 1.7M Nov 20 18:42 human_db.pos
-rw-r--r-- 1 root root 1.3M Nov 20 18:42 human_db.pot
-rw-r--r-- 1 root root  43M Nov 20 18:42 human_db.psq
-rw-r--r-- 1 root r

## Run Forward BLAST (SPR-5 -> Human Proteome)

In [9]:
!blastp \
  -query SPR5fin.fasta \
  -db human_db \
  -out SPR5_vs_human.tbl \
  -outfmt "6 qseqid sseqid pident length evalue bitscore stitle" \
  -max_target_seqs 5




Run