High-Performance PfamScan

The objective of this pure Python implementation of PfamScan is to parallelize the process of pfam_scan.pl in order to perform a complete proteome.

Installation instructions

Run the silent installation of Miniconda in case you don't have this software in your Linux Environment

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda3

Once you have installed Miniconda/Anaconda, create a Python 3 environment.

git clone https://github.com/fpozoc/hp-pfamscan.git
cd HP-PfamScan
conda env create --file environment.yml
conda activate pfamscan

In case the user does not choose Conda as the desired environment, this instructions described here can be followed.

Disclaimer: Pfam-B has not been uploaded from version 27. You can take Pfam-A.hmm from current_release and Pfam-B.hmm from version 27 or take only Pfam-A.hmm. More info here.

mkdir -p pfam_db
curl ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.gz | gunzip > pfam_db/Pfam-A.hmm
curl ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.dat.gz | gunzip > pfam_db/Pfam-A.hmm.dat
curl ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/active_site.dat.gz | gunzip > pfam_db/active_site.dat
curl ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam27.0/Pfam-B.hmm.gz | gunzip > pfam_db/Pfam-B.hmm
curl ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam27.0/Pfam-B.hmm.dat.gz | gunzip > pfam_db/Pfam-B.hmm.dat

hmmpress Pfam-A.hmm
hmmpress Pfam-B.hmm ### Optional

### Download GRCh38 Gencode v33
mkdir -p genomes/GRCh38/g33/
curl ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_33/gencode.v33.pc_transcripts.fa.gz > genome_annotation/GRCh38/g33/gencode.v33.pc_transcripts.fa.gz

Split the multifasta file in several files with transcript id as name of the file. It will be stored in genomes/GRCh38/g33/seqs.

Once we have it, run src/pfamscan.py to locally process the sequences in a batched way.

python -m src.run --seqs genome_annotation/GRCh38/g33/gencode.v33.pc_transcripts.fa.gz --outdir out/GRCh38/g33 --pfamdb pfam_db --jobs 10

Links of interest

Some old pfam_scan.pl starting instructions here.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src		src
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

README.md

README.md

environment.yml

environment.yml

Repository files navigation

High-Performance PfamScan

Installation instructions

Links of interest

About

Languages

fpozoc/hp-pfamscan

Folders and files

Latest commit

History

Repository files navigation

High-Performance PfamScan

Installation instructions

Links of interest

About

Topics

Resources

Stars

Watchers

Forks

Languages