Skip to content

Commit

Permalink
Remove local.json in favor of system variables (#19)
Browse files Browse the repository at this point in the history
* remove local.json

* refactor to use third party binaries as system variables

* add THIRDPARTY.md

* update dockerfile

* update tests
  • Loading branch information
rvhonorato committed Apr 18, 2024
1 parent 1df641c commit 9dcc312
Show file tree
Hide file tree
Showing 13 changed files with 252 additions and 205 deletions.
65 changes: 37 additions & 28 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,56 +1,65 @@
#==============================================================================================
FROM python:3.11 as base
FROM python:3.11 AS base

LABEL author="Rodrigo V. Honorato <r.vargashonorato@uu.nl>"

ARG SOFTWARE_PATH=/opt/software

#------------------------------------------------------------------------------------------
# System dependencies
RUN apt-get update && \
apt-get install -y --no-install-recommends \
build-essential \
libboost-all-dev \
&& \
apt-get clean && rm -rf /var/lib/apt/lists/*

# Copy Whiscy
WORKDIR /opt/software/whiscy
COPY . .

# install BioPython
RUN pip install biopython==1.79

# Build protdist
WORKDIR /opt/software/whiscy/bin/protdist
RUN sh compile.sh

#------------------------------------------------------------------------------------------
# Build Muscle
WORKDIR /opt/software/whiscy/muscle3.8.1551
WORKDIR ${SOFTWARE_PATH}/muscle3.8.1551
RUN curl https://drive5.com/muscle/muscle_src_3.8.1551.tar.gz | tar xzv && \
make && \
mv muscle /opt/software/whiscy/bin/muscle3.8.1551 && \
sed -i "s/\/Users\/bjimenez\/bin\/muscle\/muscle3.8.31_i86darwin64/\/opt\/software\/whiscy\/bin\/muscle3.8.1551/g" /opt/software/whiscy/etc/local.json
make
ENV MUSCLE_BIN=${SOFTWARE_PATH}/muscle3.8.1551/muscle

#------------------------------------------------------------------------------------------
# Build hsspconv
WORKDIR ${SOFTWARE_PATH}
RUN wget https://github.com/cmbi/hssp/archive/3.1.5.tar.gz && \
tar -zxvf 3.1.5.tar.gz && \
cd hssp-3.1.5 && \
./autogen.sh && \
./configure && \
make hsspconv && \
mv hsspconv ../ && \
sed -i "s/\/Users\/bjimenez\/bin\/hssp\/hsspconv/\/opt\/software\/whiscy\/bin\/hsspconv/g" /opt/software/whiscy/etc/local.json

make hsspconv
ENV HSSPCONV_BIN=${SOFTWARE_PATH}/hssp-3.1.5/hsspconv

#------------------------------------------------------------------------------------------
# Build freesasa
WORKDIR ${SOFTWARE_PATH}
RUN wget https://github.com/mittinatten/freesasa/releases/download/2.0.3/freesasa-2.0.3.tar.gz && \
tar -zxvf freesasa-2.0.3.tar.gz && \
cd freesasa-2.0.3 && \
./configure --disable-json --prefix=/opt/software/whiscy/bin/freesasa && \
make && make install

# WHISCY exports
ENV WHISCY_PATH=/opt/software/whiscy
ENV PYTHONPATH="${PYTHONPATH}:${WHISCY_PATH}"
ENV WHISCY_BIN="${WHISCY_PATH}/whiscy.py"
ENV PATH="${WHISCY_PATH}:${WHISCY_PATH}/bin/freesasa/bin:${PATH}"
./configure --disable-json --prefix=`pwd` && \
make && \
make install
ENV FREESASA_BIN=${SOFTWARE_PATH}/freesasa-2.0.3/bin/freesasa

#------------------------------------------------------------------------------------------
# Install Whiscy
WORKDIR ${SOFTWARE_PATH}/whiscy
COPY . .

#------------------------------------------------------------------------------------------
# install BioPython
RUN pip install biopython==1.79

#------------------------------------------------------------------------------------------
# Build protdist
WORKDIR ${SOFTWARE_PATH}/whiscy/bin/protdist
RUN sh compile.sh
ENV PROTDIST_BIN=${SOFTWARE_PATH}/whiscy/bin/protdist/protdist

#------------------------------------------------------------------------------------------
# Set data directory

WORKDIR /data

Expand Down
71 changes: 71 additions & 0 deletions THIRDPARTY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Installation of Third-Party dependencies

These instructions assuming you are installing locally in a Ubuntu Linux system, different steps may be required for other systems.

All will be installed in a `software` directory in your `$HOME` directory.

## System dependendencies

```bash
$ sudo apt-get update && \
sudo apt-get install -y build-essential libboost-all-dev
```

## Muscle

```bash
$ mkdir -p $HOME/software && cd $HOME/software
$ mkdir muscle3.8.1551 && cd muscle3.8.1551
$ wget https://drive5.com/muscle/muscle_src_3.8.1551.tar.gz
$ tar -zxf muscle_src_3.8.1551.tar.gz && rm muscle_src_3.8.1551.tar.gz
$ make
$ export MUSCLE_BIN=$HOME/software/muscle3.8.1551/muscle
```

## Freesasa
```bash
$ mkdir -p $HOME/software && cd $HOME/software
$ wget https://github.com/mittinatten/freesasa/releases/download/2.0.3/freesasa-2.0.3.tar.gz
$ tar -zxf freesasa-2.0.3.tar.gz && rm freesasa-2.0.3.tar.gz
$ cd freesasa-2.0.3
$ ./configure --disable-json --prefix=`pwd`
$ make
$ make install
$ export FREESASA_BIN=$HOME/software/freesasa-2.0.3/bin/freesasa
```

## HSSPCONV
```bash
$ mkdir -p $HOME/software && cd $HOME/software
$ wget https://github.com/cmbi/hssp/archive/3.1.5.tar.gz
$ tar -zxf 3.1.5.tar.gz && rm 3.1.5.tar.gz
$ cd hssp-3.1.5
$ ./autogen.she
$ ./configure
$ make hsspconv
$ export HSSPCONV_BIN=$HOME/software/hssp-3.1.5/hsspconv
```

## Protdist

Protdist is distributed together with WHISCY, you can find it in the `whiscy` directory. We are working on a better way to install this dependency 🙂

```bash
$ mkdir -p $HOME/software && cd $HOME/software
$ git clone https://github.com/haddocking/whiscy
$ mv whiscy/bin/protdist . && rm -rf whiscy
$ cd protdist
$ bash compile.sh
$ export PROTDIST_BIN=$HOME/software/protdist/protdist
```

---

In the end the system variables that define the third-party dependencies should look like this:

```bash
export MUSCLE_BIN=$HOME/software/muscle3.8.1551/muscle
export FREESASA_BIN=$HOME/software/freesasa-2.0.3/bin/freesasa
export HSSPCONV_BIN=$HOME/software/hssp-3.1.5/hsspconv
export PROTDIST_BIN=$HOME/software/protdist/protdist
```
19 changes: 0 additions & 19 deletions etc/local.json

This file was deleted.

38 changes: 38 additions & 0 deletions libwhiscy/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
import os
from pathlib import Path

MUSCLE_BIN = os.environ.get("MUSCLE_BIN")
FREESASA_BIN = os.environ.get("FREESASA_BIN")
HSSPCONV_BIN = os.environ.get("HSSPCONV_BIN")
PROTDIST_BIN = os.environ.get("PROTDIST_BIN")

# Make sure none of them are None
if MUSCLE_BIN is None:
raise ValueError("MUSCLE_BIN not found in system variables")

if FREESASA_BIN is None:
raise ValueError("FREESASA_BIN not found in system variables")

if HSSPCONV_BIN is None:
raise ValueError("HSSPCONV_BIN not found in system variables")

if PROTDIST_BIN is None:
raise ValueError("PROTDIST_BIN not found in system variables")

PARAM_PATH = Path(Path(__file__).parent.parent, "param")


CUTOFF = {
"sa_pred_cutoff": 15.0,
"sa_act_cutoff": 40.0,
"air_cutoff": 0.18,
"air_dist_cutoff": 6.5,
}

AIR = {
"air_pro_percentage": 10.0,
"air_wm_pro_or": 98.52,
"air_wm_whis_or": 0.370515,
"air_wm_pro_and": 55.42,
"air_wm_whis_and": 0.106667,
}
7 changes: 4 additions & 3 deletions libwhiscy/access.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,17 @@
import os
import subprocess

from libwhiscy import FREESASA_BIN


def calculate_accessibility(pdb_file_name, output_file_name):
"""Calculates the SASA using freesasa.
Uses the command line interface and not the Python bindings to be able to get
a RSA NACCESS-format like file.
"""
cmd = "freesasa {} -n 20 --format=rsa --radii=naccess -o {}".format(
pdb_file_name, output_file_name
)
cmd = f"{FREESASA_BIN} {pdb_file_name} -n 20 --format=rsa --radii=naccess -o {output_file_name}"

try:
subprocess.run(cmd, shell=True)
except:
Expand Down
48 changes: 29 additions & 19 deletions libwhiscy/pdbutil.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

"""Util functions involving structure and PDB files"""

import os
Expand All @@ -13,21 +12,26 @@

class NotAlternative(Select):
"""Removes alternative AAs"""

def accept_residue(self, residue):
return (is_aa(residue) and residue.id[2] == ' ')
return is_aa(residue) and residue.id[2] == " "


_hydrogen = re.compile("[123 ]*H.*")


def is_hydrogen(atom):
"""Checks if atom is an hydrogen"""
name = atom.get_id()
name = atom.get_id()
return _hydrogen.match(name)


def download_pdb_structure(pdb_code, pdb_file_name, file_path='.'):
def download_pdb_structure(pdb_code, pdb_file_name, file_path="."):
"""Downloads a PDB structure from the Protein Data Bank"""
pdbl = PDBList()
file_name = pdbl.retrieve_pdb_file(pdb_code, file_format='pdb', pdir=file_path, overwrite=True)
file_name = pdbl.retrieve_pdb_file(
pdb_code, file_format="pdb", pdir=file_path, overwrite=True
)
if os.path.exists(file_name):
os.rename(file_name, pdb_file_name)
else:
Expand All @@ -44,7 +48,7 @@ def get_pdb_sequence(input_pdb_file, chain_id, mapping_output=False, with_gaps=F
residues = list(chain)
for res in residues:
# Remove alternative location residues
if "CA" in res.child_dict and is_aa(res) and res.id[2] == ' ':
if "CA" in res.child_dict and is_aa(res) and res.id[2] == " ":
try:
mapping[res.id[1]] = three_to_one(res.get_resname())
except KeyError:
Expand All @@ -57,35 +61,41 @@ def get_pdb_sequence(input_pdb_file, chain_id, mapping_output=False, with_gaps=F
start, end = res_numbers[0], res_numbers[-1]
missing = sorted(set(range(start, end + 1)).difference(res_numbers))
for m in missing:
mapping[m] = '-'
mapping[m] = "-"

if mapping_output:
return mapping
else:
return ''.join([mapping[k] for k in sorted(mapping.keys())])
return "".join([mapping[k] for k in sorted(mapping.keys())])


def map_protein_to_sequence_alignment(pdb_file, chain_id, sequence, phylip_file, output_file_name):
def map_protein_to_sequence_alignment(
pdb_file, chain_id, sequence, phylip_file, output_file_name
):
"""Creates a dictionary .conv file mapping protein residue numeration to aligment"""
mapping = get_pdb_sequence(pdb_file, chain_id, mapping_output=True)
# Check if sequence is the same
pdb_seq = ''.join([mapping[k] for k in sorted(mapping.keys())])
pdb_seq = "".join([mapping[k] for k in sorted(mapping.keys())])
if pdb_seq != sequence:
raise SystemExit("ERROR: PDB sequence doest not match sequence alignment")

# Account for gaps in phylipseq file
#alignment = list(AlignIO.parse(phylip_file, format='phylip-sequential'))[0]
#master_phylip = alignment[0].seq
#if str(master_phylip.ungap('-')) != pdb_seq:
# alignment = list(AlignIO.parse(phylip_file, format='phylip-sequential'))[0]
# master_phylip = alignment[0].seq
# if str(master_phylip.ungap('-')) != pdb_seq:
# raise SystemExit("ERROR: PDB sequence doest not match sequence alignment in phylip file")

with open(output_file_name, 'w') as output_handle:
output_handle.write("# Conversion table from {} and chain {} to sequence{}".format(pdb_file,
chain_id,
os.linesep))
with open(output_file_name, "w") as output_handle:
output_handle.write(
"# Conversion table from {} and chain {} to sequence{}".format(
pdb_file, chain_id, os.linesep
)
)
seq_res_id = 1
for pdb_res_id in sorted(mapping.keys()):
# Do not map gaps if any
if mapping[pdb_res_id] != '-':
output_handle.write("{0} {1}{2}".format(pdb_res_id, seq_res_id, os.linesep))
if mapping[pdb_res_id] != "-":
output_handle.write(
"{0} {1}{2}".format(pdb_res_id, seq_res_id, os.linesep)
)
seq_res_id += 1
18 changes: 0 additions & 18 deletions tests/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,6 @@

GOLDEN_DATA_PATH = Path(Path(__file__).parent, "golden_data")

WHISCY_PATH = Path(__file__).parent.parent

WHISCY_BIN = Path(WHISCY_PATH, "whiscy.py")

FREESASA_PATH = Path(WHISCY_PATH, "bin", "freesasa", "bin")
FREESASA_EXEC = Path(FREESASA_PATH, "freesasa")

# If `FREESASA_EXEC` is found, add it to the PATH
if FREESASA_EXEC.exists():
import os

os.environ["PATH"] += os.pathsep + str(FREESASA_PATH)
else:
# Fail the tests, we need it to run the tests
raise FileNotFoundError(f"{FREESASA_EXEC} not found")

MUSCLE_BIN = Path(WHISCY_PATH, "bin", "muscle3.8.1551")


PAM_EXPECTED_SCORES = [
0.001239810431396115,
Expand Down
Loading

0 comments on commit 9dcc312

Please sign in to comment.