# Third-party models


In this notebook, we show how to integrate "external" models into pepme.

In [None]:
from functools import partial

import numpy as np

from pepme import compute_metrics, show_table
from pepme.metrics import FBD, ID
from pepme.models import ThirdPartyModel

Some models of interest are not available through e.g., PyPI or Huggingface - only the git repository may be available. Here we show how to run such models in pepme.


An external model is compatible with pepme if it satisfies the following three requirements:

- Repository is accessible using `git clone`, e.g. a public repository.
- Repository dependencies are installable using `pip install .`, e.g., by setup.py or pyproject.toml.
- Contains a function with signature `Callable[[list[str], ...], np.ndarray]` where the first parameter is called `sequences`.


Let's use a toy model in a github repository satisfying all three requirements. To do so, we need to define the function entry point, repository url and save directory.


In [None]:
thirdparty_model = ThirdPartyModel(
    entry_point="pepmem.model:embed",
    repo_url="https://github.com/RasmusML/pepme-models",
    save_dir="../plugins/pepme-models/main",
    # Path to an enviroments python executable.
    # If none, a venv is created using the current python executable.
    python_bin=None,
    branch="main",  # Defaults: whole repository
)

Collecting pip
  Using cached pip-25.1.1-py3-none-any.whl.metadata (3.6 kB)
Using cached pip-25.1.1-py3-none-any.whl (1.8 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 24.0
    Uninstalling pip-24.0:
      Successfully uninstalled pip-24.0
Successfully installed pip-25.1.1


Cloning into '../plugins/pepme-models/main/repo'...


Obtaining file:///Users/rasmus.larsen/work/hackathon-2025/pepme/docs/plugins/pepme-models/main/repo
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Checking if build backend supports build_editable: started
  Checking if build backend supports build_editable: finished with status 'done'
  Getting requirements to build editable: started
  Getting requirements to build editable: finished with status 'done'
  Preparing editable metadata (pyproject.toml): started
  Preparing editable metadata (pyproject.toml): finished with status 'done'
Collecting numpy (from pepmem==0.0.1)
  Using cached numpy-2.3.1-cp311-cp311-macosx_14_0_arm64.whl.metadata (62 kB)
Using cached numpy-2.3.1-cp311-cp311-macosx_14_0_arm64.whl (5.4 MB)
Building wheels for collected packages: pepmem
  Building editable for pepmem (pyproject.toml): started
  Building editable for pepmem (pyproject.toml): finished with status 'done'
  Created wheel for pepmem: filename=pe

`ThirdPartyModel` clones the model repository, creates a virtual enviroment (venv) (if `python_bin=None`) and installs the dependencies using `pip install .`.

Assuming everything went well, let's now compute a metric using this embedding model.


In [None]:
def embedder(seq: list[str]) -> np.ndarray:
    return thirdparty_model(seq, batch_size=32)


embedder(["MKQW", "RKSPL"])

array([[44.,  8., 52., 32.],
       [55., 10., 65., 40.]])

In [None]:
sequences = {
    "HydrAMP": ["MMRK", "RKSPL", "RRLSK", "RRLSK"],
    "hyformer": ["MKQW", "RKSPL"],
    "Random": ["KKKKK", "PLQ", "RKSPL"],
}

metrics = [FBD(reference=sequences["Random"], embedder=embedder)]
df = compute_metrics(sequences, metrics)

show_table(df)

  covmean, err = sqrtm(sigma1.dot(sigma2), disp=False)
100%|██████████| 3/3 [00:00<00:00, 11.80it/s, data=Random, metric=FBD]  


Unnamed: 0,FBD↓
HydrAMP,215.6
hyformer,81.67
Random,0.0


## AMPlify


Let's also use AMPlify which is an antimicrobial peptide (AMP) classifier, i.e., outputs the probability a peptide has antimicrobial properties.

First you need an enviroment with python 3.9 (requirement of AMPlify), e.g., using conda:

In [None]:
# !conda create -n amplify_env python=3.9 -y

Let's setup the model.

In [None]:
thirdparty_model = ThirdPartyModel(
    entry_point="predict:predict",
    repo_url="https://github.com/RasmusML/pepme-models",
    save_dir="../plugins/pepme-models/amplify",
    # Set path to the python3.9 executable. If your default python version is 3.9,
    # then 'python_bin=None' will work out of the box.
    python_bin="/opt/anaconda3/envs/amplify_env/bin/python",
    branch="amplify",
)

Cloning into '../plugins/pepme-models/amplify/repo'...


Obtaining file:///Users/rasmus.larsen/work/hackathon-2025/pepme/docs/plugins/pepme-models/amplify/repo
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Checking if build backend supports build_editable: started
  Checking if build backend supports build_editable: finished with status 'done'
  Getting requirements to build editable: started
  Getting requirements to build editable: finished with status 'done'
  Preparing editable metadata (pyproject.toml): started
  Preparing editable metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: amplify
  Building editable for amplify (pyproject.toml): started
  Building editable for amplify (pyproject.toml): finished with status 'done'
  Created wheel for amplify: filename=amplify-0.0.1-0.editable-py3-none-any.whl size=16947 sha256=7773082d0577eddc54b40e4ad1eb82872c137ed32872f220428d78d343fd0bf2
  Stored in directory: /private/var/folders/dj/lbjr_33

Assuming everything went well, let's now compute a metric using this predictive model.


In [None]:
thirdparty_model(["MKQW", "RKSPL"], model_type="imbalanced", batch_size=128, n_ensembles=2)

array([0.00635906, 0.49806994], dtype=float32)

In [None]:
sequences = {
    "HydrAMP": ["MMRK", "RKSPL", "RRLSK", "RRLSK"],
    "hyformer": ["MKQW", "RKSPL"],
    "Random": ["KKKKK", "PLQ", "RKSPL"],
}

discriminator = partial(thirdparty_model, model_type="balanced", n_ensembles=5, batch_size=128)

metrics = [ID(predictor=discriminator, name="p_AMP (AMPlify)", objective="maximize")]
df = compute_metrics(sequences, metrics)

show_table(df)

100%|██████████| 3/3 [00:15<00:00,  5.09s/it, data=Random, metric=p_AMP (AMPlify)]  


Unnamed: 0,p_AMP (AMPlify)↑
HydrAMP,0.26±0.17
hyformer,0.24±0.24
Random,0.40±0.30


## amPEPpy


Let's also use amPEPpy which is an antimicrobial peptide (AMP) classifier, i.e., outputs the probability a peptide has antimicrobial properties.

First you need an enviroment with python 3.8, e.g., using conda:

In [None]:
# !conda create -n ampep_env python=3.8 -y

Let's setup the model.

In [None]:
thirdparty_model = ThirdPartyModel(
    entry_point="amPEPpy.predict:predict",
    repo_url="https://github.com/RasmusML/pepme-models",
    save_dir="../plugins/pepme-models/ampep",
    python_bin="/opt/anaconda3/envs/ampep_env/bin/python",
    branch="ampeppy",
)

Assuming everything went well, let's now compute a metric using this predictive model.


In [None]:
thirdparty_model(["MKQW", "RKSPL"])

array([0.49427083, 0.28333333])

In [None]:
sequences = {
    "HydrAMP": ["MMRK", "RKSPL", "RRLSK", "RRLSK"],
    "hyformer": ["MKQW", "RKSPL"],
    "Random": ["KKKKK", "PLQ", "RKSPL"],
}

discriminator = thirdparty_model

metrics = [ID(predictor=thirdparty_model, name="p_AMP (amPEPpy)", objective="maximize")]
df = compute_metrics(sequences, metrics)

show_table(df)

100%|██████████| 3/3 [00:03<00:00,  1.27s/it, data=Random, metric=p_AMP (amPEPpy)]  


Unnamed: 0,p_AMP (amPEPpy)↑
HydrAMP,0.41±0.13
hyformer,0.39±0.11
Random,0.39±0.12
