# Third-party models


In [None]:
from typing import Literal

import numpy as np

from pepme.core import compute_metrics, show_table
from pepme.metrics.fid import FID
from pepme.metrics.id import ID
from pepme.third_party import ThirdPartyModel

Some models of interest are not available through e.g., PyPI or Huggingface - only the git repository may be available. Here we show how to run such models in pepme.


An external model is compatible with pepme if satisfies the following three requirements:

- Repository is accessible using `git clone`, e.g. a public repository.
- Repository dependencies are installable using `pip install .`, e.g., by setup.py or pyproject.toml.
- Contains a function with signature `Callable[[list[str], ...], np.ndarray]` where the first parameter is called `sequences`.


Let's use a toy model in a github repository satisfying all three requirements. To do so, we need to define the function entry point, repository url and save directory.


In [None]:
thirdparty_model = ThirdPartyModel(
    entry_point="pepmem.model:embed",
    repo_url="git+https://github.com/RasmusML/pepme-models",
    save_dir="../plugins/pepme-models/main",
    # Path to an enviroments python executable.
    # If none, a venv is created using the current python executable.
    python_bin=None,
    # Branch other than 'main'
    # branch="embed-1",
)

`ThirdPartyModel` clones the model repository, creates a virtual enviroment (venv) (if `python_bin=None`) and installs the dependencies using `pip install .`.

Assuming everything went well, let's now compute a metric using this embedding model.


In [None]:
def embedder(seq: list[str]) -> np.ndarray:
    return thirdparty_model(seq, batch_size=32)


embedder(["MKQW", "RKSPL"])

array([[44.,  8., 52., 32.],
       [55., 10., 65., 40.]])

In [None]:
sequences = {
    "HydrAMP": ["MMRK", "RKSPL", "RRLSK", "RRLSK"],
    "hyformer": ["MKQW", "RKSPL"],
    "Random": ["KKKKK", "PLQ", "RKSPL"],
}

metrics = [FID(reference=sequences["Random"], embedder=embedder)]
df = compute_metrics(sequences, metrics)

show_table(df)

100%|██████████| 3/3 [00:00<00:00, 14.07it/s, data=Random, metric=FID]  


Unnamed: 0,FID↓
HydrAMP,215.6
hyformer,81.67
Random,0.0


## AMPlify


Let's also use AMPlify which is an AMP classifier.


In [None]:
thirdparty_model = ThirdPartyModel(
    entry_point="predict:predict",
    repo_url="git+https://github.com/RasmusML/pepme-models",
    save_dir="../plugins/pepme-models/amplify",
    python_bin="/opt/anaconda3/envs/amplify/bin/python",
    branch="amplify",
)

Assuming everything went well, let's now compute a metric using this embedding model.


In [None]:
def discriminator(
    seq: list[str],
    model_type: Literal["balanced", "imbalanced"] = "balanced",
    n_ensembles: int = 5,
) -> np.ndarray:
    return thirdparty_model(seq, model_type=model_type, n_ensembles=n_ensembles)


discriminator(["MKQW", "RKSPL"], model_type="imbalanced", n_ensembles=1)

array([0.01271812, 0.7690338 ], dtype=float32)

In [None]:
sequences = {
    "HydrAMP": ["MMRK", "RKSPL", "RRLSK", "RRLSK"],
    "hyformer": ["MKQW", "RKSPL"],
    "Random": ["KKKKK", "PLQ", "RKSPL"],
}

metrics = [ID(predictor=discriminator, name="p_AMP (AMPlify)", objective="maximize")]
df = compute_metrics(sequences, metrics)

show_table(df)

100%|██████████| 3/3 [00:14<00:00,  4.97s/it, data=Random, metric=p_AMP (AMPlify)]  


Unnamed: 0,p_AMP (AMPlify)↑
HydrAMP,0.26±0.17
hyformer,0.24±0.24
Random,0.40±0.30
