This notebook will show an example on how to use METL models through Hugging Face to predict on more than the sequences allowed by the demo.

The example provided through the notebook uses a pretrained METL model to predict GB1 binding affinity.

First, we will download some dependencies not included in Colab and import the required modules that we will need to download the METL model through 🤗 and predict with it.

In [1]:
# @title Installing libraries not included with colab
!pip install -q biopandas==0.5.1

In [2]:
# @title Download the example pdb file
!wget -O 2qmt_p.pdb https://raw.githubusercontent.com/gitter-lab/metl-pretrained/main/pdbs/2qmt_p.pdb

--2025-05-05 20:03:28--  https://raw.githubusercontent.com/gitter-lab/metl-pretrained/main/pdbs/2qmt_p.pdb
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 78764 (77K) [text/plain]
Saving to: ‘2qmt_p.pdb’


2025-05-05 20:03:28 (3.38 MB/s) - ‘2qmt_p.pdb’ saved [78764/78764]



In [3]:
# @title Importing required libraries
from transformers import AutoModel, AutoConfig, logging
import ipywidgets as widgets
from IPython.display import clear_output, HTML, display
import pandas as pd
import torch
import io
import json
import biopandas

logging.set_verbosity_error()

# Declaring this here so that it's available regardless if later cells are run or not
variant_file = None
pdb_file_path = '2qmt_p.pdb'

def to_zero_based(variants):
    zero_based = []
    for line in variants:
        line_as_json = json.loads(line)
        new_variants = []
        for variant in line_as_json:
            new_variant = []
            mutations = variant.split(',')
            for mutation in mutations:
                residue_zero_based = int(mutation[1:-1]) - 1
                new_variant.append(f"{mutation[0]}{residue_zero_based}{mutation[-1]}")
            new_variants.append(",".join(new_variant))
        zero_based.append(new_variants)

    return zero_based

We will then load a METL model through the 🤗 API. trust_remote_code=True is required to use METL models through 🤗.

In [4]:
# @title Loading METL from 🤗
metl = AutoModel.from_pretrained('gitter-lab/METL', trust_remote_code=True)

The METL 🤗 wrapper requires the loading of the specific METL model after initialization of the `metl` variable above. Use the dropdown below to select a model to use for predicting.

The publicly available METL models are hosted on [Zenodo](https://zenodo.org/doi/10.5281/zenodo.11051644). The [metl-pretrained](https://github.com/gitter-lab/metl-pretrained#available-models) repo provides a table describing the available models.

In [5]:
# @title Available METL models
# @markdown Use this dropdown to choose a METL model. Running the cell will load the selected model by UUID.
dropdown_selection = "Source / Global / 1D — [METL-G-20M-1D] (D72M9aEp)"  # @param ["Source / Global / 1D — [METL-G-20M-1D] (D72M9aEp)", "Source / Global / 3D — [METL-G-20M-3D] (Nr9zCKpR)", "Source / Global / 1D — [METL-G-50M-1D] (auKdzzwX)", "Source / Global / 3D — [METL-G-50M-3D] (6PSAzdfv)", "Source / Local / GFP / 1D — [METL-L-2M-1D-GFP] (8gMPQJy4)", "Source / Local / GFP / 3D — [METL-L-2M-3D-GFP] (Hr4GNHws)", "Source / Local / DLG4 / 1D — [METL-L-2M-1D-DLG4_2022] (8iFoiYw2)", "Source / Local / DLG4 / 3D — [METL-L-2M-3D-DLG4_2022] (kt5DdWTa)", "Source / Local / GB1 / 1D — [METL-L-2M-1D-GB1] (DMfkjVzT)", "Source / Local / GB1 / 3D — [METL-L-2M-3D-GB1] (epegcFiH)", "Source / Local / GRB2 / 1D — [METL-L-2M-1D-GRB2] (kS3rUS7h)", "Source / Local / GRB2 / 3D — [METL-L-2M-3D-GRB2] (X7w83g6S)", "Source / Local / Pab1 / 1D — [METL-L-2M-1D-Pab1] (UKebCQGz)", "Source / Local / Pab1 / 3D — [METL-L-2M-3D-Pab1] (2rr8V4th)", "Source / Local / PTEN / 1D — [METL-L-2M-1D-PTEN] (CEMSx7ZC)", "Source / Local / PTEN / 3D — [METL-L-2M-3D-PTEN] (PjxR5LW7)", "Source / Local / TEM-1 / 1D — [METL-L-2M-1D-TEM-1] (PREhfC22)", "Source / Local / TEM-1 / 3D — [METL-L-2M-3D-TEM-1] (9ASvszux)", "Source / Local / Ube4b / 1D — [METL-L-2M-1D-Ube4b] (HscFFkAb)", "Source / Local / Ube4b / 3D — [METL-L-2M-3D-Ube4b] (H48oiNZN)", "METL-BIND / Source / Local / GB1 / 3D — [METL-BIND-2M-3D-GB1-STANDARD] (K6mw24Rg)", "METL-BIND / Source / Local / GB1 / 3D — [METL-BIND-2M-3D-GB1-BINDING] (Bo5wn2SG)", "Finetuned / Global / GFP / 1D — (PeT2D92j)", "Finetuned / Global / GFP / 3D — (6JBzHpkQ)", "Finetuned / Global / DLG4-Abundance / 1D — (4Rh3WCbG)", "Finetuned / Global / DLG4-Abundance / 3D — (RBtqxzvu)", "Finetuned / Global / DLG4-Binding / 1D — (4xbuC5y7)", "Finetuned / Global / DLG4-Binding / 3D — (BuvxgE2x)", "Finetuned / Global / GB1 / 1D — (dAndZfJ4)", "Finetuned / Global / GB1 / 3D — (9vSB3DRM)", "Finetuned / Global / GRB2-Abundance / 1D — (HenDpDWe)", "Finetuned / Global / GRB2-Abundance / 3D — (dDoCCvfr)", "Finetuned / Global / GRB2-Binding / 1D — (cvnycE5Q)", "Finetuned / Global / GRB2-Binding / 3D — (jYesS9Ki)", "Finetuned / Global / Pab1 / 1D — (ho54gxzv)", "Finetuned / Global / Pab1 / 3D — (jhbL2FeB)", "Finetuned / Global / PTEN-Abundance / 1D — (UEuMtmfx)", "Finetuned / Global / PTEN-Abundance / 3D — (eJPPQYEW)", "Finetuned / Global / PTEN-Activity / 1D — (U3X8mSeT)", "Finetuned / Global / PTEN-Activity / 3D — (4gqYnW6V)", "Finetuned / Global / TEM-1 / 1D — (ELL4GGQq)", "Finetuned / Global / TEM-1 / 3D — (K6BjsWXm)", "Finetuned / Global / Ube4b / 1D — (BAWw23vW)", "Finetuned / Global / Ube4b / 3D — (G9piq6WH)", "Finetuned / Local / GFP / 1D — (HaUuRwfE)", "Finetuned / Local / GFP / 3D — (LWEY95Yb)", "Finetuned / Local / DLG4-Abundance / 1D — (RMFA6dnX)", "Finetuned / Local / DLG4-Abundance / 3D — (V3uTtXVe)", "Finetuned / Local / DLG4-Binding / 1D — (YdzBYWHs)", "Finetuned / Local / DLG4-Binding / 3D — (iu6ZahPw)", "Finetuned / Local / GB1 / 1D — (Pgcseywk)", "Finetuned / Local / GB1 / 3D — (UvMMdsq4)", "Finetuned / Local / GRB2-Abundance / 1D — (VNpi9Zjt)", "Finetuned / Local / GRB2-Abundance / 3D — (PqBMjXkA)", "Finetuned / Local / GRB2-Binding / 1D — (Z59BhUaE)", "Finetuned / Local / GRB2-Binding / 3D — (VwcRN6UB)", "Finetuned / Local / Pab1 / 1D — (TdjCzoQQ)", "Finetuned / Local / Pab1 / 3D — (5SjoLx3y)", "Finetuned / Local / PTEN-Abundance / 1D — (oUScGeHo)", "Finetuned / Local / PTEN-Abundance / 3D — (DhuasDEr)", "Finetuned / Local / PTEN-Activity / 1D — (m9UsG7dq)", "Finetuned / Local / PTEN-Activity / 3D — (8Vi7ENcC)", "Finetuned / Local / TEM-1 / 1D — (64ncFxBR)", "Finetuned / Local / TEM-1 / 3D — (PncvgiJU)", "Finetuned / Local / Ube4b / 1D — (e9uhhnAv)", "Finetuned / Local / Ube4b / 3D — (NfbZL7jK)", "GFP DESIGN / Finetuned / Local / GFP / 1D — (YoQkzoLD)", "GFP DESIGN / Finetuned / Local / GFP / 3D — (PEkeRuxb)"]

uuid_lookup = {
    "Source / Global / 1D — [METL-G-20M-1D] (D72M9aEp)": "D72M9aEp",
    "Source / Global / 3D — [METL-G-20M-3D] (Nr9zCKpR)": "Nr9zCKpR",
    "Source / Global / 1D — [METL-G-50M-1D] (auKdzzwX)": "auKdzzwX",
    "Source / Global / 3D — [METL-G-50M-3D] (6PSAzdfv)": "6PSAzdfv",
    "Source / Local / GFP / 1D — [METL-L-2M-1D-GFP] (8gMPQJy4)": "8gMPQJy4",
    "Source / Local / GFP / 3D — [METL-L-2M-3D-GFP] (Hr4GNHws)": "Hr4GNHws",
    "Source / Local / DLG4 / 1D — [METL-L-2M-1D-DLG4_2022] (8iFoiYw2)": "8iFoiYw2",
    "Source / Local / DLG4 / 3D — [METL-L-2M-3D-DLG4_2022] (kt5DdWTa)": "kt5DdWTa",
    "Source / Local / GB1 / 1D — [METL-L-2M-1D-GB1] (DMfkjVzT)": "DMfkjVzT",
    "Source / Local / GB1 / 3D — [METL-L-2M-3D-GB1] (epegcFiH)": "epegcFiH",
    "Source / Local / GRB2 / 1D — [METL-L-2M-1D-GRB2] (kS3rUS7h)": "kS3rUS7h",
    "Source / Local / GRB2 / 3D — [METL-L-2M-3D-GRB2] (X7w83g6S)": "X7w83g6S",
    "Source / Local / Pab1 / 1D — [METL-L-2M-1D-Pab1] (UKebCQGz)": "UKebCQGz",
    "Source / Local / Pab1 / 3D — [METL-L-2M-3D-Pab1] (2rr8V4th)": "2rr8V4th",
    "Source / Local / PTEN / 1D — [METL-L-2M-1D-PTEN] (CEMSx7ZC)": "CEMSx7ZC",
    "Source / Local / PTEN / 3D — [METL-L-2M-3D-PTEN] (PjxR5LW7)": "PjxR5LW7",
    "Source / Local / TEM-1 / 1D — [METL-L-2M-1D-TEM-1] (PREhfC22)": "PREhfC22",
    "Source / Local / TEM-1 / 3D — [METL-L-2M-3D-TEM-1] (9ASvszux)": "9ASvszux",
    "Source / Local / Ube4b / 1D — [METL-L-2M-1D-Ube4b] (HscFFkAb)": "HscFFkAb",
    "Source / Local / Ube4b / 3D — [METL-L-2M-3D-Ube4b] (H48oiNZN)": "H48oiNZN",
    "METL-BIND / Source / Local / GB1 / 3D — [METL-BIND-2M-3D-GB1-STANDARD] (K6mw24Rg)": "K6mw24Rg",
    "METL-BIND / Source / Local / GB1 / 3D — [METL-BIND-2M-3D-GB1-BINDING] (Bo5wn2SG)": "Bo5wn2SG",
    "Finetuned / Global / GFP / 1D — (PeT2D92j)": "PeT2D92j",
    "Finetuned / Global / GFP / 3D — (6JBzHpkQ)": "6JBzHpkQ",
    "Finetuned / Global / DLG4-Abundance / 1D — (4Rh3WCbG)": "4Rh3WCbG",
    "Finetuned / Global / DLG4-Abundance / 3D — (RBtqxzvu)": "RBtqxzvu",
    "Finetuned / Global / DLG4-Binding / 1D — (4xbuC5y7)": "4xbuC5y7",
    "Finetuned / Global / DLG4-Binding / 3D — (BuvxgE2x)": "BuvxgE2x",
    "Finetuned / Global / GB1 / 1D — (dAndZfJ4)": "dAndZfJ4",
    "Finetuned / Global / GB1 / 3D — (9vSB3DRM)": "9vSB3DRM",
    "Finetuned / Global / GRB2-Abundance / 1D — (HenDpDWe)": "HenDpDWe",
    "Finetuned / Global / GRB2-Abundance / 3D — (dDoCCvfr)": "dDoCCvfr",
    "Finetuned / Global / GRB2-Binding / 1D — (cvnycE5Q)": "cvnycE5Q",
    "Finetuned / Global / GRB2-Binding / 3D — (jYesS9Ki)": "jYesS9Ki",
    "Finetuned / Global / Pab1 / 1D — (ho54gxzv)": "ho54gxzv",
    "Finetuned / Global / Pab1 / 3D — (jhbL2FeB)": "jhbL2FeB",
    "Finetuned / Global / PTEN-Abundance / 1D — (UEuMtmfx)": "UEuMtmfx",
    "Finetuned / Global / PTEN-Abundance / 3D — (eJPPQYEW)": "eJPPQYEW",
    "Finetuned / Global / PTEN-Activity / 1D — (U3X8mSeT)": "U3X8mSeT",
    "Finetuned / Global / PTEN-Activity / 3D — (4gqYnW6V)": "4gqYnW6V",
    "Finetuned / Global / TEM-1 / 1D — (ELL4GGQq)": "ELL4GGQq",
    "Finetuned / Global / TEM-1 / 3D — (K6BjsWXm)": "K6BjsWXm",
    "Finetuned / Global / Ube4b / 1D — (BAWw23vW)": "BAWw23vW",
    "Finetuned / Global / Ube4b / 3D — (G9piq6WH)": "G9piq6WH",
    "Finetuned / Local / GFP / 1D — (HaUuRwfE)": "HaUuRwfE",
    "Finetuned / Local / GFP / 3D — (LWEY95Yb)": "LWEY95Yb",
    "Finetuned / Local / DLG4-Abundance / 1D — (RMFA6dnX)": "RMFA6dnX",
    "Finetuned / Local / DLG4-Abundance / 3D — (V3uTtXVe)": "V3uTtXVe",
    "Finetuned / Local / DLG4-Binding / 1D — (YdzBYWHs)": "YdzBYWHs",
    "Finetuned / Local / DLG4-Binding / 3D — (iu6ZahPw)": "iu6ZahPw",
    "Finetuned / Local / GB1 / 1D — (Pgcseywk)": "Pgcseywk",
    "Finetuned / Local / GB1 / 3D — (UvMMdsq4)": "UvMMdsq4",
    "Finetuned / Local / GRB2-Abundance / 1D — (VNpi9Zjt)": "VNpi9Zjt",
    "Finetuned / Local / GRB2-Abundance / 3D — (PqBMjXkA)": "PqBMjXkA",
    "Finetuned / Local / GRB2-Binding / 1D — (Z59BhUaE)": "Z59BhUaE",
    "Finetuned / Local / GRB2-Binding / 3D — (VwcRN6UB)": "VwcRN6UB",
    "Finetuned / Local / Pab1 / 1D — (TdjCzoQQ)": "TdjCzoQQ",
    "Finetuned / Local / Pab1 / 3D — (5SjoLx3y)": "5SjoLx3y",
    "Finetuned / Local / PTEN-Abundance / 1D — (oUScGeHo)": "oUScGeHo",
    "Finetuned / Local / PTEN-Abundance / 3D — (DhuasDEr)": "DhuasDEr",
    "Finetuned / Local / PTEN-Activity / 1D — (m9UsG7dq)": "m9UsG7dq",
    "Finetuned / Local / PTEN-Activity / 3D — (8Vi7ENcC)": "8Vi7ENcC",
    "Finetuned / Local / TEM-1 / 1D — (64ncFxBR)": "64ncFxBR",
    "Finetuned / Local / TEM-1 / 3D — (PncvgiJU)": "PncvgiJU",
    "Finetuned / Local / Ube4b / 1D — (e9uhhnAv)": "e9uhhnAv",
    "Finetuned / Local / Ube4b / 3D — (NfbZL7jK)": "NfbZL7jK",
    "GFP DESIGN / Finetuned / Local / GFP / 1D — (YoQkzoLD)": "YoQkzoLD",
    "GFP DESIGN / Finetuned / Local / GFP / 3D — (PEkeRuxb)": "PEkeRuxb"
}

uuid = uuid_lookup[dropdown_selection]
metl.load_from_uuid(uuid)


Depending on the model chosen, different files might be needed. This example is setup to use metl-l-2m-3d-gb1 and will need a few inputs for prediction.

Specifically, for this 3d GB1 model we will need:
- a wild type sequnece
- a PDB structure file (as this is a 3d model)
- variants to use with METL

In [6]:
# @title Protein wild type sequence
# @markdown Enter the wild type of your protein here. The wildtype for gb1 is provided to use with the default model example here.
wildtype = 'MQYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTE' # @param {type:"string", placeholder:"Enter a wildtype here"}

In [7]:
# @title PDB file upload
# @markdown If your model needs a PDB file, run this cell and upload the file with the provided button that appears below.
# @markdown
# @markdown If you would like to change the file, simply upload another one. The last uploaded file will be what is used.
# @markdown If you would like to predict with the pre-loaded GB1 model, download [this pdb file](https://github.com/gitter-lab/metl-pretrained/blob/main/pdbs/2qmt_p.pdb)

def update_pdb_file(file_name):
  global pdb_file_path
  for name, data in file_name['new'].items():
    clear_output()
    display(pdb_upload)
    print(f"Selected file: {name}")
    pdb_file_path = f'./{name}'

    with open(name, 'wb') as f:
      f.write(data['content'])

pdb_upload = widgets.FileUpload(
    accept='.pdb',
    multiple=False
)
pdb_upload.observe(update_pdb_file, names='value')
pdb_upload

FileUpload(value={}, accept='.pdb', description='Upload')

Lastly, we will then collect some variants. The code in this notebook supports variants in JSON list format. Either enter a JSON-formatted list of variants in the text box below or upload a variant JSON file in the following cell. Run the cell to active the text entry or file upload.

In [8]:
# @title Variant text input
# @markdown The placehold variants below use 0-based indexing.
variants_string = """["T17P,T54F", "V28L,F51A", "T17P,V28L,F51A,T54F"]
["T13P,T33F"]"""
style = {'description_width':'initial'}

variant_text = widgets.Textarea(
    value='',
    placeholder=variants_string,
    description='Variant String:',
    disabled=False,
    style = style,
    layout=widgets.Layout(height='100px', width='500px'),
)

variant_text.add_class('variant_text_area')

style = """
<style>
  .variant_text_area > textarea::placeholder {
    color: var(--colab-primary-text-color);
  }

  .variant_text_area > textarea {
    background-color: var(--colab-secondary-surface-color);
    color: var(--colab-primary-text-color);
  }
</style>
"""

display(HTML(style))
display(variant_text)

Textarea(value='', description='Variant String:', layout=Layout(height='100px', width='500px'), placeholder='[…

If you would rather upload a file, run the cell below and use it to upload a file. If a file is uploaded, the input above will not be looked at for variants


In [9]:
# @title Variant file upload
# @markdown If you want to upload a variant JSON file, run this cell and upload the file with the provided button that appears below.


def update_variant_file(button_input):
  global variant_file
  for name, data in button_input['new'].items():
    clear_output()
    display(variant_upload)
    print(f'Loaded file: {name}')
    variant_file = data['content'].decode('utf-8').splitlines()

variant_upload = widgets.FileUpload(
    accept='.json, .txt',
    multiple=False
)

variant_upload.observe(update_variant_file, names='value')
variant_upload

FileUpload(value={}, accept='.json, .txt', description='Upload')

In [10]:
# @title Variant selecting logic (always run this)

clear_output()
if len(variant_text.value) > 0:
  print("Using text area input")
  variants = variant_text.value
elif variant_file:
  print("Using variants file")
  variants = variant_file
else:
  print("Using variant placeholder")
  variants = variant_text.placeholder.splitlines()

Using variant placeholder


For biologists, 1-based indexing is commonly used. However, METL models were designed to used 0-based indexing. If one-based indexing is needed, select it in the dropdown below.

In [11]:
# @title Transform input from 1-based indexing to 0-based indexing
# @markdown Select indexing for residue mutations
indexing = "0" # @param ['0', '1']

Since both file and string variants give the same result, we only need to use one moving forwards. We will use the string_variants variable.

To predict with METL, we will need to use the loaded model and encoder with our variables we defined above. We will wrap this in a for loop to predict on all of our variants as we have multiple lines of them.

In [12]:
# @title METL predicting loop
output = []

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

metl = metl.to(device)

if indexing == "1":
  predict_variants = to_zero_based(variants)
else:
  predict_variants = variants

for variant in predict_variants:
    # First in METL we need to encode our variants
    if not isinstance(variant, list):
      variant = json.loads(variant)
    encoded_variants = metl.encoder.encode_variants(wildtype, variant)

    #Next, we predict
    with torch.no_grad():
        if pdb_file_path:
            predictions = metl(torch.tensor(encoded_variants).to(device), pdb_fn=pdb_file_path)
        else:
            predictions = metl(torch.tensor(encoded_variants).to(device))

        output.append({
            "wt": wildtype,
            "variants": variant,
            "output": predictions.tolist()
        })

In [13]:
# @title Display METL preditions
from IPython.display import Javascript

display(Javascript('''google.colab.output.setIframeHeight(0, true, {maxHeight: 300})'''))
print(json.dumps(output, indent=2))

<IPython.core.display.Javascript object>

[
  {
    "wt": "MQYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTE",
    "variants": [
      "T17P,T54F",
      "V28L,F51A",
      "T17P,V28L,F51A,T54F"
    ],
    "output": [
      [
        4.2775397300720215,
        3.786275863647461,
        -1.6370080709457397,
        0.1933237612247467,
        -0.13506530225276947,
        -1.456122875213623,
        0.40609604120254517,
        -0.14475120604038239,
        1.3374593257904053,
        2.163062572479248,
        0.4950665831565857,
        0.4560806155204773,
        -1.0170670747756958,
        2.858417510986328,
        5.019073963165283,
        0.013847976922988892,
        2.2520668506622314,
        0.967809796333313,
        0.4779720604419708,
        -2.347883701324463,
        -0.13269801437854767,
        -1.4652981758117676,
        -1.1244169473648071,
        -1.6959165334701538,
        -1.374971866607666,
        -0.6227024793624878,
        -0.8287121057510376,
        1.7804256677627563,
        2.1719

Finally, we will save our output. We will save our output as a list of JSON Objects. Access the saved output.json file with the Files icon to the left.

In [14]:
# @title Saving the predictions
with open('./output.json', 'w') as f:
    f.write(json.dumps(output, indent=2))