# Sequence Optimization Demo: Disordered, Self-Interacting, and Non-Interacting Sequence

This notebook demonstrates how to use the `SequenceOptimizer` class to generate a protein sequence that:
- Is predicted to be fully disordered (using `FractionDisorder`)
- Interacts strongly with itself (using `SelfEpsilon`)
- Does **not** interact with a specified target sequence (using `EpsilonByValue`)

We will use the `goose` package and its optimization backend for this demonstration.

In [1]:
# Import required libraries
import goose
from goose.optimize import SequenceOptimizer
from goose.backend.optimizer_properties import FractionDisorder, SelfEpsilon, EpsilonByValue
from sparrow.protein import Protein
import numpy as np
import metapredict as meta

# For reproducibility
import random
random.seed(42)
np.random.seed(42)

## Setup: Define Parameters and Target Sequence

We will define the target sequence length, the sequence we want to avoid interacting with, and the property targets for the optimizer.

In [2]:
# Define parameters
sequence_length = 50  # Target length for the designed sequence

target_sequence = "MASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGT"  # Example target sequence to avoid interacting with

# Set property targets
fraction_disorder_target = 1.0  # Fully disordered
self_epsilon_target = -10.0     # Strong self-attraction (negative value)
epsilon_by_value_target = 0.0   # No interaction with the target sequence

# You may adjust these values as needed for your use case.

## Initialize the SequenceOptimizer and Add Properties

We will create a `SequenceOptimizer` instance and add the three properties to optimize:

In [3]:
# Initialize the optimizer
optimizer = SequenceOptimizer(target_length=sequence_length, gap_to_report=100, num_shuffles=1)

# Add FractionDisorder property (maximize disorder)
optimizer.add_property(FractionDisorder, target_value=fraction_disorder_target, weight=1.0)

# Add SelfEpsilon property (maximize self-attraction)
optimizer.add_property(SelfEpsilon, target_value=self_epsilon_target, weight=1.0, model='mpipi')

# Add EpsilonByValue property (minimize interaction with target_sequence)
optimizer.add_property(EpsilonByValue, target_value=epsilon_by_value_target, target_sequence=target_sequence, weight=1.0, model='mpipi')

# Set optimization parameters
optimizer.set_optimization_params(max_iterations=2000, tolerance=0.01, shuffle_interval=5)

2025-05-12 11:19:38,481 - INFO - Using amino_acids.py for kmer properties
2025-05-12 11:19:38,482 - INFO - Added new property FractionDisorder
2025-05-12 11:19:38,483 - INFO - Added new property SelfEpsilon
2025-05-12 11:19:38,484 - INFO - Added new property EpsilonByValue
2025-05-12 11:19:38,482 - INFO - Added new property FractionDisorder
2025-05-12 11:19:38,483 - INFO - Added new property SelfEpsilon
2025-05-12 11:19:38,484 - INFO - Added new property EpsilonByValue


## Run the Optimization

We will now run the optimizer to generate a sequence that meets the specified criteria.

In [4]:
# Run the optimizer
designed_sequence = optimizer.run()
print("Optimized Sequence:")
print(designed_sequence)

2025-05-12 11:19:53,048 - INFO - Starting sequence optimization
  0%|          | 0/2000 [00:00<?, ?it/s]

Iteration 0: Best Error = 12.425006993258844
Iteration 0: Best Sequence = ISTHCVWTNLASFFVWFNACAHEVHTNCRHVVICAQHSMGRKKPLSEMTG
Iteration 0: Target FractionDisorder = 1.000
Iteration 0: Current FractionDisorder = 1.000
Iteration 0: Target SelfEpsilon = -10.000
Iteration 0: Current SelfEpsilon = -0.257
Iteration 0: Target EpsilonByValue = 0.000
Iteration 0: Current EpsilonByValue = -2.682


Best Error = 6.70:   5%|▌         | 100/2000 [00:00<00:13, 145.93it/s]

Iteration 100: Best Error = 6.6995574247407115
Iteration 100: Best Sequence = KMYFNDFGYMYLYQSGWAWTGRWYMYFDGFFIIVSHYHPCEFYMYALVQN
Iteration 100: Target FractionDisorder = 1.000
Iteration 100: Current FractionDisorder = 0.580
Iteration 100: Target SelfEpsilon = -10.000
Iteration 100: Current SelfEpsilon = -8.720
Iteration 100: Target EpsilonByValue = 0.000
Iteration 100: Current EpsilonByValue = -6.117


Best Error = 6.30:  10%|█         | 200/2000 [00:01<00:12, 146.97it/s]

Iteration 200: Best Error = 6.295142213027799
Iteration 200: Best Sequence = KMYFNDFIYMYLYQSIWAWTGRWYMYFDCFFIIVSHYHPCEFYMYALVQM
Iteration 200: Target FractionDisorder = 1.000
Iteration 200: Current FractionDisorder = 0.660
Iteration 200: Target SelfEpsilon = -10.000
Iteration 200: Current SelfEpsilon = -7.994
Iteration 200: Target EpsilonByValue = 0.000
Iteration 200: Current EpsilonByValue = -5.020


Best Error = 6.21:  15%|█▌        | 300/2000 [00:02<00:11, 144.16it/s]

Iteration 300: Best Error = 6.214579117899744
Iteration 300: Best Sequence = KLYFNDFIYMYLYQSIWAWTGRWYLYFDTFFIIVSHYHPCEFYMYALVQM
Iteration 300: Target FractionDisorder = 1.000
Iteration 300: Current FractionDisorder = 1.000
Iteration 300: Target SelfEpsilon = -10.000
Iteration 300: Current SelfEpsilon = -11.739
Iteration 300: Target EpsilonByValue = 0.000
Iteration 300: Current EpsilonByValue = -7.327


Best Error = 6.20:  20%|██        | 400/2000 [00:02<00:10, 145.94it/s]

Iteration 400: Best Error = 6.201095650693099
Iteration 400: Best Sequence = KLYFNDFIYMYLYQSIWAWTGRWYLYFDTFFIIVSHYHPCEFYIYALVQM
Iteration 400: Target FractionDisorder = 1.000
Iteration 400: Current FractionDisorder = 0.780
Iteration 400: Target SelfEpsilon = -10.000
Iteration 400: Current SelfEpsilon = -8.701
Iteration 400: Target EpsilonByValue = 0.000
Iteration 400: Current EpsilonByValue = -5.375


Best Error = 5.85:  25%|██▌       | 500/2000 [00:03<00:10, 147.00it/s]

Iteration 500: Best Error = 5.852295966675451
Iteration 500: Best Sequence = KLYFNDFIYMYLYQSIWIITGQWYIYFDVFFIIVSHYHPIEFYIYAIVQM
Iteration 500: Target FractionDisorder = 1.000
Iteration 500: Current FractionDisorder = 0.980
Iteration 500: Target SelfEpsilon = -10.000
Iteration 500: Current SelfEpsilon = -8.675
Iteration 500: Target EpsilonByValue = 0.000
Iteration 500: Current EpsilonByValue = -5.164


Best Error = 4.68:  30%|███       | 600/2000 [00:04<00:09, 147.32it/s]

Iteration 600: Best Error = 4.677178142444644
Iteration 600: Best Sequence = ILYLNEFIYMYLYQSIWIITQQWYIYLDVFFIIVDYYLIIEFYIYAIVHM
Iteration 600: Target FractionDisorder = 1.000
Iteration 600: Current FractionDisorder = 0.780
Iteration 600: Target SelfEpsilon = -10.000
Iteration 600: Current SelfEpsilon = -8.763
Iteration 600: Target EpsilonByValue = 0.000
Iteration 600: Current EpsilonByValue = -4.085


Best Error = 2.93:  35%|███▌      | 700/2000 [00:04<00:08, 146.09it/s]

Iteration 700: Best Error = 2.9253123645169454
Iteration 700: Best Sequence = IIYVFEHIYMYLYHSIWIITQCWVIVIDVHLIIVDYYVIIEVYIVAIVHL
Iteration 700: Target FractionDisorder = 1.000
Iteration 700: Current FractionDisorder = 0.000
Iteration 700: Target SelfEpsilon = -10.000
Iteration 700: Current SelfEpsilon = -4.449
Iteration 700: Target EpsilonByValue = 0.000
Iteration 700: Current EpsilonByValue = -2.189


Best Error = 2.73:  40%|████      | 800/2000 [00:05<00:08, 145.40it/s]

Iteration 800: Best Error = 2.7313772020963616
Iteration 800: Best Sequence = IIYVFDHIFMYLYHSIWIITQCWVIVIDVHLIIVHFYVIIEVYIVAIVHL
Iteration 800: Target FractionDisorder = 1.000
Iteration 800: Current FractionDisorder = 1.000
Iteration 800: Target SelfEpsilon = -10.000
Iteration 800: Current SelfEpsilon = -9.711
Iteration 800: Target EpsilonByValue = 0.000
Iteration 800: Current EpsilonByValue = -3.026


Best Error = 2.58:  45%|████▌     | 900/2000 [00:06<00:07, 145.21it/s]

Iteration 900: Best Error = 2.5834390891639227
Iteration 900: Best Sequence = IIYVFDHIMIYLYHSIWIITQCWVIVIDVHLIIVHFYVIIDVYIVAIVHL
Iteration 900: Target FractionDisorder = 1.000
Iteration 900: Current FractionDisorder = 0.080
Iteration 900: Target SelfEpsilon = -10.000
Iteration 900: Current SelfEpsilon = -7.038
Iteration 900: Target EpsilonByValue = 0.000
Iteration 900: Current EpsilonByValue = -2.761


Best Error = 2.00:  50%|█████     | 1000/2000 [00:06<00:06, 144.74it/s]

Iteration 1000: Best Error = 2.001503852871558
Iteration 1000: Best Sequence = IIYVFDHIIIYSYHSIVIIPQCWVIVIDVHLIIVHAYVIIDVYIVIIVHL
Iteration 1000: Target FractionDisorder = 1.000
Iteration 1000: Current FractionDisorder = 1.000
Iteration 1000: Target SelfEpsilon = -10.000
Iteration 1000: Current SelfEpsilon = -9.373
Iteration 1000: Target EpsilonByValue = 0.000
Iteration 1000: Current EpsilonByValue = -1.375


Best Error = 1.68:  55%|█████▌    | 1100/2000 [00:07<00:06, 141.81it/s]

Iteration 1100: Best Error = 1.6796059931550111
Iteration 1100: Best Sequence = IIYVYDHIIIVIYHSIVIIPQCWVIVIDVHLIIVHMVVIIDVYIVIIVHL
Iteration 1100: Target FractionDisorder = 1.000
Iteration 1100: Current FractionDisorder = 0.000
Iteration 1100: Target SelfEpsilon = -10.000
Iteration 1100: Current SelfEpsilon = -4.558
Iteration 1100: Target EpsilonByValue = 0.000
Iteration 1100: Current EpsilonByValue = -0.300


Best Error = 0.75:  60%|██████    | 1200/2000 [00:08<00:05, 141.69it/s]

Iteration 1200: Best Error = 0.7463228097654019
Iteration 1200: Best Sequence = IIYVYDHIIIVIYHQIVIISNIWVIVIDVHLIIVHLVVIIDVYIVIIVHL
Iteration 1200: Target FractionDisorder = 1.000
Iteration 1200: Current FractionDisorder = 0.000
Iteration 1200: Target SelfEpsilon = -10.000
Iteration 1200: Current SelfEpsilon = -10.700
Iteration 1200: Target EpsilonByValue = 0.000
Iteration 1200: Current EpsilonByValue = -0.872


Best Error = 0.68:  65%|██████▌   | 1300/2000 [00:09<00:04, 141.64it/s]

Iteration 1300: Best Error = 0.6808235047608625
Iteration 1300: Best Sequence = IIYVYDHIIIVIYHQIVIISNIWVIVIEVHLIIVHLVVIIDVYIVIIVHL
Iteration 1300: Target FractionDisorder = 1.000
Iteration 1300: Current FractionDisorder = 0.940
Iteration 1300: Target SelfEpsilon = -10.000
Iteration 1300: Current SelfEpsilon = -12.009
Iteration 1300: Target EpsilonByValue = 0.000
Iteration 1300: Current EpsilonByValue = -3.415


Best Error = 0.39:  70%|███████   | 1400/2000 [00:09<00:04, 141.60it/s]

Iteration 1400: Best Error = 0.38808204631453413
Iteration 1400: Best Sequence = IIYIYEHIIIIIYHQIIIISNIWVIVIEVHLIIINLVIIIDIYIIIIVNL
Iteration 1400: Target FractionDisorder = 1.000
Iteration 1400: Current FractionDisorder = 0.920
Iteration 1400: Target SelfEpsilon = -10.000
Iteration 1400: Current SelfEpsilon = -9.749
Iteration 1400: Target EpsilonByValue = 0.000
Iteration 1400: Current EpsilonByValue = -0.057


Best Error = 0.39:  75%|███████▌  | 1500/2000 [00:10<00:03, 141.48it/s]

Iteration 1500: Best Error = 0.3868221891836918
Iteration 1500: Best Sequence = IIYIYEHIIIIIYHQIIIISNIWVIVIEVHLIIIHLVIIIDIYIIIIVNL
Iteration 1500: Target FractionDisorder = 1.000
Iteration 1500: Current FractionDisorder = 0.000
Iteration 1500: Target SelfEpsilon = -10.000
Iteration 1500: Current SelfEpsilon = -5.808
Iteration 1500: Target EpsilonByValue = 0.000
Iteration 1500: Current EpsilonByValue = -1.090


Best Error = 0.29:  80%|████████  | 1600/2000 [00:11<00:02, 141.29it/s]

Iteration 1600: Best Error = 0.2913241357625975
Iteration 1600: Best Sequence = IIYIYEHIIIIIYHQIIIITNIWVIVIEVHLIIIHLVIIIDIYIIIIVNL
Iteration 1600: Target FractionDisorder = 1.000
Iteration 1600: Current FractionDisorder = 0.000
Iteration 1600: Target SelfEpsilon = -10.000
Iteration 1600: Current SelfEpsilon = -10.006
Iteration 1600: Target EpsilonByValue = 0.000
Iteration 1600: Current EpsilonByValue = -0.284


Best Error = 0.29:  85%|████████▌ | 1700/2000 [00:11<00:02, 141.30it/s]

Iteration 1700: Best Error = 0.2913241357625975
Iteration 1700: Best Sequence = IIYIYEHIIIIIYHQIIIITNIWVIVIEVHLIIIHLVIIIDIYIIIIVNL
Iteration 1700: Target FractionDisorder = 1.000
Iteration 1700: Current FractionDisorder = 0.000
Iteration 1700: Target SelfEpsilon = -10.000
Iteration 1700: Current SelfEpsilon = -2.847
Iteration 1700: Target EpsilonByValue = 0.000
Iteration 1700: Current EpsilonByValue = -0.615


Best Error = 0.29:  90%|█████████ | 1800/2000 [00:12<00:01, 141.26it/s]

Iteration 1800: Best Error = 0.2913241357625975
Iteration 1800: Best Sequence = IIYIYEHIIIIIYHQIIIITNIWVIVIEVHLIIIHLVIIIDIYIIIIVNL
Iteration 1800: Target FractionDisorder = 1.000
Iteration 1800: Current FractionDisorder = 1.000
Iteration 1800: Target SelfEpsilon = -10.000
Iteration 1800: Current SelfEpsilon = -8.407
Iteration 1800: Target EpsilonByValue = 0.000
Iteration 1800: Current EpsilonByValue = -3.009


Best Error = 0.29:  95%|█████████▌| 1900/2000 [00:13<00:00, 138.92it/s]

Iteration 1900: Best Error = 0.2871369004528468
Iteration 1900: Best Sequence = IIYIYEHIIIIIYHQIIIITNIWVIVIEVHLIIIHVVIIIDIYIIIIVNL
Iteration 1900: Target FractionDisorder = 1.000
Iteration 1900: Current FractionDisorder = 0.000
Iteration 1900: Target SelfEpsilon = -10.000
Iteration 1900: Current SelfEpsilon = -6.962
Iteration 1900: Target EpsilonByValue = 0.000
Iteration 1900: Current EpsilonByValue = -1.464


Best Error = 0.27: 100%|██████████| 2000/2000 [00:14<00:00, 142.80it/s]
2025-05-12 11:20:07,389 - INFO - Sequence optimization completed
2025-05-12 11:20:07,390 - INFO - Optimized Sequence: IIYIYEHIIIIIYHQIIIITNIWVIVIDVHLIIIHVVIIIDIYIIIIVNL
2025-05-12 11:20:07,395 - INFO - FractionDisorder: 0.94 (Target: 1.00)
Best Error = 0.27: 100%|██████████| 2000/2000 [00:14<00:00, 142.80it/s]
2025-05-12 11:20:07,389 - INFO - Sequence optimization completed
2025-05-12 11:20:07,390 - INFO - Optimized Sequence: IIYIYEHIIIIIYHQIIIITNIWVIVIDVHLIIIHVVIIIDIYIIIIVNL
2025-05-12 11:20:07,395 - INFO - FractionDisorder: 0.94 (Target: 1.00)
2025-05-12 11:20:07,397 - INFO - SelfEpsilon: -10.00 (Target: -10.00)
2025-05-12 11:20:07,399 - INFO - EpsilonByValue: -0.20 (Target: 0.00)
2025-05-12 11:20:07,397 - INFO - SelfEpsilon: -10.00 (Target: -10.00)
2025-05-12 11:20:07,399 - INFO - EpsilonByValue: -0.20 (Target: 0.00)


Optimized Sequence:
IIYIYEHIIIIIYHQIIIITNIWVIVIDVHLIIIHVVIIIDIYIIIIVNL


## Evaluate the Designed Sequence

Let's check the properties of the designed sequence to confirm it meets our goals:

In [None]:
# Compute disorder fraction using metapredict
predicted_disorder = meta.predict_disorder(designed_sequence)
fraction_disorder = (predicted_disorder > 0.5).sum() / len(predicted_disorder)

# Compute self-epsilon and epsilon with target using finches (mpipi)
from finches.frontend.mpipi_frontend import Mpipi_frontend
model = Mpipi_frontend()

self_interaction = model.epsilon(designed_sequence, designed_sequence)
target_interaction = model.epsilon(designed_sequence, target_sequence)

print(f"Fraction Disorder: {fraction_disorder:.2f}")
print(f"Self Epsilon (should be strongly negative): {self_interaction:.2f}")
print(f"Epsilon with Target Sequence (should be near zero): {target_interaction:.2f}")