# Sequence Optimization with GOOSE

This notebook demonstrates the functionality of the `SequenceOptimizer` from GOOSE, using various protein properties defined in the optimizer_properties module. We will:
- Initialize a SequenceOptimizer
- Add multiple properties (e.g., Hydrophobicity, FCR, NCPR)
- Set optimization parameters
- Run the optimization
- Analyze the optimized sequence


In [9]:
# Import required modules
import goose
from goose.optimize import SequenceOptimizer
from goose.backend.optimizer_properties import Hydrophobicity, FCR, NCPR, ComputeIWD
from sparrow import Protein


## Initialize SequenceOptimizer

Create an instance of SequenceOptimizer with a target sequence length. You can also specify other initialization parameters as needed.

In [14]:
# Set the target sequence length
sequence_length = 50

# Initialize the optimizer
optimizer = SequenceOptimizer(target_length=sequence_length, verbose=True, gap_to_report=100)


2025-05-12 10:20:03,272 - INFO - Using amino_acids.py for kmer properties


## Add Protein Properties

Add one or more properties to optimize. Here, we add Hydrophobicity, FCR, and NCPR as examples. You can add as many properties as you like, each with its own target value and weight.

In [15]:
# Add properties to optimize
optimizer.add_property(Hydrophobicity, target_value=0.5, weight=1.0)
optimizer.add_property(FCR, target_value=0.35, weight=1.0)
optimizer.add_property(NCPR, target_value=0.0, weight=1.0)
# Example: Add ComputeIWD for serine residues
#optimizer.add_property(ComputeIWD, residues=("S"), target_value=2.0, weight=0.5)


2025-05-12 10:20:03,579 - INFO - Added new property Hydrophobicity
2025-05-12 10:20:03,580 - INFO - Added new property FCR
2025-05-12 10:20:03,581 - INFO - Added new property NCPR
2025-05-12 10:20:03,580 - INFO - Added new property FCR
2025-05-12 10:20:03,581 - INFO - Added new property NCPR


## Set Optimization Parameters

Configure the optimization process. You can adjust parameters such as the number of iterations, tolerance, and shuffle interval to control the optimization behavior.

In [21]:
# Set optimization parameters
optimizer.set_optimization_params(
    max_iterations=10000,
    tolerance=0.01,
    shuffle_interval=20,
    window_size=10,
    num_shuffles=2
)


## Run Sequence Optimization

Run the optimizer to generate a sequence that best matches the specified property targets.

In [22]:
# Run the optimization
print("Starting optimization...")
optimized_sequence = optimizer.run()
print(f"Optimized Sequence: {optimized_sequence}")


2025-05-12 10:20:42,691 - INFO - Starting sequence optimization


Starting optimization...


Best Error = 0.39:  18%|█▊        | 1800/10000 [00:00<00:01, 6718.04it/s]

Iteration 0: Best Error = 3.5180000000000002
Iteration 0: Best Sequence = TPEYFDGDPRNGIIPTSWPTREFGRLWAMAYCLPKDPYQHIVRRMAVHRF
Iteration 0: Target Hydrophobicity = 0.500
Iteration 0: Current Hydrophobicity = 3.868
Iteration 0: Target FCR = 0.350
Iteration 0: Current FCR = 0.240
Iteration 0: Target NCPR = 0.000
Iteration 0: Current NCPR = 0.040
Iteration 100: Best Error = 1.45
Iteration 100: Best Sequence = HNDENCTPKEGNNPRRFQEDRKAPSREERDNQHYENYGKDPPPNRNRPAH
Iteration 100: Target Hydrophobicity = 0.500
Iteration 100: Current Hydrophobicity = 1.952
Iteration 100: Target FCR = 0.350
Iteration 100: Current FCR = 0.360
Iteration 100: Target NCPR = 0.000
Iteration 100: Current NCPR = -0.040
Iteration 200: Best Error = 0.7739999999999998
Iteration 200: Best Sequence = HNDENHWNKEGNNPRRHQEDRKQNHREERDNQHQENYGKDHHPNRNRPHH
Iteration 200: Target Hydrophobicity = 0.500
Iteration 200: Current Hydrophobicity = 1.338
Iteration 200: Target FCR = 0.350
Iteration 200: Current FCR = 0.400
Iteration 200: Targe

Best Error = 0.39:  20%|██        | 2000/10000 [00:00<00:00, 9917.89it/s]

Iteration 1900: Best Error = 0.386
Iteration 1900: Best Sequence = QNDENQQNKENNNQRRQQEDRKQNQREERDNQQQENNNKDQQNNRNRNQN
Iteration 1900: Target Hydrophobicity = 0.500
Iteration 1900: Current Hydrophobicity = 0.836
Iteration 1900: Target FCR = 0.350
Iteration 1900: Current FCR = 0.400
Iteration 1900: Target NCPR = 0.000
Iteration 1900: Current NCPR = 0.000


Best Error = 0.39:  43%|████▎     | 4300/10000 [00:00<00:00, 10754.09it/s]

Iteration 2000: Best Error = 0.386
Iteration 2000: Best Sequence = QNDENQQNKENNNQRRQQEDRKQNQREERDNQQQENNNKDQQNNRNRNQN
Iteration 2000: Target Hydrophobicity = 0.500
Iteration 2000: Current Hydrophobicity = 1.146
Iteration 2000: Target FCR = 0.350
Iteration 2000: Current FCR = 0.400
Iteration 2000: Target NCPR = 0.000
Iteration 2000: Current NCPR = 0.000
Iteration 2100: Best Error = 0.386
Iteration 2100: Best Sequence = QNDENQQNKENNNQRRQQEDRKQNQREERDNQQQENNNKDQQNNRNRNQN
Iteration 2100: Target Hydrophobicity = 0.500
Iteration 2100: Current Hydrophobicity = 0.980
Iteration 2100: Target FCR = 0.350
Iteration 2100: Current FCR = 0.360
Iteration 2100: Target NCPR = 0.000
Iteration 2100: Current NCPR = -0.040
Iteration 2200: Best Error = 0.386
Iteration 2200: Best Sequence = QNDENQQNKENNNQRRQQEDRKQNQREERDNQQQENNNKDQQNNRNRNQN
Iteration 2200: Target Hydrophobicity = 0.500
Iteration 2200: Current Hydrophobicity = 1.146
Iteration 2200: Target FCR = 0.350
Iteration 2200: Current FCR = 0.400
Iterati

Best Error = 0.39:  45%|████▌     | 4500/10000 [00:00<00:00, 11214.19it/s]

Iteration 4400: Best Error = 0.386
Iteration 4400: Best Sequence = QNDENQQNKENNNQRRQQEDRKQNQREERDNQQQENNNKDQQNNRNRNQN
Iteration 4400: Target Hydrophobicity = 0.500
Iteration 4400: Current Hydrophobicity = 1.012
Iteration 4400: Target FCR = 0.350
Iteration 4400: Current FCR = 0.400
Iteration 4400: Target NCPR = 0.000
Iteration 4400: Current NCPR = 0.000


Best Error = 0.39:  69%|██████▉   | 6900/10000 [00:00<00:00, 11617.98it/s]

Iteration 4500: Best Error = 0.386
Iteration 4500: Best Sequence = QNDENQQNKENNNQRRQQEDRKQNQREERDNQQQENNNKDQQNNRNRNQN
Iteration 4500: Target Hydrophobicity = 0.500
Iteration 4500: Current Hydrophobicity = 0.836
Iteration 4500: Target FCR = 0.350
Iteration 4500: Current FCR = 0.360
Iteration 4500: Target NCPR = 0.000
Iteration 4500: Current NCPR = 0.040
Iteration 4600: Best Error = 0.386
Iteration 4600: Best Sequence = QNDENQQNKENNNQRRQQEDRKQNQREERDNQQQENNNKDQQNNRNRNQN
Iteration 4600: Target Hydrophobicity = 0.500
Iteration 4600: Current Hydrophobicity = 0.988
Iteration 4600: Target FCR = 0.350
Iteration 4600: Current FCR = 0.400
Iteration 4600: Target NCPR = 0.000
Iteration 4600: Current NCPR = 0.000
Iteration 4700: Best Error = 0.386
Iteration 4700: Best Sequence = QNDENQQNKENNNQRRQQEDRKQNQREERDNQQQENNNKDQQNNRNRNQN
Iteration 4700: Target Hydrophobicity = 0.500
Iteration 4700: Current Hydrophobicity = 0.988
Iteration 4700: Target FCR = 0.350
Iteration 4700: Current FCR = 0.400
Iteratio

Best Error = 0.39:  70%|███████   | 7000/10000 [00:00<00:00, 11617.98it/s]

Iteration 6900: Best Error = 0.386
Iteration 6900: Best Sequence = QNDENQQNKENNNQRRQQEDRKQNQREERDNQQQENNNKDQQNNRNRNQN
Iteration 6900: Target Hydrophobicity = 0.500
Iteration 6900: Current Hydrophobicity = 1.052
Iteration 6900: Target FCR = 0.350
Iteration 6900: Current FCR = 0.400
Iteration 6900: Target NCPR = 0.000
Iteration 6900: Current NCPR = 0.000


Best Error = 0.39:  94%|█████████▍| 9400/10000 [00:00<00:00, 11712.37it/s]

Iteration 7000: Best Error = 0.386
Iteration 7000: Best Sequence = QNDENQQNKENNNQRRQQEDRKQNQREERDNQQQENNNKDQQNNRNRNQN
Iteration 7000: Target Hydrophobicity = 0.500
Iteration 7000: Current Hydrophobicity = 1.606
Iteration 7000: Target FCR = 0.350
Iteration 7000: Current FCR = 0.400
Iteration 7000: Target NCPR = 0.000
Iteration 7000: Current NCPR = 0.000
Iteration 7100: Best Error = 0.386
Iteration 7100: Best Sequence = QNDENQQNKENNNQRRQQEDRKQNQREERDNQQQENNNKDQQNNRNRNQN
Iteration 7100: Target Hydrophobicity = 0.500
Iteration 7100: Current Hydrophobicity = 0.960
Iteration 7100: Target FCR = 0.350
Iteration 7100: Current FCR = 0.360
Iteration 7100: Target NCPR = 0.000
Iteration 7100: Current NCPR = 0.040
Iteration 7200: Best Error = 0.386
Iteration 7200: Best Sequence = QNDENQQNKENNNQRRQQEDRKQNQREERDNQQQENNNKDQQNNRNRNQN
Iteration 7200: Target Hydrophobicity = 0.500
Iteration 7200: Current Hydrophobicity = 1.316
Iteration 7200: Target FCR = 0.350
Iteration 7200: Current FCR = 0.400
Iteratio

Best Error = 0.39: 100%|██████████| 10000/10000 [00:00<00:00, 11277.71it/s]
2025-05-12 10:20:43,582 - INFO - Sequence optimization completed
2025-05-12 10:20:43,582 - INFO - Optimized Sequence: QNDENQQNKENNNQRRQQEDRKQNQREERDNQQQENNNKDQQNNRNRNQN
2025-05-12 10:20:43,582 - INFO - Hydrophobicity: 0.84 (Target: 0.50)
2025-05-12 10:20:43,583 - INFO - FCR: 0.40 (Target: 0.35)
2025-05-12 10:20:43,583 - INFO - NCPR: 0.00 (Target: 0.00)
Best Error = 0.39: 100%|██████████| 10000/10000 [00:00<00:00, 11277.71it/s]
2025-05-12 10:20:43,582 - INFO - Sequence optimization completed
2025-05-12 10:20:43,582 - INFO - Optimized Sequence: QNDENQQNKENNNQRRQQEDRKQNQREERDNQQQENNNKDQQNNRNRNQN
2025-05-12 10:20:43,582 - INFO - Hydrophobicity: 0.84 (Target: 0.50)
2025-05-12 10:20:43,583 - INFO - FCR: 0.40 (Target: 0.35)
2025-05-12 10:20:43,583 - INFO - NCPR: 0.00 (Target: 0.00)


Iteration 9500: Best Error = 0.386
Iteration 9500: Best Sequence = QNDENQQNKENNNQRRQQEDRKQNQREERDNQQQENNNKDQQNNRNRNQN
Iteration 9500: Target Hydrophobicity = 0.500
Iteration 9500: Current Hydrophobicity = 1.012
Iteration 9500: Target FCR = 0.350
Iteration 9500: Current FCR = 0.400
Iteration 9500: Target NCPR = 0.000
Iteration 9500: Current NCPR = 0.000
Iteration 9600: Best Error = 0.386
Iteration 9600: Best Sequence = QNDENQQNKENNNQRRQQEDRKQNQREERDNQQQENNNKDQQNNRNRNQN
Iteration 9600: Target Hydrophobicity = 0.500
Iteration 9600: Current Hydrophobicity = 1.268
Iteration 9600: Target FCR = 0.350
Iteration 9600: Current FCR = 0.400
Iteration 9600: Target NCPR = 0.000
Iteration 9600: Current NCPR = 0.000
Iteration 9700: Best Error = 0.386
Iteration 9700: Best Sequence = QNDENQQNKENNNQRRQQEDRKQNQREERDNQQQENNNKDQQNNRNRNQN
Iteration 9700: Target Hydrophobicity = 0.500
Iteration 9700: Current Hydrophobicity = 0.836
Iteration 9700: Target FCR = 0.350
Iteration 9700: Current FCR = 0.360
Iteratio

## Analyze Optimized Sequence

Evaluate the optimized sequence by calculating the values of the properties and comparing them to the targets.

In [23]:
# Create a Protein object from the optimized sequence
final_protein = Protein(optimized_sequence)

# Calculate and print property values
hydro = final_protein.hydrophobicity
fcr = final_protein.FCR
ncpr = final_protein.NCPR


print(f"Final Hydrophobicity: {hydro:.3f} (Target: 0.5)")
print(f"Final FCR: {fcr:.3f} (Target: 0.35)")
print(f"Final NCPR: {ncpr:.3f} (Target: 0.0)")


Final Hydrophobicity: 0.836 (Target: 0.5)
Final FCR: 0.400 (Target: 0.35)
Final NCPR: 0.000 (Target: 0.0)
