# Constraining ensembles in STARLING

This notebook will cover how to constrain ensembles using various constraints in STARLING.
**Before starting**, install STARLING locally. See https://github.com/idptools/starling/ for more information

- This notebook will cover the following:
    1. Constraining by distance between two specific residues
    2. Constraining by a target radius of gyration or end-to-end distance
    3. Adding helicity constraints

***NOTE***: Ensemble generation with constraints is much slower than without!

***NOTE***: STARLING can only generate ensembles of sequences up to 380 residues!

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import starling
from starling.inference.constraints import HelicityConstraint, DistanceConstraint, RgConstraint, ReConstraint

#NOTE: if this is the first time you are running STARLING, it will download the model for you!
# this can take a few minutes depending on your internet speed.

In [None]:
# first we are going to set our target sequence. 
# we are going to analyze the N-terminal IDR of Homo sapiens CTCF, Uniprot ID 49711
ctcf='MEGDAVEAIVEESETFIKGKERKTYQRRREGGQEEDACHLPQNQTDGGEVVQDVNSSVQMVMMEQLDPTLLQMKTEVMEGTVAPEAEAAVDDTQIITLQVVNMEEQPINIGELQLVQVPVPVTVPVATTSVEELQGAYENEVSKEGLAESEPMICHTLPLPEGFQVVKVGANGEVETLEQGELPPQEDPSWQKDPDYQPPAKKTKKTKKSKLRYTEEGKDVDVSVYDFEEEQQEGLLSEVNAEKVVGNMKPPKPTKIKKKGVKKTFQCELCSYTCPRRSNLDRHMKSHTDERPHKCHLCGRAFRTVTLLRNHLNTHTGTRPHKCPDCDMAFVTSGELVRHRRYKHTHEKPFKCSMCDYASVEVSKLKRHIRSHTGERPFQCSLCSYASRDTYKLKRHMRTHSGEKPYECYICHARFTQSGTMKMHILQKHTENVAKFHCPHCDTVIARKSDLGVHLRKQHSYIEQGKKCRYCDAVFHERYALIQHQKSHKNEKRFKCDQCDYACRQERHMIMHKRTHTGEKPYACSHCDKTFRQKQLLDMHFKRYHDPNFVPAAFVCSKCGKTFTRRNTMARHADNCAGPDGVEGENGGETKKSKRGRKRKMRSKKEDSSDSENAEPDLDDNEDEEEPAVEIEPEPEPQPVTPAPPPAKKRRGRPPGRTNQPKQNQPTAIIQVEDQNTGAIENIIVEVKKEPDAEPAEGEEEEAQPAATDAPNGDLTPEMILSMMDR'
# we used metapredict V3 to define the region that is the IDR of CTCF
ctcf_idr = ctcf[:262]

In [None]:
# now we are going to generate the ensemble with no constraints
# NOTE: we are going to reduce the number of steps and conformations for this demo to speed things up.
# however, you should use the default values for real applications.
# default conformations is 400 and default steps is 30.
no_constraints = starling.generate(ctcf_idr, return_single_ensemble=True,
                                   conformations=100, steps=10)

## Constraining by residue distance
You can choose two residues and a target distance between them as a constraint.

In [None]:
# set the constraint
# NOTE, target is a distance in angstroms.
constraint = DistanceConstraint(resid1=10, resid2=100, target=30)                                

In [None]:
# now we can generate an ensemble with the DistanceConstraint. 
dist_constraints = starling.generate(ctcf_idr, constraint=constraint, return_single_ensemble=True,
                                     conformations=100, steps=10)

In [None]:
# now lets calculate the average distance of residues 10 and 100 in both ensembles
# NOTE: the first time we run this, it will calculate the ensemble distances and store them in the object.
# subsequent calls will be much faster.
# remember, this is 0-indexed, so index 10 is residue 11 in nd index 100 is residue 101.
unconstrained_distance = no_constraints.rij(10, 100, return_mean=False)
constrained_distance = dist_constraints.rij(10, 100, return_mean=False)

# now we are going to plot the histogram of the distances and then mark the mean and the target
plt.figure(figsize=(8,5))
plt.hist(unconstrained_distance, bins=10, alpha=0.5, label='Unconstrained', color='blue', density=True, edgecolor='black')
plt.hist(constrained_distance, bins=10, alpha=0.5, label='Constrained', color='orange', density=True, edgecolor='black')
plt.axvline(np.mean(unconstrained_distance), color='blue', linestyle='dashed', linewidth=1)
plt.axvline(np.mean(constrained_distance), color='orange', linestyle='dashed', linewidth=1)
plt.axvline(30, color='red', linestyle='dashed', linewidth=1, label='Target Distance')
plt.xlabel('Distance (Å)')
plt.ylabel('Density')
plt.title('Distance between Residues 11 and 101')
plt.legend()
plt.show()

## Constraining by target Radius of Gyration (Rg) or end-to-end distance (Re)

In this section we will also modify a variable called ``force_constant`` when generating ensembles
with constrained Rg. Lower values will be less constrained whereas higher values are more constrained. 
 
The default value is 2.0, and you can modify this value for any of the constraints!

In [None]:
# set the constraint
# NOTE, target is a distance in angstroms.
# NOTE: for Rg we are going to modify the force_constant to 0.1 to make the constraint softer.
# this is something you can adjust depending on how the distribution of conformations looks for you. 
rg_constraint = RgConstraint(target=40, force_constant=0.1)
re_constraint = ReConstraint(target=100)

In [None]:
re_constraint_ensemble = starling.generate(ctcf_idr, constraint=re_constraint, return_single_ensemble=True,
                                         conformations=100, steps=10)
rg_constraint_ensemble = starling.generate(ctcf_idr, constraint=rg_constraint, return_single_ensemble=True,
                                         conformations=100, steps=10)

In [None]:
# NOTE when we plot this you should notice that the Rg ensemble isn't as close to our target 
# as the Re ensemble. This is because we set a lower force constant to make the constraint softer. 
# You can adjust this value to get a distribution that works for you.

# now let's plot the Rg and Re values in two plots similar to above
# first we need to get the Rg and Re values for each ensemble
no_constraints_rg = no_constraints.radius_of_gyration(return_mean=False)
rg_constrained_rg = rg_constraint_ensemble.radius_of_gyration(return_mean=False)
no_constraints_re = no_constraints.end_to_end_distance(return_mean=False)
re_constrained_re = re_constraint_ensemble.end_to_end_distance(return_mean=False)

# now we can plot the histograms
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15,5))
# Rg plot
ax1.hist(no_constraints_rg, bins=10, alpha=0.5, label='Unconstrained', color='blue', density=True, edgecolor='black')
ax1.hist(rg_constrained_rg, bins=10, alpha=0.5, label='Rg Constrained', color='orange', density=True, edgecolor='black')
ax1.axvline(np.mean(no_constraints_rg), color='blue', linestyle='dashed', linewidth=1)
ax1.axvline(np.mean(rg_constrained_rg), color='orange', linestyle='dashed', linewidth=1)
ax1.axvline(40, color='red', linestyle='dashed', linewidth=1, label='Target Rg')
ax1.set_xlabel('Radius of Gyration (Å)')
ax1.set_ylabel('Density')
ax1.set_title('Radius of Gyration (Rg)')
ax1.legend()
# Re plot
ax2.hist(no_constraints_re, bins=10, alpha=0.5, label='Unconstrained', color='blue', density=True, edgecolor='black')
ax2.hist(re_constrained_re, bins=10, alpha=0.5, label='Re Constrained', color='orange', density=True, edgecolor='black')
ax2.axvline(np.mean(no_constraints_re), color='blue', linestyle='dashed', linewidth=1)
ax2.axvline(np.mean(re_constrained_re), color='orange', linestyle='dashed', linewidth=1)
ax2.axvline(100, color='red', linestyle='dashed', linewidth=1, label='Target Re')
ax2.set_xlabel('End-to-End Distance (Å)')
ax2.set_ylabel('Density')
ax2.set_title('End-to-End Distance (Re)')
ax2.legend()
plt.show()

## Helicity Constraint

In [None]:
# for helicity constraints, we are going to constrain residues 151 to 161
helix_constraint = HelicityConstraint(resid_start=150, resid_end=160)
helix_constrained_ensemble = starling.generate(ctcf_idr, constraint=helix_constraint, return_single_ensemble=True,
                                              conformations=100, steps=10)

In [None]:
# now we are going to plot the distances between residues 151 and 161 in both ensembles
unconstrained_helix_distance = no_constraints.rij(150, 160, return_mean=False)
constrained_helix_distance = helix_constrained_ensemble.rij(150, 160, return_mean=False)
plt.figure(figsize=(8,5))
plt.hist(unconstrained_helix_distance, bins=10, alpha=0.5, label='Unconstrained', color='blue', density=True, edgecolor='black')
plt.hist(constrained_helix_distance, bins=10, alpha=0.5, label='Helix Constrained', color='orange', density=True, edgecolor='black')
plt.axvline(np.mean(unconstrained_helix_distance), color='blue', linestyle='dashed', linewidth=1)
plt.axvline(np.mean(constrained_helix_distance), color='orange', linestyle='dashed', linewidth=1)
plt.xlabel('Distance (Å)')
plt.ylabel('Density')
plt.title('Distance between Residues 151 and 161')
plt.legend()
plt.show()