# Working with Ensembles of structures

The ensembles of structures allow for the simultaneous analysis of structures. Actually there is support for the following ensembles:

- Ensemble: a blank Ensemble where Structure object can be appended.
- UniprotEnsemble: given a Uniprot code, all the Structures for this protein will be automatically load in the Ensemble.
- PdbFlexEnsemble: a alpha-carbon trajectory downloaded from [pdbFlex](https://pdbflex.org/)

All ensembles will try to align and renumber structures to mantain consistency.

In [1]:
from pyfoldx.structure import Ensemble, UniprotEnsemble
from pyfoldx.structure import Structure


In [None]:
print(pyfoldx.version)

In [2]:
#Some other imports, to work with the data and to plot
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import warnings

# We want to print full tables
pd.set_option("display.max_rows", 1000, "display.max_columns", 1000)
# We ignore warnings
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')


# Creating an Uniprot Ensemble

In [3]:
# We will create an ensemple of all the crystals for Uniprot P01112(GTPAse HRAS) with good resolution (<1.6 Armstrong)
ensemblePath = "/home/lradusky/Downloads/P01112/"
t = UniprotEnsemble("P01112", ensemblePath, just_xray=True, max_resolution=1.6)

e tracemalloc to get the object allocation traceback
  lines = os.popen(command).readlines()
  _warn("subprocess %s is still running" % self.pid,
  lines = os.popen(command).readlines()
  _warn("subprocess %s is still running" % self.pid,
  lines = os.popen(command).readlines()
  _warn("subprocess %s is still running" % self.pid,
  lines = os.popen(command).readlines()
  _warn("subprocess %s is still running" % self.pid,
  lines = os.popen(command).readlines()
  _warn("subprocess %s is still running" % self.pid,
  lines = os.popen(command).readlines()
  _warn("subprocess %s is still running" % self.pid,
  lines = os.popen(command).readlines()
  _warn("subprocess %s is still running" % self.pid,
  lines = os.popen(command).readlines()
  _warn("subprocess %s is still running" % self.pid,
  lines = os.popen(command).readlines()
  _warn("subprocess %s is still running" % self.pid,
  lines = os.popen(command).readlines()
  _warn("subprocess %s is still running" % self.pid,
  lines = os.pope

Note that working with a big set of structures can be memory consuming.

Below, we show how an Ensemble can be created with an Ensemble file that was saved before (as done above)

In [4]:
t.toPdbFile(ensemblePath+"ensemble.pdb")

  and should_run_async(code)


In [5]:
tFile = "ensemble.pdb"

# We overwrite the t object with a basic Ensemble object loaded from the file saved above.
t = Ensemble("P01112", ensemblePath, ensemblePath+tFile)

In [6]:
len(t.frames)

31

# Computing energy on ensembles

To compute the $\Delta G$ of each one of the structures within the ensemble, we just have to call the *getTotalEnergy* function.

In [7]:
totalEnergyDf = t.getTotalEnergy()

Computing total energy for ensemble...
  0%|          | 0/31 [00:00<?, ?it/s]


TypeError: expected str, bytes or os.PathLike object, not NoneType

It will return a Pandas DataFrame with the total energy for each structure and also al the computed energy terms.

In [8]:
totalEnergyDf 

  and should_run_async(code)


Unnamed: 0,total,backHbond,sideHbond,energy_VdW,electro,energy_SolvP,energy_SolvH,energy_vdwclash,entrop_sc,entrop_mc,sloop_entropy,mloop_entropy,cis_bond,energy_torsion,backbone_vdwclash,energy_dipole,water,disulfide,energy_kon,partcov,energyIonisation,entr_complex
121P_A,46.5261,-110.949,-27.7819,-189.976,-9.65533,273.043,-247.383,13.0638,99.1815,246.653,0,0,0,6.62009,108.465,-2.80515,0,0,0,-3.7553,0.271471,0
5P21_A,63.8615,-109.438,-24.8353,-198.859,-15.5729,301.747,-253.395,16.9063,99.9202,253.316,0,0,0,4.3484,104.747,-4.45109,0,0,0,-6.2234,0.398109,0
821P_A,56.8177,-112.771,-22.8938,-202.253,-14.0575,301.024,-258.982,13.4614,99.9937,255.124,0,0,0,7.84693,115.566,-4.25729,0,0,0,-5.60047,0.18319,0
3TGP_A,96.8114,-101.32,-24.4528,-198.879,-13.2404,299.272,-255.001,28.2767,98.8799,257.332,0,0,0,18.8562,111.526,-3.84169,0,0,0,-9.24046,0.171218,0
1QRA_X,75.0375,-112.564,-26.6937,-208.336,-15.53,315.016,-267.265,24.3109,105.449,254.47,0,0,0,18.8569,115.312,-3.64739,0,0,0,-9.43555,0.405796,0
1CTQ_A,55.5804,-115.263,-32.5994,-206.345,-13.6862,304.176,-265.478,27.3148,104.372,249.903,0,0,0,18.1108,110.635,-5.49168,0,0,0,-9.64484,0.212345,0
3OIW_A,63.7516,-113.204,-27.7604,-209.165,-17.2555,331.157,-268.327,27.0207,107.819,254.285,0,0,0,6.21235,114.713,-4.3492,0,0,0,-23.0598,0.379287,0
3K8Y_X,52.2302,-114.353,-31.8432,-206.061,-15.0242,321.396,-264.497,17.5806,107.422,255.162,0,0,0,6.98105,113.708,-3.47403,0,0,0,-21.6447,0.584865,0
4DLU_A,46.87,-116.769,-31.0041,-207.93,-16.344,324.147,-266.702,18.5902,107.381,253.654,0,0,0,7.01391,114.732,-4.82815,0,0,0,-21.2088,0.868752,0
3OIU_A,45.232,-112.323,-29.897,-207.381,-16.8151,329.003,-267.914,17.3757,105.054,251.645,0,0,0,6.99962,115.792,-3.21989,0,0,0,-27.7124,0.418258,0


The same is valid for the energy at residue level. In this case the columns will be each one of the structures within the ensemble, and the rows the residues. If a position is different in sequence in any of the structures will ocuppy a different row. Also, positions not resolved in some of the crystals will have an *np.Nan* value in the correspondent cell.

In [9]:
residueEnergy = t.getResiduesEnergy()

Computing residue energy for ensemble...
  and should_run_async(code)
100%|██████████| 31/31 [01:28<00:00,  2.85s/it]
Energy computed.


In [10]:
residueEnergy

  and should_run_async(code)


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,121P_A,5P21_A,821P_A,3TGP_A,1QRA_X,1CTQ_A,3OIW_A,3K8Y_X,4DLU_A,3OIU_A,3L8Z_A,5WDP_X,1ZW6_A,4DLY_X,4DLR_A,5B2Z_A,4DLV_A,3RRZ_A,3RSO_A,2RGC_A,3RS0_A,3RRY_A,2RGB_X,5WDQ_A,5B30_A,2QUZ_A,5VBE_X,2RGE_A,5E95_A,2RGG_A,6MQT_A
Code,Mol,Pos,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1
MET,A,1,-1.00937,-1.09656,-0.962419,-0.816734,-0.904175,-1.17527,-1.09956,-1.09298,-1.06901,-1.04377,-1.03429,-1.01167,-0.687371,-1.2296,-1.17605,-1.03211,-1.16488,0.53581,-0.958834,-0.64168,-0.568456,-1.14554,-1.21772,-1.14947,-0.632468,-1.05389,-0.940336,-1.11413,,-1.14099,-0.353219
THR,A,2,0.16575,0.255197,0.278736,0.158651,0.267353,0.266082,0.230636,0.190952,0.24392,0.115735,0.324846,0.508138,0.774228,0.0552634,0.283894,1.11235,0.552353,2.47087,0.558106,0.381045,1.04143,0.486063,0.59046,0.328507,0.208252,0.140549,0.389076,0.433308,0.0718744,0.182309,0.52026
GLU,A,3,1.574,1.49316,1.59569,1.64146,1.56853,1.85307,1.60162,1.89767,1.31815,0.462201,1.91283,1.26551,1.82585,4.26374,1.7875,5.89218,1.75358,3.34686,2.14298,6.12593,4.26114,4.02438,5.13781,1.43765,2.06619,1.13059,1.15615,1.51788,2.03497,1.4627,1.99463
TYR,A,4,-2.65386,-2.68354,-2.73943,-2.80166,-2.67664,-2.63134,-2.49822,-2.54609,-2.91197,-2.51025,-2.75674,-2.79331,-2.42961,-2.85335,-2.51474,-2.87491,-2.61819,-2.51294,-2.4955,-2.35094,-2.54226,-2.73256,-2.57843,-2.63231,-2.58982,-2.86603,-2.86621,-2.75904,-2.80485,-2.64865,-3.07458
LYS,A,5,2.02746,1.09628,2.23902,0.814047,-0.0145071,-0.672768,-0.836744,-0.895402,-0.963739,-1.06931,-0.102721,-0.512979,-0.308176,-0.668866,-0.633515,0.70657,0.105684,-0.554472,-0.617707,-0.230735,-0.855163,0.0517106,-0.743987,-0.565568,-0.549234,0.294052,0.611332,-0.161846,-0.423205,0.187568,1.75785
LEU,A,6,-2.96297,-2.16659,-2.82231,-2.29118,-2.2915,-2.13828,-2.13788,-2.09669,-2.16809,-2.14972,-2.47039,-2.19721,-2.18507,-2.29431,-2.31295,-2.20388,-2.23993,-2.39006,-2.33069,-2.16521,-2.323,-2.24767,-2.17337,-2.33719,-2.34033,-2.20341,-2.42103,-2.23931,-2.59068,-2.27295,-2.36804
VAL,A,7,-2.64768,-2.37909,-2.62514,-2.52318,-2.56675,-2.44903,-2.4319,-2.40144,-2.4549,-2.46928,-2.58469,-2.45416,-2.39016,-2.65083,-2.52482,-2.52381,-2.43129,-2.51218,-2.49545,-2.55438,-2.47564,-2.58438,-2.52225,-2.46491,-2.17152,-2.47115,-2.24634,-2.38916,-2.58668,-2.3547,-2.36014
VAL,A,8,-2.20235,-1.97276,-2.28684,-1.96786,-2.19515,-2.24931,-2.4518,-2.44592,-2.54166,-2.50425,-2.73787,-2.43128,-2.50108,-2.40046,-2.57796,-2.54098,-2.39062,-2.58386,-2.38098,-2.4392,-2.57953,-2.42296,-2.27619,-2.56478,-2.40677,-2.20718,-1.96839,-2.03714,-2.75586,-2.15644,-2.58869
VAL,A,9,-1.921,-1.87061,-1.83184,-1.90365,-2.07312,-2.0986,-2.11818,-2.20432,-2.17303,-2.19532,-2.33569,-2.11345,-2.12842,-2.29118,-2.17153,-2.29555,-1.73522,-2.25977,-2.12477,-2.2044,-1.95225,-2.25546,-2.11174,-2.2722,-1.82388,-1.61503,-1.53342,-1.90161,-2.08652,-1.91522,-1.64285
GLY,A,10,2.01446,2.03398,2.01897,2.10976,1.98722,2.01123,2.10914,2.13396,2.12133,2.05727,2.17434,2.0197,2.08339,2.0547,2.04309,2.04617,2.12133,2.05298,2.12201,2.07119,2.04133,2.07565,2.12589,2.04451,2.0905,1.71911,1.78196,2.04828,1.86463,1.99629,1.85245
