# Feature extraction from conformational ensembles in IDPET

In this notebook we will illustrate how we can use IDPET to extract structural features from conformational ensembles.

# Load data from PED

In [2]:
%load_ext autoreload
%autoreload 2
import matplotlib.pyplot as plt
from idpet.ensemble import Ensemble
from idpet.ensemble_analysis import EnsembleAnalysis
from idpet.visualization import *
from idpet.utils import set_verbosity

set_verbosity("INFO")  # Change verbosity level to show more information when performing the analysis.

ens_codes = [
    Ensemble('PED00156e001', database='ped'),
    Ensemble('PED00157e001', database='ped'),
    Ensemble('PED00158e001', database='ped'),
]
analysis = EnsembleAnalysis(ens_codes)
analysis.load_trajectories()
vis = Visualization(analysis);

Ensemble PED00156e001 already downloaded. Skipping.
File PED00156e001.pdb already exists. Skipping extraction.
Trajectory file already exists for ensemble PED00156e001.
Ensemble PED00157e001 already downloaded. Skipping.
File PED00157e001.pdb already exists. Skipping extraction.
Trajectory file already exists for ensemble PED00157e001.
Ensemble PED00158e001 already downloaded. Skipping.
File PED00158e001.pdb already exists. Skipping extraction.
Trajectory file already exists for ensemble PED00158e001.
Loading trajectory for PED00156e001...
Loading trajectory for PED00157e001...
Loading trajectory for PED00158e001...


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## 1- Using `extract_features` method from `EnsembleAnalysis` class
This method helps to extract 6 different features, which is then can be used for dimensionality reduction algorithms:
- phi & psi angles ('phi_psi') 

- Pairwise distances between Cα atoms ('ca_dist')

- Alpha angles, defined as the dihedral angles formed by four consecutive Cα atoms ('a_angle')

- trRosetta omega angle, defined as the inter-residue dihedral angle between the CA and CB atoms of a first residue and the CB and CA atoms of a second residue ('tr_omega)

- trRosetta phi angle, defined as the inter-residue angle between the CA and CB atoms of a first residue and the CB atom of a second residue ('tr_phi)

In the example below we extracted pairwise distances between Cα atoms and save the data in `feat` variable. The data saved as a dictionary that the keys are the ensemble names and the values are numpy arrays of pairwise distances. 

In [35]:
feat = analysis.extract_features(featurization='ca_dist')
print(f'feat is a {type(feat)} \nthe keys are: {feat.keys()},\nthe shape of the data is: {feat["PED00156e001"].shape} ,\nthe type of the data is: {type(feat["PED00156e001"])}')

Performing feature extraction for Ensemble: PED00156e001.
Transformed ensemble shape: (100, 1653)
Performing feature extraction for Ensemble: PED00157e001.
Transformed ensemble shape: (100, 1653)
Performing feature extraction for Ensemble: PED00158e001.
Transformed ensemble shape: (88, 1653)


feat is a <class 'dict'> 
the keys are: dict_keys(['PED00156e001', 'PED00157e001', 'PED00158e001']),
the shape of the data is: (100, 1653) ,
the type of the data is: <class 'numpy.ndarray'>


## 2- Other options in `extract_features`

- `normalize`: if `True` you can normalize the extracted feature, which is only applicable for `ca_dist`.
- `min_sep`: Minimum separation distance, available for `ca_dist`, `tr_omega`, and `tr_phi` . Default is 2.
- `max_spe`: Maximum separation distance, available for `ca_dist`, `tr_omega`, and `tr_phi`. Default is None.

The example below show other options for extracting pairwise distances between Cα atoms

In [38]:
feat = analysis.extract_features(featurization='ca_dist', 
                                 normalize=True,
                                 min_sep=2,
                                 max_sep=None)


Performing feature extraction for Ensemble: PED00156e001.
Transformed ensemble shape: (100, 1653)


Performing feature extraction for Ensemble: PED00157e001.
Transformed ensemble shape: (100, 1653)
Performing feature extraction for Ensemble: PED00158e001.
Transformed ensemble shape: (88, 1653)
Concatenated featurized ensemble shape: (288, 1653)


## 3- Using `get_features` method from `EnsembleAnalysis` class

In [43]:
feat = analysis.get_features(featurization='tr_phi')