# Extract eigenvectors
Get eigenvectors from DDPT ENM simulations.

## Purpose
Play with ways how to extract eigenvectors' data from `matrix.eigenfacs` file.

## Methodology
Use NumPy and/or Pandas to handle data and save it as `.csv` file.

## WIP - improvements
Use this section only if the notebook is not final.

Notable TODOs:
- todo 1;
- todo 2;
- todo 3.

## Results
Describe and comment the most important results.

## Suggested next steps
State suggested next steps, based on results obtained in this notebook.

# Setup

## Library import
We import all the required Python libraries

In [1]:
# Data manipulation
import pandas as pd
import numpy as np
import os
import glob
from biopandas.pdb import PandasPdb
from pymol import cmd

# Options for pandas
pd.options.display.max_columns = 50
pd.options.display.max_rows = 30

# Visualizations
import plotly
import plotly.graph_objs as go
import plotly.offline as ply
plotly.offline.init_notebook_mode(connected=True)

import matplotlib.pyplot as plt
import seaborn as sns

# Autoreload extension
if 'autoreload' not in get_ipython().extension_manager.loaded:
    %load_ext autoreload
    
%autoreload 2

### Change directory
If Jupyter lab sets the root directory in `notebooks`, change directory.

In [2]:
if "notebook" in os.getcwd():
    os.chdir("..")

## Local library import
We import all the required local libraries libraries

In [3]:
# Include local library paths
import sys
sys.path.append("./src") # uncomment and fill to import local libraries

# Import local libraries
import src.utilities as utils

# Parameter definition
We set all relevant parameters for our notebook. By convention, parameters are uppercase, while all the 
other variables follow Python's guidelines.

In [4]:
EIGENFACS_FILEPATH="data/raw/matrix.eigenfacs"


# Data import
We retrieve all the required data for the analysis.

In [5]:
with open(EIGENFACS_FILEPATH) as file: # Use file to refer to the file object
    eigenfacs = file.read().splitlines()

# eigenfacs = np.loadtxt(EIGENFACS_FILEPATH)

In [29]:
# Count number of EN beads
if eigenfacs[0][1:7] != "VECTOR":
    print("Check matrix.eigenfacs.")
else:
    i = 0
    # First line: VECTOR 
    # Second line: ---
    while(eigenfacs[i+2][1:7] != "VECTOR"):
        i += 1
    no_beads = i
    
    
# Extract eigenvalues
eigenvalues = eigenfacs[::no_beads+2]
mode_numbers = [int(line[8:12]) for line in eigenvalues]
eigenvalues = [float(line[-10:]) for line in eigenvalues]

eigenvalues = pd.DataFrame(data=eigenvalues, index=mode_numbers)
eigenvalues.columns = ['eigenvalue']
eigenvalues.index.name = 'mode'

print("# of beads = {}\n# of modes = {}".format(no_beads, len(mode_numbers)))
eigenvalues.head(7)


# of beads = 399
# of modes = 1197


Unnamed: 0_level_0,eigenvalue
mode,Unnamed: 1_level_1
1,4.5288e-09
2,4.7153e-09
3,4.7536e-09
4,4.8315e-09
5,4.9739e-09
6,5.0087e-09
7,0.00075857


In [None]:
eigenvalues.to_csv("tmp/eigenvalues.csv")

In [88]:
no_modes = len(mode_numbers)
eigenvectors = np.split(np.array(eigenfacs), no_modes)
# Remove VECTOR and --- lines
eigenvectors = [array[2:] for array in eigenvectors]
eigenvectors = np.concatenate(eigenvectors)
eigenvectors = np.loadtxt(eigenvectors)

mode = np.repeat(mode_numbers, no_beads)
bead_number = np.tile(np.arange(no_beads)+1, no_modes)
eigenvectors = np.hstack((np.vstack((mode, bead_number)).T, eigenvectors))

In [86]:
eigenvectors.shape

(477603, 3)

In [91]:
eigenvectors = pd.DataFrame(data=eigenvectors, columns=['mode', 'bead_number', 'x_comp', 'y_comp', 'z_comp'])\
    .astype({'mode' : int, 'bead_number' : int})
eigenvectors.head()

Unnamed: 0,mode,bead_number,x_comp,y_comp,z_comp
0,1,1,-0.049686,0.003702,-0.003721
1,1,2,-0.043219,0.005798,-0.003236
2,1,3,-0.03582,0.004682,-0.001633
3,1,4,-0.03914,0.010346,-0.003891
4,1,5,-0.041187,0.016402,-0.006047


In [72]:
eigenvectors.to_csv("tmp/eigenvectors.csv", index=None)

In [73]:
eigenvectors[eigenvectors['mode'] == 7][['x_comp', 'y_comp', 'z_comp']].to_numpy()

array([[-0.048104 ,  0.0054316,  0.0041839],
       [-0.043453 ,  0.010777 ,  0.0064199],
       [-0.036101 ,  0.0083899,  0.011287 ],
       ...,
       [-0.025943 ,  0.0092304, -0.053933 ],
       [-0.027371 ,  0.0096976, -0.059186 ],
       [-0.028697 , -0.0029617, -0.062537 ]])

# Data processing
Put here the core of the notebook. Feel free to further split this section into subsections.

# References
We report here relevant references:
1. author1, article1, journal1, year1, url1
2. author2, article2, journal2, year2, url2