Here we are trying to understand the weights a bit and how to average. The first thing we need to do is to actually take the weights for each piece of the histo we have taken.


This may seem a bit backwards, as we first made the histograms, but we essentially need to back-calculate from the cluster logs where on `cv_coordinates` this was, so we can then find the weight per snapshot!!!! 

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import sys
import os
import pandas as pd
import MDAnalysis as mda

Obviously, this would be much faster to do together, but I want to keep everything apart for clarity!!!

In [2]:
name_sim = "influx_BFRU_gate_CV"
simulation_directory = f"/data2/GLUT5_string/string/string_sims/TMD_initial_path/{name_sim}"

In [3]:
def process_grid_txt(grid, start = 100):
    grid = np.loadtxt(f'../confout_files/FES_grids_confouts/influx_BFRU_gate_CV/histogram_{grid}/cluster_logs.txt', \
                 dtype = str)
    processed_confout = []
    for confout in grid:
        confout = confout.split('.')[0]
        confout = confout.split('-')
        iteration = int(confout[0]) - start
        bead = int(confout[1])
        swarm = int(confout[2][1:])
        processed_confout.append([iteration,bead,swarm])
       
    return(processed_confout)

In [4]:
def get_original_CV(processed_confout):
    
    n_beads = 14
    n_swarms = 32
    
    cv_proj_converted = []
    for grid in processed_confout:   
        iteration_start_cv_proj = grid[0] * (n_beads*n_swarms)
        ## need to remove one from the grid because the actual bead counting will begin 
        ## at 0 on cv_proj, but in the confout files it would start at 1
        ## (just trust me, I checked this very carefully :) )
        bead_start_cv_proj = iteration_start_cv_proj + ((grid[1]-1) * n_swarms)
        swarm_start_cv_proj = bead_start_cv_proj + grid[2]
        cv_proj_converted.append(swarm_start_cv_proj)
        
        print(f'[{grid[0]+100}, {grid[1]}, {grid[2]}], {swarm_start_cv_proj}')
    return cv_proj_converted
        

In [5]:
test = get_original_CV(process_grid_txt(grid = 210))

[272, 14, 5], 77477
[516, 14, 13], 186797
[498, 14, 27], 178747
[490, 14, 2], 175138
[464, 14, 23], 163511
[523, 14, 14], 189934
[523, 14, 10], 189930
[266, 14, 19], 74803
[524, 14, 6], 190374
[411, 14, 20], 139764
[223, 14, 29], 55549
[329, 14, 29], 103037
[537, 14, 5], 196197
[544, 14, 19], 199347
[507, 14, 19], 182771
[461, 14, 17], 162161
[525, 14, 0], 190816
[265, 14, 12], 74348
[534, 14, 31], 194879
[463, 14, 26], 163066
[524, 14, 26], 190394
[411, 14, 9], 139753
[480, 14, 10], 170666
[327, 14, 15], 102127
[516, 14, 12], 186796
[222, 14, 29], 55101
[400, 14, 21], 134837
[222, 14, 0], 55072
[408, 14, 6], 138406
[499, 14, 2], 179170
[445, 14, 19], 154995
[436, 14, 25], 150969
[271, 14, 18], 77042
[488, 14, 2], 174242
[518, 14, 28], 187708
[549, 14, 11], 201579
[472, 14, 2], 167074
[535, 14, 12], 195308
[295, 14, 18], 87794
[514, 14, 29], 185917
[440, 14, 16], 152752
[518, 14, 12], 187692
[210, 14, 23], 49719
[327, 14, 29], 102141
[514, 14, 10], 185898
[459, 14, 11], 161259
[548, 14

Now we can actually convert this

In [6]:
cv_coordinates = np.load(f'{simulation_directory}/cv_coordinates.final.MSM.npy')
cv_proj = cv_coordinates[:, :, [1,0]]  #IC,EC

weights = np.load(f'{simulation_directory}/weights_MSM.npy')
F = np.load(f'{simulation_directory}/F_MSM.npy')

## Now, convert weights so we can access the proper element

`weights` was originally shaped like `len(cv_coordinates)*2` 

Weights has to do with each **frame** and **not each CV**!!!

Example: `weights[0]`, `weights[2]`... `weights[32]` will all have the same weight. That's because for bead 1, swarm 0-31, they all have the same CV start position. Remember that we did not take the starting bead for any swarm, because we were trying to avoid this sampling bias. 

So, that means in effect that we should skip every other weight for our indexing, as we never take values from the start frame of the swarms

In [7]:
## I leave this here for you so you can double check yourself one day if you want to
#weights_reshaped = np.vstack((weights[::2], weights[1::2])).T
weights_reshaped = weights[1::2]

In [8]:
weights_reshaped[test]*10000

array([0.00311203, 0.00311203, 0.00311203, 0.00311203, 0.00311203,
       0.00311203, 0.00311203, 0.00311203, 0.00311203, 0.00995053,
       0.00311203, 0.00995053, 0.00230383, 0.00311203, 0.00995053,
       0.00311203, 0.00311203, 0.00311203, 0.00311203, 0.00311203,
       0.00311203, 0.00311203, 0.00311203, 0.00995053, 0.00311203,
       0.00311203, 0.00311203, 0.00311203, 0.00311203, 0.00311203,
       0.00995053, 0.00230383, 0.00311203, 0.00311203, 0.00311203,
       0.00311203, 0.00311203, 0.00311203, 0.00311203, 0.00995053,
       0.00995053, 0.00311203, 0.00311203, 0.00311203, 0.00311203,
       0.00311203, 0.00311203, 0.00995053, 0.00311203, 0.00311203,
       0.00311203, 0.00995053, 0.00311203, 0.00311203, 0.00311203,
       0.00995053, 0.00995053, 0.00311203, 0.00311203, 0.00311203,
       0.00311203, 0.00311203, 0.00311203, 0.00311203, 0.00311203,
       0.00995053, 0.00995053, 0.00311203, 0.00311203, 0.00995053,
       0.00995053, 0.00311203, 0.00995053, 0.00311203, 0.00995

Now I am testing this out to see how this would work on a practical example, I copied here the code from `TM7b_TM10`

In [12]:
def theta_of_angle(u, s1, s2, s3):
    from numpy.linalg import norm
    A = u.select_atoms(s1).center_of_geometry()
    B = u.select_atoms(s2).center_of_geometry()
    C = u.select_atoms(s3).center_of_geometry()

    BA = A - B
    BC = C - B
    theta = np.arccos(np.dot(BA, BC)/(norm(BA)*norm(BC)))
    return np.rad2deg(theta)

In [13]:
grid = 823
backbone = ' and name CA' 
condition = 'BFRU'


if os.path.isfile(f'../confout_files/FES_grids_confouts/influx_{condition}_gate_CV/histogram_{grid}/FES_grid_all.xtc'):
    print(f'calculating angles for grid {grid}')
    u = mda.Universe(f'../confout_files/tpr_files/influx_{condition}_gate_CV.wholesys.tpr', \
                    f'../confout_files/FES_grids_confouts/influx_{condition}_gate_CV/histogram_{grid}/FES_grid_all.xtc')

    sels = ['284', '291', '299']
    ## make backbone '' or ' and name CA' to toggle between selections
    theta_u = []
    for ts in u.trajectory:
        theta = theta_of_angle(u, s1 = f'resid {sels[0]}{backbone}', 
                               s2 = f'resid {sels[1]}{backbone}', 
                               s3 = f'resid {sels[2]}{backbone}')

        theta_u.append(theta)

calculating angles for grid 823


In [14]:
theta_u

[101.34028752226638,
 100.28683805990498,
 98.16448044470675,
 95.67810697093607,
 99.25099039847609,
 96.30437680913654,
 98.76697698491994,
 84.85260699030874,
 98.64706733529579,
 99.18428624381492,
 96.37110558918486,
 95.36411894676881,
 98.80770474927381,
 97.81336709283637,
 93.15038553562823,
 100.55910767529431,
 82.38186422498659,
 99.39359045796546,
 100.9567683299133,
 96.40743618604593,
 98.88955913977927,
 98.36331599123025,
 97.96943004024355,
 98.24599088145362,
 94.65510114567759,
 95.18089489643026,
 98.08031386728366,
 91.20989440840589,
 99.03317230758199,
 95.66252926806702,
 97.45681674576639,
 102.83451585271565,
 96.09836222942434,
 80.06391826357078,
 79.40832490110701,
 98.6437058109003,
 96.31115122005261,
 97.58425198718226,
 96.37206751495584,
 94.9297286362049,
 98.93241735636215,
 99.04577121178934,
 94.82084020095685,
 96.55370225526076,
 78.95910725073695,
 105.29811142886498,
 103.31616068065179,
 97.68535645011674,
 93.22291751597491,
 100.44392270857