Goal of this program:

Input: a molecular data file, containing a system of two or more polymer chains, in an ASE-compatible format


How to identify backbone atoms? 
separate chains and build connection matrices? 

Step: identify all backbone atoms
      select all viable backbone atoms/positions for cross-linking
      get positions of the selected atoms from above
      define a bonding radius based on length of crosslinker molecule
      crosslink loop until desired crosslinking density has been achieved (or until there are no more viable sites available - problem case)
          randomly select a chain
          randomly select a viable backbone atom from that chain
          find all viable backbone atoms within bonding radius of the selected atom
          randomly select one atom from list generated above
          save pointers to these two atoms, remove them from list of available crosslinking sites

Step: for all of the crosslinking sites, 

Output: a .mol2 file, containing a randomly cross-linked version of the original polymer system

In [1]:
from ase import ase, io, neighborlist
import scipy
import numpy as np
from scipy import sparse
import pandas as pd

In [2]:
mol = ase.io.read('pnipaam.test.pdb')

In [3]:
cut = neighborlist.natural_cutoffs(mol)

In [4]:
nl = neighborlist.NeighborList(cut, self_interaction=False, bothways=True)

In [5]:
nl.update(mol)

True

In [6]:
matrix = nl.get_connectivity_matrix()

In [7]:
n_components, component_list = sparse.csgraph.connected_components(matrix)

In [8]:
molDict = {}
for i in set(component_list):
    molDict[i] = 0
for atom in component_list:
    molDict[atom] += 1

In [9]:
for molIdx in set(component_list):
    molDict[molIdx] = [ i for i in range(len(component_list)) if component_list[i] == molIdx ]

In [10]:
# isolate polymer chains from solvent-polymer system
chainDict = {}
for molecule in molDict:
    if len(molDict[molecule]) > 20:
        chainDict[molecule] = molDict[molecule]
len(chainDict)

8

get matrix entries for each atom in the chains

In [13]:
for i in chainDict:
    for j,idx in enumerate(chainDict[i]):
        chainDict[i][j] = (chainDict[i][j], matrix[idx])

In [17]:
data = mol.todict()
data.pop('pbc')
data.pop('cell')
data.pop('bfactor')
data

{'numbers': array([1, 6, 6, ..., 8, 1, 1]),
 'positions': array([[34.844, 15.755, 17.225],
        [34.997, 16.74 , 16.783],
        [33.765, 17.683, 17.065],
        ...,
        [33.489, 34.81 , 35.648],
        [34.282, 34.315, 35.44 ],
        [33.783, 35.489, 36.255]]),
 'residuenames': array(['RES ', 'RES ', 'RES ', ..., 'WAT ', 'WAT ', 'WAT '], dtype='<U4'),
 'atomtypes': array(['H4', 'C1', 'C2', ..., 'O', 'H1', 'H2'], dtype='<U3'),
 'residuenumbers': array([ 445,  889,  889, ..., 1549, 1549, 1549])}

In [22]:
backboneDict = {}
atom, connections = chainDict[0][0]
if data['atomtypes'][atom] in ['C1', 'C2']:
    print('nice')

nice


Get all backbone atoms (identified by atomtype here). Mark the ones that should be targeted for crosslinking (identifiable by their connections?). In actuality, arbitrary which one to target, as i'll be inserting an entire vinyl unit...?

check configuration in https://pubs.rsc.org/en/content/articlelanding/2009/sm/b816443f/unauth#!divRelatedContent&articles


In [10]:
component_list

array([   0,    0,    0, ..., 1075, 1075, 1075], dtype=int32)

In [11]:
molIdxs = [ i for i in range(len(component_list)) if component_list[i] == molIdx ]

In [12]:
len(molDict)

1076

In [14]:
for molecule in molDict:
    if len(molDict[molecule]) > 10:
        print(len(molDict[molecule]), molecule)

306 0
306 1
306 2
306 3
306 4
306 5
306 6
306 7


In [18]:
mol = ase.io.read('pnipaam.test.pdb')

In [19]:
chain0 = ase.Atoms([mol[i] for i in chainDict[0]])

In [20]:
atoms = [i.index for i in mol]

In [22]:
len(data['atomtypes'])

5787

In [23]:
data['positions']

array([[34.844, 15.755, 17.225],
       [34.997, 16.74 , 16.783],
       [33.765, 17.683, 17.065],
       ...,
       [33.489, 34.81 , 35.648],
       [34.282, 34.315, 35.44 ],
       [33.783, 35.489, 36.255]])

In [24]:
mol_df = pd.DataFrame(data['positions'])
mol_df = mol_df.rename({0: "x_pos", 1: "y_pos", 2: "z_pos"}, axis=1)
mol_df['molecule'] = component_list
mol_df['res_name'] = data['residuenames']
mol_df['atomic_number'] = data['numbers']
mol_df['atom_type'] = data['atomtypes']
chain_df = mol_df[mol_df['res_name'] != "WAT "]
# compute number of connections to each molecule
chain_df['n_bonds'] = [matrix[i].sum() for i in chain_df.index]
chain_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == '__main__':


Unnamed: 0,x_pos,y_pos,z_pos,molecule,res_name,atomic_number,atom_type,n_bonds
0,34.844,15.755,17.225,0,RES,1,H4,1
1,34.997,16.740,16.783,0,RES,6,C1,4
2,33.765,17.683,17.065,0,RES,6,C2,4
3,35.127,16.630,15.707,0,RES,1,H3,1
4,35.894,17.095,17.292,0,RES,1,H5,1
...,...,...,...,...,...,...,...,...
2443,23.777,37.110,-0.051,7,RES,1,H18,1
2444,24.339,35.981,-2.157,7,RES,1,H19,1
2445,24.602,34.831,-0.841,7,RES,1,H20,1
2446,25.974,35.651,-1.608,7,RES,1,H21,1


In [25]:
set(chain_df['atom_type'])

{'C1',
 'C11',
 'C13',
 'C17',
 'C2',
 'C6',
 'H12',
 'H14',
 'H15',
 'H16',
 'H18',
 'H19',
 'H20',
 'H21',
 'H3',
 'H4',
 'H5',
 'H7',
 'H8',
 'N10',
 'O9'}

In [26]:
chain_df[chain_df['atom_type'] == "C1"]

Unnamed: 0,x_pos,y_pos,z_pos,molecule,res_name,atomic_number,atom_type,n_bonds
1,34.997,16.740,16.783,0,RES,6,C1,4
20,34.226,19.135,17.020,0,RES,6,C1,4
39,35.933,20.746,18.178,0,RES,6,C1,4
58,38.277,21.750,18.299,0,RES,6,C1,4
77,39.235,20.927,15.936,0,RES,6,C1,4
...,...,...,...,...,...,...,...,...
2352,20.968,34.424,10.386,7,RES,6,C1,4
2371,23.419,35.262,10.716,7,RES,6,C1,4
2390,23.673,33.980,8.366,7,RES,6,C1,4
2409,25.573,34.736,6.770,7,RES,6,C1,4


In [27]:
mol_df = pd.DataFrame(m)
mol_df

NameError: name 'm' is not defined

In [None]:
chainDict

In [None]:
matrix[0].count_nonzero()

In [None]:
matrix[0].keys()

In [None]:
matrix[1].keys()

In [None]:
matrix[310].keys()

In [None]:
mol

In [None]:
mol[0]