Skip to content

Assigns the head and tail atom position in a monomeric unit or in a polymerization reaction, both represented in SMILES string format

License

Notifications You must be signed in to change notification settings

IBM/HeadTailAssign

Repository files navigation

Head and Tail assignment

HTA logo

A software that assign the head and tail of polymers.

Documentation

Installation

  • Download the code file to your desired directory and unzip it.

To create the working environment run:

conda env create -f environment.yml

GAMESS is also a requirement. This software was developed to use GAMESS version 2020-R2 and 2024-R2

This version of HTA only runs in Windows OS. To run on Linux OS some changes need to be done on the code. Please, refer to observations for a detailed explanation of the changes.

Running the script

Input file

  • The input file provided should contain all the data in one csv file. One example of input file containing the monomer precursors is located in input.csv

  • If the user wants to provide the reaction smiles: The csv file should have two columns: 'name', 'reaction'. The 'name' indicates the name of the polymer and the 'reaction' should represent the polymerization reaction. One example of input for reactions:

  'name','reaction'
  'polyisobutylene succinic anhydride', '[C]1(=[O])[O][C](=[O])[CH]=[CH]1.[C]1([CH3])[C]([CH3])=[CH][CH]=[CH][CH]=1>>[CH3][C]([CH2][CH]1[C](=[O])[O][C](=[O])[CH2]1)=[CH2]'
  • If the user wants to provide the monomer: The csv file should have two columns: 'name', 'monomers_list'. The 'name' indicates the name of the polymer and the 'monomers_list' should represent the monomers if homopolymer or list of the monomers separated by '.' if copolymer. One example of input for reactions: One example of input for monomers:
'name', 'monomers_list'
'Poly(vinyl formate)','O=COC=C'
'Poly(trimethylene succinate)','OC(=O)CCC(=O)O.OCCCO'
  • The main.py file has all the steps needed to run the code.

To run the code, type at your terminal:

python main.py [data.csv]

More information about the functions can be found at the HeadTailAssign module

Output file

  • The output file is an csv file with the following structure:
"name","monomers_list","monomers","polymer_id","classes","mechanism","head_tail"
"Nylon 3 - Poly(propiolactam)","C1CNC1=O","['C1CNC1=O']","polymer_1","polyamide","polycondensation","CN([*:1])C([*:2])=O"
"Poly(hexylene succinate)","OC(=O)CCC(=O)O.OCCCCCCO","['OC(=O)CCC(=O)O', 'OCCCCCCO']","polymer_36","polyester","polycondensation","O=C([*:1])CCC(=O)OCCCCCCO[*:2]"

in which,

"name": Name of the polymer

"monomers_list": Monomer precursors provided by the user

"monomers": Monomer precursors treated by HTA

"polymer_id": Identification number generated by HTA

"classes": The predicted class of the monomer

"mechanism": The predicted mechanism that the monomers goes through to polymerize

"head_tail": The SMILES string with the head an tail assigned.

Examples on how to run HTA

  • In the folder examples you can find examples on how to run HTA. You can start using the interactive example, which can be followed in a Jupyter notebook and move to the non-interactive example that uses the terminal to run data in batch.

Observations

  • Be aware that the Find Monomer method works better with FingerprintSimilarity() metric set to AllBitSimilarity, please change the metric at: 'rdkit\DataStructs__init__.py'

  • Add copolymerizations with products separated by '.'

  • Validation tests were performed with theory level HF/STO-3G, but other theory levels can be implemented. To implement a new theory level, add a new block of code in the method "_generate_gamess_input_file()" located in gamess_helper.py. You should modify all the lines shown in the following code to generate the .inp file for GAMESS with the desired theory level:

        if (runtype == 'scf' and basis == 'sto3g'):
            output_file.write('!   File created by MacMolPlt 7.7.2 \n')
            output_file.write(' $CONTRL SCFTYP=RHF RUNTYP=ENERGY MAXIT=30 MULT=1 $END \n')
            output_file.write(' $SYSTEM TIMLIM=525600 MWORDS=50 MEMDDI=200 $END \n')
            output_file.write(' $BASIS GBASIS=STO NGAUSS=3 $END \n')
            output_file.write(' $SCF DIRSCF=.TRUE. $END')
            output_file.write('\n')
  • To run on Linux OS:

    extractor.py

    Line 31 replace "\\" for "/"
    Line 32 replace "\\" for "/"
    Line 85  replace "\\" for "/"
    Line 86  replace "\\" for "/"
    

    assigner.py

    Line 236  replace "\\" for "/"
    Line 278 replace "\\" for "/"
    Line 310  replace "\\" for "/"
    

    gamess.py

    Line 132  replace "\\" for "/"
    Line 134 replace "os.system('python ././run_gamess_job.py '+path+' 2')" for "os.system('python ././run_gamess_job_linux.py '+path+' 2')"
    

Authorship

About

Assigns the head and tail atom position in a monomeric unit or in a polymerization reaction, both represented in SMILES string format

Resources

License

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages