# MolSSI Workshop
## 2) Parsing files

In [2]:
ls data

03_Prod.mdout              distance_data_headers.csv
Untitled.ipynb             [34moutfiles[m[m/
benzene.xyz                sapt.out
buckminsterfullerene.xyz   water.xyz


In [3]:
pwd

'/Users/Sergi/Desktop/cms-workshop'

Notice that the file paths are different for these two systems. The Windows system uses a backslash (‘\’), while Mac and Linux use a forward slash (‘/’) for filepaths.

When we write a script, we want it to be usable on any operating system, thus we will use a python module called os.path that will allow us to define file paths in a general way.

In order to get the path to the ethanol.out file in a general way, type

In [16]:
import os

ethanol_file = os.path.join('data', 'outfiles', 'ethanol.out')
print(ethanol_file)

data/outfiles/ethanol.out


(EL codi d'abans probablement funciona pq aquets Jupyter Notebook ja esta dins la carpeta cms-workshop. Si no ho estigues, potser abans de 'data' s'hauria d'especificar tot el directior (ex: Uesers/Desktop))

File paths can be absolute, or relative.

A relative file path gives the location relative to the directory we are in. Thus, if we are in the cms-workshop directory, the relative filepath for the ethanol.out file would be data/ethanol.out

An absolute filepath gives the complete path to a file. This could file path could be used from anywhere on a computer, and would access the same file. For example, the absolute filepath to the ethanol.out file on a Mac might be Users/YOUR_USER_NAME/Desktop/cms-workshop/data/ethanol.out. You can get the absolute path of a file using os.path.abspath(path), where path is the relative path of the file.

In [15]:
import os

ethanol_file = os.path.abspath('ethanol.out')
print(ethanol_file)

/Users/Sergi/Desktop/cms-workshop/ethanol.out


Python pathlib

We are working with the os.path module here, and this is how you will see people handle file paths in most Python code. However, as of Python 3.6, there is also a pathlib module in the Python standard library that can be used to represent and manipulate filepaths. os.path works with filepaths as strings, while in the pathlib module, paths are objects. A good overview of the pathlib module can be found here.

## Reading a file

In [20]:
outfile = open(ethanol_file,"r")
data = outfile.readlines()
outfile.close()


An alternative way to open a file.
Alternatively, you can open a file using context-manager. In this case, the context manager will automatically handle closing of the file. To use a context manager to open and close the file, you use the word with, and put everything you want to be done while the file is open in an indented block.

In [21]:
with open(ethanol_file,"r") as outfile:
    data = outfile.readlines()

In [23]:
print(len(data))

270


## Searching for a pattern in your file


In [24]:
for line in data:
    print(line)



    -----------------------------------------------------------------------

          Psi4: An Open-Source Ab Initio Electronic Structure Package

                               Psi4 1.1 release



                         Git: Rev {HEAD} add49b9 





    R. M. Parrish, L. A. Burns, D. G. A. Smith, A. C. Simmonett,

    A. E. DePrince III, E. G. Hohenstein, U. Bozkaya, A. Yu. Sokolov,

    R. Di Remigio, R. M. Richard, J. F. Gonthier, A. M. James,

    H. R. McAlexander, A. Kumar, M. Saitow, X. Wang, B. P. Pritchard,

    P. Verma, H. F. Schaefer III, K. Patkowski, R. A. King, E. F. Valeev,

    F. A. Evangelista, J. M. Turney, T. D. Crawford, and C. D. Sherrill,

    J. Chem. Theory Comput. in press (2017).

    (doi: 10.1021/acs.jctc.7b00174)



    -----------------------------------------------------------------------





    Psi4 started on: Tuesday, 27 June 2017 12:10PM



    Process ID:  10591

    PSIDATADIR: /Users/armcdona/psi4conda/share/psi4

    Memory:     500.0 MiB

In [25]:
for line in data:
    if 'Final Energy' in line:
        energy_line = line
        print(energy_line)

  @DF-RHF Final Energy:  -154.09130176573018



In [27]:
energy_line.split()

['@DF-RHF', 'Final', 'Energy:', '-154.09130176573018']

In [28]:
energy_line.split(':')


['  @DF-RHF Final Energy', '  -154.09130176573018\n']

In [29]:
words = energy_line.split()
print(words)

['@DF-RHF', 'Final', 'Energy:', '-154.09130176573018']


In [31]:
energy = float(words[3])
print(energy)

-154.09130176573018


## Exercise

In [4]:
import os

sapt_file = os.path.join('data','sapt.out')
print(sapt_file)

data/sapt.out


In [11]:
outfile = open(sapt_file,"r")
data_sapt = outfile.readlines()
outfile.close()


In [30]:
for line in data_sapt:
    
    if "Electrostatics  " in line:
        line_split=line.split()
        Electrostatics=float(line_split[3])
        print("Electrostatics : ", Electrostatics, "kcal/mol")
        
    elif "Exchange  " in line:
        line_split=line.split()
        Exchange=float(line_split[3])
        print("Exchange : ", Exchange, "kcal/mol")
        
    elif "Induction  " in line:
        line_split=line.split()
        Induction=float(line_split[3])
        print("Induction : ", Induction, "kcal/mol")
        
    elif "Dispersion  " in line:
        line_split=line.split()
        Dispersion=float(line_split[3])
        print("Dispersion : ", Dispersion, "kcal/mol")

Total_Energy=Electrostatics+Exchange+Induction+Dispersion
print ("Total Energy : ", Total_Energy, "kcal/mol")
    

Electrostatics :  -2.25850118 kcal/mol
Exchange :  2.27730198 kcal/mol
Induction :  -0.5216933 kcal/mol
Dispersion :  -0.9446677 kcal/mol
Total Energy :  -1.4475602000000003 kcal/mol


In [37]:
important_lines=[]
energies=[]

for line in data_sapt:
    if "Electrostatics  " in line:
        Electro_line=line
        important_lines.append(Electro_line)
    
    if "Exchange  " in line:
        Exchange_line=line
        important_lines.append(Exchange_line)
        
    if "Induction  " in line:
        Induction_line=line
        important_lines.append(Induction_line)
        
    if "Dispersion  " in line:
        Dispersion_line=line
        important_lines.append(Dispersion_line)
        
        
        
for line in important_lines:
    words=line.split()
    energy_type=str(words[0])
    energy_value=float(words[3])
    energies.append(energy_value)
    print('{} : {} kcal/mol'.format(energy_type, energy_value))


total_energy=sum(energies)
print('Total Energy : {} kcal/mol'.format(total_energy))


Electrostatics : -2.25850118 kcal/mol
Exchange : 2.27730198 kcal/mol
Induction : -0.5216933 kcal/mol
Dispersion : -0.9446677 kcal/mol
Total Energy : -1.4475602000000003 kcal/mol


## Searching for a particular line number in your file
