<a href="https://colab.research.google.com/github/vinayak2019/Parsing_Files/blob/main/Parsing_files.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

We will use pymatgen, a python package, to parse the files from VASP and Gaussian. pymatgen is well-documented which makes it easy to use.

In [None]:
# install pymatgen
!pip install pymatgen

In [None]:
# fetch the required files from GitHub
!git clone https://github.com/vinayak2019/Parsing_Files
!tar -xf  Parsing_Files/vasp/vasprun.xml.tar.gz -C Parsing_Files/vasp/

# **Parsing VASP files**

The Vienna Ab initio Simulation Package: atomic scale materials modelling from first principles.
(https://www.vasp.at/)

In [None]:
from pymatgen.io.vasp import Vasprun

In [None]:
vasprun = Vasprun("/content/Parsing_Files/vasp/vasprun.xml")



In [None]:
dir(vasprun)

In [None]:
# check whether calculation is converged
vasprun.converged_electronic

True

In [None]:
# get final energy
vasprun.final_energy

  "Final e_wo_entrp differs from the final "


-612.04228636

In [None]:
# plotting density of states
from pymatgen.electronic_structure.plotter import DosPlotter

tdos = vasprun.tdos
plotter = DosPlotter(sigma=0.1)
plotter.add_dos("Total DOS", tdos)
plotter.show()

In [None]:
from pymatgen.io.vasp import Outcar

outcar = Outcar("/content/Parsing_Files/vasp/OUTCAR")

In [None]:
outcar.run_stats

{'Average memory used (kb)': 0.0,
 'Elapsed time (sec)': 537.543,
 'Maximum memory used (kb)': 411408.0,
 'System time (sec)': 30.046,
 'Total CPU time used (sec)': 534.446,
 'User time (sec)': 504.399,
 'cores': '16'}

# **Parsing Gaussian files**

Gaussian 16 is the latest in the Gaussian series of programs. It provides state-of-the-art capabilities for electronic structure modeling. Gaussian 16 is licensed for a wide variety of computer systems. All versions of Gaussian 16 contain every scientific/modeling feature, and none imposes any artificial limitations on calculations other than your computing resources and patience. https://gaussian.com/gaussian16/

## **pymatgen**

In [None]:
from pymatgen.io.gaussian  import GaussianOutput

In [None]:
# reading the log file
gout = GaussianOutput("/content/Parsing_Files/gaussian/tddft.log")

In [None]:
# looking at all the methods available
dir(gout)

In [None]:
# getting the final energy
gout.final_energy

In [None]:
# final structure
gout.final_structure

In [None]:
# TDDFT excitations
gout.read_excitation_energies()

## **Generic text parsing**

We will use regular expressions for parsing text files.

The process for parsing is as follows - 
1.   Find a unique pattern for the start of parsing segment
2.   Find a pattern for the end of segment
1.   Read the file
2.   Look for the line with start pattern
1.   Starting the parsing code until end pattern is encountered













In [None]:
import re

In [None]:
# We will parse the Mulliken charges for all atoms

# Find the pattern
start_pattern = re.compile(r'')

In [None]:
# The pattern at the end
end_pattern = re.compile(r'')

In [None]:
# read the file
with open() as f:
  lines = f.read()

In [None]:
# line with start pattern

In [None]:
# parse lines

In [None]:
# parsed data

### **Exercise**

Parse the Mulliken charges with hydrogens summed into heavy atoms

In [None]:
# YOUR CODE HERE