# Exploring Correlation Methods and DFT

Today we are going to get some intuition on the interplay of accuracy and computational that is often an issue when dealing with computational problems.

We will be looking at some precomputed calculation that were made on a professional Quantum Chemistry software package (**Orca**).

For correlation methods we will focus our attention on the energetics of methane:

![](files/methane.png)

For dft we will focus our attention on the dipole moment of Hydroquinone, remember that energetics are not directly comparable with DFT since these are under a different Hamiltonian. When comparing relative energy values are used, in this case we will use dipole moments.

![](files/hydroquinone.png)



First let's load some preliminaries
## <i class="fa fa-book"></i>  Preliminaries

In [None]:
# our bread and butter
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# add all probables spots for the quantum world library
import sys
sys.path.append('../library')
sys.path.append('../../library')
sys.path.append('/home/student/chem160/library')
#This is how we'll import our own home-made modules
import quantumWorld as qworld
qworld.fancy_plotting()
# convenient units
hartree_to_kcal = 627.503

## Part 1: Loading the data

Our data is stored in python **pickle** format, let's load it:

In [None]:
import pickle
afile = open('files/data_methane.pckl')
data = pickle.load(afile )
afile.close()

Once we load it it will be a dictionary of dictionaries, exploring the following correlation methods:

* HF
* MP2
* QCISD
* CCSD
* CCSD(T)
* DLPNO-CCSD

Each method can be accesed via:

```python
data[method]
```
for example 'MP2' calculation data is:

```python
data['MP2']
```

### Try loading one type of method, what type of data is inside?

Each method has information on various aspects of the calculations:

* **Basis_set**, basis set  used.
* **Nbasis**, number of basis functions.
* **Energy**, final energy calculation.
* **Ctime**, computing time in seconds.
* **E_corr**, correlation energy recovered, remember $E_{corr} = E_{method}-E_{HF}$
* **E_corr_percent**, percent of correlation energy recovered, in this case, comparing with the exact energy. In formula this would be $ \frac{|E_{corr}|}{|E_{exact}-E_{HF}|} \times 100$

Each of these elements is an array, so for example if you wanted to access the Calculation times for MP2 methods, you would use:


```python
data['MP2']['Ctime']
```

# Try it out!

## Part 2 : Finding trends

We will do plotting and curve fitting to figure out the trends for multiple methods.

For this we have the utility function **qworld.polynomial_fit(x,y)** which recieves as input an array vector **x** and **y** of data and will return a array **x_fit, y_fit** and **label_fit** which represents the fitted polynomial.

### Fitting a trend example

In [None]:
# set the data
x = np.linspace(0,100,100)
y = 0.3 * np.power(x,2.5)*(1+0.3*np.sin(x))
# get the fit
x_fit,y_fit,label_fit = qworld.polynomial_fit(x,y)
#plotting stuff
plt.plot(x,y,'o',label='Data')
plt.plot(x_fit,y_fit,label=label_fit)
plt.xlabel('$x$')
plt.ylabel('$f(x)$')
plt.title('Polynomial fit of trend')
plt.legend(loc='best')
plt.show()

## Your mission: <br> find out trends for multiple variables <i class="fa fa-line-chart"></i>
Mainly using x as a dimention for the number of basis sets, for multiple methods, investigate:

* Scaling factor for computing times
* Correlation energy retrieved.
* Percent of Correlation energy calculated.


### <i class="fa fa-line-chart"></i> Effect on computing time


In [None]:
for method in data.keys():
    x = data[method]['Nbasis']
    ###fill the data to be plotted on y
    
    ###fit the data
    
    ##make plots


plt.xlabel('Number of Basis sets')
plt.ylabel('Computing time $(s)$')
plt.title('Polynomial fit of trend')
plt.legend(loc='best',ncol=2,prop={'size':16})
plt.show()

### <i class="fa fa-line-chart"></i> Effect on correlation energy


In [None]:

plt.xlabel('Number of Basis sets')
plt.ylabel('$E_{corr}$')
plt.title('Polynomial fit of trend')
plt.legend(loc='best',ncol=2,prop={'size':16})
plt.show()

### <i class="fa fa-line-chart"></i> Effect on percent of correlation energy

In [None]:

plt.xlabel('Number of Basis sets')
plt.ylabel('$E_{corr}$%')
plt.title('Polynomial fit of trend')
plt.legend(loc='best',ncol=2,prop={'size':16})
plt.show()

### <i class="fa fa-question-circle"></i> Questions

* Any other ideas on possible interesting trends to look at?
* What would be the sweet spot between accurate and still not to expensive?

## Part 3: Loading the DFT data

Our data is stored in python **pickle** format, let's load it:

In [None]:
import pickle
afile = open('files/dft_methane.pckl')
dft = pickle.load(afile )
afile.close()

This dictionary will have the following functionals:

* LDA
* BP86
* VWN
* PBE
* B3LYP 
* PBE0 
* TPSS 
* TPSS0 
* M06-2X 
* M06L
* B2PLYP
* mPW2PLYP 
* PWPB95

Each method has information on various aspects of the calculations:

* **Basis_set**, basis set  used.
* **Nbasis**, number of basis functions.
* **DipoleM**, final dipole moment.
* **Ctime**, computing time in seconds.
* **Erorr**, difference between dipole moment and experimental value $Error = |E_{method}-E_{Exp}|$


Each of these elements is an array, so for example if you wanted to access the Calculation times for MP2 methods, you would use:


```python
data['B3LYP']['Ctime']
```

# Try it out!

## Part 4 : Finding trends

We will do plotting and curve fitting to figure out the trends for multiple methods.

## Your mission: <br> find out trends for multiple variables <i class="fa fa-line-chart"></i>
Mainly using x as a dimention for the number of basis sets, for multiple methods, investigate:

* Scaling factor for computing times
* Error


### <i class="fa fa-line-chart"></i> Effect on computing time



### <i class="fa fa-line-chart"></i>  Effect on accuracy

### <i class="fa fa-question-circle"></i> Questions

* Any other ideas on possible interesting trends to look at?
* What would be the sweet spot between accurate and still not to expensive?
* How do DFT methods comapre with CC methods in computing time-wise?

# DFT in Pyquante
In the following part, we'll run 3 different DFT functionals: 1 that performs the local density approximation (SVWN), one that performs the generalized gradient approximation (AM05), and one that is a 'hybrid' functional (BLYP).

In [None]:
from PyQuante.Molecule import Molecule
from PyQuante.dft import *
from PyQuante import configure_output
import PyQuante.DFunctionals as dfun 
import time

configure_output()

###  Define a molecule here

In [None]:
mol=Molecule('mol',
             atomlist =  )


## LDA functional

In [None]:
start_time = time.time()
en,orbe,orbs = dft(mol,functional='LDA')
lda_time = time.time() - start_time
print(lda_time)

## AM05: The GGA one

In [None]:
start_time = time.time()
en,orbe,orbs = dft(mol,functional='AM05')
am05_time = time.time() - start_time
print(am05_time)

### BLYP: The Hybrid One

In [None]:
start_time = time.time()
en,orbe,orbs = dft(mol,functional='BLYP')
blyp_time = time.time() - start_time
print(blyp_time)