## Coulomb Matrix Representation

In [None]:
import numpy as np
from math import sqrt

## Useful Resources
 - [Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning](https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.108.058301)
 - [Prediction Errors of Molecular Machine learning Models lower than Hybrid DFT Error](https://pubs.acs.org/doi/abs/10.1021/acs.jctc.7b00577)
 - [Understanding molecular representations in machine learning: The role of uniqueness and target similarity](https://aip.scitation.org/doi/10.1063/1.4964627)

## Introduction
For machine learning there needs to be some way to represent the data to the model in a way in which the model can infer knowledge about the data and use it for future predictions. In chemistry, the data we are trying to represent are molecules with the information we are trying to teach the model are property values for those molecules. The goal is to represent the molecule in a way that provides a detailed enough description about the underlying physics of the molecule in order to accurately predict the properties of the molecule. This has led to a lot of work to determine how to best represent the molecule for the model to learn from. One of the most simplistic ways to describe the molecule is what we are going to work on today, the Coulomb matrix.

## General Theory
The Coulomb matrix is one of the more simplistic representations used to describe the molecule. The Coulomb matrix is a square matrix with diagonal elements being the electronic potential energy of the atom and off diagonal elements being the Coulomb nuclear repulsion between atom I and J.   

$$M_{IJ} =\begin{cases}0.5Z_{I}^{2.4} &\text{for } I = J, \\ \frac{Z_I Z_J}{\left | R_I - R_J \right |} &\text{for } I \neq J.\end{cases} $$
 
## Setup
1. Parse file for atoms and coordinates
2. Build Coulomb Matrix

In [None]:
file = open('methane.xyz', 'r')

doc = []
for line in file:
    doc.append(line)

In [None]:
# read number of atoms
natoms = int(doc[0].split()[0])

# parse coordinates
coords = []
for i in range(natoms):
    a_coords = doc[i + 2].split()[0:4]
    coords.append(a_coords)

coords

## What do we need for CM?
1. Nuclear charges
2. Calculate when $I = J$
3. Calculate when $I \neq J$
4. Output lower triangle of matrix

In [None]:
# nuclear charges

# build CM matrix

# return the lower trinagle of the CM as a vector
mat = mat[np.tril_indices(natoms)]

In [None]:
mat

## If this interests you, feel free to help out with [chemreps](https://github.com/dlf57/chemreps)!