# Homework - Geometry Analysis Project - Extension 1

You can also see the homework assignment on the [Tabular Data](https://education.molssi.org/python_scripting_cms/04-tabular_data/index.html) lesson under "Geometry Analysis Project". Note that there are several project extensions for you to try if you finish the first assignment.

*Assignment*: In the lesson materials, there is a file in the `data` folder called “water.xyz”. This is a very simple, standard file format that is often used to distribute molecular coordinates. The first line of the file is the number of atoms in the molecule, the second line is a title line (or may be blank), and the coordinates begin on the third line. The format of the coordinates is

```Atom_Label  XCoor   YCoor   ZCoor```

and the default units (which are used in this example) are angstroms.

Write a code to read in the information from the xyz file and determine the bond lengths between all the atoms. There is a numpy function to take the square root, `numpy.sqrt()`. To raise a number to a power, use \*\*, as in 3\*\*2 = 9. Your code output should look something like this.

    O to O : 0.0
    O to H1 : 0.969
    O to H2 : 0.969
    H1 to O : 0.969
    H1 to H1 : 0.0
    H1 to H2 : 1.527
    H2 to O : 0.969
    H2 to H1 : 1.527
    H2 to H2 : 0.0


Hint: You will need a double for loop to measure the distance between all the atoms. If you aren’t sure how to get started, print the variables inside your for loop.

## Extension 1

Your initial project calculated the distance between every set of atoms. However, some of these atoms aren’t really bonded to each other. H1 and H2 are not bonded for example, and all of the distances between an atom and itself are zero. Use a distance cutoff of 1.5 angstroms to define a bond (that is, if the bond length is greater than 1.5 angstroms, consider the atoms not bonded). Modify your code to only print the atoms that are actually bonded to each other.

In [1]:
import os
import numpy as np

The real coordinates start from third rows, therefore skip 2 header using `skip_header` option.

In [2]:
water_file = os.path.join('data', 'water.xyz')
water = np.genfromtxt(fname=water_file, skip_header=2, dtype='unicode')
print(water)

[['O' '0.000000' '-0.007156' '0.965491']
 ['H1' '-0.000000' '0.001486' '-0.003471']
 ['H2' '0.000000' '0.931026' '1.207929']]


In [3]:
print(water[:,1:])

[['0.000000' '-0.007156' '0.965491']
 ['-0.000000' '0.001486' '-0.003471']
 ['0.000000' '0.931026' '1.207929']]


Put atom names into `atoms` list, while coordinates must be extracted with slicing and converted to `np.float` type then stored in `coordinates` numpy array. `atoms` should be in list object type to retrieve the atom `index` in looping below.

In [4]:
atoms = list(water[:,0])
coordinates = water[:,1:].astype(np.float)
print(F'This is list of atoms:\n{atoms}\n')
print(F'This is numpy array of coordinates:\n{coordinates}')

This is list of atoms:
['O', 'H1', 'H2']

This is numpy array of coordinates:
[[ 0.       -0.007156  0.965491]
 [-0.        0.001486 -0.003471]
 [ 0.        0.931026  1.207929]]


In [5]:
# Just wanna make sure if the formula is correct
# compare the results against the homework instruction

OH1_distance = np.sqrt(np.sum((coordinates[0]-coordinates[1])**2))  # O - H1
OH2_distance = np.sqrt(np.sum((coordinates[0]-coordinates[2])**2))  # O - H2
H1H2_distance = np.sqrt(np.sum((coordinates[1]-coordinates[2])**2))  # H1 - H2
print(F'O-H1: {OH1_distance}\nO-H2: {OH2_distance}\nH1-H2: {H1H2_distance}')

O-H1: 0.9690005374652793
O-H2: 0.9690003348647513
H1-H2: 1.52693633514957


In [12]:
for atom1 in atoms:
    for atom2 in atoms:
        a1 = atoms.index(atom1)
        a2 = atoms.index(atom2)
        coor1 = coordinates[a1]
        coor2 = coordinates[a2]
        distance = np.sqrt(np.sum((coor1-coor2)**2))
        if 0 < distance <= 1.5:
            print(F'{atom1:2} to {atom2:2} : {distance:.3f}')

O  to H1 : 0.969
O  to H2 : 0.969
H1 to O  : 0.969
H2 to O  : 0.969


In [7]:
import getpass
from time import strftime, localtime

lt = localtime()
uname = getpass.getuser()
print(F'Last Updated On: {strftime("%Y-%m-%d %H:%M")} {lt.tm_zone}\nBy: {uname}')

Last Updated On: 2020-06-08 11:08 WIB
By: radifar
