# Introduction to programming - Assignment 2017/2018

**Submit to Blackboard before Friday, 2nd of March at 18:00.**

**5 marks attainable, corresponding to 40% of the course grade (provided the attendance criteria have been met)**

Disufide bonds between polymer chains determine their molecular structure, properties and function. In polypeptides they are responsible for protein's tertiary and quaternary structure (as well as your [hair style](http://dx.doi.org/10.1039/B604537P)), and in artificial rubber they can improve the performance of tyres.

In this exercise you should write a standalone program that when executed prompts the user for the name of an XYZ file, and after the user enters this name the program prints out all disulfide bonds present in the structure with no repetitions. The output should have the following format:

    : S(label1)-S(label2)
    : S(label3)-S(label4)
    : ...

where *labelX* is the position of the atom in the sequence of the XYZ file, starting with 1 for the first atom up to the total number of atoms in the structure (this is the label displayed by molecular visualisers such as Avogadro). For example:

    : S(32)-S(137)
    : S(437)-S(99)
    : ...
    
It is important that the start with a colon (:), and the labels are enclose in round brackets, as this will be used to test the output of your program. Your program can output any other text you wish above or below your result (as long as it does *not* start with a colon).

<a href="http://openbabel.org/wiki/XYZ_(format)#Additional_Comments">XYZ files</a> contain information about atom positions but have no information about chemical bonding. Chemical bond information must be determined by comparing interatomic distances to the sum of the covalent radius of the atoms involved. We will take the covalent radius of sulfur to be 1.2&Aring; (this is bigger than the actual value, but we want to make sure we identify stretched bonds which are likely to show up in proteins).

Accompanying this notebook is the file <a href="neuraminidase.xyz">neuraminidase.xyz</a> containing the structure of Neuraminidase protein found in the surface of the influenza virus, together with a non-sulfur-containing molecule and
crystallization water molecules. Full marks will be awarded to any implementation that produces the correct result against the neuraminidase.xyz file and a blind test case.

If your implementation is functional (i.e. your program does not crash) but does not fully work for a generic case, partial marks will be awarded for correct implementation of the functions below.

## Implementation suggestion

A working solution can be obtained by appropriately combining the following functions.

### parse_line(*number*,*string*) - 1 mark

This function should receive one number and one string (a line of the XYZ file) as its arguments, and return a list of the form:

    [number,element_symbol,coordinate_list]

where number is the number given as argument to the function, element_symbol is a string with the chemical symbol letter(s), and coordinate_list is a list of numbers with the atom coordinates. For example:

    [372,'C',[4.36500, 10.95900, 46.49400]]

### distance(*list1*,*list2*) - 1 mark

This function receives two lists as arguments, each list contains the coordinates of a point in 3D space. The function should return the distance between these two points in space.

### is_bridge(*list1*,*list2*) - 1 mark

This function receives two lists as arguments, each in the same form as the output of function *parse_line()*. It should return the boolean True if the two atoms represented by list1 and list2 correspond to bonded sulfur atoms. It should return False if that is not the case.

### formatter(*list1*,*list2*) - 1 mark

This function receives two lists as arguments, each in the same form as the output of function *parse_line()* (they can correspond to any atom, not necessarily sulfur), and return a string of the form:

    ': element_symbol1(number1)-element_symbol2(number2)'
    
For example:

    ': Cl(2)-Ca(1)'

## Instructions

You should submit to Blackboard a single file called bridgeS.py, containing a standalone Python script with your program. Your program should not crash when given a well formatted XYZ file. Any program that does not fulfil this criteria will not be considered.

In the same zip archive you will find a template for bridgeS.py. Separate the function definition from the rest of your program which you should write inside the *if* block provided.

If you choose to follow the suggested implementation, in order to qualify for partial marks, it is important that you define the functions name and argument order *exactly* as listed above. We will test that parse_line(), distance(), is_bridge() and formatter() are present and behave appropriately. Functions that work correctly but are called different names will not result in partial marks being awarded.

If you encounter difficulties, you are invited to ask questions in the forum set up for the course on Blackboard.

In [11]:
#Template for a program to identify disulfide bridges on an XYZ file

#Write first any import statements that you need in your program
%pylab inline



#Define here any functions you will be using
def parse_line(number, string):
    "Return a list of number, element symbol and coordinates"
    l=string.split()
    sym=str(l[0])
    coordx=float(l[1])
    coordy=float(l[2])
    coordz=float(l[3])
    coordinates=[coordx, coordy, coordz]
    parse=[number,sym,coordinates]
    return parse

def distance(list1, list2):
    "Give the distance between 2 sets of coordinates"
    x1=float(list1[0])
    y1=float(list1[1])
    z1=float(list1[2])
    x2=float(list2[0])
    y2=float(list2[1])
    z2=float(list2[2])
    dist = ( (x1-x2)**2 + (y1-y2)**2 + (z1-z2)**2 )**0.5
    return dist

def isBridge(list1, list2):
    "Is there a bond?"
    if distance(list1, list2) <= 2.4 and distance(list1, list2)!=0:
        bond = True
    else:
        bond = False
    return bond

def formatter(list1, list2):
    "Show bonded atoms"
    el_sym1= str(list1[1])
    num1=str(list1[0])
    el_sym2= str(list2[1])
    num2=str(list2[0])
    lin=":"+el_sym1+"("+num1+")-"+el_sym2+"("+num2+")"
    return lin


#The following if statement does not affect the working of the program,
#but will allow to test your functions even if the rest of the program
#does not work

if __name__=="__main__":

    #Remove the pass statement below and write the rest of the
    #program inside this if block
    #Do not forget identation

    pass

Populating the interactive namespace from numpy and matplotlib


`%matplotlib` prevents importing * from pylab and numpy
  "\n`%matplotlib` prevents importing * from pylab and numpy"


In [30]:
file_in=input("Filename?")
with open(file_in, 'r') as f:
    f1=list(f)
    f2=f1[2:]
    s_data=[]
    number=0
    for line in f2:
        number=number+1
        if "S" in line.split():
            #number= len(list)[-1]-1
            v=parse_line(number,line)
            s_data=s_data +[v]
s_data

Filename?neuraminidase.xyz


[[88, 'S', [6.558, 30.111, 40.407]],
 [343, 'S', [8.319, 17.844, 45.514]],
 [381, 'S', [6.413, 17.091, 45.67]],
 [763, 'S', [20.729, 6.04, 51.614]],
 [820, 'S', [19.757, 14.621, 43.733]],
 [876, 'S', [23.352, 10.688, 43.037]],
 [897, 'S', [21.333, 6.8, 53.433]],
 [1197, 'S', [20.771, 14.863, 45.482]],
 [1210, 'S', [19.432, 15.01, 40.16]],
 [1245, 'S', [21.113, 13.956, 39.727]],
 [1566, 'S', [26.882, 23.445, 50.88]],
 [1578, 'S', [22.802, 22.474, 47.311]],
 [1649, 'S', [24.07, 23.827, 46.386]],
 [1662, 'S', [26.671, 25.537, 51.099]],
 [1823, 'S', [29.995, 17.046, 35.201]],
 [1889, 'S', [28.154, 40.545, 48.999]],
 [2021, 'S', [29.901, 41.51, 49.461]],
 [2321, 'S', [12.079, 31.787, 55.138]],
 [2580, 'S', [12.2, 26.388, 43.431]],
 [2648, 'S', [6.277, 30.682, 38.457]],
 [2682, 'S', [7.762, 24.415, 47.562]],
 [2898, 'S', [4.655, 27.638, 56.151]],
 [2899, 'S', [5.358, 29.024, 56.189]],
 [2907, 'S', [5.966, 24.578, 48.6]]]

In [29]:
bonds=[]
for i in s_data:
    coord=i[2]
    for j in s_data:
        isBridge(coord,j[2])
        if isBridge(coord,j[2])==True:
            bonds=bonds+[formatter(i,j)]
            if formatter(j,i) not in bonds:
                print(formatter(i,j))

:S(90)-S(2650)
:S(345)-S(383)
:S(765)-S(899)
:S(822)-S(1199)
:S(1212)-S(1247)
:S(1568)-S(1664)
:S(1580)-S(1651)
:S(1891)-S(2023)
:S(2684)-S(2909)
:S(2900)-S(2901)
