# Introduction to Molecular Modelling - Workshop 1

[Dr. Kim Jelfs](mailto:k.jelfs@imperial.ac.uk) and [Dr. João Pedro Malhado](mailto:malhado@imperial.ac.uk), Department of Chemistry, Imperial College London

If text is in <font color="#3399ff">blue</font>, then there is a question related to that text in the Blackboard Quiz for today's Workshop.

First we need to include libraries for plotting and numerical functions by executing

    %pylab inline

In [1]:
%pylab inline

Populating the interactive namespace from numpy and matplotlib


### Quick reminder on Python notebooks:

1. Remember to save your notebook every so often using the Save icon above.

2. To execute cells, use "Shift + Return".

For more, refer back to your Python for Data analysis workshops from last term.

## Exercise 1 - Opening and visualising files in Avogadro

Open up Avogadro: Start $\rightarrow$ All Programs $\rightarrow$  Avogadro. 

**Note**: Open "Avogadro", not "Avogadro2".

We are going to start by looking at some different systems using example structure files. You will find all these files in the folder "Structure_files" within the distribution for this Python notebook.

Here's a summary of what each of the main toolbar's icons do (and if you let the cursor hover over the icon in Avogadro, a short summary of how to use it will be displayed):

<img src="Workshop1_data/Image_files/Avogadro_toolbar_W1.png" style="max-width:100%; width: 60%"/>

To open the structures within Avogadro, go to File $\rightarrow$ Open, then navigate to each of the below files in turn. With each one, you should click on the "Display Settings" icon and then play around with different display types (recommendations for each file are given below). By clicking on the "Spanner" icon, you can edit the default options for these displays. If things go wrong, you can always use "Ctrl-Z" to undo, or "View $\rightarrow$ Reset Display Types", or close Avogadro and reopen to start from fresh.

The first time you open a file, work out how to rotate the view, zoom in and out of the view and translate the view (**Hint**: hover over the "Navigation Tool" for instructions on how to do this) - you'll need to do this for all the structures, to get a true feel of their 3D structure.

1. **1CRN.pdb**: This is the structure of a hydrophobic protein (for more information and the original literature reference, go to http://www.rcsb.org/pdb/explore.do?structureId=1crn). Try display types: "Cartoon" and "Ribbon".

2. **benzene.fchk**: This is the structure of benzene, however, this file contains more than just the information on the atom positions, as it is actually a file that has been output from a quantum mechanical calculation (you will be running these calculations yourself in the Physical Computational Labs with Dr. Hunt later this term). This means we can visualise the molecular orbitals of the benzene molecule. When you open the file, you should have a sidebar "Orbitals" popping up. If you navigate to different orbitals (start with the HOMO and LUMO) and select them, those orbitals should be displayed (if not, try clicking "render").

3. **c60.fchk**: This is a molecular structure of fullerene, C$_{60}$, again it is a file output from a quantum mechanical calculation, so you can visualise the fullerene's molecular orbitals. Use the measurement tool to measure the C-C bond lengths. <font color="#3399ff">How many different C-C bond lengths are in this fullerene structure, what are these bond lengths? </font> What are the C-C-C bond angles - how many are there, are these what you would expect?

4. **caffeine.g03**: This is a molecular structure of caffeine. Try display types: "Dipole", "Ring", "Stick", "van der Waals Spheres", "Wireframe", "Label", also try varying the settings of some of these display types to test their effect.

5. **graphite.xyz**: This is a fragment of a graphite structure. <font color="#3399ff">Can you tell what the stacking type is? AAAA? ABCA? ABAB?</font> (if you're not sure what is meant by these stacking types, ask a demonstrator).

6. **morphine.cml**: This a molecule of the painkiller, morphine. <font color="#3399ff">How many chiral atoms are there in morphine? </font>

7. **pop_n2_opt_freq.fchk**: This is a N$_2$ molecule with the data for visualising the molecular orbitals included. This term you will be learning how you build a molecular orbital diagram with Prof. Vilar - here you can take a look at how they look in 3D. **Optional**: Dr. Hunt provides files for all the molecules whose molecular orbitals you will construct in Year 1 on her website (http://www.huntresearchgroup.org.uk/teaching/year2_calcs.html), so if you want to look at these in 3D while you are studying MO theory with Prof. Vilar, you can download the files and visualise them in Avogadro.

8. **porphyrin.cml**: This is porphine, from the family of porphyrin rings, which are macrocyclic organic molecules. Some of these molecules form complexes with metal ions, whereby the metals are typically in oxidation state 2$^{+}$ or 3$^{+}$, and 2H$^{+}$ are removed from the porphyrin for the complexation. Typical ions include Al$^{3+}$ or Fe$^{2+}$. <font color="#3399ff">What diameter is the central cavity, is there space for either of those ions to be complexed?</font> (**Hint**: the van der Waals radius of nitrogen is 155 pm, the ionic radii of Al$^{3+}$ and Fe$^{2+}$ are 34 pm and 46 pm respectively).

9. **tpy-Ru.sdf**: a ruthenium transition metal complex. Try display type: "Polygon".

10. **vitaminc.fchk**: a molecule of vitamin C (ascorbic acid). <font color="#3399ff">Identify any chiral carbon atoms and assign them as *R* or *S*.</font>

### A note on file extensions

You will have noticed that these files had several different file extensions - they were different file formats. Avogadro can open ("read in") many different file types. It can also save (File $\rightarrow$ Save As or the "Save" icon) in many different formats. The different types of files contain the information on the structure in different formats and features of those structures (e.g. Molecular Orbitals) in some cases. Traditionally, different software packages have been able to read/save different formats, and this will depend on where the file has originated and what the file is being used for. Avogadro can be a useful way to convert between these. The file types you've just been looking at include:

* Formats that are inputs (**.g03**) or outputs (**.fchk**) from Gaussian (a Quantum Mechanical software package that you will use in the Computational Labs with Dr. Hunt http://www.gaussian.com/)

* **.xyz**: this is arguably the simplest format, generally just a list of element types (column 1), followed by the coordinates of the atom: $\textit{x}$ (column 2), $\textit{y}$ (column 3) and $\textit{z}$ (column 4). There will often be a header (the lines at the top) that give the total number of atoms in the file. Open up the graphite**.xyz** file to see this. You should do this first by navigating to the file in WordPad on your computer, but then use the below Python command to see the file's contents in your Notebook. Python has added "\n" to the end of each line - this represents a "new line" character. You have not previously encountered this method of opening and printing files: 

    with open('Workshop1_data/Structure_files/graphite.xyz') as graphite:
        code=graphite.readlines()
    code

In [2]:
with open('Workshop1_data/Structure_files/graphite.xyz') as graphite:
    code=graphite.readlines()
code

['128\n',
 '\n',
 'C      3.695  -1.521   5.265\n',
 'C      3.505  -2.233   2.002\n',
 'C      2.294  -1.329   5.304\n',
 'C      2.963  -0.951   1.754\n',
 'C      3.316  -2.945  -1.262\n',
 'C      3.127  -3.658  -4.531\n',
 'C      1.916  -2.752  -1.222\n',
 'C      2.584  -2.376  -4.780\n',
 'C      4.010   0.850   4.730\n',
 'C      3.821   0.138   1.467\n',
 'C      2.606   1.046   4.769\n',
 'C      3.275   1.423   1.218\n',
 'C      3.632  -0.573  -1.797\n',
 'C      3.442  -1.286  -5.067\n',
 'C      2.228  -0.378  -1.758\n',
 'C      2.896  -0.001  -5.315\n',
 'C      1.436  -2.418   5.592\n',
 'C      1.247  -3.129   2.329\n',
 'C      0.032  -2.222   5.630\n',
 'C      0.701  -1.844   2.080\n',
 'C      1.058  -3.841  -0.935\n',
 'C      0.868  -4.554  -4.205\n',
 'C     -0.346  -3.645  -0.896\n',
 'C      0.322  -3.269  -4.454\n',
 'C      1.748  -0.043   5.056\n',
 'C      1.559  -0.755   1.793\n',
 'C      0.342   0.154   5.094\n',
 'C      1.010   0.532   1.544\n',
 'C

<font color="#3399ff">What is the *y* coordinate of the 7th atom?</font>

* **.sdf**: "Structure Data Format", can include physical data such as melting points as well as the molecular structure coordinates.
* **.pdb**: "Protein Database Format", created in 1976 to allow scientists to add protein structures to a world-wide database (http://www.wwpdb.org/). Open the **1CRN.pdb** file in the notebook below as you just did with the **.xyz** file, to look at the format of the file. Find the region where the $\textit{x,y,z}$ coordinates and element types are specified.

In [4]:
with open('Workshop1_data/Structure_files/1CRN.pdb') as CRN:
    code1=CRN.readlines()
code1

['HEADER    PLANT SEED PROTEIN                      30-APR-81   1CRN      1CRND  1\n',
 'COMPND    CRAMBIN                                                       1CRN   4\n',
 'SOURCE    ABYSSINIAN CABBAGE (CRAMBE ABYSSINICA) SEED                   1CRN   5\n',
 'AUTHOR    W.A.HENDRICKSON,M.M.TEETER                                    1CRN   6\n',
 'REVDAT   5   16-APR-87 1CRND   1       HEADER                           1CRND  2\n',
 'REVDAT   4   04-MAR-85 1CRNC   1       REMARK                           1CRNC  1\n',
 'REVDAT   3   30-SEP-83 1CRNB   1       REVDAT                           1CRNB  1\n',
 'REVDAT   2   03-DEC-81 1CRNA   1       SHEET                            1CRNB  2\n',
 'REVDAT   1   28-JUL-81 1CRN    0                                        1CRNB  3\n',
 'REMARK   1                                                              1CRN   7\n',
 'REMARK   1 REFERENCE 1                                                  1CRNC  2\n',
 'REMARK   1  AUTH   M.M.TEETER            

* **.cml**: "Chemical markup language" is a language for Chemistry that is part of the broader XML project for providing formats for science data that are both human-readable and machine-readable (more info: http://www.xml-cml.org/). It was developed by [Peter Murray-Rust](http://www.ch.cam.ac.uk/person/pm286) and [Henry Rzepa](http://www.imperial.ac.uk/people/h.rzepa) (Imperial College London). 
For our purposes, the **.cml** file is very useful - it is readable and writable by both ChemDraw (where you construct 2D chemical structures) and Avogadro. For now, open the two **.cml** format files in ChemDraw to see how the files look when opened in 2D.

## Exercise 2 - Converting structures between 2D and 3D file formats

Draw a structure of a molecule (any molecule, but include multiple atoms and bonds) in ChemDraw. Save this file as either a **.cml** format file or a **MDL molfile** (a **.mol** file), save the file in the "Structures_built_in_lab" folder. Now open your file in Avogadro. 

**A word of warning**: With 3D structures, Avogadro is using algorithms to make a **guess** at the 3D structure (in fact, if your **.cml** file does not contain any coordinates, Avogadro will show you a warning message when you open the file, asking "Do you want Avogadro to build a rough geometry?"). We will see later how these guesses can fail, and of course, the conformation generated may not be the one we are interested in. So we must always remember to apply our chemical intuition as to whether the conformation is sensible and/or the conformation we are interested in.

## Exercise 3 - Drawing molecules in Avogadro

Open up Avogadro: Start $\rightarrow$  All Programs $\rightarrow$  Avogadro

Have a play around with drawing and visualising molecules with the software. Ask a demonstrator if you're stuck, or go to the link for Avogradro tutorials (http://avogadro.cc/wiki/Category:Tutorials). If things go wrong, you can always use "Ctrl-Z" to undo, or close Avogadro and reopen to start from fresh.

If you want to save any of the structures, use File $\rightarrow$ Save As, then select the file format you want and save in the "Structures_built_in_lab" folder.

This is a checklist of tasks to master for drawing and visualising molecules before you move on (in a sensible order to tackle):

* Draw a methane molecule
* Display a molecule in a variety of display settings e.g. "Ball and Stick" and change their settings (click on the spanner icon to open the setting menu for each display type).
* Edit your methane to an ethane molecule (you may find this easier to do with "Ball and Stick" display setting)
* Draw another ethane molecule next to your first ethane molecule
* Measure the carbon-carbon bond distances on each of your ethane molecules. Are they the same?
* Edit your ethane to a propane molecule, manually manipulate this to a sensible conformation.
* Open a new Avogadro window, draw another propane molecule. Now use the auto-optimisation tool to "optimise" the conformation of the molecule, make sure you do this by clicking on the "E" button (with a green downwards arrow) on the toolbar, this then gives you the choice of forcefield - choose "MMFF94s". In Workshop 3, we will cover in more detail how a molecule's energy is calculated and what the software is doing when it is "optimising" the molecule! Have you ended up with the same or different conformation?
* Draw a benzene molecule, use "Auto-Opt" and make sure you end up with a C-C bond length of ~1.4 $\unicode{x212B}$ and a sensible conformation if you rotate the view. **Hint**: remember bond order.
* What happens when you drag an atom out of position during a minimisation? Can you do anything to prevent a molecule going back to a "sensible" conformation?
* Construct a 3D model of the octasilsesquioxane, Si$_{8}$O$_{12}$H$_{8}$:

<img src="Workshop1_data/Image_files/Si8O12H8.png" style="max-width:100%; width: 20%"/>

When you think you have a sensible structure for the molecule (**including having used Auto-Opt with the Forcefield 'MMFF94s'**), compare different display types for it and save it as a **.xyz** file.
Measure your structure, report your average measurements for the below:

### Saving images of your models

You can save images of your models using File $\rightarrow$ Export $\rightarrow$ Graphics. Save any images within your workshop data folder (it's a good idea to make a new folder for this).

## Exercise 4 - Looking at the stereoisomers of thalidomide

Get a reasonable 3D structural model of *R*-thalidomide in Avogadro (shown below). It's up to you how you prefer to do this, you could use ChemDraw first, or just construct the molecule manually in Avogadro. Make sure you have the *R*-enantiomer in 3D space. When you're happy with the conformation and you have an energy that is not higher than any that of any your neighbours (after using "Auto-Opt" in the MMFF94s forcefield), then save your molecular conformation as a **.xyz** file in the "Structures_built_in_lab" folder.

<img src="Workshop1_data/Image_files/R-thalidomide_image.png" style="max-width:100%; width: 25%"/>

Now go to Build $\rightarrow$ Invert Chirality, this will give you the *S*-enantiomer. Save this molecular conformation as a **.xyz** file. How does it's energy compare to the *R*-enantiomer, does that make sense?

Save an image of each of the enantiomers, so that if these are placed side-by-side there is a mirror of symmetry between them. Now insert the images below, using the below syntax (we have already provided the image file for the mirror):

<img src="Workshop1_data/Images in Lab/R-thalidomide(optimised).png" style="width: 30%;float:left" />
<img src="Workshop1_data/Image_files/mirror.png" style="width: 3%;float:left" />
<img src="Workshop1_data/Images in Lab/S-thalidomide(optimised).png" style="width: 30%;float:left" />

You will be familiar with the tragedy of thalidomide - the *S*-enantiomer caused malformation of the limbs of the children born to mothers who had used the drug for morning sickness. 
Originally the problem was discussed as if it would have been solved had the "good" enantiomer, the *R*-enantiomer been used. This is now thought to be incorrect for 2 reasons, (1) there is some evidence that both enantiomers may be teratogenic and (2) the enantiomers racemise in protonated media ([Nature Reviews Drug Discovery 1, 753-768 **2002**](http://www.nature.com/nrd/journal/v1/n10/box/nrd915_BX1.html)). Looking at your model, can you see how this might occur? 

## Optional - favourite image

Add below an image that you like best from today, it can be of any molecule you've looked at today or any other molecule.

    <img src="Workshop1_data/Favourite_image.png" style="max-width:100%; width: 65%"/>

## Optional - start looking at solid state crystal structures in Mercury ready for the next workshop:

### Introduction to visualising periodic solid state structures in Mercury

So far we have mostly been visualising individual molecules in Avogadro (or, in the case of graphite, a slab of that substance). When we want to visualise a solid state crystal structure, we will typically use other software that is better suited to this. The one we will use is called Mercury (https://www.ccdc.cam.ac.uk/solutions/csd-system/components/mercury/) and makes visualisations of crystal structures easy. You can also install Mercury on your own computer (Imperial College has a site license) - see accompanying handout for more information on how to do this.

Open up Mercury: Start $\rightarrow$ All Programs $\rightarrow$  Mercury.

Work out how to do the following:

* Using the "Structure Navigator" on the right-hand side of the screen, navigate through structures in the database (Mercury comes with a database of 1000's of structures from the Cambridge Crystallographic Database Centre ([CCDC](http://www.ccdc.cam.ac.uk/)) - we'll be investigating these more in future workshops). For now just take a look at a few of the structures.

* Work out how to zoom in and out and how to rotate the unit cell.

* Tick the "Packing" box in the Display Options menu (typically at the bottom of the window). This will show you the full unit cell of the system. Remember, with a periodic structure, anything coming out of the cell on one side is actually re-entering the cell on the opposite side (e.g. like a character running across a screen on a simple video game - they reenter on the opposite side.)

### More information:

* Avogadro tutorials: http://avogadro.cc/wiki/Category:Tutorials

* Mercury tutorials/user guide: [https://www.ccdc.cam.ac.uk/support-and-resources](https://www.ccdc.cam.ac.uk/support-and-resources/CCDCResources/?rt=-1&mc=-1&p=0e7591ad-2201-e411-99f5-00505686f06e&so=0)

* Books with good information on Molecular Modelling for beginners:
    1. Molecular Modelling: Principles and Applications, A. R. Leach, 2nd Edition, Pearson.
    2. Understanding Molecular Simulation, D. Frenkel & B. Smit, 2nd Edition, Academic Press.
    3. Essentials of Computational Chemistry, C. J. Cramer, 2nd Edition, Wiley.

### Acknowledgments

Avogadro's user tutorials (as above) and the supplied structural files from Avogadro.

### <font color="red">IMPORTANT</font>

Now submit **ALL** the files you've saved and this completed and saved notebook onto Blackboard as a **single zip file**. 

To do this, navigate to the files in Windows Explorer, highlight the notebook and all the folders that contain the files you've saved. Then right-click, point to 'Send to', and then click 'Compressed (zipped) folder'. This should create a zip file in the same location. Upload this zip file to Blackboard. 

<font color="red">*Note*</font>: When marking last year, we found many people only uploaded the notebook and not all the files they'd created as well. Double-check you have uploaded all the files as a single zip file - you could check this by redownloading the zip file from Blackboard and opening it to check. If you're still not sure, check with a demonstrator. We will expect the files you've created *as well as* the notebook when we check your uploads.