# Chemoinformatics and Chemical Descriptors

So we've had some practice with python and handling some chemical data. We need to fill in one more big step before we can build our own models.

## SMILES

### Drawing Structures

If we want to train a model to predict some property--say, solubility--we need a way for our program to uniquely identify a molecule. One of the more common tools is called the SMILES string (simplified molecular-input line-entry system), which translates a chemical line drawing into a computer-readible string. SMILES strings were invented before the days of ChemDraw, but they've maintained relevance in chemoinformatics. In this notebook assignment, you will learn how to write, read and translate SMILES strings.

We will make use of some existing python libraries to help, excecute the cell below. See your instructor if the cell produces an error.

In [None]:
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import rdDepictor
from rdkit import Chem
from rdkit.Chem import Draw

IPythonConsole.ipython_useSVG = True
IPythonConsole.molSize = 300, 300
rdDepictor.SetPreferCoordGen(True)

SMILES strings are written using atomic symbols, punctuation, and numbers to specify the connectivity of a molecule. Atomic symbols for common elements (B,C,N,O,S,F,Cl,Br,I) are written as-is, while less common elements require brackets, [Au].

Single bonds are implied when atoms are adjacent, and double and triple bonds require different notation. We can use `rdkit` to view the line structure generated by SMILES. The function we'll call generates a `Molecule` object, which we'll use for many things later on.

Let's start by viewing propane by inputting its SMILES string below.

In [None]:
propane = Chem.MolFromSmiles()
propane

We can signify a double bond using `=`. In the cell below, create a drawing of 2-pentene using its SMILES string.

You won't be asked to draw SMILES strings on your own, but it can be useful to be able to read them. In the cell below, plot the drawings of ethanol, benzene, caffeine, and acetate. 

Rdkit has a lot of useful functionality, One thing it can do is plot many molecules in a grid. To do this:

    1. Put all molecules in a list
    2. Create a new list with the molecule names. It should be in the same order as your molecule list.
    3. Create a grid using the rdkit function
    
In the cell below, fill in the missing code to perform these tasks.

In [None]:
mols = []  # contains the molecule objects
names = [] # contains the names

#Now we create the GridImage
grid = Draw.MolsToGridImage( , legends=names) #pass the 'mols' list here and create the image

####END

grid #visualize your molecules!

### The Molecule Object

