# HOW TO: Easy Visualization of Molecules.

Greetings everyone!

I've seen many people ask for a simple yet elegant visualization tool for the [Predicting Molecular Properties](https://www.kaggle.com/c/champs-scalar-coupling/overview) challenge. Therefore, I'm going to explain in this Kernel how to install and use **ase**, which is a python module that allows one to work with atoms and molecules. 

It is available on gitlab: [ase](https://gitlab.com/ase/ase).

The first thing we need to do is to install **ase** on our Kernel. To do that, just click on the *Settings* tab on the right panel, then click on *Install...*, right next to *Packages*. In the *pip package name* entry, just write **ase** then hit *Install Package*.

Kaggle is going to do its things then restart the Kernel. **ase** should then be installed! Let's check this:

In [1]:
import ase

It worked! 

Now let's visualize one of the molecule from the structures.csv file:

In [2]:
import pandas as pd

struct_file = pd.read_csv('../input/structures.csv')

Now we select a random molecule from this file:

In [3]:
import random

# Select a molecule
random_molecule = random.choice(struct_file['molecule_name'].unique())
molecule = struct_file[struct_file['molecule_name'] == random_molecule]
display(molecule)

Unnamed: 0,molecule_name,atom_index,atom,x,y,z
1088656,dsgdb9nsd_065393,0,C,0.175811,1.32563,0.424972
1088657,dsgdb9nsd_065393,1,C,-0.069199,-0.144359,0.140879
1088658,dsgdb9nsd_065393,2,C,0.999329,-0.905732,-0.611448
1088659,dsgdb9nsd_065393,3,N,0.792647,-1.12663,0.799493
1088660,dsgdb9nsd_065393,4,C,1.740231,-0.882289,1.776418
1088661,dsgdb9nsd_065393,5,O,2.921469,-1.10419,1.665141
1088662,dsgdb9nsd_065393,6,C,-1.490277,-0.615805,-0.026059
1088663,dsgdb9nsd_065393,7,C,-2.657337,0.131192,0.508544
1088664,dsgdb9nsd_065393,8,N,-2.424946,0.053102,-0.944477
1088665,dsgdb9nsd_065393,9,H,1.246888,1.542942,0.434288


Next we need to retrieve the atomic coordinates in a numpy array form:

In [4]:
# Get atomic coordinates
atoms = molecule.iloc[:, 3:].values
print(atoms)

[[ 0.17581067  1.32563015  0.42497243]
 [-0.0691986  -0.14435865  0.14087909]
 [ 0.99932871 -0.90573197 -0.61144798]
 [ 0.79264702 -1.12663036  0.79949343]
 [ 1.74023067 -0.88228886  1.77641842]
 [ 2.92146902 -1.10419001  1.66514116]
 [-1.49027734 -0.61580452 -0.02605917]
 [-2.65733723  0.13119212  0.508544  ]
 [-2.4249456   0.05310198 -0.94447741]
 [ 1.24688778  1.54294186  0.43428811]
 [-0.2809294   1.95825083 -0.34329392]
 [-0.2397779   1.62298927  1.39238752]
 [ 0.71130635 -1.72165701 -1.26880949]
 [ 1.90341421 -0.37475399 -0.89892681]
 [ 1.27702931 -0.52106485  2.7145009 ]
 [-1.55702391 -1.70076964 -0.06247784]
 [-3.52685753 -0.41453156  0.86323691]
 [-2.48399094  1.07662855  1.01396676]
 [-2.02901992  0.92808023 -1.27974128]]


The last thing we need is the atomic symbols:

In [5]:
# Get atomic symbols
symbols = molecule.iloc[:, 2].values
print(symbols)

['C' 'C' 'C' 'N' 'C' 'O' 'C' 'C' 'N' 'H' 'H' 'H' 'H' 'H' 'H' 'H' 'H' 'H'
 'H']


Finally, let's put everything into something that **ase** can process:

In [6]:
from ase import Atoms
import ase.visualize

system = Atoms(positions=atoms, symbols=symbols)

ase.visualize.view(system, viewer="x3d")

TADA!!!

You can rotate the molecule with a left click, translate it with a middle click, and zoom in or out using right click. 

All this can be summarized in a single function:

In [7]:
def view(molecule):
    # Select a molecule
    mol = struct_file[struct_file['molecule_name'] == molecule]
    
    # Get atomic coordinates
    xcart = mol.iloc[:, 3:].values
    
    # Get atomic symbols
    symbols = mol.iloc[:, 2].values
    
    # Display molecule
    system = Atoms(positions=xcart, symbols=symbols)
    print('Molecule Name: %s.' %molecule)
    return ase.visualize.view(system, viewer="x3d")

random_molecule = random.choice(struct_file['molecule_name'].unique())
view(random_molecule)

Molecule Name: dsgdb9nsd_009291.


I hope you enjoyed this little notebook!

Cheers

Boris D.