<a href="https://colab.research.google.com/github/squeeko/DeepChem_projects/blob/master/DC_1_BasicTools.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tutorial 1: The Basic Tools of the Deep Life Sciences

Welcome to DeepChem's introductory tutorial for the deep life sciences. This series of notebooks is a step-by-step guide for you to get to know the new tools and techniques needed to do deep learning for the life sciences. We'll start from the basics, assuming that you're new to machine learning and the life sciences, and build up a repertoire of tools and techniques that you can use to do meaningful work in the life sciences.

Scope: This tutorial will encompass both the machine learning and data handling needed to build systems for the deep life sciences.


## Setup
The first step is to get DeepChem up and running. We recommend using Google Colab to work through this tutorial series. You'll need to run the following commands to get DeepChem installed on your colab notebook. Note that this will take something like 5 minutes to run on your colab instance.

In [1]:
!curl -Lo conda_installer.py https://raw.githubusercontent.com/deepchem/deepchem/master/scripts/colab_install.py

import conda_installer
conda_installer.install()
!root/miniconda/bin/conda info -e

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3490  100  3490    0     0   9483      0 --:--:-- --:--:-- --:--:--  9483


add /root/miniconda/lib/python3.6/site-packages to PYTHONPATH
python version: 3.6.9
fetching installer from https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
done
installing miniconda to /root/miniconda
done
installing rdkit, openmm, pdbfixer
added omnia to channels
added conda-forge to channels
done
conda packages installation finished!


/bin/bash: root/miniconda/bin/conda: No such file or directory


In [2]:
!pip install --pre deepchem

Collecting deepchem
[?25l  Downloading https://files.pythonhosted.org/packages/5a/23/51a96cba097428794e3864a4c969f2c4f27f450a9c074cd3f69aecd87169/deepchem-2.4.0rc1.dev20200921195626.tar.gz (390kB)
[K     |▉                               | 10kB 15.3MB/s eta 0:00:01[K     |█▊                              | 20kB 1.7MB/s eta 0:00:01[K     |██▌                             | 30kB 2.2MB/s eta 0:00:01[K     |███▍                            | 40kB 2.5MB/s eta 0:00:01[K     |████▏                           | 51kB 2.0MB/s eta 0:00:01[K     |█████                           | 61kB 2.2MB/s eta 0:00:01[K     |█████▉                          | 71kB 2.4MB/s eta 0:00:01[K     |██████▊                         | 81kB 2.7MB/s eta 0:00:01[K     |███████▌                        | 92kB 2.9MB/s eta 0:00:01[K     |████████▍                       | 102kB 2.7MB/s eta 0:00:01[K     |█████████▏                      | 112kB 2.7MB/s eta 0:00:01[K     |██████████                      | 122kB 2

In [3]:
import deepchem as dc
print(dc.__version__)

2.4.0-rc1.dev


# Training a Model with DeepChem: A First Example

Deep learning can be used to solve many sorts of problems, but the basic workflow is usually the same. Here are the typical steps you follow.


1.   Select the data set you will train your model on (or create a new data set if there isn't an existing suitable one).

2.   Create the model.

3.   Train the model on the data.

4.   Evaluate the model on an independent test set to see how well it works.

5.   Use the model to make predictions about new data.










With DeepChem, each of these steps can be as little as one or two lines of Python code. In this tutorial we will walk through a basic example showing the complete workflow to solve a real world scientific problem.

The problem we will solve is predicting the solubility of small molecules given their chemical formulas. This is a very important property in drug development: if a proposed drug isn't soluble enough, you probably won't be able to get enough into the patient's bloodstream to have a therapeutic effect. The first thing we need is a data set of measured solubilities for real molecules. One of the core components of DeepChem is MoleculeNet, a diverse collection of chemical and molecular data sets. For this tutorial, we can use the Delaney solubility data set.

In [4]:
tasks, datasets, transformers = dc.molnet.load_delaney(featurizer='GraphConv')
train_dataset, valid_dataset, test_dataset = datasets

I won't say too much about this code right now. We will see many similar examples in later tutorials. There are two details I do want to draw your attention to. First, notice the featurizer argument passed to the load_delaney() function. Molecules can be represented in many ways. We therefore tell it which representation we want to use, or in more technical language, how to "featurize" the data. Second, notice that we actually get three different data sets: a training set, a validation set, and a test set. Each of these serves a different function in the standard deep learning workflow.

Now that we have our data, the next step is to create a model. We will use a particular kind of model called a "graph convolutional network", or "graphconv" for short.