<a href="https://colab.research.google.com/github/ravichas/AMPL-Tutorial/blob/master/01_make_AMPL_Google_Drive.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<p><img alt="Colaboratory logo" height="150px" src="https://avatars1.githubusercontent.com/u/56178629?s=400&u=8f149c29954fc5f92773f0c89ff52e7b39d13a3b&v=4" align="left" hspace="10px" vspace="0px"></p>

<h1>Install AMPL on Google Drive</h1>


The ATOM Modeling PipeLine (AMPL; https://github.com/ATOMconsortium/AMPL) is an open-source, modular, extensible software pipeline for building and sharing models to advance in silico drug discovery.

**Warning: This is an experimental notebook**



---

# Goals
- Create a reusable installation of AMPL in Colab and save in User's personal Google Drive
- AMPL is using Python 3.6.7 while Colab is currently using 3.6.9 so they do not currently match. Your mileage may vary.

# Requirements
- Datasets are required for testing AMPL. delaney-processed_curated_fit.csv and delaney-processed_curated_external.csv are copied to this runtime from Google Drive.

## Authenticate and Mount your Google drive 

When you run the following cell, you will be asked to do the following thihngs:

1. A empty input box and a link will appear
2. Click the link
3. Authenticate your Google account
4. copy the link and paste it in the input box that appeared in step 1

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount("/content/drive")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
import pandas as pd
import requests
import io
url = 'https://raw.githubusercontent.com/ravichas/AMPL-Tutorial/master/datasets/delaney-processed_curated_external.csv'
url1 = 'https://raw.githubusercontent.com/ravichas/AMPL-Tutorial/master/datasets/delaney-processed_curated_fit.csv'
download = requests.get(url).content
download1 = requests.get(url1).content

df = pd.read_csv(url, index_col=0)
df1 = pd.read_csv(url1, index_col=0)

# Reading the downloaded content and turning it into a pandas dataframe
df = pd.read_csv(io.StringIO(download.decode('utf-8')))
df1 = pd.read_csv(io.StringIO(download1.decode('utf-8')))

df.to_csv('delaney-processed_curated_external.csv', index=False)
df.to_csv('delaney-processed_curated_fit.csv', index=False)

## Check whether the input files present

In [None]:
import os
assert(os.path.isfile('/content/delaney-processed_curated_fit.csv'))
assert(os.path.isfile('/content/delaney-processed_curated_external.csv'))

## Get the Python version

In [None]:
!python --version

Python 3.6.9


## Install Miniconda to /content/AMPL
Conda include hard coded paths so this is the only location it can go (< 30 seconds)

**WE NEED TO ADDRESS WHAT IF THE DIRECTORY ALREADY EXISTS**

In [None]:
!wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
!time bash Miniconda3-latest-Linux-x86_64.sh -b -p /content/AMPL

--2020-09-23 19:45:01--  https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
Resolving repo.anaconda.com (repo.anaconda.com)... 104.16.130.3, 104.16.131.3, 2606:4700::6810:8203, ...
Connecting to repo.anaconda.com (repo.anaconda.com)|104.16.130.3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 93052469 (89M) [application/x-sh]
Saving to: ‘Miniconda3-latest-Linux-x86_64.sh’


2020-09-23 19:45:02 (119 MB/s) - ‘Miniconda3-latest-Linux-x86_64.sh’ saved [93052469/93052469]

PREFIX=/content/AMPL
Unpacking payload ...
Collecting package metadata (current_repodata.json): - \ | done
Solving environment: - \ done

## Package Plan ##

  environment location: /content/AMPL

  added / updated specs:
    - _libgcc_mutex==0.1=main
    - ca-certificates==2020.1.1=0
    - certifi==2020.4.5.1=py38_0
    - cffi==1.14.0=py38he30daa8_1
    - chardet==3.0.4=py38_1003
    - conda-package-handling==1.6.1=py38h7b6447c_0
    - conda==4.8.3=py38_0
    - crypt

In [None]:
!ls 

AMPL					drive
delaney-processed_curated_external.csv	Miniconda3-latest-Linux-x86_64.sh
delaney-processed_curated_fit.csv	sample_data


## Create AMPL code dependency and save it to AMPL.txt

In [None]:
url='https://raw.githubusercontent.com/ravichas/AMPL-Tutorial/master/datasets/AMPL.txt'

downloaded_obj = requests.get(url)
with open("AMPL.txt", "wb") as file:
    file.write(downloaded_obj.content)

## Install code dependencies using the AMPL.txt file that we downloaded in the previous step 
* ~ 3 minutes

```
real	1m53.391s
user	1m12.443s
sys	0m9.643s
```

In [None]:
!time /content/AMPL/bin/conda install --file AMPL.txt -y


Downloading and Extracting Packages
_libgcc_mutex-0.1    | : 100% 1.0/1 [00:00<00:00,  5.29it/s]
ca-certificates-2020 | : 100% 1.0/1 [00:00<00:00, 11.34it/s]
fftw3f-3.3.4         | : 100% 1.0/1 [00:00<00:00,  2.22it/s]
libgfortran-3.0.0    | : 100% 1.0/1 [00:00<00:00, 10.25it/s]
libgfortran-ng-7.5.0 | : 100% 1.0/1 [00:00<00:00,  2.84it/s]
libstdcxx-ng-9.3.0   | : 100% 1.0/1 [00:00<00:00,  1.09it/s]
pandoc-2.10.1        | : 100% 1.0/1 [00:07<00:00,  6.77s/it]               
libgomp-9.3.0        | : 100% 1.0/1 [00:00<00:00,  8.30it/s]
openblas-0.2.20      | : 100% 1.0/1 [00:05<00:00,  4.73s/it]               
_openmp_mutex-4.5    | : 100% 1.0/1 [00:00<00:00, 22.25it/s]
blas-1.1             | : 100% 1.0/1 [00:00<00:00, 25.32it/s]
libgcc-ng-9.3.0      | : 100% 1.0/1 [00:01<00:00,  1.69s/it]
blosc-1.20.0         | : 100% 1.0/1 [00:00<00:00,  3.48it/s]
bzip2-1.0.8          | : 100% 1.0/1 [00:00<00:00,  7.00it/s]
c-ares-1.16.1        | : 100% 1.0/1 [00:00<00:00, 16.32it/s]
expat-2.2.9       

## Get AMPL related Python, Pip and Conda versions

* Note the local drive paths

In [None]:
%%bash
/content/AMPL/bin/python -V
/content/AMPL/bin/pip -V
/content/AMPL/bin/conda -V

Python 3.6.7
pip 20.2.2 from /content/AMPL/lib/python3.6/site-packages/pip (python 3.6)
conda 4.8.4


## Clone AMPL source and apply patches (if any)

In [None]:
%%bash
mkdir github
cd github
git clone https://github.com/ATOMconsortium/AMPL.git

Cloning into 'AMPL'...


In [None]:
# There is a problem with the UMAP package so remove umap import
%%bash
cat << "EOF" > transformations_py.patch
--- transformations.py  2020-09-14 17:08:22.225747322 -0700
+++ transformations_patched.py  2020-09-14 17:08:07.869651225 -0700
@@ -9,7 +9,7 @@

 import numpy as np
 import pandas as pd
-import umap
+# import umap

 import deepchem as dc
 from deepchem.trans.transformers import Transformer, NormalizationTransformer
EOF

patch -N /content/github/AMPL/atomsci/ddm/pipeline/transformations.py transformations_py.patch

patching file /content/github/AMPL/atomsci/ddm/pipeline/transformations.py


In [None]:
# There is a problem with dependency checking on import after install
%%bash
cat << "EOF" > __init___py.patch
--- /content/AMPL/atomsci/ddm/__init__.py.backup	2020-09-19 18:10:05.264013977 +0000
+++ /content/AMPL/atomsci/ddm/__init__.py	2020-09-19 18:15:37.338771924 +0000
@@ -1,6 +1,6 @@
 import pkg_resources
 try:
     __version__ = pkg_resources.require("atomsci-ampl")[0].version
-except TypeError:
+except:
     pass
EOF

patch -N /content/github/AMPL/atomsci/ddm/__init__.py __init___py.patch

patching file /content/github/AMPL/atomsci/ddm/__init__.py


## Build and install AMPL

In [None]:
%%bash
# Move conda python to beginning of PATH in this cell
PATH=/content/AMPL/bin:$PATH
# Clear PYTHONPATH
PYTHONPATH=

cd /content/github/AMPL
time ./build.sh
time ./install.sh system

running build
running build_py
creating /content/github/AMPL.build/ampl/lib
creating /content/github/AMPL.build/ampl/lib/atomsci
copying atomsci/__init__.py -> /content/github/AMPL.build/ampl/lib/atomsci
creating /content/github/AMPL.build/ampl/lib/atomsci/ddm
copying atomsci/ddm/__init__.py -> /content/github/AMPL.build/ampl/lib/atomsci/ddm
creating /content/github/AMPL.build/ampl/lib/atomsci/ddm/pipeline
copying atomsci/ddm/pipeline/temporal_splitter.py -> /content/github/AMPL.build/ampl/lib/atomsci/ddm/pipeline
copying atomsci/ddm/pipeline/model_tracker.py -> /content/github/AMPL.build/ampl/lib/atomsci/ddm/pipeline
copying atomsci/ddm/pipeline/perf_data.py -> /content/github/AMPL.build/ampl/lib/atomsci/ddm/pipeline
copying atomsci/ddm/pipeline/chem_diversity.py -> /content/github/AMPL.build/ampl/lib/atomsci/ddm/pipeline
copying atomsci/ddm/pipeline/ave_splitter.py -> /content/github/AMPL.build/ampl/lib/atomsci/ddm/pipeline
copying atomsci/ddm/pipeline/diversity_plots.py -> /content/

Skipping installation of /content/github/AMPL.build/ampl/bdist.linux-x86_64/wheel/atomsci/__init__.py (namespace package)

real	0m0.572s
user	0m0.464s
sys	0m0.104s

real	0m0.893s
user	0m0.693s
sys	0m0.130s


In [None]:
# Remove conda package downloads to decrease package size
# 1 min
!time /content/AMPL/bin/conda clean -a -y

Cache location: /content/AMPL/pkgs
Will remove the following tarballs:

/content/AMPL/pkgs
------------------
traitlets-4.3.3-py36h9f0ad1d_1.tar.bz2       133 KB
ca-certificates-2020.1.1-0.conda             125 KB
python_abi-3.6-1_cp36m.tar.bz2                 4 KB
jupyter_client-6.1.7-py_0.tar.bz2             76 KB
testpath-0.4.4-py_0.tar.bz2                   85 KB
xorg-libxdmcp-1.1.3-h516909a_0.tar.bz2        18 KB
simdna-0.4.2-py_0.tar.bz2                    627 KB
readline-8.0-h7b6447c_0.conda                356 KB
statsmodels-0.10.2-py36hc1659b7_0.tar.bz2     9.7 MB
sqlite-3.28.0-h8b20d00_0.tar.bz2             1.9 MB
xorg-xextproto-7.3.0-h14c3975_1002.tar.bz2      27 KB
_libgcc_mutex-0.1-conda_forge.tar.bz2          3 KB
six-1.15.0-pyh9f0ad1d_0.tar.bz2               14 KB
libedit-3.1.20181209-hc058e9b_0.conda        163 KB
pycosat-0.6.3-py36h8c4c3a4_1004.tar.bz2      107 KB
xz-5.2.5-h7b6447c_0.conda                    341 KB
ipython-7.16.1-py36h95af2a2_0.tar.bz2        1.1 MB
icu

## Compress and Store AMPL installation in Google drive (~ 600 MB and takes < 6 minutes)

This step is done to retrieve the installation for later use 

In [None]:
!time tar -cjvf AMPL.tar.bz2 AMPL

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
AMPL/include/boost/flyweight/simple_locking.hpp
AMPL/include/boost/flyweight/hashed_factory.hpp
AMPL/include/boost/flyweight/assoc_container_factory_fwd.hpp
AMPL/include/boost/flyweight/static_holder.hpp
AMPL/include/boost/flyweight/set_factory_fwd.hpp
AMPL/include/boost/flyweight/serialize.hpp
AMPL/include/boost/flyweight/no_tracking_fwd.hpp
AMPL/include/boost/flyweight/flyweight.hpp
AMPL/include/boost/flyweight/hashed_factory_fwd.hpp
AMPL/include/boost/flyweight/holder_tag.hpp
AMPL/include/boost/flyweight/no_locking_fwd.hpp
AMPL/include/boost/flyweight/assoc_container_factory.hpp
AMPL/include/boost/flyweight/tracking_tag.hpp
AMPL/include/boost/flyweight/tag.hpp
AMPL/include/boost/flyweight/no_tracking.hpp
AMPL/include/boost/flyweight/intermodule_holder.hpp
AMPL/include/boost/flyweight/key_value_fwd.hpp
AMPL/include/boost/flyweight/intermodule_holder_fwd.hpp
AMPL/include/boost/flyweight/detail/
AMPL/include/boost/flyweig

In [None]:
%%bash

if [ -d "/content/drive/My Drive/colab" ] 
then
    echo "Directory /content/drive/My Drive/colab exists." 
else
    echo "Directory '/content/drive/My Drive/colab' does not exists. Creating one now!"
    mkdir -p '/content/drive/My Drive/colab'
fi


Directory /content/drive/My Drive/colab exists.


In [None]:
# Copy to Google Drive
# 1 min
!time cp AMPL.tar.bz2 '/content/drive/My Drive/colab'


real	0m7.743s
user	0m0.012s
sys	0m0.681s


In [None]:
!ls '/content/drive/My Drive/colab'

AMPL.tar.bz2


In [None]:
# !tar -jtvf '/content/drive/My Drive/colab/AMPL.tar.bz2'

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
-rw-rw-r-- root/root      1508 2020-07-14 10:45 AMPL/share/terminfo/n/ncr160vt100pp
lrwxrwxrwx root/root         0 2020-09-23 19:47 AMPL/share/terminfo/n/nec -> nec5520
-rw-rw-r-- root/root      1661 2020-07-14 10:45 AMPL/share/terminfo/n/nsterm-c
-rw-rw-r-- root/root      1488 2020-09-23 19:47 AMPL/share/terminfo/n/nwp513-a
-rw-rw-r-- root/root      1516 2020-07-14 10:45 AMPL/share/terminfo/n/ncr260vt100wpp
-rw-rw-r-- root/root       418 2020-07-14 10:45 AMPL/share/terminfo/n/nsterm+s
-rw-rw-r-- root/root      1876 2020-07-14 10:45 AMPL/share/terminfo/n/nansi.sysk
-rw-rw-r-- root/root      1284 2020-07-14 10:45 AMPL/share/terminfo/n/nsterm-m-s-acs
-rw-rw-r-- root/root      1132 2020-07-14 10:45 AMPL/share/terminfo/n/nsterm-m-7
drwxr-xr-x root/root         0 2020-09-23 19:48 AMPL/share/terminfo/P/
lrwxrwxrwx root/root         0 2020-09-23 19:47 AMPL/share/terminfo/P/P8 -> ../p/prism8
lrwxrwxrwx root/root         0 2020-09

# Note
- Check https://drive.google.com/ for AMPL.tar.bz2. It may take minutes to synchronize.