# DrugFlow with Docker for Local Development and Cloud Deployment

---

This documentation provides a guide on DrugFlow implemented in Python, designed for both local development and cloud deployment using Docker. It covers the following topics:

1. **Introduction to DrugFlow**: Overview of the DrugFlow and its applications.
2. **Setting Up the Development Environment**: Step-by-step instructions for setting up a local development environment using Docker.
3. **Building and Running the Docker Container**: Instructions for building the Docker image and running the container.
4. **Deploying to the Cloud**: Guidelines for deploying the DrugFlow to a cloud platform using Docker.
5. **Best Practices**: Tips and best practices for working with DrugFlow and Docker.

## Introduction to DrugFlow and the Drug Discovery Landscape

---



## Notebook Roadmap

---

### Sections
- [Building and Running the Docker Container](#building-and-running-the-docker-container)
- [Using DrugFlow](#using-drugflow)
- [Small GSK3B-FRAT1 Study](#small-gsk3b-frat1-study)
- [Deploying to the Cloud](#deploying-to-the-cloud)


### Prerequisites

Before you begin, ensure you have the following installed on your local machine:

- Docker: [Install Docker](https://docs.docker.com/get-docker/)
- A compatible GPU (for DrugFlow)
- NVIDIA drivers (if using GPU)


## Building and Running the Docker Container

---

To build and run the Docker container for DrugFlow, follow these steps:

1. **Clone the Repository**: Clone the DrugFlow repository to your local machine.

```bash
git clone https://github.com/gabenavarro/MLContainerLab.git
cd MLContainerLab
```

2. **Build the Docker Image**: Use the provided Dockerfile to build the Docker image.

```bash
# You can choose any tag you want for the image
# Feel free to play around with the base image, just make sure the host has the same or higher CUDA version
docker build -f ./assets/build/Dockerfile.drugflow.cu121cp311 -t drugflow:121-311 .
```
3. **Run the Docker Container**: Run the Docker container with the necessary configurations. In the first example, we will run the container locally with GPU support. This is the recommended way to run a container while in development mode. For scaling up, we will use the second example which runs the container in the cloud.

```bash
   # Run the container with GPU support
   docker run -dt \
      --gpus all \
      --shm-size=64g \
      -v "$(pwd)/assets:/workspace/assets" \
      -v "$(pwd)/documentation:/workspace/documentation" \
      -v "$(pwd)/datasets:/workspace/datasets" \
      --name drugflow \
      --env NVIDIA_VISIBLE_DEVICES=all \
      --env GOOGLE_APPLICATION_CREDENTIALS=/workspace/assets/secrets/gcp-key.json \
      --entrypoint /bin/bash \
      drugflow:121-311
   ```
> Note: The `-v "$(pwd):/workspace"` option mounts the current directory to `/workspace` in the container, allowing you to access your local files from within the container. The `--env` options set environment variables for GPU visibility and Google Cloud credentials.<br>
> Note: The `--gpus all` option allows the container to use all available GPUs. <br>

4. **Access the Container with IDE**: In this example, we will use Visual Studio Code to access the container. You can use any IDE of your choice.

```bash
# In a scriptable manner
CONTAINER_NAME=drugflow
FOLDER=/workspace
HEX_CONFIG=$(printf {\"containerName\":\"/$CONTAINER_NAME\"} | od -A n -t x1 | tr -d '[\n\t ]')
code --folder-uri "vscode-remote://attached-container+$HEX_CONFIG$FOLDER"
```

> Note: The `code` command is used to open Visual Studio Code. Make sure you have the Remote - Containers extension installed in VS Code to access the container directly. <br>
> Note: Make sure you have installed Remote - Containers extension in VS Code.<br>



[1]: https://profiles.nlm.nih.gov/spotlight/kk/feature/protein "Protein Folding and the Thermodynamic Hypothesis, 1950-1962"
[2]: https://en.wikipedia.org/wiki/Anfinsen%27s_dogma "Anfinsen's dogma - Wikipedia"
[3]: https://www.hhmi.org/news/rosetta-may-hold-key-predicting-protein-folding "Rosetta May Hold Key to Predicting Protein Folding - HHMI"
[4]: https://docs.rosettacommons.org/docs/latest/meta/Rosetta-Timeline "History of Rosetta"
[5]: https://www.nature.com/articles/s41586-021-03819-2 "Highly accurate protein structure prediction with AlphaFold - Nature"
[6]: https://deepmind.google/discover/blog/alphafold-using-ai-for-scientific-discovery-2020/ "AlphaFold: Using AI for scientific discovery - Google DeepMind"
[7]: https://www.bakerlab.org/publications/ "Publications - Baker Lab"
[8]: https://www.science.org/doi/10.1126/science.abj8754 "Accurate prediction of protein structures and interactions using a ..."
[9]: https://www.science.org/doi/10.1126/science.ade2574 "Evolutionary-scale prediction of atomic-level protein structure with a ..."
[10]: https://ai.meta.com/blog/protein-folding-esmfold-metagenomics/ "ESM Metagenomic Atlas: The first view of the 'dark matter ... - Meta AI"
[11]: https://www.nature.com/articles/s41586-023-06415-8 "De novo design of protein structure and function with RFdiffusion"
[12]: https://www.ipd.uw.edu/2022/12/a-diffusion-model-for-protein-design/ "A diffusion model for protein design"
[13]: https://www.bakerlab.org/2023/07/11/diffusion-model-for-protein-design/ "RFdiffusion: A generative model for protein design - Baker Lab"
[14]: https://www.bakerlab.org/2023/03/30/rf-diffusion-now-free-and-open-source/ "RFdiffusion now free and open source - Baker Lab"


In [2]:
!python ../inference.py --config ../default_inference_args.yaml  --protein_ligand_csv ../data/protein_ligand_example.csv --out_dir /workspace/datasets/user_predictions_small 


Generating ESM language model embeddings
Processing 1 of 1 batches (4 sequences)
  Y = indices.astype(int)
2it [01:36, 48.45s/it]


In [2]:
from rdkit import Chem

mol = Chem.MolFromMolFile("/workspace/datasets/user_predictions_small/1a0q/rank1.sdf")