# Globus AlphaFold Example

This shows using AlphaFold on ThetaGPU via funcX. The notebook goes through uploading a FASTA file to ALCF's Eagle before invoking an AlphaFold job on the file. The AlphaFold job is dispatched to ThetaGPU using funcX and the results are written to Eagle for the user to collect.

In [16]:
import requests
import time
import json
import sys
import os

from globus_automate_client import create_flows_client
from fair_research_login import NativeClient
from funcx.sdk.client import FuncXClient

To run this example you will need to be a member of the `Globus AlphaFold Group`. This group restricts access to both the Globus endpoint to read and write data and the funcX endpoint deployed on ThetaGPU. You can request access here: https://app.globus.org/groups/2f76ac1f-3e68-11ec-976c-89c391007df5/about

In [17]:
group_id = '2f76ac1f-3e68-11ec-976c-89c391007df5'
eagle_endpoint = 'a3411a10-da2d-4b44-82f4-d6f5006d6da2'

AlphaFold accepts a FASTA file as input. Here we have an example FASTA file that can be used to test the system. You can replace this FASTA with your own and it will be uploaded to ALCF's Eagle storage system for processing.

In [18]:
fasta = 'GB98_DM_3.fasta'

f = open(fasta, "r")
print(f.read())

>GB98_DM.3 GB98 Deletion Mutation Sequence
TTYKLILNKQAKEEAIKELVDAGTAEKYFKLIANAKTVEGVWTYKDEIKTFTVTE




## Upload the file to Eagle

In [19]:
client = NativeClient(client_id='7414f0b4-7d05-4bb6-bb00-076fa3f17cf5')
tokens = client.login(requested_scopes=[f'https://auth.globus.org/scopes/{eagle_endpoint}/https'],
                     no_local_server=True, no_browser=True)
auth_token = tokens[eagle_endpoint]['access_token']
headers = {'Authorization': f'Bearer {auth_token}'}
print(headers)

{'Authorization': 'Bearer AgGxwMWgne3j3WGXQdBDm4QpmWmknl9VNJYzvYMebOVw2wjn02u7C3kJ3qjrJJdGg9yB1WDlrw672JfQb7EbYsBomy'}


In [20]:
r = requests.put(f'https://g-719d9.fd635.8443.data.globus.org/fasta/{fasta}', data=open(fasta, 'rb'),
                 headers=headers)
print(r)

<Response [200]>


## Create an analysis function

Use funcX to register a simple analysis function.

In [21]:
fxc = FuncXClient()

Specify the endpoint to use.

In [22]:
endpoint_uuid = 'ab415fc0-6d3b-4d1a-b62e-392c97998ce0' # thetagpu
endpoint_uuid = 'c0a1693c-ca4c-4730-aa30-b306a1ecb742' # polaris

The function needs to use the AlphaFold container to execute. Here we register the path to the container with funcX such that we can register the function against the container. The funcX endpoint will then spawn a worker inside the container to serve the function.

In [23]:
cont_path = '/eagle/APSDataAnalysis/AlphaFold/AlphaFoldImage/alphafold-fx.sif'
cont_uuid = fxc.register_container(container_type='singularity', location=cont_path)

Define a function to invoke AlphaFold. This function will be executed on a ThetaGPU node.

In [24]:
def alphafold(fasta='GB98_DM_3.fasta', num_gpus=2,
              models = 'model_1', p_val='full_dbs', t_val = '2020-05-14'):
    import os
    import time
    import uuid
    import subprocess
    from subprocess import PIPE
    
    os.chdir('/opt/alphafold')
    
    fasta_pathname = f'/eagle/APSDataAnalysis/AlphaFold/fasta/{fasta}'
    data_dir = '/projects/CVD-Mol-AI/hsyoo/AlphaFoldData'
    dirname = str(uuid.uuid4())[:8]
    
    output = f'/eagle/APSDataAnalysis/AlphaFold/output/{dirname}'
    os.mkdir(output)
    log_file = f'{output}/{fasta}.log'
    
    cmd = f'/opt/alphafold/run.sh -d {data_dir} -o {output} -f {fasta_pathname} -t {t_val} -p {p_val} -m {models} -a {num_gpus} > {log_file} 2>&1'

    res = subprocess.run(cmd, stdout=PIPE, stderr=PIPE, shell=True, executable='/bin/bash')
    
    result_path = f'https://g-719d9.fd635.8443.data.globus.org/output/{dirname}/{fasta}.log'
    return {'output_path': result_path}

func_uuid = fxc.register_function(alphafold, container_uuid=cont_uuid)
print(func_uuid)

6bab1d8b-3d32-4f08-b0e6-db2a7c41fcc9


## Define a flow

It can take over an hour to run the AlphaFold function. Here we define a Globus Flow to run the function and email the result upon completion.

Note: You need to insert the password for the gmail account.

In [46]:
flow_definition = {
  "Comment": "Globus AlphaFold Flow",
  "StartAt": "AlphaFold",
  "States": {
    "AlphaFold": {
      "Comment": "run funcX",
      "Type": "Action",
      "ActionUrl": "https://automate.funcx.org",
      "ActionScope": "https://auth.globus.org/scopes/b3db7e59-a6f1-4947-95c2-59d6b7a70f8c/action_all",
      "Parameters": {
          "tasks": [{
            "endpoint.$": "$.input.fx_ep",
            "function.$": "$.input.fx_id",
            "payload": {
                "fasta.$": "$.input.fasta",
            }
        }]
      },
      "ResultPath": "$.output",
      "WaitTime": 36000,
      "Next": "Notify"
    },
    "Notify": {
      "Type": "Action",
      "ActionUrl": "https://actions.automate.globus.org/notification/notify",
      "Parameters": {
        "notification_method": "email",
        "sender": "globus.automate.notifications@gmail.com",
        "destination.$": "$.EmailNotificationInput.destination",
        "subject.$": "$.EmailNotificationInput.subject",
        "body_template.$": "$.EmailNotificationInput.body_template",
        "body_variables.$": "$.output.details.result[0]",
        "body_mimetype": "text/html",
        "send_credentials": [
          {
            "credential_type": "smtp",
            "credential_method": "email",
            "credential_value": {
              "hostname": "smtp.gmail.com",
              "username": "funcx.alerts@gmail.com",
              "password": "oDbS4hOM579S1"
            }
          }
        ]
      },
      "ResultPath": "$.EmailNotificationResult",
      "End": True
    }
  }
}

Register the flow

In [47]:
flows_client = create_flows_client()
flow = flows_client.deploy_flow(flow_definition, title="AlphaFold flow", input_schema={})
flow_id = flow['id']
flow_scope = flow['globus_auth_scope']
print(f'Newly created flow with id:\n{flow_id}')

Newly created flow with id:
effd7616-7b97-46ee-af3a-77d43f890b1d


Create input for the flow. You should specify your own email address as the destination.

Note: you need to insert your email address for the destination.

In [48]:
# Set an email address to send to
email_address = "ryan.chard@gmail.com"

flow_input = {
    "input": {
        "fx_id": func_uuid,
        "fx_ep": endpoint_uuid,
        "fasta": fasta,
    },
    "EmailNotificationInput": {
        "destination": email_address,
        "subject": "AlphaFold Notification",
        "body_template": "<html><body><h1>Globus AlphaFold flow completed.</h1><p>You can collect the result here: $output_path</p></body></html>",
        "body_variables": {
            "output_path": "Hello"
        }
    }
}

Run the flow. You can use the resulting link to monitor the flow in the Globus WebApp.

In [55]:
for x in range(1):
    flow_action = flows_client.run_flow(flow_id, flow_scope, flow_input)
    flow_action_id = flow_action['action_id']
    print(f"flow started: https://app.globus.org/runs/{flow_action_id}")
    flow_status = flow_action['status']
    print(f'Flow action started with id: {flow_action_id}')
    while flow_status == 'ACTIVE':
        time.sleep(120)
        flow_action = flows_client.flow_action_status(flow_id, flow_scope, flow_action_id)
        flow_status = flow_action['status']
        print(f'Flow status: {flow_status}')

flow started: https://app.globus.org/runs/38b3c5df-08ce-4f70-9d48-0cb54a881394
Flow action started with id: 38b3c5df-08ce-4f70-9d48-0cb54a881394
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: ACTIVE
Flow status: AC

In [None]:
flow_action['details']#['output']['details']['result'][0]