# Globus AlphaFold Example

This shows using AlphaFold on ThetaGPU via funcX. The notebook goes through uploading a FASTA file to ALCF's Eagle before invoking an AlphaFold job on the file. The AlphaFold job is dispatched to ThetaGPU using funcX and the results are written to Eagle for the user to collect.

In [1]:
import requests
import time
import json
import sys
import os

from fair_research_login import NativeClient

from funcx.sdk.client import FuncXClient

To run this example you will need to be a member of the `Globus AlphaFold Group`. This group restricts access to both the Globus endpoint to read and write data and the funcX endpoint deployed on ThetaGPU. You can request access here: https://app.globus.org/groups/2f76ac1f-3e68-11ec-976c-89c391007df5/about

In [2]:
group_id = '2f76ac1f-3e68-11ec-976c-89c391007df5'
eagle_endpoint = 'a3411a10-da2d-4b44-82f4-d6f5006d6da2'

AlphaFold accepts a FASTA file as input. Here we have an example FASTA file that can be used to test the system. You can replace this FASTA with your own and it will be uploaded to ALCF's Eagle storage system for processing.

In [23]:
fasta = 'GB98_DM_3.fasta'

f = open(fasta, "r")
print(f.read())

>GB98_DM.3 GB98 Deletion Mutation Sequence
TTYKLILNKQAKEEAIKELVDAGTAEKYFKLIANAKTVEGVWTYKDEIKTFTVTE




## Upload the file to Eagle

In [25]:
client = NativeClient(client_id='7414f0b4-7d05-4bb6-bb00-076fa3f17cf5')
tokens = client.login(requested_scopes=[f'https://auth.globus.org/scopes/{eagle_endpoint}/https'],
                     no_local_server=True, no_browser=True)
auth_token = tokens[eagle_endpoint]['access_token']
headers = {'Authorization': f'Bearer {auth_token}'}
print(headers)

Please paste the following URL in a browser:
https://auth.globus.org/v2/oauth2/authorize?client_id=7414f0b4-7d05-4bb6-bb00-076fa3f17cf5&redirect_uri=https%3A%2F%2Fauth.globus.org%2Fv2%2Fweb%2Fauth-code&scope=https%3A%2F%2Fauth.globus.org%2Fscopes%2Fa3411a10-da2d-4b44-82f4-d6f5006d6da2%2Fhttps&state=_default&response_type=code&code_challenge=8Y77b6XPj-mN5d_Pp5rzCWIycD4sDrf89SSkm-lvMz4&code_challenge_method=S256&access_type=online&prefill_named_grant=My+App+Login
Please Paste your Auth Code Below: 
AsrLgIlm6CzOwOth8COVJsHXH2cQbQ
{'Authorization': 'Bearer AgpOrMQNglpyKMO4W1D4NaJazM91w3qN49d0JvpJk4yGpBzEr7hvCzpYrgjmG3PDw0xGYYbw8vkEqXiXVlykaSqN5V'}


In [16]:
r = requests.put(f'https://g-719d9.fd635.8443.data.globus.org/fasta/{fasta}', data=open(fasta, 'rb'),
                 headers=headers)
print(r)

<Response [200]>


## Create an analysis function

Use funcX to register a simple analysis function.

In [7]:
fxc = FuncXClient()

In [8]:
import funcx
funcx.__version__

'0.3.4'

Specify the ThetaGPU endpoint to use.

In [9]:
endpoint_uuid = 'ab415fc0-6d3b-4d1a-b62e-392c97998ce0'

The function needs to use the AlphaFold container to execute. Here we register the path to the container with funcX such that we can register the function against the container. The funcX endpoint will then spawn a worker inside the container to serve the function.

In [10]:
cont_path = '/eagle/APSDataAnalysis/AlphaFold/AlphaFoldImage/alphafold-fx.sif'
cont_uuid = fxc.register_container(container_type='singularity', location=cont_path)

Define a function to invoke AlphaFold. This function will be executed on a ThetaGPU node.

In [17]:
def alphafold(fasta='GB98_DM_3.fasta', num_gpus=8,
              models = 'model_1', p_val='full_dbs', t_val = '2020-05-14'):
    import os
    import time
    import subprocess
    from subprocess import PIPE
    
    os.chdir('/opt/alphafold')
    
    fasta_pathname = f'/eagle/APSDataAnalysis/AlphaFold/fasta/{fasta}'
    
    data_dir = '/projects/CVD-Mol-AI/hsyoo/AlphaFoldData'
    
    timestamp = int(time.time())
    
    output = f'/eagle/APSDataAnalysis/AlphaFold/output/{timestamp}'
    os.mkdir(output)
    log_file = f'{output}/{fasta}.log'
    
    cmd = f'/opt/alphafold/run.sh -d {data_dir} -o {output} -f {fasta_pathname} -t {t_val} -p {p_val} -m {models} -a {num_gpus} > {log_file} 2>&1'

    res = subprocess.run(cmd, stdout=PIPE, stderr=PIPE, shell=True, executable='/bin/bash')
    
    result_path = f'https://g-719d9.fd635.8443.data.globus.org/output/{timestamp}/{fasta}.log'
    return result_path

func_uuid = fxc.register_function(alphafold, container_uuid=cont_uuid)
print(func_uuid)

a6141b46-6956-4e06-8996-f57b4fb6800c


Run the function. We specify the name of the FASTA file as input.

In [20]:
res = fxc.run(fasta=fasta, endpoint_id=endpoint_uuid, function_id=func_uuid)
print(res)

bddf6dd6-202c-494e-b23f-0cedfbe13c27


In [21]:
while True:
    time.sleep(60)
    try:
        r = fxc.get_result(res)
        print(r)
        break
    except Exception as eek:
        print(eek)

Task is pending due to waiting-for-launch
Task is pending due to running
Task is pending due to running
Task is pending due to running
Task is pending due to running
Task is pending due to running
Task is pending due to running
Task is pending due to running
Task is pending due to running
Task is pending due to running
ConnectionError on request
Task is pending due to running
ConnectionError on request
Task is pending due to running
Task is pending due to running
Task is pending due to running
Task is pending due to running
Task is pending due to running
Task is pending due to running


KeyboardInterrupt: 