Is there a way to find the max GPU memory watermark? How to run locally with minimal setup #17

avilella · 2021-08-05T08:20:47Z

According to the README.md, the memory goes as follows:

Maximum length limits depends on free GPU provided by Google-Colab fingers-crossed

For GPU: Tesla T4 or Tesla P100 with ~16G the max length is ~1400
For GPU: Tesla K80 with ~12G the max length is ~1000
To check what GPU you got, open a new code cell and type !nvidia-smi

I am interested in structures of around either (a) one single chain of 240-280aa or around (b) 2 different chains of ~120 + ~140aa. What would be the minimal GPU that would allow us to run this locally?

I am thinking that given our own custom MSAs, it wouldn't need to connect to MMSeqs2 or download the 2Tb of sequence data, thus going straight into running the prediction based on the MSA of internal data on the docker container?

Or am I missing something obvious that would still require Colab or something else remote?

The text was updated successfully, but these errors were encountered:

milot-mirdita · 2021-08-05T08:35:15Z

The old K40 GPUs (12GB RAM) we have locally ran all but one (it was 900-1000aa) CASP FM target without issues with the official pipeline, so AF2 doesn't necessarily need very new GPUs.

You might still want to poke at the python code in the Colab, as this will be a lot easier to supply your own MSAs to than the official pipeline. Ideally we want to make the Colabs also runnable on the command line, but haven't started working on that yet.

avilella · 2021-08-05T09:44:38Z

It would be great to have an option for a command-line Colab: eager to run it on 10E3-10E4 inputs. Thanks in advance.

…

On Thu, Aug 5, 2021 at 9:35 AM Milot Mirdita ***@***.***> wrote: The old K40 GPUs (12GB RAM) we have locally ran all but one (it was 900-1000aa) CASP FM target without issues with the official pipeline, so AF2 doesn't necessarily need very new GPUs. You might still want to poke at the python code in the Colab, as this will be a lot easier to supply your own MSAs to than the official pipeline. Ideally we want to make the Colabs also runnable on the command line, but haven't started working on that yet. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#17 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABGSN34BZ3MV4PXMM4ZNNTT3JEM7ANCNFSM5BTJXEPQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email> .

RodenLuo · 2021-08-23T08:41:19Z

Ideally we want to make the Colabs also runnable on the command line, but haven't started working on that yet.

This is also mentioned in #20. It would be great to have either a local command-line interface or the local notebook version so that we can run on inputs with >1000 amino acids and predict complexes (dimers/trimers) of it.

I'm not that familiar with all the steps involved in the code. Mostly use it as an end-to-end tool. I tried to localize the AlphaFold2_advanced notebook. After solving several package issues, now stuck at No module named 'colabfold'. I also see the database_path all go to googleapis which will work fine on Colab but less smoothly on local I guess. I have a local version of AlphaFold2 running fine. Would be much appreciated if you can give some hints on how to localize the AlphaFold2_advanced notebook. Thanks.

avilella · 2021-08-23T09:08:24Z

I ended up biting the bullet, getting a computer with a large enough NVIDIA GPU and installing Alphafold2 on the Linux machine. The docker version of the instructions was a pain to get sorted, so I did it with a non-docker recipe here: https://github.com/kalininalab/alphafold_non_docker For other people without the means or access to their own computer with a large enough NVIDIA GPU, I think it's still worth have a way to expose this wonderful Colab notebooks to a more programmatic / serial way of performing the tasks.There is so much work and experimentation going on in this Colabs, that it's worth keeping an eye on them even for people who have managed to locally deploy Alphafold2 on their computers.

…

On Mon, Aug 23, 2021 at 9:41 AM Roden Luo ***@***.***> wrote: Ideally we want to make the Colabs also runnable on the command line, but haven't started working on that yet. This is also mentioned in #20 <#20>. It would be great to have either a local command-line interface or the local notebook version so that we can run on inputs with >1000 amino acids and predict complexes (dimers/trimers) of it. I'm not that familiar with all the steps involved in the code. Mostly use it as an end-to-end tool. I tried to localize the AlphaFold2_advanced notebook. After solving several package issues, now stuck at No module named 'colabfold'. I also see the database_path all go to googleapis which will work fine on Colab but less smoothly on local I guess. I have a local version of AlphaFold2 running fine. Would be much appreciated if you can give some hints on how to localize the AlphaFold2_advanced notebook. Thanks. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#17 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABGSN2RP5Z62M5W2CK3YO3T6ICTVANCNFSM5BTJXEPQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email> .

milot-mirdita · 2021-08-23T09:44:15Z

We now have an internal version that runs on a cluster. The main issue still remains that the MMseqs2 API runs on one single server and will probably not scale to multiple research group submitting jobs.

We are still preparing databases, scripts etc. so people could deploy their own server. However, to use MMseqs2 as we use it for ColabFold we do require that all databases are fully in RAM (currently requiring 535GB of RAM + some RAM for each worker process).

We can change the local ColabFold version to work with MMseqs2's usual batch mode where the memory requirements are not as high.

If you want to run a few thousand sequences please contact me directly (email, twitter etc). I can give you access to the local version. We still need to figure something out how to scale the API better though.

RodenLuo · 2021-08-23T11:07:08Z

Thanks! I have a local version of AlphaFold2 installed with docker on a server. (I met some problems during installation. And then I was trying to install the non_docker version on a cluster as well but later dropped it as the one on the server worked out fine after changing the Cuda version.)

I have 4 NVIDIA RTX A6000 and 1.0TB RAM on that server. But I still have not got AlphaFold2_advanced.ipynb run through. I would like to predict homotrimers of a protein with more than 1000 aa (more details at issue #93 in AlphaFold2's repo). With trimer settings, the total length is more than 3000 aa.

I am facing the below error if I run them on Colab.

Exception: Input sequence is too long: 3867 amino acids, while the maximum is 2500. Please use the full AlphaFold system for long sequences.
Exception: Input sequence is too long: 3078 amino acids, while the maximum is 2500. Please use the full AlphaFold system for long sequences.

I'm trying to run the notebooks locally on the server now. The previously mentioned No module named 'colabfold' error was because I was launching the notebook within the AlphaFold2's folder and it bypassed this line if not os.path.isdir("alphafold"): in the notebook. I moved the notebook to another folder. After several pip installs and conda installs for the missing packages, I didn't change the database_path, so it should be using googleapis I suppose. I changed the max length to MAX_SEQUENCE_LENGTH = 5000 (I only changed this line when trying to fix the aforementioned error).

And now, the #@title Search against genetic databases cell runs fine and plotted the sequence coverage figure. However, the #@title run alphafold cell gives below error.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_40057/1661749859.py in <module>
    151       cfg.data.eval.num_ensemble = num_ensemble
    152 
--> 153       params = data.get_model_haiku_params(name,'./alphafold/data')
    154       model_runner = model.RunModel(cfg, params, is_training=is_training)
    155       COMPILED = compiled

~/.conda/envs/AF/lib/python3.9/site-packages/alphafold/model/data.py in get_model_haiku_params(model_name, data_dir)
     37     params = np.load(io.BytesIO(f.read()), allow_pickle=False)
     38 
---> 39   return utils.flat_params_to_haiku(params)

~/.conda/envs/AF/lib/python3.9/site-packages/alphafold/model/utils.py in flat_params_to_haiku(params)
     77     if scope not in hk_params:
     78       hk_params[scope] = {}
---> 79     hk_params[scope][name] = jnp.array(array)
     80 
     81   return hk_params

~/.conda/envs/AF/lib/python3.9/site-packages/jax/_src/numpy/lax_numpy.py in array(object, dtype, copy, order, ndmin)
   3085     _inferred_dtype = object.dtype and dtypes.canonicalize_dtype(object.dtype)
   3086     lax._check_user_dtype_supported(_inferred_dtype, "array")
-> 3087     out = _device_put_raw(object, weak_type=weak_type)
   3088     if dtype: assert _dtype(out) == dtype
   3089   elif isinstance(object, (DeviceArray, core.Tracer)):

~/.conda/envs/AF/lib/python3.9/site-packages/jax/_src/lax/lax.py in _device_put_raw(x, weak_type)
   1607   else:
   1608     aval = raise_to_shaped(core.get_aval(x), weak_type=weak_type)
-> 1609     return xla.array_result_handler(None, aval)(*xla.device_put(x))
   1610 
   1611 def zeros_like_shaped_array(aval):

~/.conda/envs/AF/lib/python3.9/site-packages/jax/interpreters/xla.py in device_put(x, device)
    156   x = canonicalize_dtype(x)
    157   try:
--> 158     return device_put_handlers[type(x)](x, device)
    159   except KeyError as err:
    160     raise TypeError(f"No device_put handler for type: {type(x)}") from err

~/.conda/envs/AF/lib/python3.9/site-packages/jax/interpreters/xla.py in _device_put_array(x, device)
    164   if x.dtype is dtypes.float0:
    165     x = np.zeros(x.shape, dtype=np.dtype(bool))
--> 166   return (backend.buffer_from_pyval(x, device),)
    167 
    168 def _device_put_scalar(x, device):

RuntimeError: Resource exhausted: Out of memory while trying to allocate 2097152 bytes.

However, all 4 GPUs and the RAM are available as shown below.

~$ nvidia-smi
Mon Aug 23 14:04:14 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A6000    Off  | 00000000:18:00.0 Off |                  Off |
| 30%   24C    P8     6W / 300W |    460MiB / 48685MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA RTX A6000    Off  | 00000000:3B:00.0 Off |                  Off |
| 30%   28C    P8    14W / 300W |    550MiB / 48685MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA RTX A6000    Off  | 00000000:86:00.0 Off |                  Off |
| 30%   26C    P8     7W / 300W |    456MiB / 48685MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA RTX A6000    Off  | 00000000:AF:00.0 Off |                  Off |
| 30%   24C    P8    17W / 300W |    452MiB / 48685MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

$ free -h
              total        used        free      shared  buff/cache   available
Mem:           1.0T        7.3G        808G         40M        191G        994G
Swap:          7.5G          0B        7.5G

Any help would be very appreciated. Thanks

sokrypton · 2021-08-23T14:15:28Z

The advanced notebook is under active development. I would avoid trying to deploy it locally (unless you are willing to track daily bug fixes and implement them yourself). For a more stable setup see alphafold2_mmseqs2 notebook.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to find the max GPU memory watermark? How to run locally with minimal setup #17

Is there a way to find the max GPU memory watermark? How to run locally with minimal setup #17

avilella commented Aug 5, 2021

milot-mirdita commented Aug 5, 2021

avilella commented Aug 5, 2021 via email

RodenLuo commented Aug 23, 2021

avilella commented Aug 23, 2021 via email

milot-mirdita commented Aug 23, 2021

RodenLuo commented Aug 23, 2021

sokrypton commented Aug 23, 2021

Is there a way to find the max GPU memory watermark? How to run locally with minimal setup #17

Is there a way to find the max GPU memory watermark? How to run locally with minimal setup #17

Comments

avilella commented Aug 5, 2021

milot-mirdita commented Aug 5, 2021

avilella commented Aug 5, 2021 via email

RodenLuo commented Aug 23, 2021

avilella commented Aug 23, 2021 via email

milot-mirdita commented Aug 23, 2021

RodenLuo commented Aug 23, 2021

sokrypton commented Aug 23, 2021