Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The predicted values are nan #120

Closed
nimijkrap opened this issue Aug 11, 2021 · 13 comments
Closed

The predicted values are nan #120

nimijkrap opened this issue Aug 11, 2021 · 13 comments
Labels
error report Something isn't working

Comments

@nimijkrap
Copy link

nimijkrap commented Aug 11, 2021

I set up alphafold without docker on our server and ran alphafold with A100 GPU.
During relaxation, "simtk.openmm.OpenMMException: Particle coordinate is nan" error occurred as below

I0811 17:57:58.746377 140681736556736 run_alphafold.py:141] Running model model_1
I0811 17:58:21.323568 140681736556736 model.py:132] Running predict with shape(feat) = {'aatype': (4, 68), 'residue_index': (4, 68), 'seq_length': (4,), 'template_aatype': (4, 4, 68), 'template_all_atom_masks': (4, 4, 68, 37), 'template_all_atom_positions': (4, 4, 68, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 68), 'msa_mask': (4, 508, 68), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 68, 3), 'template_pseudo_beta_mask': (4, 4, 68), 'atom14_atom_exists': (4, 68, 14), 'residx_atom14_to_atom37': (4, 68, 14), 'residx_atom37_to_atom14': (4, 68, 37), 'atom37_atom_exists': (4, 68, 37), 'extra_msa': (4, 5120, 68), 'extra_msa_mask': (4, 5120, 68), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 68), 'true_msa': (4, 508, 68), 'extra_has_deletion': (4, 5120, 68), 'extra_deletion_value': (4, 5120, 68), 'msa_feat': (4, 508, 68, 49), 'target_feat': (4, 68, 22)}
I0811 18:02:36.754542 140681736556736 model.py:140] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (68, 68, 64)}, 'experimentally_resolved': {'logits': (68, 37)}, 'masked_msa': {'logits': (508, 68, 23)}, 'predicted_lddt': {'logits': (68, 50)}, 'structure_module': {'final_atom_mask': (68, 37), 'final_atom_positions': (68, 37, 3)}, 'plddt': (68,)}
I0811 18:02:36.765380 140681736556736 run_alphafold.py:153] Total JAX model model_1 predict time (includes compilation time, see --benchmark): 255?
Traceback (most recent call last):
  File "/home/dearfold/alphafold/run_alphafold.py", line 302, in <module>
    app.run(main)
  File "/home/dearfold/anaconda3/envs/alphafold/lib/python3.7/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/home/dearfold/anaconda3/envs/alphafold/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/home/dearfold/alphafold/run_alphafold.py", line 284, in main
    random_seed=random_seed)
  File "/home/dearfold/alphafold/run_alphafold.py", line 177, in predict_structure
    relaxed_pdb_str, _, _ = amber_relaxer.process(prot=unrelaxed_protein)
  File "/home/dearfold/alphafold/alphafold/relax/relax.py", line 62, in process
    max_outer_iterations=self._max_outer_iterations)
  File "/home/dearfold/alphafold/alphafold/relax/amber_minimize.py", line 461, in run_pipeline
    pdb_string = clean_protein(prot, checks=checks)
  File "/home/dearfold/alphafold/alphafold/relax/amber_minimize.py", line 171, in clean_protein
    fixed_pdb = cleanup.fix_pdb(pdb_file, alterations_info)
  File "/home/dearfold/alphafold/alphafold/relax/cleanup.py", line 55, in fix_pdb
    fixer.addMissingAtoms(seed=0)
  File "/home/dearfold/anaconda3/envs/alphafold/lib/python3.7/site-packages/pdbfixer/pdbfixer.py", line 954, in addMissingAtoms
    mm.LocalEnergyMinimizer.minimize(context)
  File "/home/dearfold/anaconda3/envs/alphafold/lib/python3.7/site-packages/simtk/openmm/openmm.py", line 4110, in minimize
    return _openmm.LocalEnergyMinimizer_minimize(context, tolerance, maxIterations)
simtk.openmm.OpenMMException: Particle coordinate is nan

These are files in the output directory.

features.pkl  msas  result_model_1.pkl  unrelaxed_model_1.pdb

I checked unrelaxed_model_1.pdb, and found that atom coordinates are written as nan. The below is the part of unrelaxed_model_1.pdb.

MODEL     1
ATOM      1  N   GLY A   1         nan     nan     nan  1.00  0.00           N
ATOM      2  CA  GLY A   1         nan     nan     nan  1.00  0.00           C
ATOM      3  C   GLY A   1         nan     nan     nan  1.00  0.00           C
ATOM      4  O   GLY A   1         nan     nan     nan  1.00  0.00           O
ATOM      5  N   TRP A   2         nan     nan     nan  1.00  0.00           N
ATOM      6  CA  TRP A   2         nan     nan     nan  1.00  0.00           C
ATOM      7  C   TRP A   2         nan     nan     nan  1.00  0.00           C
ATOM      8  CB  TRP A   2         nan     nan     nan  1.00  0.00           C
ATOM      9  O   TRP A   2         nan     nan     nan  1.00  0.00           O
ATOM     10  CG  TRP A   2         nan     nan     nan  1.00  0.00           C
ATOM     11  CD1 TRP A   2         nan     nan     nan  1.00  0.00           C
ATOM     12  CD2 TRP A   2         nan     nan     nan  1.00  0.00           C
ATOM     13  CE2 TRP A   2         nan     nan     nan  1.00  0.00           C
ATOM     14  CE3 TRP A   2         nan     nan     nan  1.00  0.00           C
ATOM     15  NE1 TRP A   2         nan     nan     nan  1.00  0.00           N
ATOM     16  CH2 TRP A   2         nan     nan     nan  1.00  0.00           C
ATOM     17  CZ2 TRP A   2         nan     nan     nan  1.00  0.00           C
ATOM     18  CZ3 TRP A   2         nan     nan     nan  1.00  0.00           C
ATOM     19  N   SER A   3         nan     nan     nan  1.00  0.00           N
ATOM     20  CA  SER A   3         nan     nan     nan  1.00  0.00           C
ATOM     21  C   SER A   3         nan     nan     nan  1.00  0.00           C
ATOM     22  CB  SER A   3         nan     nan     nan  1.00  0.00           C
ATOM     23  O   SER A   3         nan     nan     nan  1.00  0.00           O
ATOM     24  OG  SER A   3         nan     nan     nan  1.00  0.00           O
ATOM     25  N   THR A   4         nan     nan     nan  1.00  0.00           N
ATOM     26  CA  THR A   4         nan     nan     nan  1.00  0.00           C
ATOM     27  C   THR A   4         nan     nan     nan  1.00  0.00           C
ATOM     28  CB  THR A   4         nan     nan     nan  1.00  0.00           C
ATOM     29  O   THR A   4         nan     nan     nan  1.00  0.00           O
ATOM     30  CG2 THR A   4         nan     nan     nan  1.00  0.00           C
ATOM     31  OG1 THR A   4         nan     nan     nan  1.00  0.00           O

So, I loaded the result_model_1.pkl file as dictionary and found that the predicted values are also nan.

Python 3.7.11 (default, Jul 27 2021, 14:32:16)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle as pkl
>>> with open("result_model_1.pkl", "rb") as f:
...     d=pkl.load(f)
...
>>> d
{'distogram': {'bin_edges': array([ 2.3125   ,  2.625    ,  2.9375   ,  3.25     ,  3.5625   ,
        3.875    ,  4.1875   ,  4.5      ,  4.8125   ,  5.125    ,
        5.4375   ,  5.75     ,  6.0625   ,  6.375    ,  6.6875   ,
        6.9999995,  7.3125   ,  7.625    ,  7.9375   ,  8.25     ,
        8.5625   ,  8.875    ,  9.1875   ,  9.5      ,  9.812499 ,
       10.124999 , 10.4375   , 10.75     , 11.0625   , 11.375    ,
       11.687499 , 12.       , 12.3125   , 12.625    , 12.9375   ,
       13.25     , 13.5625   , 13.874999 , 14.187501 , 14.499999 ,
       14.812499 , 15.124999 , 15.437499 , 15.75     , 16.0625   ,
       16.375    , 16.687502 , 16.999998 , 17.312498 , 17.624998 ,
       17.937498 , 18.25     , 18.5625   , 18.875    , 19.1875   ,
       19.5      , 19.8125   , 20.125    , 20.437498 , 20.75     ,
       21.062498 , 21.374998 , 21.6875   ], dtype=float32), 'logits': array([[[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]],

       [[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]],

       [[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]],

       ...,

       [[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]],

       [[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]],

       [[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)}, 'experimentally_resolved': {'logits': array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]], dtype=float32)}, 'masked_msa': {'logits': array([[[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]],

       [[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]],

       [[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]],

       ...,

       [[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]],

       [[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]],

       [[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)}, 'predicted_lddt': {'logits': array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]], dtype=float32)}, 'structure_module': {'final_atom_mask': array([[1., 1., 1., ..., 0., 0., 0.],
       [1., 1., 1., ..., 1., 0., 0.],
       [1., 1., 1., ..., 0., 0., 0.],
       ...,
       [1., 1., 1., ..., 0., 1., 0.],
       [1., 1., 1., ..., 0., 0., 0.],
       [1., 1., 1., ..., 0., 0., 0.]], dtype=float32), 'final_atom_positions': array([[[nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan],
        ...,
        [nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan]],

       [[nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan],
        ...,
        [nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan]],

       [[nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan],
        ...,
        [nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan]],

       ...,

       [[nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan],
        ...,
        [nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan]],

       [[nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan],
        ...,
        [nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan]],

       [[nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan],
        ...,
        [nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan]]], dtype=float32)}, 'plddt': array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan])}

I checked that the feature.pkl file is okay and parameters for model are loaded well.
I tested different sequences, but the predicted values were always nan.
I guess something went wrong during prediction, but I cannot figure out what is wrong and how to fix it.
Has anyone faced the same issue?
Can anyone help me to fix it?

@abridgland abridgland added the error report Something isn't working label Sep 1, 2021
@Augustin-Zidek
Copy link
Collaborator

Is this still an issue with the latest version of AlphaFold? Also, does it help to run without relax (--run_relax=false)?

@giangpth
Copy link

giangpth commented Mar 17, 2022

When I run with --run_relax=false I don't get the error anymore.
However, I noticed another problem when I run with --run_relax=false . For example, with the attached fasta file, when I run with --run_relax=false the rank_0 model I get will be all nan (as attached). Some other models (rank_1, rank_2...) may have valid atom coordinates. And also check the ranking_debug.json, there is some nan value.
Error.zip

@Augustin-Zidek
Copy link
Collaborator

Hi, thanks for the additional information, we will investigate and let you know.

@mcbeaker
Copy link

Any update on this topic? I am getting the same error using the multimer protocol

@Augustin-Zidek
Copy link
Collaborator

Is this an issue for all 5 predictions or just some of them?

@giangpth
Copy link

just some of them

@mcbeaker
Copy link

mcbeaker commented May 25, 2022 via email

@RodenLuo
Copy link

Same issue for --model_preset=multimer, --use_gpu_relax=True with v2.2.0.

@RodenLuo
Copy link

This is also the case when --use_gpu_relax=False.

Both stop at

simtk.openmm.OpenMMException: Particle coordinate is nan

@RodenLuo
Copy link

RodenLuo commented Jun 6, 2022

I just got approval that I can share the following sequences for debugging purposes. In my case, this nan related bug happened for:

$ cat MPK4_MKK2_Docking.fasta 
>MPK4
MSAESCFGSSGDQSSSKGVATHGGSYVQYNVYGNLFEVSRKYVPPLRPIGRGAYGIVCAATNSETGEEVAIKKIGNAFDNIIDAKRTLREIKLLKHMDHENVIAVKDIIKPPQRENFNDVYIVYELMDTDLHQIIRSNQPLTDDHCRFFLYQLLRGLKYVHSANVLHRDLKPSNLLLNANCDLKLGDFGLARTKSETDFMTEYVVTRWYRAPELLLNCSEYTAAIDIWSVGCILGETMTREPLFPGKDYVHQLRLITELIGSPDDSSLGFLRSDNARRYVRQLPQYPRQNFAARFPNMSAGAVDLLEKMLVFDPSRRITVDEALCHPYLAPLHDINEEPVCVRPFNFDFEQPTLTEENIKELIYRETVKFNPQDSV
>MKK2-Docking
MKKGGFSNNLKLAIPVAGE
$ cat run_multimer.sh
#!/bin/bash
## https://sbgrid.org/wiki/examples/alphafold2
### Tips: https://wiki.hpcc.msu.edu/display/ITH/Alphafold 
#SBATCH -N 1
#SBATCH --partition=batch
#SBATCH -J AlphaFold.version2.2
#SBATCH -o AlphaFold.v2.2.%J.out
#SBATCH -e AlphaFold.v2.2.%J.err
#SBATCH --mail-user=deng.luo@kaust.edu.sa
#SBATCH --mail-type=ALL
#SBATCH --time=24:00:00
#SBATCH --mem=64G
#SBATCH --gres=gpu:4
#SBATCH --cpus-per-task=32
#SBATCH --constraint=[a100]


module load alphafold/2.2.0/python3_jupyter
export ALPHAFOLD_DATA=/reference/alphafold/2.1.1/all_alphafold_data
export CUDA_VISIBLE_DEVICES=0,1,2,3
export TF_FORCE_UNIFIED_MEMORY=1
export XLA_PYTHON_CLIENT_MEM_FRACTION=0.5
export XLA_PYTHON_CLIENT_ALLOCATOR=platform
python3 $AlphaFold/run_alphafold.py \
 --data_dir=$ALPHAFOLD_DATA \
 --output_dir=/af2_multimer_run/MPK4_MKK2_Docking \
 --fasta_paths=/af2_multimer_run/MPK4_MKK2_Docking/MPK4_MKK2_Docking.fasta \
 --max_template_date=2022-05-25 \
 --db_preset=full_dbs \
 --bfd_database_path=$ALPHAFOLD_DATA/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
 --uniclust30_database_path=$ALPHAFOLD_DATA/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
 --uniref90_database_path=$ALPHAFOLD_DATA/uniref90/uniref90.fasta \
 --mgnify_database_path=$ALPHAFOLD_DATA/mgnify/mgy_clusters_2018_12.fa \
 --template_mmcif_dir=$ALPHAFOLD_DATA/pdb_mmcif/mmcif_files \
 --model_preset=multimer \
 --uniprot_database_path=$ALPHAFOLD_DATA/uniprot/uniprot.fasta \
 --pdb_seqres_database_path=$ALPHAFOLD_DATA/pdb_seqres/pdb_seqres.txt \
 --obsolete_pdbs_path=$ALPHAFOLD_DATA/pdb_mmcif/obsolete.dat \
 --use_gpu_relax=True

--use_gpu_relax=False is also facing nan issue.

@boegel
Copy link

boegel commented Aug 4, 2022

It looks like this problem can be fixed by making a small change, which is necessary when you're using jax 0.3.8 or newer, see #513

@Augustin-Zidek
Copy link
Collaborator

This has been fixed in https://github.com/deepmind/alphafold/releases/tag/v2.2.4. Closing this issue, feel free to reopen this issue or open a new issue if this is still a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
error report Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants
@boegel @abridgland @mcbeaker @giangpth @Augustin-Zidek @RodenLuo @nimijkrap and others