Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue when using nequip-deploy 🐛 [BUG] #346

Open
utkarshp1161 opened this issue Jun 6, 2023 · 9 comments
Open

issue when using nequip-deploy 🐛 [BUG] #346

utkarshp1161 opened this issue Jun 6, 2023 · 9 comments
Labels
bug Something isn't working

Comments

@utkarshp1161
Copy link

utkarshp1161 commented Jun 6, 2023

Describe the bug
issue when using nequip-deploy

To Reproduce
nequip-deploy build --train-dir model_path/ model_path/deployed_model.pth

ERROR:

[W init.cpp:833] Warning: Use _jit_set_fusion_strategy, bailout depth is deprecated. Setting to (STATIC, 2) (function operator())
/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_check.py:172: UserWarning: The TorchScript type system doesn't support instance-level annotations on empty non-base types in `__init__`. Instead, either 1) use a type annotation in the class body, or 2) wrap the type in `torch.jit.Attribute`.
  warnings.warn("The TorchScript type system doesn't support "
Traceback (most recent call last):
  File "/home/anaconda3/envs/bebam/bin/nequip-deploy", line 8, in <module>
    sys.exit(main())
  File "/home/nequip/nequip/nequip/scripts/deploy.py", line 225, in main
    model = _compile_for_deploy(model)
  File "/home/nequip/nequip/nequip/scripts/deploy.py", line 62, in _compile_for_deploy
    model = script(model)
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/e3nn/util/jit.py", line 266, in script
    out = compile(mod, in_place=in_place)
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/e3nn/util/jit.py", line 101, in compile
    compile(
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/e3nn/util/jit.py", line 113, in compile
    mod = torch.jit.script(mod, **script_options)
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_script.py", line 1284, in script
    return torch.jit._recursive.create_script_module(
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_recursive.py", line 480, in create_script_module
    return create_script_module_impl(nn_module, concrete_type, stubs_fn)
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_recursive.py", line 542, in create_script_module_impl
    script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn)
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_script.py", line 614, in _construct
    init_fn(script_module)
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_recursive.py", line 520, in init_fn
    scripted = create_script_module_impl(orig_value, sub_concrete_type, stubs_fn)
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_recursive.py", line 546, in create_script_module_impl
    create_methods_and_properties_from_stubs(concrete_type, method_stubs, property_stubs)
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_recursive.py", line 397, in create_methods_and_properties_from_stubs
    concrete_type._create_methods_and_properties(property_defs, property_rcbs, method_defs, method_rcbs, method_defaults)
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_recursive.py", line 867, in try_compile_fn
    return torch.jit.script(fn, _rcb=rcb)
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_script.py", line 1338, in script
    ast = get_jit_def(obj, obj.__name__)
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/frontend.py", line 297, in get_jit_def
    return build_def(parsed_def.ctx, fn_def, type_line, def_name, self_name=self_name, pdt_arg_types=pdt_arg_types)
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/frontend.py", line 335, in build_def
    param_list = build_param_list(ctx, py_def.args, self_name, pdt_arg_types)
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/frontend.py", line 359, in build_param_list
    raise NotSupportedError(ctx_range, _vararg_kwarg_err)
torch.jit.frontend.NotSupportedError: Compiled functions can't take variable number of arguments or use keyword-only arguments with defaults:
  File "/home/anaconda3/envs/bebam/lib/python3.10/logging/__init__.py", line 2131
def debug(msg, *args, **kwargs):
                       ~~~~~~~ <--- HERE
    """
    Log a message with severity 'DEBUG' on the root logger. If the logger has
@utkarshp1161 utkarshp1161 added the bug Something isn't working label Jun 6, 2023
@utkarshp1161 utkarshp1161 changed the title 🐛 [BUG] issue when using nequip-deploy 🐛 [BUG] Jun 6, 2023
@Linux-cpp-lisp
Copy link
Collaborator

This looks like you've edited the code to include logging.debug calls in the model?

@utkarshp1161
Copy link
Author

utkarshp1161 commented Jun 7, 2023

Not that, but I have set up my python environment for nequip such that I am able to use pytroch 2.0 (unlike what is prescribed : PyTorch >= 1.8, !=1.9, <=1.11.*. PyTorch, due to hardware constraints and try few other things with torch geometric). I am able to train the nequip models in this setup but when trying to deploy the model getting this error. My goal is to do a md simulation on trained model and I thought that I could use NequIPCalculator.from_deployed_model(model, **kwargs). Is there a workaround such that I can do the md sim without having to deploy the model?

@Linux-cpp-lisp
Copy link
Collaborator

I see--- what hardware things? Please note the following upstream issue: #311. If you do or do not encounter this issue, please post in that thread so we can continue to try to resolve and understand this problem. Also please note that on AMD GPUs more recent versions of PyTorch appear to be fine.

Regarding torch_geometric, that is no longer a dependency of nequip, but maybe I am misinterpreting what you mean.

You could try 1.13? I've never seen this issue reported before... besides your PyTorch version, is there anything else custom or unusual about your setup? There should never be a call to logging.debug in the model. Maybe the rest of the stack trace, which isn't included here, says where in the model it is?

@utkarshp1161
Copy link
Author

utkarshp1161 commented Jun 7, 2023

Thank you, will try and get back with more details.

Can you please answer this:
"My goal is to do a md simulation on trained model and I thought that I could use NequIPCalculator.from_deployed_model(model, **kwargs). Is there a workaround such that I can do the md sim without having to deploy the model?"

Actually I have already trained quite a number of models and since nequip-deploy is not working for them I am looking for some work around to complete my study without having to setup things again.

@Linux-cpp-lisp
Copy link
Collaborator

You could do inefficient MD by manually constructing NequIPCalculator from an uncompiled PyTorch model (build using model_from_config and .load_state_dict and then passed to the constructor, rather than from_deployed_model). This will loose you performance in a lot of places, however.

It is not possible to do MD in LAMMPS, OpenMM, etc. without deploying.

@Linux-cpp-lisp
Copy link
Collaborator

Linux-cpp-lisp commented Jun 7, 2023

Thank you, will try and get back with more details.

Thanks. It's possible that there is a missing @torch.jit.unused, in which case a quick code change will make it possible for you to deploy everything without retraining. (In general most code and version changes will not require retraining.)

@utkarshp1161
Copy link
Author

utkarshp1161 commented Jun 7, 2023

You could do inefficient MD by manually constructing NequIPCalculator from an uncompiled PyTorch model (build using model_from_config and .load_state_dict and then passed to the constructor, rather than from_deployed_model).

Do I need to modify the calculate function in "class NequIPCalculator(Calculator)" if I use uncompiled PyTorch model?

@Linux-cpp-lisp
Copy link
Collaborator

No, you shouldn't need to.

@utkarshp1161
Copy link
Author

No, you shouldn't need to.

cool

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants