contraction error when running mps-rnn repo #1595

mschuylermoss · 2023-09-28T14:38:02Z

mschuylermoss
Sep 28, 2023

I downloaded the mps-rnn repo) and am trying to run it using the default settings. The only things I specify are:
python vmc.py --net_dim 1 --dtype "float32" --show_progress --cuda "" --run_name "test" --out_dir "./out"

I am using a conda environment with python=3.9.17, but I pip installed everything instead of conda install (I have read that Conda can raise weird errors with NetKet). Other relevant dependencies are: jax==0.4.14, jaxlib=0.4.14, numpy=1.24.4, netket (custom branch), plum-dispatch=2.2.0

When I run the code, everything gets traced and the VMC begins to run and immediately breaks (i.e. the progress bar appears and shows 0%, so the first step of the VMC breaks) the final error reads:
jax._src.traceback_util.UnfilteredStackTrace: ValueError: Einstein sum subscript 'a' does not contain the correct number of indices for operand 0.
The full stack trace is here:

Traceback (most recent call last):
  File "/Users/mschuylerm/Documents/Github/mps-rnn/vmc.py", line 330, in <module>
    main()
  File "/Users/mschuylerm/Documents/Github/mps-rnn/vmc.py", line 320, in main
    vmc.run(n_iter=args.max_step, out=logger, show_progress=args.show_progress)
  File "/Users/mschuylerm/miniconda3/envs/mpsrnn/lib/python3.9/site-packages/netket/driver/abstract_variational_driver.py", line 256, in run
    for step in self.iter(n_iter, step_size):
  File "/Users/mschuylerm/miniconda3/envs/mpsrnn/lib/python3.9/site-packages/netket/driver/abstract_variational_driver.py", line 168, in iter
    self._forward_and_backward()
  File "/Users/mschuylerm/miniconda3/envs/mpsrnn/lib/python3.9/site-packages/netket/driver/vmc.py", line 130, in _forward_and_backward
    self._loss_stats, self._loss_grad = self.state.expect_and_grad(self._ham)
  File "/Users/mschuylerm/miniconda3/envs/mpsrnn/lib/python3.9/site-packages/netket/vqs/mc/mc_state/state.py", line 599, in expect_and_grad
    return expect_and_grad(
  File "/Users/mschuylerm/miniconda3/envs/mpsrnn/lib/python3.9/site-packages/plum/function.py", line 393, in __call__
    return _convert(method(*args, **kw_args), return_type)
  File "/Users/mschuylerm/miniconda3/envs/mpsrnn/lib/python3.9/site-packages/netket/vqs/base.py", line 395, in expect_and_grad
    return expect_and_grad(
  File "/Users/mschuylerm/miniconda3/envs/mpsrnn/lib/python3.9/site-packages/plum/function.py", line 393, in __call__
    return _convert(method(*args, **kw_args), return_type)
  File "/Users/mschuylerm/miniconda3/envs/mpsrnn/lib/python3.9/site-packages/plum/function.py", line 56, in f_renamed
    return f(*args, **kw_args)
  File "/Users/mschuylerm/miniconda3/envs/mpsrnn/lib/python3.9/site-packages/netket/vqs/mc/mc_state/expect_grad_chunked.py", line 42, in expect_and_grad_nochunking
    return expect_and_grad(vstate, operator, use_covariance, *args, **kwargs)
  File "/Users/mschuylerm/miniconda3/envs/mpsrnn/lib/python3.9/site-packages/plum/function.py", line 393, in __call__
    return _convert(method(*args, **kw_args), return_type)
  File "/Users/mschuylerm/miniconda3/envs/mpsrnn/lib/python3.9/site-packages/plum/function.py", line 56, in f_renamed
    return f(*args, **kw_args)
  File "/Users/mschuylerm/miniconda3/envs/mpsrnn/lib/python3.9/site-packages/netket/vqs/mc/mc_state/expect_grad.py", line 56, in expect_and_grad_covariance
    Ō, Ō_grad = expect_and_forces(vstate, Ô, mutable=mutable)
  File "/Users/mschuylerm/miniconda3/envs/mpsrnn/lib/python3.9/site-packages/plum/function.py", line 393, in __call__
    return _convert(method(*args, **kw_args), return_type)
  File "/Users/mschuylerm/miniconda3/envs/mpsrnn/lib/python3.9/site-packages/netket/vqs/mc/mc_state/expect_forces.py", line 51, in expect_and_forces
    Ō, Ō_grad, new_model_state = forces_expect_hermitian(
  File "/Users/mschuylerm/miniconda3/envs/mpsrnn/lib/python3.9/site-packages/netket/vqs/mc/mc_state/expect_forces.py", line 84, in forces_expect_hermitian
    O_loc = local_value_kernel(
  File "/Users/mschuylerm/miniconda3/envs/mpsrnn/lib/python3.9/site-packages/netket/vqs/mc/kernels.py", line 71, in local_value_kernel_jax
    logpsi_σp = logpsi(pars, σp)
  File "/Users/mschuylerm/miniconda3/envs/mpsrnn/lib/python3.9/site-packages/netket/utils/jax.py", line 76, in maybe_scalar_fun
    res = apply_fun(pars, xb, *args, **kwargs)
  File "/Users/mschuylerm/miniconda3/envs/mpsrnn/lib/python3.9/site-packages/netket/vqs/mc/mc_state/state.py", line 197, in <lambda>
    lambda model, pars, x, **kwargs: model.apply(pars, x, **kwargs),
  File "/Users/mschuylerm/Documents/Github/mps-rnn/models/mps.py", line 153, in __call__
    return _call(self, inputs)
  File "/Users/mschuylerm/miniconda3/envs/mpsrnn/lib/python3.9/site-packages/plum/function.py", line 393, in __call__
    return _convert(method(*args, **kw_args), return_type)
  File "/Users/mschuylerm/Documents/Github/mps-rnn/models/mps.py", line 251, in _call_single
    (h, log_psi, _), _ = lax.scan(scan_func, (h, log_psi, counts), model.reorder_idx)
  File "/Users/mschuylerm/Documents/Github/mps-rnn/models/mps.py", line 235, in scan_func
    p_i, h, counts = _update_h_p_single(model, inputs, i, h, counts)
  File "/Users/mschuylerm/Documents/Github/mps-rnn/models/mps.py", line 185, in _update_h_p_single
    h = _get_new_h(model, h, i)
  File "/Users/mschuylerm/Documents/Github/mps-rnn/models/mps.py", line 163, in _get_new_h
    h = jnp.einsum("a,iab->ib", h, model.M[i])
  File "/Users/mschuylerm/miniconda3/envs/mpsrnn/lib/python3.9/site-packages/jax/_src/numpy/lax_numpy.py", line 3250, in einsum
    operands, contractions = contract_path(
  File "/Users/mschuylerm/miniconda3/envs/mpsrnn/lib/python3.9/site-packages/opt_einsum/contract.py", line 228, in contract_path
    raise ValueError("Einstein sum subscript '{}' does not contain the "
ValueError: Einstein sum subscript 'a' does not contain the correct number of indices for operand 0.

I looked at the dimensions of the objects being contracted and made some very naive adjustments to see if it was actually a bug having to do with the contractions, but that seemed to create a bread crumb trail where one fix lead to some contraction error somewhere else and so on. This is the very initial step of this whole repo, so I suspect there is something deeper going on here (and that the issue is not actually with these contractions).

Something else that might be noteworthy is that before the VMC begins running, I get two user warnings that read:
UserWarning: `plum.Val` is deprecated and will be removed in a future version. Please use `typing.Literal` instead.
and
UserWarning: Explicitly requested dtype <class 'numpy.int64'> requested in astype is not available, and will be truncated to dtype int32. To enable more dtypes, set the jax_enable_x64 configuration option or the JAX_ENABLE_X64 shell environment variable.

PhilipVinc · 2023-09-28T16:18:14Z

PhilipVinc
Sep 28, 2023
Maintainer

Netket had a few changes from that time, notably the ordering of the dimensions of the samples.

You should probably install netket's old version from Dian's branch pip install git+https://github.com/wdphy16/netket.git@jax_operators.

(the plum error is not important, the other error is because you did not load netket first and you'll be running simulations in single precision and not double)

0 replies

wdphy16 · 2023-09-28T18:45:48Z

wdphy16
Sep 28, 2023
Collaborator

Hi @mschuylermoss , there're indeed some changes of package versions over the past few months. I've updated the pinned versions in that repo, now you can try to pull the repo and install the dependencies again using pip install -r requirements.txt.

If there're still problems, you can open an issue in that repo.

@PhilipVinc The problem is that there is an unexpected dimension in a very deep vmap + scan, and I guess it's related to the transpose of samples. I think it's time to finish my PR of RNN and do some thorough testing. Last year we were waiting for Flax's RNN API to be stable, now it's already stable and I can refactor my code on that. Next week I'll spend some time on this.

0 replies

mschuylermoss · 2023-09-28T19:13:31Z

mschuylermoss
Sep 28, 2023
Author

Thank you both for the quick responses-- everything seems to be working for me now! If anything else comes up, I will address it in the original repo.

0 replies

PhilipVinc · 2023-09-28T19:28:57Z

PhilipVinc
Sep 28, 2023
Maintainer

@wdphy16 yes, I had thought about that a while back but forgot to ping you about it :)
Ping me when you've updated your PR. If for any reason you need to bump flax version do so with no issue.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NetKet

contraction error when running mps-rnn repo #1595

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

NetKet

contraction error when running mps-rnn repo #1595

mschuylermoss Sep 28, 2023

Replies: 4 comments

PhilipVinc Sep 28, 2023 Maintainer

wdphy16 Sep 28, 2023 Collaborator

mschuylermoss Sep 28, 2023 Author

PhilipVinc Sep 28, 2023 Maintainer

mschuylermoss
Sep 28, 2023

PhilipVinc
Sep 28, 2023
Maintainer

wdphy16
Sep 28, 2023
Collaborator

mschuylermoss
Sep 28, 2023
Author

PhilipVinc
Sep 28, 2023
Maintainer