Errors #8

lorrp1 · 2020-08-20T19:04:59Z

Hello, im stallion.py but im getting few problems:
using gpu = 1 in the trainer return

models/temporal_fusion_transformer/init.py", line 793, in _log_interpretation dim=0

using only the cpu it works but return this error:

AttributeError: module 'tensorflow._api.v1.io.gfile' has no attribute 'get_filesystem'
which apparently it is due to pytorch and tensorboard incompatibility but i managed to install pytorch-forecasting without problems so could you tell me what version of pythorch/tensorboard are you using?

thanks

jdb78 · 2020-08-20T19:12:52Z

This is a known tensorboard issue. Deinstall tensorflow and then install tensorboard 2.2 (I think 2.3 has an unfixed bug).

On GPU: I have not tested that recently. If you know of any free CI with GPUs, I would be very keen to learn about it.

lorrp1 · 2020-08-22T21:01:05Z

i managed to get it working by reinstalling everything.

On GPU: I have not tested that recently. If you know of any free CI with GPUs, I would be very keen to learn about it.

if by cl you mean cloud then colab should be fine.

i having another "problem", i cant find a way to increase the training batches per epoch which is stuck at 5 (even though the error say 4:

ValueError: val_check_interval (200) must be less than or equal to the number of the training batches (4). "

i have read tried reading the documentation of pytorch_lightning but there is only min_epoch or way to limit using limit_train_batches

the problem is that the loss stop getting lower after a few epoch (too early comparing it to other models) and i think it may be caused by the too low train_batches and the results seems almost random every time due to this.

jdb78 · 2020-08-22T22:08:41Z

I guess there are 4 training batches as the training data loader will drop the last one.

Probably, your data is just super tiny or your batch size is huge. You should be able to fix your early stopping problem by increasing the patience on the pytorch lightning early stopping callback. Anyways, DL normally excels at larger data.

I updated the docs and now there is are some better tutorials available.

BTW: I will run stuff on the GPU tomorrow and will fix the remaining issues. By CI, I mean a continuous integration system such as CircleCI to test PRs automatically. I might eventually settle for self hosted GPUs and just fork out the money but would prefer a free method. Colab is great but not made for that.

lorrp1 · 2020-08-22T22:19:23Z

It’s a very simple period function it shouldn’t have problem but the length is definitely very small. I’ll try increasing the length of the data frame or reducing the batch_size itself.

lorrp1 · 2020-08-23T19:23:16Z

using a way smaller bench size return a way better result.
i have seen the "fix gpu", im going to check soon if it works now.

jdb78 · 2020-08-25T22:31:27Z

Have you checked that the training works on GPU for you? I am always grateful for feedback!

lorrp1 · 2020-08-26T14:08:37Z

I haven’t got time.
But I’ll try as soon as possible.

lorrp1 · 2020-08-27T17:31:41Z

not working with the last update from pip :

Epoch 16: 100%|██████████| 27/Traceback (most recent call last):8.498, v_num=55]
  File "/home//Documents//TimeSeries/Multivariate/pytorchForecasting/randomTIme/stallion2.py", line 113, in <module>
    preds, index = tft.predict(val_dataloader, return_index=True, fast_dev_run=True)
  File "/home//.local/lib/python3.7/site-packages/pytorch_forecasting/models/base_model.py", line 505, in predict
    out = self(x)  # raw output is dictionary
  File "/home//.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home//.local/lib/python3.7/site-packages/pytorch_forecasting/models/temporal_fusion_transformer/__init__.py", line 447, in forward
    input_vectors[name] = emb(x_cat[..., self.hparams.x_categoricals.index(name)])
  File "/home//.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home//.local/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 126, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/home//.local/lib/python3.7/site-packages/torch/nn/functional.py", line 1814, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected object of device type cuda but got device type cpu for argument #3 'index' in call to _th_index_select

jdb78 · 2020-08-27T17:49:49Z

Thanks for the bug - have not run into this yet. Interesting that this seems to happen after 16 epochs. It's definitely time to employ GPUs for testing - I will give this a shot next week.

jdb78 · 2020-08-27T18:01:35Z

@lorrp1 Are you sure to be on the newest version? The line 447 the traceback cites seems to be 449 on the master (https://github.com/jdb78/pytorch-forecasting/blob/master/pytorch_forecasting/models/temporal_fusion_transformer/__init__.py#L449)

lorrp1 · 2020-08-27T18:53:08Z

by "last from pip" i meant to write pypi.
with the last from the master (0.2.4) same:


Epoch 48: 100%|███████████| 27Traceback (most recent call last):1.221, v_num=56]
  File "/home//Documents//TimeSeries/Multivariate/pytorchForecasting/randomTIme/stallion2.py", line 113, in <module>
    preds, index = tft.predict(val_dataloader, return_index=True, fast_dev_run=True)
  File "/home//.local/lib/python3.7/site-packages/pytorch_forecasting/models/base_model.py", line 541, in predict
    out = self(x)  # raw output is dictionary
  File "/home//.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home//.local/lib/python3.7/site-packages/pytorch_forecasting/models/temporal_fusion_transformer/__init__.py", line 449, in forward
    input_vectors[name] = emb(x_cat[..., self.hparams.x_categoricals.index(name)])
  File "/home//.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home//.local/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 126, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/home//.local/lib/python3.7/site-packages/torch/nn/functional.py", line 1814, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected object of device type cuda but got device type cpu for argument #3 'index' in call to _th_index_select

also its pretty strange that the gpu=0 version ended at the 32 epoch (without errors)

jdb78 · 2020-08-27T19:31:45Z

I think the finishing without errors is due to early stopping which is expected behaviour.

On the other error, I believe the root cause is this:

There is no issue with training because you fail at predicting the validation dataset after training (this explains why the the epoch is greater than 0)
The model lives on the GPU and you try to predict with a dataloader that outputs data on the CPU. There is a very easy manual fix. Just call tft.to("cpu") before calling its predict function.
There is a chance you can fix the error by installing the latest version of pytorch-lightning. I had to manually move the model to the GPU to reproduce your error.
This is actually a good catch because there might be use cases where you want to also predict on the GPU. I will add this in the next version of the package so data is moved to the GPU before being passed onto the model in case it lives on the GPU.

Let me know if this fixes it for you.

jdb78 · 2020-08-30T08:27:36Z

Fixed in #27

lorrp1 · 2020-09-03T15:37:56Z

it works adding tft.to("cpu") before .predict

jdb78 closed this as completed Sep 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Errors #8

Errors #8

lorrp1 commented Aug 20, 2020

jdb78 commented Aug 20, 2020

lorrp1 commented Aug 22, 2020 •

edited

jdb78 commented Aug 22, 2020

lorrp1 commented Aug 22, 2020 •

edited

lorrp1 commented Aug 23, 2020

jdb78 commented Aug 25, 2020

lorrp1 commented Aug 26, 2020

lorrp1 commented Aug 27, 2020 •

edited

jdb78 commented Aug 27, 2020

jdb78 commented Aug 27, 2020 •

edited

lorrp1 commented Aug 27, 2020 •

edited

jdb78 commented Aug 27, 2020 •

edited

jdb78 commented Aug 30, 2020

lorrp1 commented Sep 3, 2020

Errors #8

Errors #8

Comments

lorrp1 commented Aug 20, 2020

jdb78 commented Aug 20, 2020

lorrp1 commented Aug 22, 2020 • edited

jdb78 commented Aug 22, 2020

lorrp1 commented Aug 22, 2020 • edited

lorrp1 commented Aug 23, 2020

jdb78 commented Aug 25, 2020

lorrp1 commented Aug 26, 2020

lorrp1 commented Aug 27, 2020 • edited

jdb78 commented Aug 27, 2020

jdb78 commented Aug 27, 2020 • edited

lorrp1 commented Aug 27, 2020 • edited

jdb78 commented Aug 27, 2020 • edited

jdb78 commented Aug 30, 2020

lorrp1 commented Sep 3, 2020

lorrp1 commented Aug 22, 2020 •

edited

lorrp1 commented Aug 22, 2020 •

edited

lorrp1 commented Aug 27, 2020 •

edited

jdb78 commented Aug 27, 2020 •

edited

lorrp1 commented Aug 27, 2020 •

edited

jdb78 commented Aug 27, 2020 •

edited