New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errors #8
Comments
This is a known tensorboard issue. Deinstall tensorflow and then install tensorboard 2.2 (I think 2.3 has an unfixed bug). On GPU: I have not tested that recently. If you know of any free CI with GPUs, I would be very keen to learn about it. |
i managed to get it working by reinstalling everything.
if by cl you mean cloud then colab should be fine. i having another "problem", i cant find a way to increase the training batches per epoch which is stuck at 5 (even though the error say 4:
i have read tried reading the documentation of pytorch_lightning but there is only min_epoch or way to limit using limit_train_batches the problem is that the loss stop getting lower after a few epoch (too early comparing it to other models) and i think it may be caused by the too low train_batches and the results seems almost random every time due to this. |
I guess there are 4 training batches as the training data loader will drop the last one. Probably, your data is just super tiny or your batch size is huge. You should be able to fix your early stopping problem by increasing the patience on the pytorch lightning early stopping callback. Anyways, DL normally excels at larger data. I updated the docs and now there is are some better tutorials available. BTW: I will run stuff on the GPU tomorrow and will fix the remaining issues. By CI, I mean a continuous integration system such as CircleCI to test PRs automatically. I might eventually settle for self hosted GPUs and just fork out the money but would prefer a free method. Colab is great but not made for that. |
It’s a very simple period function it shouldn’t have problem but the length is definitely very small. I’ll try increasing the length of the data frame or reducing the batch_size itself. |
using a way smaller bench size return a way better result. |
Have you checked that the training works on GPU for you? I am always grateful for feedback! |
I haven’t got time. |
not working with the last update from pip :
|
Thanks for the bug - have not run into this yet. Interesting that this seems to happen after 16 epochs. It's definitely time to employ GPUs for testing - I will give this a shot next week. |
@lorrp1 Are you sure to be on the newest version? The line 447 the traceback cites seems to be 449 on the master (https://github.com/jdb78/pytorch-forecasting/blob/master/pytorch_forecasting/models/temporal_fusion_transformer/__init__.py#L449) |
by "last from pip" i meant to write pypi.
also its pretty strange that the gpu=0 version ended at the 32 epoch (without errors) |
I think the finishing without errors is due to early stopping which is expected behaviour. On the other error, I believe the root cause is this:
Let me know if this fixes it for you. |
Fixed in #27 |
it works adding tft.to("cpu") before .predict |
Hello, im stallion.py but im getting few problems:
using gpu = 1 in the trainer return
using only the cpu it works but return this error:
thanks
The text was updated successfully, but these errors were encountered: