You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 29, 2022. It is now read-only.
I'm running the nmt_small.yml model on a cluster with 3 Titan Black GPUs with TF version 1.0.1, and consistently running into issues around step 900. The issue actually seems to have to do with an evaluation that occurs between steps 900-1000.
INFO:tensorflow:Starting evaluation at 2017-03-26-16:21:57
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN Black, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX TITAN Black, pci bus id: 0000:83:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:2) -> (device: 2, name: GeForce GTX TITAN Black, pci bus id: 0000:84:00.0)
W tensorflow/core/framework/op_kernel.cc:993] Out of range: Reached limit of 1
[[Node: parallel_read/filenames/limit_epochs/CountUpTo = CountUpTo[T=DT_INT64, _class=["loc:@parallel_read/filenames/limit_epochs/epochs"], limit=1, _device="/job:localhost/replica:0/task:0/cpu:0"](parallel_read/filenames/limit_epochs/epochs)]]
W tensorflow/core/framework/op_kernel.cc:993] Out of range: Reached limit of 1
[[Node: parallel_read_1/filenames/limit_epochs/CountUpTo = CountUpTo[T=DT_INT64, _class=["loc:@parallel_read_1/filenames/limit_epochs/epochs"], limit=1, _device="/job:localhost/replica:0/task:0/cpu:0"](parallel_read_1/filenames/limit_epochs/epochs)]]
W tensorflow/core/framework/op_kernel.cc:993] Invalid argument: Tried to read from index 46 but array size is: 46
I am debugging but wanted to report in case it's an easy fix for someone more familiar with the code. I have attached a more complete error log in case it's helpful.
The text was updated successfully, but these errors were encountered:
train_stops_step_901.txt
I'm running the
nmt_small.yml
model on a cluster with 3 Titan Black GPUs with TF version 1.0.1, and consistently running into issues around step 900. The issue actually seems to have to do with an evaluation that occurs between steps 900-1000.I am debugging but wanted to report in case it's an easy fix for someone more familiar with the code. I have attached a more complete error log in case it's helpful.
The text was updated successfully, but these errors were encountered: