copy_task.py sample fails. #45

chrispugmire · 2019-07-25T02:21:54Z

testing with command line:

python copy_task.py -cuda 0 -lr 0.001 -rnn_type lstm -nlayer 1 -nhlayer 2 -dropout 0 -mem_slot 32 -batch_size 1000 -optim adam -sequence_max_length 8 -iterations 100

I get multiple errors when it finishes, first on the generate_data call which has undefined parameters:

input_data, target_output, loss_weights = generate_data(random_length, input_size)

NameError: name 'input_size' is not defined

And then after fixing that I get:
output = output[:, -1, :].sum().data.cpu().numpy()[0]
IndexError: too many indices for array

Looks like that bit of code hasn't been used. I have tried to fix it but I'm unclear of the solution for the second issue as I'm new to pytorch, thanks in advance for any fixes.

ChrisP.

The text was updated successfully, but these errors were encountered:

ixaxaar · 2019-07-25T06:01:14Z

Okay so if you're getting errors like:

WARNING:root:Setting up a new session...
Exception in user code:
------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/urllib3/connection.py", line 160, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw)
  File "/usr/lib/python3.7/site-packages/urllib3/util/connection.py", line 80, in create_connection
    raise err
  File "/usr/lib/python3.7/site-packages/urllib3/util/connection.py", line 70, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/urllib3/connectionpool.py", line 603, in urlopen
    chunked=chunked)
  File "/usr/lib/python3.7/site-packages/urllib3/connectionpool.py", line 355, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.7/http/client.py", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.7/http/client.py", line 1275, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.7/http/client.py", line 1224, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.7/http/client.py", line 1016, in _send_output
    self.send(msg)
  File "/usr/lib/python3.7/http/client.py", line 956, in send
    self.connect()
  File "/usr/lib/python3.7/site-packages/urllib3/connection.py", line 183, in connect
    conn = self._new_conn()
  File "/usr/lib/python3.7/site-packages/urllib3/connection.py", line 169, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7ff8f661e630>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/usr/lib/python3.7/site-packages/urllib3/connectionpool.py", line 641, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/lib/python3.7/site-packages/urllib3/util/retry.py", line 399, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=8097): Max retries exceeded with url: /env/main (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff8f661e630>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/visdom/__init__.py", line 548, in _send
    data=json.dumps(msg),
  File "/usr/lib/python3.7/site-packages/requests/sessions.py", line 581, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/usr/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3.7/site-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8097): Max retries exceeded with url: /env/main (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff8f661e630>: Failed to establish a new connection: [Errno 111] Connection refused'))
ERROR:visdom:[Errno 111] Connection refused

These are cause visdom server has not been started.

Other than that, there is an exception at:

loss_value = loss.data[0]

  File "./copy_task.py", line 217, in <module>
    loss_value = loss.data[0]
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

This is apparently something that has changed in pytorch transition from 0.3 to 0.4. The fix is simply using loss.item() instead of loss.data[0] as stated in the error.

ixaxaar · 2019-07-25T06:03:25Z

BTW if you're not interested in plotting, you can ignore visdom errors.

fix #45

chrispugmire · 2019-07-25T23:40:45Z

OH wow, I so appreciate that you took a look at this. Sorry I was probably not clear enough, the problem only occurs when you add -iterations 100 so that the task finishes, then it runs the code to test the model and that code is faulty as input_size isn't defined, and the function doesn't return loss_weights.

python copy_task.py -cuda 0 -optim rmsprop -batch_size 32 -mem_slot 64 -iterations 100

Iteration 0/100
Traceback (most recent call last):
  File "copy_task.py", line 366, in <module>
    input_data, target_output, loss_weights = generate_data(random_length, input_size)
NameError: name 'input_size' is not defined

So I changed as follows: (I'm assuming batch size should be 1 for a test)
input_data, target_output = generate_data(1,random_length, args.input_size)

But that fails thusly:

File "copy_task.py", line 371, in <module>
  output, (chx, mhx, rv) = rnn(input_data, (None, mhx, None), reset_experience=True, pass_through_memory=True)
File "C:\Users\chrisp\Anaconda3\envs\fastai\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
  result = self.forward(*input, **kwargs)
File "C:\Users\chrisp\Anaconda3\envs\fastai\lib\site-packages\dnc\dnc.py", line 235, in forward
  inputs = [T.cat([input[:, x, :], last_read], 1) for x in range(max_length)]
File "C:\Users\chrisp\Anaconda3\envs\fastai\lib\site-packages\dnc\dnc.py", line 235, in <listcomp>
  inputs = [T.cat([input[:, x, :], last_read], 1) for x in range(max_length)]
RuntimeError: Expected object of backend CPU but got backend CUDA for sequence element 1 in sequence argument at position #1 'tensors'

So changed like this: (making batch_size the same as the training)
input_data, target_output = generate_data(args.batch_size,random_length, args.input_size)

Then it fails thusly:

Iteration 0/100
Traceback (most recent call last):
  File "copy_task.py", line 371, in <module>
    output, (chx, mhx, rv) = rnn(input_data, (None, mhx, None), reset_experience=True, pass_through_memory=True)
  File "C:\Users\chrisp\Anaconda3\envs\fastai\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\chrisp\Anaconda3\envs\fastai\lib\site-packages\dnc\dnc.py", line 235, in forward
    inputs = [T.cat([input[:, x, :], last_read], 1) for x in range(max_length)]
  File "C:\Users\chrisp\Anaconda3\envs\fastai\lib\site-packages\dnc\dnc.py", line 235, in <listcomp>
    inputs = [T.cat([input[:, x, :], last_read], 1) for x in range(max_length)]
RuntimeError: Expected object of backend CPU but got backend CUDA for sequence element 1 in sequence argument at position #1 'tensors'

At which point I'm completely lost :-)

Actually, after a bit more poking, is that code just copied from the adding_task source and completely unrelated to the copy_task... it just looks very wrong?

Thanks in advance for your time! I realize I don't know what I'm doing here :-)

ixaxaar closed this as completed in 79dc405 Jul 25, 2019

ixaxaar added a commit that referenced this issue Jul 25, 2019

Merge pull request #46 from ixaxaar/fix_tasks

016b541

fix #45

ixaxaar mentioned this issue Aug 16, 2019

#45 fix copy task generalization code #47

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

copy_task.py sample fails. #45

copy_task.py sample fails. #45

chrispugmire commented Jul 25, 2019

ixaxaar commented Jul 25, 2019

ixaxaar commented Jul 25, 2019

chrispugmire commented Jul 25, 2019 •

edited

copy_task.py sample fails. #45

copy_task.py sample fails. #45

Comments

chrispugmire commented Jul 25, 2019

ixaxaar commented Jul 25, 2019

ixaxaar commented Jul 25, 2019

chrispugmire commented Jul 25, 2019 • edited

chrispugmire commented Jul 25, 2019 •

edited