Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copy_task.py sample fails. #45

Closed
chrispugmire opened this issue Jul 25, 2019 · 3 comments
Closed

copy_task.py sample fails. #45

chrispugmire opened this issue Jul 25, 2019 · 3 comments

Comments

@chrispugmire
Copy link

testing with command line:

python copy_task.py -cuda 0 -lr 0.001 -rnn_type lstm -nlayer 1 -nhlayer 2 -dropout 0 -mem_slot 32 -batch_size 1000 -optim adam -sequence_max_length 8 -iterations 100

I get multiple errors when it finishes, first on the generate_data call which has undefined parameters:

input_data, target_output, loss_weights = generate_data(random_length, input_size)

NameError: name 'input_size' is not defined

And then after fixing that I get:
output = output[:, -1, :].sum().data.cpu().numpy()[0]
IndexError: too many indices for array

Looks like that bit of code hasn't been used. I have tried to fix it but I'm unclear of the solution for the second issue as I'm new to pytorch, thanks in advance for any fixes.

ChrisP.

@ixaxaar
Copy link
Owner

ixaxaar commented Jul 25, 2019

Okay so if you're getting errors like:

WARNING:root:Setting up a new session...
Exception in user code:
------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/urllib3/connection.py", line 160, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw)
  File "/usr/lib/python3.7/site-packages/urllib3/util/connection.py", line 80, in create_connection
    raise err
  File "/usr/lib/python3.7/site-packages/urllib3/util/connection.py", line 70, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/urllib3/connectionpool.py", line 603, in urlopen
    chunked=chunked)
  File "/usr/lib/python3.7/site-packages/urllib3/connectionpool.py", line 355, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.7/http/client.py", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.7/http/client.py", line 1275, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.7/http/client.py", line 1224, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.7/http/client.py", line 1016, in _send_output
    self.send(msg)
  File "/usr/lib/python3.7/http/client.py", line 956, in send
    self.connect()
  File "/usr/lib/python3.7/site-packages/urllib3/connection.py", line 183, in connect
    conn = self._new_conn()
  File "/usr/lib/python3.7/site-packages/urllib3/connection.py", line 169, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7ff8f661e630>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/usr/lib/python3.7/site-packages/urllib3/connectionpool.py", line 641, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/lib/python3.7/site-packages/urllib3/util/retry.py", line 399, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=8097): Max retries exceeded with url: /env/main (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff8f661e630>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/visdom/__init__.py", line 548, in _send
    data=json.dumps(msg),
  File "/usr/lib/python3.7/site-packages/requests/sessions.py", line 581, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/usr/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3.7/site-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8097): Max retries exceeded with url: /env/main (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff8f661e630>: Failed to establish a new connection: [Errno 111] Connection refused'))
ERROR:visdom:[Errno 111] Connection refused

These are cause visdom server has not been started.

Other than that, there is an exception at:

loss_value = loss.data[0]
  File "./copy_task.py", line 217, in <module>
    loss_value = loss.data[0]
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

This is apparently something that has changed in pytorch transition from 0.3 to 0.4. The fix is simply using loss.item() instead of loss.data[0] as stated in the error.

@ixaxaar
Copy link
Owner

ixaxaar commented Jul 25, 2019

BTW if you're not interested in plotting, you can ignore visdom errors.

ixaxaar added a commit that referenced this issue Jul 25, 2019
@chrispugmire
Copy link
Author

chrispugmire commented Jul 25, 2019

OH wow, I so appreciate that you took a look at this. Sorry I was probably not clear enough, the problem only occurs when you add -iterations 100 so that the task finishes, then it runs the code to test the model and that code is faulty as input_size isn't defined, and the function doesn't return loss_weights.

python copy_task.py -cuda 0 -optim rmsprop -batch_size 32 -mem_slot 64 -iterations 100

Iteration 0/100
Traceback (most recent call last):
  File "copy_task.py", line 366, in <module>
    input_data, target_output, loss_weights = generate_data(random_length, input_size)
NameError: name 'input_size' is not defined

So I changed as follows: (I'm assuming batch size should be 1 for a test)
input_data, target_output = generate_data(1,random_length, args.input_size)

But that fails thusly:

File "copy_task.py", line 371, in <module>
  output, (chx, mhx, rv) = rnn(input_data, (None, mhx, None), reset_experience=True, pass_through_memory=True)
File "C:\Users\chrisp\Anaconda3\envs\fastai\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
  result = self.forward(*input, **kwargs)
File "C:\Users\chrisp\Anaconda3\envs\fastai\lib\site-packages\dnc\dnc.py", line 235, in forward
  inputs = [T.cat([input[:, x, :], last_read], 1) for x in range(max_length)]
File "C:\Users\chrisp\Anaconda3\envs\fastai\lib\site-packages\dnc\dnc.py", line 235, in <listcomp>
  inputs = [T.cat([input[:, x, :], last_read], 1) for x in range(max_length)]
RuntimeError: Expected object of backend CPU but got backend CUDA for sequence element 1 in sequence argument at position #1 'tensors'

So changed like this: (making batch_size the same as the training)
input_data, target_output = generate_data(args.batch_size,random_length, args.input_size)

Then it fails thusly:

Iteration 0/100
Traceback (most recent call last):
  File "copy_task.py", line 371, in <module>
    output, (chx, mhx, rv) = rnn(input_data, (None, mhx, None), reset_experience=True, pass_through_memory=True)
  File "C:\Users\chrisp\Anaconda3\envs\fastai\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\chrisp\Anaconda3\envs\fastai\lib\site-packages\dnc\dnc.py", line 235, in forward
    inputs = [T.cat([input[:, x, :], last_read], 1) for x in range(max_length)]
  File "C:\Users\chrisp\Anaconda3\envs\fastai\lib\site-packages\dnc\dnc.py", line 235, in <listcomp>
    inputs = [T.cat([input[:, x, :], last_read], 1) for x in range(max_length)]
RuntimeError: Expected object of backend CPU but got backend CUDA for sequence element 1 in sequence argument at position #1 'tensors'

At which point I'm completely lost :-)

Actually, after a bit more poking, is that code just copied from the adding_task source and completely unrelated to the copy_task... it just looks very wrong?

Thanks in advance for your time! I realize I don't know what I'm doing here :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants