Skip to content
This repository has been archived by the owner on Dec 29, 2022. It is now read-only.

seg fault when running nose tests #43

Closed
orsonadams opened this issue Mar 14, 2017 · 14 comments · Fixed by #61
Closed

seg fault when running nose tests #43

orsonadams opened this issue Mar 14, 2017 · 14 comments · Fixed by #61

Comments

@orsonadams
Copy link

Installed according to the guide on the contribution page.

Im running:

Ubuntu 14.04
tensorflow 1.0.0
Python 3.5.1 (default, Mar 14 2017, 15:32:51)

@dennybritz
Copy link
Contributor

Can you post the complete log?

Are other Tensorflow models running correctly? In my experience segfaults typically happen with GPU issues and/or not enough memory.

@dennybritz
Copy link
Contributor

dennybritz commented Mar 14, 2017

Could also be this Tensorflow bug that I ran into before: tensorflow/tensorflow#6968 (comment)

@orsonadams
Copy link
Author

orsonadams commented Mar 14, 2017

Hey Denny thanks, I am strongly inclined to believe that its a memory error.

tensorflow: INFO: Creating AttentionLayerBahdanau in mode=eval
tensorflow: INFO: 
AttentionLayerBahdanau: {num_units: 10}

tensorflow: INFO: Creating AttentionDecoder in mode=eval
tensorflow: INFO: 
AttentionDecoder:
  max_decode_length: 100
  rnn_cell:
    cell_class: GRUCell
    cell_params: {num_units: 8}
    dropout_input_keep_prob: 1.0
    dropout_output_keep_prob: 1.0
    num_layers: 1
    residual_combiner: add
    residual_connections: false
    residual_dense: false

tensorflow: INFO: Creating ZeroBridge in mode=eval
tensorflow: INFO: 
ZeroBridge: {}

--------------------- >> end captured logging << ---------------------

----------------------------------------------------------------------
Ran 135 tests in 109.019s

FAILED (SKIP=6, errors=17)
Segmentation fault (core dumped)

@dennybritz
Copy link
Contributor

This may be a Tensorflow bug and not related to this code, can you try the suggestion here: tensorflow/tensorflow#6968 (comment)

@orsonadams
Copy link
Author

Thanks! I'm not sure I needed that, now I don't get the seq fault, but some of the tests are failing. Im going to close this if I should expect failing tests.

--------------------- >> end captured logging << ---------------------

----------------------------------------------------------------------
Ran 135 tests in 62.252s

FAILED (SKIP=6, errors=17)

@dennybritz
Copy link
Contributor

You should not expect failing test since they are all passing on the CI server. Can you post an error message?

@orsonadams
Copy link
Author

Dropped in on the tests with nosetests -v -pdb

First error:

/seq2seq/seq2seq/test/models_test.py(154)test_infer()
pred_len = predictions_["predicted_ids"].shape[1]

(pdb) print(predictions.keys())
>>dict_keys(['features.source_ids', 'features.source_len',
 'labels.target_ids', 'features.source_tokens', 'labels.target_len', 'labels.target_tokens'])

So "predicted_ids" isn't a key is the predictions_ dict

The next error is similar to the first and the third error is below:

/seq2seq/seq2seq/test/models_test.py(141)test_train()
-> np.testing.assert_array_equal(predictions_["logits"].shape, [
(Pdb) print(predictions_["logits"])
>>keyError: 'logits'

@dennybritz
Copy link
Contributor

Hm, interesting. The tests are passing for me locally as well as on the CI server and on my Linux machine. Can you upload/paste the full log somewhere?

@orsonadams
Copy link
Author

Here is a link to the the logs on dropbox

@dennybritz
Copy link
Contributor

Are you using anaconda by any chance?

@dennybritz
Copy link
Contributor

It seems like multiple people are having this same issue, but I still haven't been able to reproduce this on any of my machines. I tried Python 3.5, 2.7, CPU and GPU. It must be related to the code here:

https://github.com/google/seq2seq/blob/master/seq2seq/models/seq2seq_model.py#L47

@dennybritz
Copy link
Contributor

dennybritz commented Mar 15, 2017

Okay, I think I found it. It seems to actually be a python bug in some versions: http://bugs.python.org/issue24931

@dennybritz
Copy link
Contributor

This should fix it, I hope: #61

Please open this issue again if it doesn't fix it.

@orsonadams
Copy link
Author

orsonadams commented Mar 16, 2017

This helped! Thanks, now on to helping with those docs!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants