Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed copying input tensor #20

Closed
csgomezg0 opened this issue Sep 13, 2021 · 4 comments
Closed

Failed copying input tensor #20

csgomezg0 opened this issue Sep 13, 2021 · 4 comments

Comments

@csgomezg0
Copy link

csgomezg0 commented Sep 13, 2021

Hi, I get data for language spanish through of files *.conll, I tranform this data *.conll in format *.ann, and I try train coreferee with this data, after of change the rules for my langage. This data are 3.000 files approximately, but when I try train this model I get a error:

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/creangel/info/image/coreferee/__main__.py", line 51, in <module>
    TrainingManager(
  File "/home/creangel/info/image/coreferee/training/train.py", line 409, in train_models
    self.train_model(config_entry_name, config_entry, temp_log_file)
  File "/home/creangel/info/image/coreferee/training/train.py", line 378, in train_model
    keras_ensemble = self.generate_keras_ensemble(
  File "/home/creangel/info/image/coreferee/training/train.py", line 219, in generate_keras_ensemble
    keras_history = model_generator.train_keras_model(training_docs, tendencies_analyzer,
  File "/home/creangel/info/image/coreferee/training/model.py", line 288, in train_keras_model
    #print('keras_inputs: ',keras_inputs)
  File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1134, in fit
    data_handler = data_adapter.get_data_handler(
  File "/usr/local/lib/python3.8/dist-packages/keras/engine/data_adapter.py", line 1383, in get_data_handler
    return DataHandler(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/keras/engine/data_adapter.py", line 1138, in __init__
    self._adapter = adapter_cls(
  File "/usr/local/lib/python3.8/dist-packages/keras/engine/data_adapter.py", line 230, in __init__
    x, y, sample_weights = _process_tensorlike((x, y, sample_weights))
  File "/usr/local/lib/python3.8/dist-packages/keras/engine/data_adapter.py", line 1031, in _process_tensorlike
    inputs = tf.nest.map_structure(_convert_numpy_and_scipy, inputs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/nest.py", line 869, in map_structure
    structure[0], [func(*x) for x in entries],
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/nest.py", line 869, in <listcomp>
    structure[0], [func(*x) for x in entries],
  File "/usr/local/lib/python3.8/dist-packages/keras/engine/data_adapter.py", line 1026, in _convert_numpy_and_scipy
    return tf.convert_to_tensor(x, dtype=dtype)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py", line 206, in wrapper
    return target(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 1430, in convert_to_tensor_v2_with_dispatch
    return convert_to_tensor_v2(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 1436, in convert_to_tensor_v2
    return convert_to_tensor(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/profiler/trace.py", line 163, in wrapped
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 1566, in convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/tensor_conversion_registry.py", line 52, in _default_conversion_function
    return constant_op.constant(value, dtype, name=name)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/constant_op.py", line 271, in constant
    return _constant_impl(value, dtype, shape, name, verify_shape=False,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/constant_op.py", line 283, in _constant_impl
    return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/constant_op.py", line 308, in _constant_eager_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/constant_op.py", line 106, in convert_to_eager_tensor
    return ops.EagerTensor(value, ctx.device_name, dtype)

tensorflow.python.framework.errors_impl.InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.

I think that is the memory. I have a GPU of 8 GB.

Note: I try with 100 files from my data and the train of coreferee work very well, but with a
with 3000 or 200 files I get the error.

Maybe, someone know about this error and which is the solution?. Thanks.

@richardpaulhudson
Copy link
Collaborator

Hi Carlos,

from the output it looks as though you are not using the latest version of Coreferee. Please update your Python version to 3.9 and your spaCy and coreferee packages as well as the spaCy and coreferee models, then try again and let me know if the problem still occurs.

Best wishes,

Richard

@csgomezg0
Copy link
Author

I upgrade spacy to version 3.1.0 and coreferee, i have python 3.8.1, I cant upgrade python in the container because I don't have access of admin, but the error continue. I would know what specifications have the machine for train coreferee (GPU) and how many files you use for train?

@richardpaulhudson
Copy link
Collaborator

Hi Carlos, unfortunately only Python 3.9 is supported and there is no way the latest version of Coreferee will work with Python 3.8. I believe the problem you are having with tensorflow is specific to the previous version of Coreferee. Perhaps you can download Python 3.9 and install it as a local user?

There are no specific hardware requirements for training: the more hardware you have, the quicker training will be. Equally, the more training examples you have the better, although there is no specific minimum.

@csgomezg0
Copy link
Author

Thanks, I upgrade python (3.9) and tensorflow (2.5.0) and the train work. How many sentence use for training model in English, I know that your model in English use 393.564 words for train language English, but how many sentence represent it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants