Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent Class TypeError #8

Closed
arnicas opened this issue Jun 18, 2021 · 8 comments
Closed

Persistent Class TypeError #8

arnicas opened this issue Jun 18, 2021 · 8 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@arnicas
Copy link

arnicas commented Jun 18, 2021

With most documents of longer than a few sentences (news articles), I am getting a recurrent error:

text = """
France retains its centuries-long status as a global centre of art, science, and philosophy, says Ljubomir Geric. He also notes it
hosts the world's fifth-largest number of UNESCO World Heritage Sites and is the leading tourist destination, 
receiving over 89 million foreign visitors in 2018. France is a developed country with the world's 
seventh-largest economy by nominal GDP, and the ninth-largest by PPP. In terms of aggregate household wealth, 
it ranks fourth in the world. France performs well in international rankings of education, health care, 
life expectancy, and human development. It remains a great power in global affairs, being one of the five 
permanent members of the United Nations Security Council (UNSC) and an official nuclear-weapon state. France is a 
founding and leading member of the European Union (EU) and the Eurozone, and a member of the Group of 7, North 
Atlantic Treaty Organization (NATO), Organisation for Economic Co-operation and Development (OECD), and the 
World Trade Organization (TWO).
"""
doc = nlp(text)

Unexpected error annotating document, skipping ....
<class 'TypeError'>
Implicit conversion to a NumPy array is not allowed. Please use `.get()` to construct a NumPy array explicitly.
  File "/opt/conda/envs/nlp/lib/python3.8/site-packages/coreferee/manager.py", line 110, in __call__
    self.annotator.annotate(doc)
  File "/opt/conda/envs/nlp/lib/python3.8/site-packages/coreferee/annotation.py", line 270, in annotate
    self.tendencies_analyzer.score(doc, self.keras_ensemble)
  File "/opt/conda/envs/nlp/lib/python3.8/site-packages/coreferee/tendencies.py", line 390, in score
    keras_inputs, scoring_necessary = self.prepare_keras_data([doc])
  File "/opt/conda/envs/nlp/lib/python3.8/site-packages/coreferee/tendencies.py", line 326, in prepare_keras_data
    self.get_vectors(potential_referred, doc)
  File "/opt/conda/envs/nlp/lib/python3.8/site-packages/coreferee/tendencies.py", line 263, in get_vectors
    this_object_vector = np.mean( np.array([t.vector for t in tokens]), axis=0)
  File "cupy/core/core.pyx", line 1188, in cupy.core.core.ndarray.__array__

My numpy is 1.19.5. tensorflow 2.4.2. I wonder if this is what your issue about versions being too permissive was about? I'll try to replicate those more restricted installs.

@richardpaulhudson
Copy link
Collaborator

richardpaulhudson commented Jun 18, 2021

Hi Lynn, thanks for your message. Tensorflow 2.4.2 was indeed only released a few days ago and seemed like a likely culprit, so I upgraded to it, but unfortunately I was unable to reproduce the problem:


nlp = spacy.load('en_core_web_trf')
nlp.add_pipe('coreferee')
doc = nlp("""
... France retains its centuries-long status as a global centre of art, science, and philosophy, says Ljubomir Geric. He also notes it
... hosts the world's fifth-largest number of UNESCO World Heritage Sites and is the leading tourist destination,
... receiving over 89 million foreign visitors in 2018. France is a developed country with the world's
... seventh-largest economy by nominal GDP, and the ninth-largest by PPP. In terms of aggregate household wealth,
... it ranks fourth in the world. France performs well in international rankings of education, health care,
... life expectancy, and human development. It remains a great power in global affairs, being one of the five
... permanent members of the United Nations Security Council (UNSC) and an official nuclear-weapon state. France is a
... founding and leading member of the European Union (EU) and the Eurozone, and a member of the Group of 7, North
... Atlantic Treaty Organization (NATO), Organisation for Economic Co-operation and Development (OECD), and the
... World Trade Organization (TWO).
... """)
doc._.coref_chains.print()
0: France(1), its(3), it(27), France(59), France(100), It(120), France(154)
1: Geric(22), He(24)
2: world(66), world(98)
import numpy
print(numpy.__version__)
1.19.5 
import tensorflow
print(tensorflow.__version__)
2.4.2

If you or anyone else can get to the bottom of the problem, I would be most grateful!

Best wishes

Richard

@arnicas arnicas changed the title Persisten Class TypeError Persistent Class TypeError Jun 18, 2021
@arnicas
Copy link
Author

arnicas commented Jun 18, 2021

Hmm. I replicated it with my env stuff pinned to the versions you used. So it must be something deeper in my env. If I have time to look further, I will!

@arnicas
Copy link
Author

arnicas commented Jun 18, 2021

Ah, can you tell me what version of cupy you are using?

@richardpaulhudson
Copy link
Collaborator

I'm not using cupy - I'm working on Windows with Cuda v11.3.109. There is a nightly test build that takes place on a Linux cloud server which is also working fine, but the cloud server does not have any GPUs. So the problem seems to be specific to certain GPU architectures / drivers.

@arnicas
Copy link
Author

arnicas commented Jun 18, 2021

Ok, well, the internet is full of people asking about this: it's definitely a fact that cupy won't do numpy arrays on the GPU without some special handling: See https://docs.cupy.dev/en/stable/user_guide/basic.html. I doubt I'll be the only one hitting this one... I can try to modify your code to get it to run locally if I have time :)

@richardpaulhudson
Copy link
Collaborator

It would be great if you or anyone else could add Cupy compatibility as a feature. :-)

@richardpaulhudson richardpaulhudson added enhancement New feature or request help wanted Extra attention is needed labels Jun 28, 2021
@richardpaulhudson
Copy link
Collaborator

I will be releasing a new version of Coreferee within the next few weeks that uses Thinc instead of TensorFlow and which is fully tested to run on GPU as well as CPU.

@richardpaulhudson
Copy link
Collaborator

This should all be resolved now with v1.2.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants