Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix vectors data on GPU #7626

Merged
merged 4 commits into from
Apr 19, 2021
Merged

Fix vectors data on GPU #7626

merged 4 commits into from
Apr 19, 2021

Conversation

svlandeg
Copy link
Member

Description

vectors.data can be of type numpy.ndarray or cupy.ndarray, but recent GPU testing highlighted some trouble when using cupy. This PR makes the code more general by relying on current ops, and also avoids using the implicit conversion to scalar by numpy (which cupy doesn't do).

Tested locally, this seemed to fix the trouble from PR #7293 🤞

Types of change

bug fix

Checklist

  • I have submitted the spaCy Contributor Agreement.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

@svlandeg svlandeg added bug Bugs and behaviour differing from documentation feat / vectors Feature: Word vectors and similarity gpu Using spaCy on GPU labels Mar 31, 2021
@svlandeg svlandeg mentioned this pull request Mar 31, 2021
3 tasks
@adrianeboyd
Copy link
Contributor

I can't seem to add this as a suggestion because it's too far from the existing edits, but after running the GPU tests with all the suggested changes I still run into problems with Vectors.add, I think this PR also needs something like this:

         if vector is not None:
+            xp = get_array_module(self.data)
+            vector = xp.asarray(vector)
             self.data[row] = vector

@svlandeg
Copy link
Member Author

svlandeg commented Apr 1, 2021

Hm, maybe? I tried to ensure that that wouldn't be necessary in the edits, but we can add it for robustness anyway.

@adrianeboyd
Copy link
Contributor

Maybe I did something wrong, but there are a number of tests that add numpy arrays as the new vector row, which leads to ValueError: non-scalar numpy.ndarray cannot be used for fill, like this:

def test_doc_api_has_vector():
vocab = Vocab()
vocab.reset_vectors(width=2)
vocab.set_vector("kitten", vector=numpy.asarray([0.0, 2.0], dtype="f"))
doc = Doc(vocab, words=["kitten"])
assert doc.has_vector

I don't think we want to require people using Vector.add to have to provide the new vector with the right ops?

@svlandeg
Copy link
Member Author

svlandeg commented Apr 1, 2021

Oh shoot, I thought I tracked down most of those tests and changed them to use current ops.
Anyway I already made the change you suggested - does that fix it?

spacy/vocab.pyx Outdated Show resolved Hide resolved
@honnibal honnibal merged commit 05bdbe2 into explosion:master Apr 19, 2021
@svlandeg svlandeg deleted the fix/vectors_gpu branch April 19, 2021 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bugs and behaviour differing from documentation feat / vectors Feature: Word vectors and similarity gpu Using spaCy on GPU
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants