-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected (almost random) parses and no sentence segmentation when using GPU on Windows #4724
Comments
Hey, So I'm not 100% sure what the problem is here, or actually which part you're identifying as an issue. You've raised 2x2x2 factors in an unhelpful way: parse quality vs speed, spaCy and StanfordNLP, Running those sentences with the Finally, it's hard not to bristle at the title of your issue. Like, cmon! |
@honnibal Ho there, no need to take offence. I use abysmal here to highlight how unexpected this is, and to show that something seems off. Perhaps that was not the right word to choose. My apologies. Have you taken a look at the examples that I give? They are abysmal, there is no other word for it. They are not "just a selection" either, the whole output looks like this. I'm sure that if you look at it objectively you cannot but agree that they are. And I am saying that because I have never seen spaCy behave in this way. That being said, this is the first time I'm using I am raising this issue not because I think spaCy is bad, of course. I have shown time and time again that I love it and use it almost daily. I raise this issue because in this case something seems wrong. Just look at the examples. Those are not reasonable at all and I think something must be wrong but I can't pinpoint what. Again, please don't feel offended as that was not at all the goal of my post. I appreciate all the work you guys have done (as I've said already numerous times). I call it abysmal because this is not the output that spaCy would normally give and something seems wrong. Especially if you also take the failure of segmentation in consideration. I hope we're good. Please re-open. I'll improve my OP. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Following a discussion on Twitter about spaCy vs stanfordnlp vs spacy-stanfordnlp, I figured I'd do a speed comparison. The task is putting 1K lines through the whole pipeline. The test includes
nlp()
as well asnlp.pipe()
.The testing scripts and data are available in this test repo: https://github.com/BramVanroy/parsers-test
I was extremely surprised by the big differences in speed (spaCy being much faster than stanfordnlp) so I went digging. I found that for the whole input in
nlp()
only one doc.sent sentence was created. In other words, no sentence segmentation was done. In addition, I saw extremely bad parsing results from spaCy. An example from usingreadlines()
+nlp.pipe()
:This is not how I've seen spaCy behave in the past (non-GPU version). The parses seem almost random, as if the model is not initialized correctly. Also note how basically all dependencies are
dep
. Again, the posted repo above may prove some insight. After more digging:cupy-cuda92
andspacy[cuda92]
. Unfortunately, this leads to the same result.spacy[cuda92]==2.1.8
to check whether the issue also arises in spaCy 2.1. Even though installation works, I get an error propagating from thinc (dependency issue?). Possibly related: module 'thinc_gpu_ops' has no attribute 'hash' thinc#79Info about spaCy
The text was updated successfully, but these errors were encountered: