New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finishing up Stanford Deprecation #2812
Comments
Hi, this is impressive analysis! I'll share some context. At the time NLTK's CoreNLP REST bindings ( However, to my surprise, NLTK's corenlp client has been used and even got occasional PRs, which suggest that there is value of having CoreNLP client that deeply integrated with NLTK. In my opinion, it is the integration that could bring real benefit. For example, once you are familiar with NLTK's dependency graph API, it's easy to use CoreNLP to get dependencies. Maybe, instead of having a completely custom client code, it would worth using Stanza to perform API calls and handle customization. I'm not entirely happy how customization is currently handled in NLTK's client, e.g. how default properties are handled. Ideally, Stanza should be used as much as possible, and NLTK's CoreNLP code would be a thin layer on top to provide a unified API and integration with other NLTK modules. I would also think of extending NLTK's documentation, so there is no need to refer to PR as documentation. Tests is another big topic, especially after pytest adoption. I should be able to find some time to help if needed. |
Thank you for the context! Beyond that, I am interested in updating some documentation to add the table above, however I'm not quite sure where it fits best. And regarding tests - I was looking into automatically downloading some of the third party tools prior to executing the CI tests. In fact, that is how I rediscovered all of these deprecated classes. |
That would be super useful. |
With this code above I get the error I executed the code under the folder How can I launch the StanfordCoreNLPServer? |
@ehsong the Stanford Segmenter is deprecated. Instead, you can use the from nltk.parse.corenlp import CoreNLPServer, CoreNLPParser
# This context manager syntax starts the server when the scope is entered,
# and stops the server when the scope is exited
with CoreNLPServer() as server:
parser = CoreNLPParser(server.url)
sentence = "This is my sentence, which I'd like to get parsed."
tokenized = list(parser.tokenize(sentence))
print(tokenized) which outputs:
Note that this requires a Alternatively, you can use another tokenizer if you wish. (e.g. Hope that helps. |
Hi, thank you so much for the quick response. I set the
It still throws the following error:
|
I would recommend adding the environment variable in Windows itself, not in Python. You can google the steps, but here's a short overview:
Good luck! |
Hello!
As some of you might be aware, several Stanford related classes have been deprecated back in 2017. They are the following:
nltk.tag.StanfordTagger
nltk.tag.StanfordPOSTagger
nltk.tag.StanfordNERTagger
nltk.parse.GenericStanfordParser
nltk.parse.StanfordParser
nltk.parse.StanfordDependencyParser
nltk.parse.StanfordNeuralDependencyParser
nltk.tokenize.StanfordTokenizer
nltk.tokenize.StanfordSegmenter
These have been replaced by the following newer classes:1
nltk.parse.GenericCoreNLPParser
nltk.parse.CoreNLPParser
nltk.parse.CoreNLPDependencyParser
Note that each of these new classes rely on a
CoreNLPServer
running. One of the ways to get this to run is directly from the source using Java, as mentioned in #1735 (comment) by the author of most of these changes, @alvations. He used:Note that newer versions of the stanford-corenlp package are available nowadays.
Alternatively, the
CoreNLPServer
class can also be used to run the server in Python, though I haven't gotten that to work on Windows.What now?
All of these Stanford classes contain DeprecationWarnings placed back in 2017, such as this one:
nltk/nltk/tokenize/stanford_segmenter.py
Lines 71 to 82 in d21646d
Clearly, we need to make some changes here. We're on v3.6.3 now.
With this issue I invite some discussion on the following options (among others):
Personally I'm leaning towards either 1 or 2.
However, before simply removing potentially often used code, I went over each of the deprecated classes to see if there are indeed new equivalents, and for adding to the documentation somewhere.
Stanford updating reference
The following table contains the deprecated classes with their main methods, and the equivalent newer classes and methods. Each line on the left column is equivalent to a line on the right column.
Notes
StanfordDependencyParser
used to have the same methods asStanfordParser
. Nowadays, you should useCoreNLPDependencyParser
instead, which has the same methods asCoreNLPParser
.My goal with this PR is to reach a consensus on how to move forwards, and then create a PR with those agreed upon changes, so feel free to share your opinion.
Footnotes
1:
StanfordNeuralDependencyParser
was never fully implemented, and as a result does not exist in the newerCoreNLP...
format.The text was updated successfully, but these errors were encountered: