Adding StopWordsRemover #59

clayms · 2017-12-11T13:30:35Z

I want to add the pyspark.ml.feature StopWordsRemover as a class in the annotator.py file so I can use that function in the same pipeline as the other sparknlp functions.

I have tried the code below, but I get the error: TypeError: 'JavaPackage' object is not callable

What am I doing wrong?

from pyspark.ml.feature import StopWordsRemover as sparkml_StopWordsRemover
stopwordList = sparkml_StopWordsRemover.loadDefaultStopWords("english")

class StopWordsRemover(AnnotatorTransformer):

    caseSensitive = Param(Params._dummy(),
                             "caseSensitive",
                             'whether to do a case sensitive comparison over the stop words',
                             typeConverter=TypeConverters.toBoolean)

    stopWords = Param(Params._dummy(),
                         "stopWords",
                         "The words to be filtered out",
                         typeConverter=TypeConverters.toListString)    
    @keyword_only
    def __init__(self):
        super(StopWordsRemover, self).__init__()
        self._java_obj = self._new_java_obj("com.johnsnowlabs.nlp.annotators.StopWordsRemover", self.uid)
        self._setDefault(caseSensitive=False, stopWords=stopwordList)
        self.setParams(**kwargs)
    
    def setParams(self, caseSensitive=False, 
                  stopWords=stopwordList):
        kwargs = self._input_kwargs
        return self._set(**kwargs)
      
    def setCaseSensitive(self, value):
        return self._set(caseSensitive=value)

    def setStopWords(self, value):
        return self._set(stopWords=value)

The text was updated successfully, but these errors were encountered:

clayms · 2017-12-12T02:00:28Z

How do I add the pyspark.ml.feature StopWordsRemover to the spark-nlp_2.11-1.2.3.jar file?

aleksei-ai · 2017-12-15T20:14:56Z

@clayms As I see you have a line self._java_obj = self._new_java_obj("com.johnsnowlabs.nlp.annotators.StopWordsRemover", self.uid)
But there is not StopWordsRemover in package com.johnsnowlabs.nlp.annotators. I think that is the reason why you got this Exception.

I suggest to add pyspark StopWordsRemover as a first stage of your pipeline.

clayms · 2017-12-17T19:46:25Z

Thank you @aleksei-ai . I see that now. I got the Spark ML StopWordsRemover to work previously together with the Spark ML RegexTokenizer, but was having trouble getting them to work in the same pipeline as the John Snow Labs annotators. I ended up putting those Spark ML stages at the end of the John Snow Labs annotator stages. The end results are what I am after. Thank you.

maziyarpanahi · 2019-10-22T10:30:37Z

In the upcoming release of Spark NLP 2.3.0, we will have a native StopWordsCleaner annotator.

aleksei-ai added the help wanted label Dec 15, 2017

clayms closed this as completed Dec 17, 2017

elielhojman mentioned this issue Apr 16, 2018

Using Spark ML NGram after Stemmer #176

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding StopWordsRemover #59

Adding StopWordsRemover #59

clayms commented Dec 11, 2017

clayms commented Dec 12, 2017

aleksei-ai commented Dec 15, 2017

clayms commented Dec 17, 2017

maziyarpanahi commented Oct 22, 2019

Adding StopWordsRemover #59

Adding StopWordsRemover #59

Comments

clayms commented Dec 11, 2017

clayms commented Dec 12, 2017

aleksei-ai commented Dec 15, 2017

clayms commented Dec 17, 2017

maziyarpanahi commented Oct 22, 2019