Best practices for text pre-processing using Spacy #7228
Replies: 1 comment 2 replies
-
Howdy, and welcome to the forums!
Not a silly question, it's always good to pay attention to the details. In this case, For sentiment analysis in particular you don't want to remove stop words - |
Beta Was this translation helpful? Give feedback.
-
Hey everyone!
I'm new to NLP and i've been playing around with spacy for sentiment analysis.
Suppose I have a sentence that I want to classify as a positive or negative one. I want to remove stop words from that sentence before feeding it to the classifier. I know that spacy's
en_core_web_sm
model will create tokens with theis_stop
attribute, which is super helpful.My question is, if i chuck a
textcat
component at the end of theen_core_web_sm
model (usingadd_pipe
), will stop words automatically be filtered out before being fed to the classifier? Or do i need to use theis_stop
attribute to remove them myself, and then feed them to another pipeline?I apologise if this is a silly question but I couldn't find an obvious answer! Many thanks in advance for any help!
Andy
Beta Was this translation helpful? Give feedback.
All reactions