We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 6b3f8d6 commit 232f10eCopy full SHA for 232f10e
nlp_class/sentiment.py
@@ -41,8 +41,14 @@
41
42
# there are more positive reviews than negative reviews
43
# so let's take a random sample so we have balanced classes
44
-np.random.shuffle(positive_reviews)
45
-positive_reviews = positive_reviews[:len(negative_reviews)]
+# np.random.shuffle(positive_reviews)
+# positive_reviews = positive_reviews[:len(negative_reviews)]
46
+
47
+# we can also oversample the negative reviews
48
+diff = len(positive_reviews) - len(negative_reviews)
49
+idxs = np.random.choice(len(negative_reviews), size=diff)
50
+extra = [negative_reviews[i] for i in idxs]
51
+negative_reviews += extra
52
53
# first let's just try to tokenize the text using nltk's tokenizer
54
# let's take the first review for example:
0 commit comments