Skip to content

Commit 232f10e

Browse files
committed
oversample
1 parent 6b3f8d6 commit 232f10e

File tree

1 file changed

+8
-2
lines changed

1 file changed

+8
-2
lines changed

nlp_class/sentiment.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,8 +41,14 @@
4141

4242
# there are more positive reviews than negative reviews
4343
# so let's take a random sample so we have balanced classes
44-
np.random.shuffle(positive_reviews)
45-
positive_reviews = positive_reviews[:len(negative_reviews)]
44+
# np.random.shuffle(positive_reviews)
45+
# positive_reviews = positive_reviews[:len(negative_reviews)]
46+
47+
# we can also oversample the negative reviews
48+
diff = len(positive_reviews) - len(negative_reviews)
49+
idxs = np.random.choice(len(negative_reviews), size=diff)
50+
extra = [negative_reviews[i] for i in idxs]
51+
negative_reviews += extra
4652

4753
# first let's just try to tokenize the text using nltk's tokenizer
4854
# let's take the first review for example:

0 commit comments

Comments
 (0)