practical-nlp/Ch4/03_Word2Vec_Example.ipynb #5

sai-teja-ponugoti · 2020-07-10T21:36:11Z

In the above Jupiter notebook, in function:

# Creating a feature vector by averaging all embeddings for all sentences
def embedding_feats(list_of_lists):
    DIMENSION = 300
    zero_vector = np.zeros(DIMENSION)
    feats = []
    for tokens in list_of_lists:
        feat_for_this =  np.zeros(DIMENSION)
        count_for_this = 0
        for token in tokens:
            if token in w2v_model:
                **feat_for_this += w2v_model[token]
                count_for_this +=1
        feats.append(feat_for_this)**        
    return feats

The feature vectors are not averaged, directly sum of the word embeddings is appended as the feature, whereas the function description indicates that the vectors will be averaged.

The text was updated successfully, but these errors were encountered:

varunp2k · 2020-07-11T15:29:31Z

Thank you @sai-teja-ponugoti for bringing this to our notice.
The fix to this most likely would be just having
np.mean(w2v_model[token],axis=0)
instead of the current
w2v_model[token]

Would you like to send a PR to fix this issue?

brianalexander · 2020-07-13T06:12:19Z

Hi @varunp2k,

I believe the fix would be a modification of

feats.append(feat_for_this)
to
feats.append(feat_for_this/count_for_this if count_for_this > 0 else feat_for_this)

In the example, we're categorizing the tweet by taking the average of all of the word vectors for each word in the sentence. This also makes sense because previously count_for_this is included in the code, but never ends up being used.

count_for_this if count_for_this > 0 else feat_for_this is needed to handle cases where there are no token matches and avoid dividing by zero.

Here is the updated cell in its entirety:

def embedding_feats(list_of_lists, dims=300):
    zero_vector = np.zeros(dims)
    feats = []
    for tokens in list_of_lists:
        feat_for_this = np.zeros(dims)
        count_for_this = 0
        for token in tokens:
            if token in w2v_model:
                feat_for_this += w2v_model[token]
                count_for_this += 1
        feats.append(feat_for_this/count_for_this if count_for_this > 0 else feat_for_this)
    return feats

train_vectors = embedding_feats(texts_processed)
print(len(train_vectors))

varunp2k · 2020-07-13T13:42:49Z

Looks good @brianalexander
Please send a PR

sai-teja-ponugoti · 2020-07-13T14:08:23Z

Thank you @sai-teja-ponugoti for bringing this to our notice.
The fix to this most likely would be just having
np.mean(w2v_model[token],axis=0)
instead of the current
w2v_model[token]

Would you like to send a PR to fix this issue?

@brainalexander have you sent one already?

varunp2k · 2020-07-13T14:42:05Z

@sai-teja-ponugoti
A PR stands for Pull Request. This article will explain what it is in a detail.
https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests
Brainalexander is yet to send a PR.
You can check the active PR's for this repo under the pull requests section right next to the issues. The link to that is here.

varunp2k added a commit that referenced this issue Jul 18, 2020

Fixed issue #5. Havent yet re-run entire notebook to confirm results

87c592a

varunp2k closed this as completed Jul 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

practical-nlp/Ch4/03_Word2Vec_Example.ipynb #5

practical-nlp/Ch4/03_Word2Vec_Example.ipynb #5

sai-teja-ponugoti commented Jul 10, 2020

varunp2k commented Jul 11, 2020 •

edited

Loading

brianalexander commented Jul 13, 2020

varunp2k commented Jul 13, 2020

sai-teja-ponugoti commented Jul 13, 2020 •

edited

Loading

varunp2k commented Jul 13, 2020

practical-nlp/Ch4/03_Word2Vec_Example.ipynb #5

practical-nlp/Ch4/03_Word2Vec_Example.ipynb #5

Comments

sai-teja-ponugoti commented Jul 10, 2020

varunp2k commented Jul 11, 2020 • edited Loading

brianalexander commented Jul 13, 2020

varunp2k commented Jul 13, 2020

sai-teja-ponugoti commented Jul 13, 2020 • edited Loading

varunp2k commented Jul 13, 2020

varunp2k commented Jul 11, 2020 •

edited

Loading

sai-teja-ponugoti commented Jul 13, 2020 •

edited

Loading