Skip to content

sshuster/10K-word-embeddings

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

10K-word-embeddings

Word embeddings learned from 10-K documents as decribed in paper:
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3480902

Getting Started

These instructions will help you get started with using the word embeddings in your projects.

Prerequisities

Install PyTorch and Numpy

Details

10k_word_embeddings.tar - Tar file containing word embeddings. Each embedding is of 300 dimension.
vocab_to_int.tar - Tar file containing vocab to integer mapping. Vocab size is 159647 words.

Usage

import torch
embed = torch.load('10k_word_embeddings.tar')
vocab_to_int = torch.load('vocab_to_int.tar')

  • Use learned embeddings as pre-trained embeddings in a neural network:
    from torch import nn
    embeddings = nn.Embedding(embed.shape[0], embed.shape[1])
    embeddings.weight.data.copy_(torch.from_numpy(embed))
    embeddings.weight.requires_grad = False

  • Get learned embeddings of a word from vocabulary:
    embeddings_profit = embed[vocab_to_int['profit']]

About

Word embeddings learned from 10-K documents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published