10K-word-embeddings

Word embeddings learned from 10-K documents as decribed in paper:
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3480902

Getting Started

These instructions will help you get started with using the word embeddings in your projects.

Prerequisities

Install PyTorch and Numpy

Details

10k_word_embeddings.tar - Tar file containing word embeddings. Each embedding is of 300 dimension.
vocab_to_int.tar - Tar file containing vocab to integer mapping. Vocab size is 159647 words.

Usage

import torch
embed = torch.load('10k_word_embeddings.tar')
vocab_to_int = torch.load('vocab_to_int.tar')

Use learned embeddings as pre-trained embeddings in a neural network:
from torch import nn
embeddings = nn.Embedding(embed.shape[0], embed.shape[1])
embeddings.weight.data.copy_(torch.from_numpy(embed))
embeddings.weight.requires_grad = False
Get learned embeddings of a word from vocabulary:
embeddings_profit = embed[vocab_to_int['profit']]

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitattributes		.gitattributes
10k_word_embeddings.tar		10k_word_embeddings.tar
README.md		README.md
vocab_to_int.tar		vocab_to_int.tar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitattributes

.gitattributes

10k_word_embeddings.tar

10k_word_embeddings.tar

README.md

README.md

vocab_to_int.tar

vocab_to_int.tar

Repository files navigation

10K-word-embeddings

Getting Started

Prerequisities

Details

Usage

About

Releases

Packages

sshuster/10K-word-embeddings

Folders and files

Latest commit

History

Repository files navigation

10K-word-embeddings

Getting Started

Prerequisities

Details

Usage

About

Resources

Stars

Watchers

Forks