wsd-training

Create training data for Word Sense Disambiguation (WSD) deep learning

A Java based project using my own Semantic Lexicon for creating word sense disambiguation training data for deep learning. WordNet does not provide a clear enough set of semantic nouns, so I've created my own clear and practical training set.

parser

This uses the Apache OpenNLP parser for sentence boundary detection, and Part of Speech (POS) tagging using Penn tags.

gradle build

Creates a Java set in ./dist/ by running

gradle clean build setup

supported training files

Added support for reading .txt, .gz and Peter's .parsed file formats for setting up Unlabelled training data

python Keras DNN

The processed data is then used to train an LSTM using Keras/Tensorflow that can be loaded to get a neural network that will label the correct Synset ID (according to the lexicon) and assing a Synset ID to an ambiguous noun.

With the right lexicon / training data this seems to get around a 75% accuracy level.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
create_td		create_td
data		data
nnet		nnet
.gitignore		.gitignore
README.md		README.md
build.gradle		build.gradle
settings.gradle		settings.gradle
version.gradle		version.gradle
wsd.properties		wsd.properties

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wsd-training

parser

gradle build

supported training files

python Keras DNN

About

Releases

Packages

Languages

rock3125/wsd-training

Folders and files

Latest commit

History

Repository files navigation

wsd-training

parser

gradle build

supported training files

python Keras DNN

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages