NLP 2018/2019 (more)

Homework 1

Chinese word segmentation is an instance of word segmentation, which is the task of splitting a string of written natural language into its component words. The majority of languages relies on a space character to delimit each word in written texts, but there are languages like Chinese, Japanese, Thai, Lao, and Vietnamese which either delimit sentences, phrases, or syllables instead of words, making the task non-trivial.

In order to tackle such non-triviality, a variety of neural network approaches has been used, like the one presented in State-of-the-art Chinese Word Segmentation with Bi-LSTMs, which is the basis for this homework assignment. Instead of a sequence to sequence task, for this homework word segmentation has been reduced to sequence tagging, making use of the BIES format to tag each character of a given sequence.

The authors describe a quite simple model: based on 1-grams and 2-grams, using pretrained word embeddings, their model is composed of two layers: a Bi-LSTM and a dense one.

The concatenation of the embeddings of 1-grams and 2-grams is fed into the Bi-LSTM layer, and its output into the dense layer to obtain a probability distribution over the BIES tags for each character in the sequence.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
code		code
LICENSE		LICENSE
README.md		README.md
hw1_report_anonymous.pdf		hw1_report_anonymous.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP 2018/2019 (more)

Homework 1

About

Releases

Packages

Languages

License

emanuelegiona/NLP2019_HW1

Folders and files

Latest commit

History

Repository files navigation

NLP 2018/2019 (more)

Homework 1

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages