Skip to content

Portland State University LING 575: CORPUS LINGUISTICS code repo for winter term 2020.

Notifications You must be signed in to change notification settings

steve3p0/LING576

Repository files navigation

LING 576: CORPUS LINGUISTICS

Source code repository for corpus linguistics taught at Portland State University.

Labs (work-in-progress)

The labs for this course are designed for non-programmers. One of my goals was to create programming versions of the assignments, so that students interested in NLP could do this option. I have Lab 1 completed and a start on Lab 2. Stay tuned!

Final Project: Feature Identification in Corpus Linguistics

"A Case Study in Improving Machine Translation"

Abstract:
One of the challenges in adopting machine translation has been how to implement post-editor (translator) feedback. Machine translation engineers are given access to corpora to train translation models, but have struggled to make improvements suggested by the translators that use them. This study explores using NLP to identify types of grammatical features in corpora that are poorly translated. We setup a simple scenario creating conditions where a specific grammatical feature, the passive voice, is poorly translated, how we can identify this feature in corpora, and how augmenting our training corpus with passive voice phrases improves machine translation quality of this feature.

You can read the full paper here.

The source code for the project is located the corpus_project folder. I will add a Jupyter Notebook and some more information on requirements, etc., at a later date.

About

Portland State University LING 575: CORPUS LINGUISTICS code repo for winter term 2020.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages