Skip to content

Latest commit

 

History

History
29 lines (20 loc) · 1.57 KB

README.md

File metadata and controls

29 lines (20 loc) · 1.57 KB

LING 576: CORPUS LINGUISTICS

Source code repository for corpus linguistics taught at Portland State University.

Labs (work-in-progress)

The labs for this course are designed for non-programmers. One of my goals was to create programming versions of the assignments, so that students interested in NLP could do this option. I have Lab 1 completed and a start on Lab 2. Stay tuned!

Final Project: Feature Identification in Corpus Linguistics

"A Case Study in Improving Machine Translation"

Abstract:
One of the challenges in adopting machine translation has been how to implement post-editor (translator) feedback. Machine translation engineers are given access to corpora to train translation models, but have struggled to make improvements suggested by the translators that use them. This study explores using NLP to identify types of grammatical features in corpora that are poorly translated. We setup a simple scenario creating conditions where a specific grammatical feature, the passive voice, is poorly translated, how we can identify this feature in corpora, and how augmenting our training corpus with passive voice phrases improves machine translation quality of this feature.

You can read the full paper here.

The source code for the project is located the corpus_project folder. I will add a Jupyter Notebook and some more information on requirements, etc., at a later date.