CANTONMT: Cantonese to English NMT Platform with Fine-Tuned Models using Real and Synthetic Back-Translation Data

Third Year Project for BSc Computer Science at the University of Manchester

Introduction

This project focuses on developing models for translating Cantonese sentences to English sentences, where the trained models have obtained comparable results against State-of-the-Art commercial models (Bing, Baidu).

User Interface is provided to test out the models, and guides are provided below.

Training

Datasets used for the training of models can be found on Google Drive

Training Files can also be found in the Notebooks folder.

User Interface

To run the user interface for demonstration purposes, you should first download the model from Google Drive.

The models should follow the same folder structure as in Google Drive under the Backend folder in the GitHub Repo.

Run the following code in the terminal to start the Backend

cd Backend
pip install -r requirement.txt
python app.py

To run the frontend user interface, run the following code in the terminal.

cd Frontend
npm i
npm run dev

The user interface should be correctly set up on http://localhost:3000/.

Manchester NLP club talk

recording-on-Youtube PPT demo-1min

Reference: Please cite our paper if you use the toolkit or data from this repository

pre-print 'CANTONMT: Cantonese to English NMT Platform with Fine-Tuned Models using Synthetic Back-Translation Data'. 2024. Kung Yin Hong, Lifeng Han, Riza Batista-Navarro, Goran Nenadic. Arxiv

@misc{hong2024cantonmt,
      title={CantonMT: Cantonese to English NMT Platform with Fine-Tuned Models Using Synthetic Back-Translation Data}, 
      author={Kung Yin Hong and Lifeng Han and Riza Batista-Navarro and Goran Nenadic},
      year={2024},
      eprint={2403.11346},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
AzureTranslator		AzureTranslator
Backend		Backend
Frontend		Frontend
Notebooks		Notebooks
data-lihkg		data-lihkg
data-wenlin		data-wenlin
data-words.hk		data-words.hk
misc		misc
processed_data		processed_data
results		results
.gitignore		.gitignore
README.md		README.md
view-CantonMT.pdf		view-CantonMT.pdf
view-CantonMT_v1.pdf		view-CantonMT_v1.pdf
view_CantonMT.pdf		view_CantonMT.pdf

kenrickkung/CantoneseTranslation

Folders and files

Latest commit

History

Repository files navigation

CANTONMT: Cantonese to English NMT Platform with Fine-Tuned Models using Real and Synthetic Back-Translation Data

Introduction

Training

User Interface

Manchester NLP club talk

Reference: Please cite our paper if you use the toolkit or data from this repository

About

Resources

Stars

Watchers

Forks

Languages