CANTONMT: Cantonese to English NMT Platform with Fine-Tuned Models using Real and Synthetic Back-Translation Data
Third Year Project for BSc Computer Science at the University of Manchester
This project focuses on developing models for translating Cantonese sentences to English sentences, where the trained models have obtained comparable results against State-of-the-Art commercial models (Bing, Baidu).
User Interface is provided to test out the models, and guides are provided below.
Datasets used for the training of models can be found on Google Drive
Training Files can also be found in the Notebooks folder.
To run the user interface for demonstration purposes, you should first download the model from Google Drive.
The models should follow the same folder structure as in Google Drive under the Backend folder in the GitHub Repo.
Run the following code in the terminal to start the Backend
cd Backend
pip install -r requirement.txt
python app.py
To run the frontend user interface, run the following code in the terminal.
cd Frontend
npm i
npm run dev
The user interface should be correctly set up on http://localhost:3000/.
recording-on-Youtube PPT demo-1min
pre-print 'CANTONMT: Cantonese to English NMT Platform with Fine-Tuned Models using Synthetic Back-Translation Data'. 2024. Kung Yin Hong, Lifeng Han, Riza Batista-Navarro, Goran Nenadic. Arxiv
@misc{hong2024cantonmt,
title={CantonMT: Cantonese to English NMT Platform with Fine-Tuned Models Using Synthetic Back-Translation Data},
author={Kung Yin Hong and Lifeng Han and Riza Batista-Navarro and Goran Nenadic},
year={2024},
eprint={2403.11346},
archivePrefix={arXiv},
primaryClass={cs.CL}
}