Skip to content
This repository has been archived by the owner on Oct 21, 2020. It is now read-only.
/ UITHelper_QAS Public archive

An ensemble system with a search engine for relevant document retrieval and a deep learning model (BERT) for machine comprehension in Vietnamese, applied to answer questions related to regulations of University of Information Technology (VNUHCM-UIT)

License

Notifications You must be signed in to change notification settings

namnv1113/UITHelper_QAS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Question Answering System for Regulations of University of Information Technology

General

The purpose of this project is to develop a Question Answering System with Reading Comprehension ability on Vietnamese, whose tools and resources are lacked, and applied to answering question related to rules and regulations of University of Information System.

This system adapts traditional Information Retrieval techniques (mostly based on Extended Boolean Model) and Deep Learning algorithms (BERT that achieves state-of-the-art performance on 11 different NLP tasks in English) and transfer learning on Vietnamese that posed attractive potential on Vietnamese Question Answering researches.

The Information Retrieval aprroaches are very common, but deep learning approaches are almost never used in Vietnamese QA System. In this project, a naive transfer learning technique is used, where we translate the SQuAD dataset from English to Vietnamese and remove bad translation (link) that poses an additional 10% boost in F1 accuracy, resulted in an F1 accuracy of 66% in the original task (QA on Wikipedia) and 56% in the UIT regulations task.

More information about this project is stored in ./Report/Summary.pdf or ./Report/Thesis.pdf

Structures

  • QASystem and Ultilities contain source codes, base model as well as fine-tuning models and dataset used in this project. Guide on how to setup and re-produce the result is also provided.
  • Report contains documents about this thesis as well as slides and related files.
  • Dataset contains the dataset that is used in this project.

Information

  • By Nguyễn Việt Nam - 14520560
  • Advisors: Dr. Ngô Đức Thành & M Sc Nguyễn Vinh Tiệp
  • Advanced Education Program 2014 - VNU-UIT

If any problem occurs, please contact me via my email address 14520560@gm.uit.edu.vn or namnv1113@gmail.com

About

An ensemble system with a search engine for relevant document retrieval and a deep learning model (BERT) for machine comprehension in Vietnamese, applied to answer questions related to regulations of University of Information Technology (VNUHCM-UIT)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages