LIFA: A Multi-source Transfer Learning Framework for Vietnamese Sentiment Classification

This is the official code for the paper Learning for Amalgamation: A Multi-Source Transfer Learning Framework for Sentiment Classification (Information Sciences 2022).

Authors: Cuong V. Nguyen and Khiem H. Le and Anh M. Tran and Quang H. Pham and Binh T. Nguyen

Overview

Our proposed LIFA framework using a gating network. (a) The overall framework demonstrated on N embeddings. (b) Our Gating Network architecture. It is worth noting that from given input data, LIFA can select N different embedding models (which can have distinct embedding dimensions) for extracting feature vectors. These feature vectors then go through linear transformation layers to project these feature vectors into the same feature space. Finally, a gating network can be employed for combining these newly computed features to learn an optimal classifier for the sentiment classification problem.

Requirement

python 3.7.3
pytorch 1.4.0
pytorch-transformers 1.2.0
tensorflow 2.0.0
torchtext 0.4.0
torchvision 0.4.0
scikit-image 0.15.0
scikit-learn 0.20.3
nltk 3.4.5
fairseq 0.9.0
vncorenlp 1.0.3

Data preparation

In this work, we use two datasets:
- AIVIVN: this is the publish dataset from AIVIVN 2019 Sentiment Challenge, including approximately 160K training reviews with the available labels and 11K testing reviews without the available labels. We manually did labelling for the testing dataset.
- Our dataset: this is our new dataset which was crawled from the Vietnamese e-commerce websites, the reviews are started from Jan 2019 and includes all product categories. We trained all the methods with 10K, 15K, 20K training reviews respectively and tested on about 170K reviews.
- The validation dataset is randomly selected from the training dataset, with 20%.
- The two datasets are placed at the folders dataset/aivivn/ and dataset/tiki/.

Pre-trained Models preparation

BERT: which is the pre-trained BERT model with the version of bert-base-multilingual-uncased and automatically downloaded from Huggingface Transformers.
PhoBERT: which is the state-of-the-art pre-trained BERT model for the Vietnamese language. To run with the pre-trained PhoBert models, we need to do:
- Download https://public.vinai.io/PhoBERT_base_transformers.tar.gz, extract and place at */phobert
- Download vncorenlp from https://github.com/VinAIResearch/PhoBERT#vncorenlp, extract and place at */vncorenlp

Training

We created lifa/run.sh to train the LIFA model on the our datasets. You can directly run this file by configuring the following parameters:

Parameters	Description	Example Values
`dataset`	The dataset that will be used for training	"tiki"
`model_name`	Type of LIFA model	"Moe_Gating"
`pretrained_model`	The pre-trained model that will be used for training	"bert"

For more fine-grained parameters, please refer to the source code.

Citation

If you find the paper or the source code useful to your projects, please cite the following bibtex:

@article{lifa,
  author = {Cuong V. Nguyen and Khiem H. Le and Anh M. Tran and Quang H. Pham and Binh T. Nguyen},
  title = {Learning for amalgamation: A multi-source transfer learning framework for sentiment classification},
  journal = {Information Sciences},
  volume = {590},
  pages = {1-14},
  year = {2022},
  issn = {0020-0255},
  doi = {https://doi.org/10.1016/j.ins.2021.12.059},
}

Contact

For any requests to further use these codes and our proposed algorithms for the Vietnamese Sentiment Classification problem, please kindly contact authors to avoid any misused action or violation to the copyright of all authors and creators of this repository.

Contact Email: ngtbinh@hcmus.edu.vn (Dr. Binh Nguyen)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
dataset		dataset
lifa		lifa
maximal_correlation_weighting		maximal_correlation_weighting
transfer_learning		transfer_learning
.gitignore		.gitignore
README.md		README.md
arch.png		arch.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset

dataset

lifa

lifa

maximal_correlation_weighting

maximal_correlation_weighting

transfer_learning

transfer_learning

.gitignore

.gitignore

README.md

README.md

arch.png

arch.png

Repository files navigation

LIFA: A Multi-source Transfer Learning Framework for Vietnamese Sentiment Classification

Contents:

Overview

Requirement

Data preparation

Pre-trained Models preparation

Training

Citation

Contact

About

Releases

Packages

Languages

lhkhiem28/Learning-for-Amalgamation

Folders and files

Latest commit

History

Repository files navigation

LIFA: A Multi-source Transfer Learning Framework for Vietnamese Sentiment Classification

Contents:

Overview

Requirement

Data preparation

Pre-trained Models preparation

Training

Citation

Contact

About

Topics

Resources

Stars

Watchers

Forks

Languages