GitHub

Open-Domain Aspect-Opinion Co-Mining with Double-Layer Span Extraction

The supervised extraction methods achieve state-of-the-art performance but require large-scale human-annotated training data. Thus, they are restricted for open-domain tasks due to the lack of training data. We propose an Open-Domain Aspect-Opinion Co-Mining (ODAO) method with Double-Layer span extraction framework to overcome this issue and simultaneously mine aspect terms, opinion terms, and their correspondance in joint model.

Dataset

The experiments are conducted on SemEval 14, 15, 16 Restaurant and SemEval 14 Laptop datasets. The original dataset can be found in the data/original_data folder.

Pre-processing

Data pre-processing includes four steps:

Format Data: The original file is processed and stored in a dictionary. Run pre-processing/format_data.py to perform this step.
Weak Label Generator: In this step, the CoreNLP dependency parser is executed to generate the weak labels. Download the CORENLP jar files from https://stanfordnlp.github.io/CoreNLP/download.html. Place stanford-corenlp-4.0.0.jar and stanford-corenlp-4.0.0-models.jar in dependency_parser folder and run the following command from the folder.
java -Xmx8g -XX:-UseGCOverheadLimit -XX:MaxPermSize=1024m -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -preload tokenize,ssplit,pos,lemma,ner,parse,depparse -status_port 9015 -port 9015 -timeout 1500000

Once the process is running on port 9015, run pre-processing/pseudo_labels.py to generate weak labels.
Split Data: Once the weak labels are generated for the original train set. This step splits the train set into pseudo train (reviews for which the weak label generator has identified both aspect term and opinion term) and pseudo test (otherwise). The pseudo test is used for prediction as part of self-training. Run pre-processing/split_data.py to execute this step.
Get Pairs: This step processes the pseudo train set to format it for training. Run pre-processing/get_pairs.py to execute this step.

Training

Run training/train.py to train the model.

Citation

Kindly cite our paper

author = {Chakraborty, Mohna and Kulkarni, Adithya and Li, Qi},
title = {Open-Domain Aspect-Opinion Co-Mining with Double-Layer Span Extraction},
year = {2022},
isbn = {9781450393850},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3534678.3539386},
doi = {10.1145/3534678.3539386},
abstract = {The aspect-opinion extraction tasks extract aspect terms and opinion terms from reviews. The supervised extraction methods achieve state-of-the-art performance but require large-scale human-annotated training data. Thus, they are restricted for open-domain tasks due to the lack of training data. This work addresses this challenge and simultaneously mines aspect terms, opinion terms, and their correspondence in a joint model. We propose an Open-Domain Aspect-Opinion Co-Mining (ODAO) method with a Double-Layer span extraction framework. Instead of acquiring human annotations, ODAO first generates weak labels for unannotated corpus by employing rules-based on universal dependency parsing. Then, ODAO utilizes this weak supervision to train a double-layer span extraction framework to extract aspect terms (ATE), opinion terms (OTE), and aspect-opinion pairs (AOPE). ODAO applies canonical correlation analysis as an early stopping indicator to avoid the model over-fitting to the noise to tackle the noisy weak supervision. ODAO applies a self-training process to gradually enrich the training data to tackle the weak supervision bias issue. We conduct extensive experiments and demonstrate the power of the proposed ODAO. The results on four benchmark datasets for aspect-opinion co-extraction and pair extraction tasks show that ODAO can achieve competitive or even better performance compared with the state-of-the-art fully supervised methods.},
booktitle = {Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
pages = {66–75},
numpages = {10},
keywords = {review analysis, natural language processing, data mining},
location = {Washington DC, USA},
series = {KDD '22}
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
code		code
data		data
dependency_parser		dependency_parser
saved_models		saved_models
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open-Domain Aspect-Opinion Co-Mining with Double-Layer Span Extraction

Dataset

Pre-processing

Training

Citation

About

Releases

Packages

Contributors 3

Languages

kulkarniadithya/ODAO

Folders and files

Latest commit

History

Repository files navigation

Open-Domain Aspect-Opinion Co-Mining with Double-Layer Span Extraction

Dataset

Pre-processing

Training

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages