Skip to content

NUSTM/GCDDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Generative Cross-Domain Data Augmentation for Aspect and Opinion Co-Extraction

This repository contains code for our NAACL2022 paper:
Generative Cross-Domain Data Augmentation for Aspect and Opinion Co-Extraction

Datasets

The training data comes from three domains: Restaurant(R) 、 Laptop(L) 、 Device(D).
We follow the previous work and remove the sentences that have no aspects and opinions when device is the source domain.

The in-domain corpus(used for training BERT-E) come from yelp and amazon reviews.

Click here to get BERT-E (BERT-Extented) , and the extraction code is by0i. (Please specify the directory where BERT is stored in modelconfig.py.)

environment

transformers==4.2.2
pytorch==1.10.0

code

  1. Firstly, we run the following code to achieve the target pseudo labeled data:
cd aeoe
cd ae_oe_bert_crf
bash ./run_bert_e_sdl.sh
  1. Then, we run the following code to achieve masked data:
cd ..
bash ./process_data.sh
  1. After that, we train the bart for data generation:
cd ..
cd da
bash ./test.sh
bash ./post_process.sh
  1. finally, we filter the generated data and train it for downstreamtask:
cd ..
cd aeoe
cd ae_oe_bert_crf
bash ./run_bert_e_da_filter.sh
bash ./run_co_guess.sh
bash ./run_bert_e_da_train.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages