2021 IJCNLP A Survey of Data Augmentation Approaches for NLP #25

DelaramRajaei · 2023-06-01T21:02:31Z

This is the issue dedicated to the summary of papers that I found related to adding back translation expander.

DelaramRajaei · 2023-06-05T18:53:29Z

Title	A Survey of Data Augmentation Approaches for NLP
Year	2021
Venue	IJCNLP
Paper's Link	https://arxiv.org/pdf/2105.03075.pdf
My summary's Link	https://docs.google.com/document/d/1K5zPymfH-PfDlJBxqdSHsvMBY7Fb_Dw7WxgBoKFNUIM/edit#

This research paper serves as a survey on data augmentation. I specifically selected and initiated my literature review with this paper to gain insights into the hierarchy of data augmentation first and figure out what category of back translation belongs to.

This paper begins by providing fundamental definitions of data augmentation and continues with the reasons behind its necessity in various NLP tasks and projects. Furthermore, it presented a range of proposed methods and solutions for different tasks and applications.

Data augmentation, as defined in the paper, refers to different methods employed to increase the sample data without the need for direct data collection.

An ideal data augmentation method should balance ease of implementation and improve model performance. There exists a trade-off between these two aspects.

Below is an overview of the demonstrated hierarchy of data augmentation:

Rule-based technique: Applying predefined rules or transformations to existing data samples to generate new synthetic samples. Following are some proposed methods for this technique:
- Synonym Replacement
- Random Insertion
- Random Deletion
- Sentence Swap
Example Interpolation Techniques: Interpolates the inputs and labels of two or more real examples.
Model-Based Techniques: Involve leveraging pre-trained models to generate augmented examples.
- The back translation method is contained in this technique.

Applications
The following are some NLP applications that can be solved using DA methods.

Low-Resource Language
One solution is to use back translation and self-learning to generate augmented training data.
Mitigating Bias
Fixing Class Imbalance
Few-Shot learning
Adversarial Examples

Tasks
Following are some tasks in NLP which can use data augmentation work.

Summarization
The back translation method was used in a paper for few-shot abstractive summarization with the use of a consistency loss inspired by UDA.
Question Answering
One paper suggests DA and sampling techniques for domain-agnostic QA and paraphrasing with back translation.
Another paper introduces QANet which can improve the performance of SQuAD when combined with back translation.
Sequence Tagging Task
Parsing Tasks
Grammatical Error Correction
Neural Machine Translation
Data-to-Text NLG
Open-Ended & Conditional Generation
Dialogue
Multimodal Tasks

Challenges & Future Directions
The paper concludes by mentioning current challenges and discussing potential areas for future research in data augmentation within the field of NLP.

Dissonance between empirical novelties and theoretical narrative
Minimal benefit for pretrained models on indomain data
Multimodal challenges
Span-based tasks
Working in specialized domains
Working with low-resource languages
More vision-inspired techniques
Self-supervised learning
Offline versus online data augmentation
Lack of unification
Good data augmentation practices

DelaramRajaei added the literature-review Summary of the paper related to the work label Jun 1, 2023

DelaramRajaei changed the title ~~Summary of papers - Adding Back Translation expander~~ A Survey of Data Augmentation Approaches for NLP Jun 19, 2023

DelaramRajaei changed the title ~~A Survey of Data Augmentation Approaches for NLP~~ 2021 IJCNLP A Survey of Data Augmentation Approaches for NLP Jun 19, 2023

DelaramRajaei self-assigned this Jun 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2021 IJCNLP A Survey of Data Augmentation Approaches for NLP #25

2021 IJCNLP A Survey of Data Augmentation Approaches for NLP #25

DelaramRajaei commented Jun 1, 2023

DelaramRajaei commented Jun 5, 2023 •

edited

Loading

2021 IJCNLP A Survey of Data Augmentation Approaches for NLP #25

2021 IJCNLP A Survey of Data Augmentation Approaches for NLP #25

Comments

DelaramRajaei commented Jun 1, 2023

DelaramRajaei commented Jun 5, 2023 • edited Loading

DelaramRajaei commented Jun 5, 2023 •

edited

Loading