Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2021 IJCNLP A Survey of Data Augmentation Approaches for NLP #25

Open
DelaramRajaei opened this issue Jun 1, 2023 · 1 comment
Open
Assignees
Labels
literature-review Summary of the paper related to the work

Comments

@DelaramRajaei
Copy link
Member

This is the issue dedicated to the summary of papers that I found related to adding back translation expander.

@DelaramRajaei DelaramRajaei added the literature-review Summary of the paper related to the work label Jun 1, 2023
@DelaramRajaei
Copy link
Member Author

DelaramRajaei commented Jun 5, 2023

Title A Survey of Data Augmentation Approaches for NLP
Year 2021
Venue IJCNLP
Paper's Link https://arxiv.org/pdf/2105.03075.pdf
My summary's Link https://docs.google.com/document/d/1K5zPymfH-PfDlJBxqdSHsvMBY7Fb_Dw7WxgBoKFNUIM/edit#

This research paper serves as a survey on data augmentation. I specifically selected and initiated my literature review with this paper to gain insights into the hierarchy of data augmentation first and figure out what category of back translation belongs to.

This paper begins by providing fundamental definitions of data augmentation and continues with the reasons behind its necessity in various NLP tasks and projects. Furthermore, it presented a range of proposed methods and solutions for different tasks and applications.

Data augmentation, as defined in the paper, refers to different methods employed to increase the sample data without the need for direct data collection.

An ideal data augmentation method should balance ease of implementation and improve model performance. There exists a trade-off between these two aspects.

Below is an overview of the demonstrated hierarchy of data augmentation:

  • Rule-based technique: Applying predefined rules or transformations to existing data samples to generate new synthetic samples. Following are some proposed methods for this technique:

    • Synonym Replacement
    • Random Insertion
    • Random Deletion
    • Sentence Swap
  • Example Interpolation Techniques: Interpolates the inputs and labels of two or more real examples.

  • Model-Based Techniques: Involve leveraging pre-trained models to generate augmented examples.

    • The back translation method is contained in this technique.

Applications
The following are some NLP applications that can be solved using DA methods.

  • Low-Resource Language
  • One solution is to use back translation and self-learning to generate augmented training data.
  • Mitigating Bias
  • Fixing Class Imbalance
  • Few-Shot learning
  • Adversarial Examples

Tasks
Following are some tasks in NLP which can use data augmentation work.

  • Summarization
  • The back translation method was used in a paper for few-shot abstractive summarization with the use of a consistency loss inspired by UDA.
  • Question Answering
  • One paper suggests DA and sampling techniques for domain-agnostic QA and paraphrasing with back translation.
  • Another paper introduces QANet which can improve the performance of SQuAD when combined with back translation.
  • Sequence Tagging Task
  • Parsing Tasks
  • Grammatical Error Correction
  • Neural Machine Translation
  • Data-to-Text NLG
  • Open-Ended & Conditional Generation
  • Dialogue
  • Multimodal Tasks

Challenges & Future Directions
The paper concludes by mentioning current challenges and discussing potential areas for future research in data augmentation within the field of NLP.

  • Dissonance between empirical novelties and theoretical narrative
  • Minimal benefit for pretrained models on indomain data
  • Multimodal challenges
  • Span-based tasks
  • Working in specialized domains
  • Working with low-resource languages
  • More vision-inspired techniques
  • Self-supervised learning
  • Offline versus online data augmentation
  • Lack of unification
  • Good data augmentation practices

@DelaramRajaei DelaramRajaei changed the title Summary of papers - Adding Back Translation expander A Survey of Data Augmentation Approaches for NLP Jun 19, 2023
@DelaramRajaei DelaramRajaei changed the title A Survey of Data Augmentation Approaches for NLP 2021 IJCNLP A Survey of Data Augmentation Approaches for NLP Jun 19, 2023
@DelaramRajaei DelaramRajaei self-assigned this Jun 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
literature-review Summary of the paper related to the work
Projects
None yet
Development

No branches or pull requests

1 participant