Skip to content

moon23k/GIFT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Generation Improving Fine Tuning

This repository shares code implementing various fine-tuning methodologies aimed at enhancing the generation capabilities of natural language generation model. There are four methodologies in total: Standard, Auxiliary, Recurrent, and Generative. Detailed descriptions of each methodology, along with their basic setups and performance evaluations in machine translation, are provided below.

Fine-Tuning Strategy

Standard

Standard Fine Tuning is the most common method of fine-tuning. It involves taking the parameters of a pre-trained model and applying the same training process as usual, but with a reduced learning rate for fine adjustments.


Auxiliary

The Auxiliary strategy is a method that reduces the risk of exposure bias by using First Token Prediction as an auxiliary training objective alongside Maximum Likelihood Estimation (MLE), which serves as the main training objective.


Recurrent

The Recurrent approach is a fine-tuning method inspired by Schedule Sampling for Transformer, where the output of the decoder is recursively utilized as input to the decoder. However, unlike the Scheduled Sampling typically used in RNN Seq2Seq, the key difference lies in exclusively replacing all input values with the output of the decoder.


Generative

The Generative method is a fine-tuning approach that incorporates generation during the training process by applying a certain proportion of the inference generation methodology.



Setups

Dataset Model Training
WMT14 En-De Transformer Seq2Seq Num of Epoch: 10



Results

Strategy Score Epoch Time Avg GPU Max GPU
BaseLine
Standard
Auxiliary
Recurrent
Generative



How to Use

Clone repo on your local env

git clone

Setup Dataset and Tokenizer

python3 setup.py

Actual Process via run.py file

python3 run.py -mode     [train, finetune, test, inference]
               -strategy [standard(default), auxiliary, recurrent, generative]
               -search   [beam, greedy]



Reference

  • Attention is all you need
  • Scheduled Sampling for Transformers

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages