Skip to content
/ gft Public

General Fine-Tuning: A little language for Deep Nets (ACL-2022 Tutorial)

License

Notifications You must be signed in to change notification settings

kwchurch/gft

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gft (general fine-tuning): A Little Language for Deepnets

1-line programs for fine-tuning, inference and more

Quick Links

  1. Papers
    1. ACL-2022 Tutorial
    2. JNLE
  2. Videos 📽️
    1. 📽️ 10 minute TEASER
    2. 🆕📽️ First half of ACL-2022 Tutorial (1 hour 16 minutes) UNABRIDGED
  3. Installation
  4. Documentation
  5. ACL-2022 Tutorial

Four Functions and Four Arguments

gft contains 4 main functions:

  1. gft_fit: fit a pretrained model to data (aka fine-tuning)
  2. gft_predict: apply a model to inputs (aka inference)
  3. gft_eval: score a model on a split of a dataset
  4. gft_summary: Find good stuff (popular models and datasets), and explain what's in those models and datasets.

These gft functions make use of 4 main arguments (though most arguments in most hubs are also supported):

  1. data: standard datasets hosted on hubs such as HuggingFace, PaddleNLP, or custom datasets hosted on the local filesystem
  2. model: standard models hosted on hubs such as HuggingFace, PaddleNLP, or custom models hosted on the local filesystem
  3. equation: string such as "classify: label ~ text", where classify is a task, and label and text refer to columns in a dataset
  4. task: classify, classify_tokens, classify_spans, classify_audio, classify_images, regress, text-generation, translation, ASR, fill-mask

A Few Simple Examples

Here are some simple examples:

emodel=H:bhadresh-savani/roberta-base-emotion

# Summarize a dataset and/or model
gft_summary --data H:dair-ai/emotion
gft_summary --model $emodel
gft_summary --data H:dair-ai/emotion --model $emodel

# find some popular datasets and models that contain "emotion"
gft_summary --data H:__contains__emotion --topn 5
gft_summary --model H:__contains__emotion --topn 5

# make predictions on inputs from stdin
echo 'I love you.' | gft_predict --task classify

# The default model (for the classification task) performs sentiment analysis
# The model, $emodel, outputs emotion classes (as opposed to POSITIVE/NEGATIVE)
echo 'I love you.' | gft_predict --task classify --model $emodel

# some other tasks (beyond classification)
echo 'I love New York.' | gft_predict --task H:token-classification
echo 'I <mask> you.' | gft_predict --task H:fill-mask

# make predictions on inputs from a split of a standard dataset
gft_predict --eqn 'classify: label ~ text' --model $emodel --data H:dair-ai/emotion --split test

# return a single score (as opposed to a prediction for each input)
gft_eval --eqn 'classify: label ~ text' --model $emodel --data H:dair-ai/emotion --split test

# Input a pre-trained model (bert) and output a post-trained model
gft_fit --eqn 'classify: label ~ text' \
	--model H:bert-base-cased \
	--data H:dair-ai/emotion \
	--output_dir $outdir

Pre-Training, Fine-Tuning and Inference

The table below shows a 3-step recipe, which has become standard in the literature on deep nets.

Step gft Support Description Time Hardware
1 Pre-Training Days/Weeks Large GPU Cluster
2 gft_fit Fine-Tuning Hours/Days 1+ GPUs
3 gft_predict Inference Seconds/Minutes 0+ GPUs

This repo provides support for step 2 (gft_fit) and step 3 (gft_predict). Most gft_fit and gft_predict programs are short (1-line), much shorter than examples such as these, which are typically a few hundred lines of python. With gft, users should not need to read or modify any python code for steps 2 and 3 in the table above.

Step 1, pre-training, is beyond the scope of this work. We recommend starting with models from HuggingFace and PaddleHub/PaddleNLP hubs, as illustrated in the examples above.

Citations, Documentation, etc.

Papers are here and here.

@inproceedings{church-etal-2022-gentle,
    title = "A Gentle Introduction to Deep Nets and Opportunities for the Future",
    author = "Church, Kenneth  and
      Kordoni, Valia  and
      Marcus, Gary  and
      Davis, Ernest  and
      Ma, Yanjun  and
      Chen, Zeyu",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-tutorials.1",
    pages = "1--6",
    abstract = "The first half of this tutorial will make deep nets more accessible to a broader audience, following {``}Deep Nets for Poets{''} and {``}A Gentle Introduction to Fine-Tuning.{''} We will also introduce GFT (general fine tuning), a little language for fine tuning deep nets with short (one line) programs that are as easy to code as regression in statistics packages such as R using glm (general linear models). Based on the success of these methods on a number of benchmarks, one might come away with the impression that deep nets are all we need. However, we believe the glass is half-full: while there is much that can be done with deep nets, there is always more to do. The second half of this tutorial will discuss some of these opportunities.",
}

@article{church-etal-2022-gft, 
   title={Emerging trends: General fine-tuning (gft)}, 
   DOI={10.1017/S1351324922000237}, 
   journal={Natural Language Engineering}, 
   publisher={Cambridge University Press}, 
   author={Church, Kenneth and Cai, Xingyu and Ying, Yibiao and Chen, Zeyu and Xun, Guangxu and Bian, Yuchen}, 
   year={2022}, 
   pages={1–17}}

About

General Fine-Tuning: A little language for Deep Nets (ACL-2022 Tutorial)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages