Introduction to T5 for Sentiment Span Extraction

This introduction notebook is featured in Abhishek Thakur's Talks #3 webinar on youtube.

You can find the original T5 paper here.

Goals for this presentation

Introduce T5 and how it works
Explain T5's significance for the future of NLP
Illustrate how to use T5 for Sentiment Span Extraction

T5 Overview

T5 is a recently released encoder-decoder model that reaches SOTA results by solving NLP problems with a text-to-text approach. This is where text is used as both an input and an output for solving all types of tasks. This was introduced in the recent paper, Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (paper). I've been deeply interested in this model the moment I read about it.

I believe that the combination of text-to-text as a universal interface for NLP tasks paired with multi-task learning (single model learning multiple tasks) will have a huge impact on how NLP deep learning is applied in practice.

In this presentation I aim to give a brief overview of T5, explain some of its implications for NLP in industry, and demonstrate how it can be used for sentiment span extraction on tweets. I hope this material helps you guys use T5 for your own purposes!

Key points from T5 paper

Treats each NLP problem as a “text-to-text” problem - input: text, output: text
Unified approach for NLP Deep Learning - Since the task is reflected purely in the text input and output, you can use the same model, objective, training procedure, and decoding process to ANY task. Above framework can be used for any task - show Q&A, summarization, etc.
Multiple NLP tasks can live in the same model - E.g. Q&A, semantic similarity, etc. However, there is a problem called task interference where good results on one task can also mean worse results on another task. E.g., a good summarizer may be bad at Q&A and vice versa. All the tasks above can live in the same model, which is how it works with the released T5 models (t5-small, t5-base, etc.)

New dataset: “Colossal Clean Crawled Corpus” (C4) - a dataset consisting of ~750GB of clean English text scraped from the web. Created with a month of data from the Common Crawl corpus cleaned with a set of heuristics to filter out "unhelpful" text (e.g. offensive language, placeholder text, source code). This is a lot larger than the 13GB of data used for BERT, and 126GB of data used for XLNet.
A simple denoising training objective was used for pretraining Basically, masked language modelling but while considering contiguous masks as a single “span” to predict, and where the final prediction is an actual text sequence containing the answers (represented by “sentinel tokens”). This was compared to a language modeling pre-training objective and results consistently improved.

Full encoder-decoder transformer architecture is used - this is in contrast to previous architectures that were either encoder based (e.g. BERT), or decoder based (e.g. GPT-2). This was found effective for both generation & classification tasks.

Key insight

Multiple NLP tasks can be learned by a single model since every NLP problem can be represented in a unified way - as a controllable text generation problem.

Expected impact

Increased adoption of multi-task models like T5 due to SOTA accuracy paired with lower time, compute, & storage costs for both deployments and experiments in NLP.

T5 for Sentiment Span Extraction (PyTorch)

This is a dataset from an existing Kaggle competition - Tweet Sentiment Extraction
Most of the existing model implementations use some sort of token classification task
- The index of the beginning and ending tokens are predicted and use to extract the span
T5 is an approach that is purely generative, like a classic language modelling task
- This is similar to abstractize summarization, translation, and overall text generation
- For our data, the span is not extracted by predicting indices, but by generating the span from scratch

Let's get started!

The rest of the tutorial (including the code) can be found in the T5 introduction notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
denoising.png		denoising.png
encoder_decoder.png		encoder_decoder.png
multi_task_model.png		multi_task_model.png
t5_qa_training_pytorch_span_extraction.ipynb		t5_qa_training_pytorch_span_extraction.ipynb
unified_approach.png		unified_approach.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction to T5 for Sentiment Span Extraction

Goals for this presentation

T5 Overview

Key points from T5 paper

Key insight

Expected impact

T5 for Sentiment Span Extraction (PyTorch)

Let's get started!

About

Releases

Packages

Languages

enzoampil/t5-intro

Folders and files

Latest commit

History

Repository files navigation

Introduction to T5 for Sentiment Span Extraction

Goals for this presentation

T5 Overview

Key points from T5 paper

Key insight

Expected impact

T5 for Sentiment Span Extraction (PyTorch)

Let's get started!

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages