# PromptSource 

PromptSource provides IDE in HuggingFace to develop/review Prompts from Huggingface Datasets, and toolkit (github) to use the prompts that have been developed.


* [PromptSource Github](https://github.com/bigscience-workshop/promptsource)

> PromptSource is a toolkit for creating, sharing and using natural language prompts.
> 
> Recent work has shown that large language models exhibit the ability to perform reasonable zero-shot generalization to new tasks. For instance, GPT-3 demonstrated that large language models have strong zero- and few-shot abilities. FLAN and T0 then demonstrated that pre-trained language models fine-tuned in a massively multitask fashion yield even stronger zero-shot performance. A common denominator in these works is the use of prompts which have gathered of interest among NLP researchers and engineers. This emphasizes the need for new tools to create, share and use natural language prompts.
> 
> Prompts are functions that map an example from a dataset to a natural language input and target output PromptSource contains a growing collection of prompts (which we call P3: Public Pool of Prompts). As of January 20, 2022, there are ~2'000 English prompts for 170+ English datasets in P3.

* [API_DOCUMENTATION](https://github.com/bigscience-workshop/promptsource/blob/main/API_DOCUMENTATION.md)

> PromptSource implements 4 classes to store, manipulate and use prompts and their metadata: ```Template```, ```Metadata```, ```DatasetTemplates``` and ```TemplateCollection```. All of them are implemented in templates.py
> ### Class DatasetTemplates
> DatasetTemplates is a class that wraps all the prompts (each of them are instances of Template) for a specific dataset/subset and implements all the helper functions necessary to read/write to the YAML file in which the prompts are saved.

* [PromptSource - an IDE and repository for natural language prompts](https://www.youtube.com/watch?v=gIthK9J52IM)

> The **Public Pool of Prompts" (P3)** gathered with PromptSource (as of September 2022) includes spans more than 2000 prompts spanning 180 datasets.  
> 
> It's useful to note that creating prompts is quite different to traditional NLP annotation in several ways: prompts are functions (not labels), they apply to datasets (not examples) and variation between prompts is desirable (rather than a nuisance). PromptSource proposes a simple workflow to meet these challenges. It also works well with 🤗 Datasets so it can be applied to a wide range of existing datasets.
> 
> To support flexible prompt editing, the Jinja2 template engine is used. This is a bit more flexible than a rule-based approach but easier to analyse than pure Python code.

* [2022-09-promptsource.pdf](./2022-09-promptsource_ide_for_nlp.pdf)

<img src="./image/prompt_source_usage.png" align="left" width=500/>

# PromptSource UI @ HuggingFace 

Prompts developed for the Huggingface [AWS product review dataset](https://huggingface.co/datasets/amazon_us_reviews) on wireless category for multi label classification.

* [bigscience/promptsource](https://huggingface.co/spaces/bigscience/promptsource)

Prompt source provides multiple templates (select in **prompt name** box) for a dataset for different ML tasks e.g. multi-label classification, summarization, etc. 

<img src="./image/huggngface_promptsource.png" align="left"/>

### Example prompt

<img src="./image/huggingface_promptsource_example.png" align="left" width=750/>

# Installation

In [1]:
!pip install promptsource --quiet

# Huggingface Dataset for Prompts

In [1]:
from datasets import load_dataset
from promptsource.templates import DatasetTemplates

In [2]:
DATASET_NAME: str = "rotten_tomatoes"

In [3]:
train = load_dataset(DATASET_NAME, split="train")
example = train[0]

Found cached dataset rotten_tomatoes (/Users/oonisim/.cache/huggingface/datasets/rotten_tomatoes/default/1.0.0/40d411e45a6ce3484deed7cc15b82a53dad9a72aafd9f86f8f227134bec5ca46)


# Prompt Templates for the Huggingface Dataset

In [4]:
templates = DatasetTemplates(
    # The dataset_name should be known/accepted Huggingface dataset name.
    # Then how can we use PromptSource for new dataset?
    dataset_name=DATASET_NAME   
)  
templates.all_template_names

['Movie Expressed Sentiment',
 'Movie Expressed Sentiment 2',
 'Reviewer Enjoyment',
 'Reviewer Enjoyment Yes No',
 'Reviewer Expressed Sentiment',
 'Reviewer Opinion bad good choices',
 'Reviewer Sentiment Feeling',
 'Sentiment with choices ',
 'Text Expressed Sentiment',
 'Writer Expressed Sentiment']

In [5]:
template = templates['Sentiment with choices ']
template.get_name()

'Sentiment with choices '

In [6]:
type(template)

promptsource.templates.Template

# Prompts

In [9]:
prompts = template.apply(example)

In [10]:
print(prompts[0])

the rock is destined to be the 21st century's new " conan " and that he's going to make a splash even greater than arnold schwarzenegger , jean-claud van damme or steven segal . 
Is this review positive or negative?


In [11]:
prompts

['the rock is destined to be the 21st century\'s new " conan " and that he\'s going to make a splash even greater than arnold schwarzenegger , jean-claud van damme or steven segal . \nIs this review positive or negative?',
 'positive']

In [12]:
template.apply(train[:2])

['[\'the rock is destined to be the 21st century\\\'s new " conan " and that he\\\'s going to make a splash even greater than arnold schwarzenegger , jean-claud van damme or steven segal .\', \'the gorgeously elaborate continuation of " the lord of the rings " trilogy is so huge that a column of words cannot adequately describe co-writer/director peter jackson\\\'s expanded vision of j . r . r . tolkien\\\'s middle-earth .\'] \nIs this review positive or negative?',
 '']