# ü§ó Huggingface x Promptify üöÄ

In [1]:
import json

from promptify import HubModel, Prompter

### Table of content:
- **A.** First example: binary classification
- **B.** Another example: multiclass classification
- **C.** Play with options and parameters
- **D.** Production-ready API using Inference Endpoints

# A. First example: binary classification

## 1. Initialize HubModel and Prompter

`HubModel` will check if model exists on the ü§ó Hub and is a text2text-generation model. 

You can visit https://huggingface.co/models to look for a model suiting your needs. Default is [`google/flan-t5-xl`](https://huggingface.co/google/flan-t5-xl), a popular text-generation model fine-tuned on more than 1000 additional tasks covering 60 languages.

In [2]:
model = HubModel(model_id_or_url="google/flan-t5-xl")
prompter = Prompter(model)

## 2. Load samples from a binary classification dataset

In [3]:
binary_examples = json.load(open("data/binary.json",'r'))
print("Got", len(binary_examples), "binary classification examples.")

prompt_examples = []
for sample in binary_examples[:2]:
    prompt_examples.append((sample['text'], sample['labels']))
print()

Got 9 binary classification examples.



Here is the first sample. Contains a field `"text"` and a field `"labels"`.

In [4]:
binary_examples[0]

{'text': 'Eight years the republicans denied obama‚Äôs picks. Breitbarters outrage is as phony as their fake president.',
 'labels': 'negative',
 'score': '',
 'complexity': ''}

## 3. (optional) Generate a prompt for binary classification

In [5]:
print(prompter.generate_prompt(
    "binary_classification.jinja",
    label_0="positive",
    label_1="negative",
    examples=prompt_examples,
    text_input="Amazing customer service.",
    description="Binary Classification System",
))

Binary Classification System
You are a highly intelligent and accurate Binary Classification system. You take Passage as input and classify that as either positive or negative Category. Your output format is only [{'C':Category}] form, no other form.

Examples:

Input: Eight years the republicans denied obama‚Äôs picks. Breitbarters outrage is as phony as their fake president.
Output: [{'C': 'negative' }]

Input: Except he‚Äôs the most successful president in our lifetimes. He‚Äôs undone most of the damage Obummer did and set America on the right path again.
Output: [{'C': 'positive' }]

Input: Amazing customer service.
Output:


## 4. Make predictions !

### 4.1 Define your `predict` method

- Takes a text as input
- Generates a prompt (using `"binary_classification.jinja"` template and some examples)
- Send a request to HF Inference API
- Search for a label in the output

Note that the postprocessing step is yet to define. For classification tasks, looking for the label in the output can be enough. For more complex tasks (like NER), the output needs better post-processing.

In [6]:
def predict(text:str) -> str:
    output = prompter.fit(
        "binary_classification.jinja",
        label_0="positive",
        label_1="negative",
        examples=prompt_examples,
        text_input=text,
    )
    if "positive" in output:
        return "positive"
    elif "negative" in output:
        return "negative"
    return "unknown"

### 4.2 Test it!

In [7]:
for item in binary_examples:
    prediction = predict(item["text"])
    print("\n")
    print(item["text"])
    print("Expected:   ", item["labels"])
    print("Prediction: ", prediction, "‚úÖ" if prediction == item["labels"] else "‚ùå")



Eight years the republicans denied obama‚Äôs picks. Breitbarters outrage is as phony as their fake president.
Expected:    negative
Prediction:  negative ‚úÖ


Except he‚Äôs the most successful president in our lifetimes. He‚Äôs undone most of the damage Obummer did and set America on the right path again.
Expected:    positive
Prediction:  positive ‚úÖ


So disappointed in wwe summerslam! I want to see john cena wins his 16th title
Expected:    negative
Prediction:  negative ‚úÖ


Looking forward to going to Carrow Rd tonight. Last time we were there\u002c Bale scored 2 and we were 3rd. Do not want extra time though
Expected:    positive
Prediction:  positive ‚úÖ


It's a good day at work when you get to shake Jim Lehrer's hand. Thanks, @user Still kicking myself for being to shy to hug
Expected:    positive
Prediction:  positive ‚úÖ


Trumpism likewise rests on a bed of racial resentment that was made knowingly and intentionally, long before Trump got into politics.
Expected:    ne

# B. Another example: multiclass classification

## 1. Initialize HubModel and Prompter

Already done in part A.

## 2. Load samples from a multiclass classification dataset

In [8]:
multiclass_examples = json.load(open("data/multiclass.json",'r'))
print("Got", len(multiclass_examples), "multiclass classification examples.")
labels = set(sample['category'] for sample in multiclass_examples)
print("Labels are:", labels)

Got 10 multiclass classification examples.
Labels are: {'surprise', 'worry', 'joy', 'hate', 'neutral', 'sadness'}


## 3. (optional) Generate a prompt for multiclass classification

In [9]:
print(
    prompter.generate_prompt(
        "multiclass_classification.jinja",
        labels=labels,
        text_input=multiclass_examples[0]["text"],
    )
)

You are a highly intelligent and accurate Multiclass Classification system. You take Passage as input and classify that as one of the following appropriate Categories:
{'surprise', 'worry', 'joy', 'hate', 'neutral', 'sadness'}
Your output format is only [{{'C': Appropriate Category from the list of provided Categories}}] form, no other form.

Input: I ate Something I don't know what it is... Why do I keep Telling things about food
Output:


## 4. Make predictions !

### 4.1 Define your `predict` method

- Takes a text as input
- Generates a prompt (using `"multiclass_classification.jinja"` template and your labels)
- Send a request to HF Inference API
- Search for a label in the output

Note that the postprocessing step is yet to define. For classification tasks, looking for the label in the output can be enough. For more complex tasks (like NER), the output needs better post-processing.

In [10]:
def predict(text:str) -> str:
    output = prompter.fit("multiclass_classification.jinja", labels=labels, text_input=text)
    for label in labels:
        if label in output:
            return label
    return "unknown"

### 4.2 Test it!

In [11]:
for item in multiclass_examples:
    prediction = predict(item["text"])
    print("\n")
    print(item["text"])
    print("Expected:   ", item["category"])
    print("Prediction: ", prediction, "‚úÖ" if prediction == item["category"] else "‚ùå")



I ate Something I don't know what it is... Why do I keep Telling things about food
Expected:    worry
Prediction:  neutral ‚ùå


Here's to the start of a great adventure. Niners today, Alaska tomorrow.
Expected:    joy
Prediction:  neutral ‚ùå


It is so annoying when she starts typing on her computer in the middle of the night!
Expected:    hate
Prediction:  hate ‚úÖ


Chocolate milk is so much better through a straw. I lack said straw
Expected:    neutral
Prediction:  neutral ‚úÖ


I want to buy this great album but unfortunately i dont hav enuff funds  its &quot;long time noisy&quot;
Expected:    sadness
Prediction:  neutral ‚ùå


dont wanna work 11-830 tomorrow  but i get paid
Expected:    sadness
Prediction:  neutral ‚ùå


Oh no one minute too late! Oh well
Expected:    worry
Prediction:  neutral ‚ùå


2 days of this month left, and I only have 400MB left on my onpeak downloads.
Expected:    surprise
Prediction:  neutral ‚ùå


my last tweet didn't send  bad phone
Expected:    ne

# C. Play with options and parameters

## Model parameters

Your results might not be satisfying on the first try. You can play with the model parameters to try to get a better output. Text generation models are based on the `transformers` library, especially the "text-generation" pipeline. A detailed list of all parameters supported by this pipeline can be found on the [ü§ó text generation doc page](https://huggingface.co/docs/api-inference/detailed_parameters#text-generation-task).

Here is the list of parameters you can use, with their type and their default value:
```py
top_k: Optional[int] = None
top_p: Optional[float] = None
temperature: float = 1.0
repetition_penalty: Optional[float] = None
max_new_tokens: Optional[int] = None
max_time: Optional[float] = None
num_return_sequences: int = 1
do_sample: bool = True
```

For classication task, those parameters might not be interested. However for tasks like summarization, you might want to look at parameters such as `max_new_tokens`.

## Options

In addition to the pipeline parameters, you can set Inference API options. In particular:
- `wait_for_model` (bool, defaults to `True`): Either you want to wait for the model to be loaded in the Inference API. Popular models are often already loaded but more specific models have to be pre-heated before being able to use them.
- `use_cache` (bool, defaults to `True`): There is a cache layer on the inference API to speedup requests we have already seen. Most models can use those results as is as models are deterministic (meaning the results will be the same anyway). However if you use a non deterministic model, you can set this parameter to prevent the caching mechanism from being used resulting in a real new query.

In [12]:
# In this example, if the model was not already loaded, an HTTPError (503 Service Unavailable)
# would have been raised. Also, the output is not deterministic so result might change between reruns.

model.run(
    prompts="What is the sum of four and five?",
    wait_for_model=False,
    use_cache=False,
)

['9']

# D. Production-ready API using Inference Endpoints

The ü§ó Inference API is a free-to-use tool to quickly try a large panel of open-source models. It contains a large free-tier plan to play with it but is not suitable for production purposes. The solution for that is ü§ó Inference Endpoints which is a secure production-ready product to easily deploy any Transformers, Sentence-Transformers and Diffusion models hosted on the Hub on a dedicated infrastructure managed by Hugging Face. To get to know more about it and start your first Endpoint, check out the [documentation](https://huggingface.co/docs/inference-endpoints/index).

Once your Inference Endpoint is deployed, you get an URL exposing an API to your model. This URL can be directly passed to `HubModel`. Change from the free-to-use API to a production-ready solution with a single line of code!

In [13]:
model = HubModel("https://endpoint-id.region.vendor.endpoints.huggingface.cloud", api_key="hf_***")

# model.run("My text input", ...)
# Prompter(model).fit(...)