# Quickstart

## Installation

In [None]:
%pip install "fastrepl==0.0.6"

You can find all releases [here](https://pypi.org/project/fastrepl).

## Setup FastREPL

In [2]:
# When using it in a script
import fastrepl

# When using it in a notebook
import fastrepl.repl as fastrepl

## Prepare Dataset

We will use [daily_dialog](https://huggingface.co/datasets/daily_dialog) from Huggingface.

In [None]:
import re
from datasets import load_dataset

ds = load_dataset("daily_dialog", split="test")
ds = ds.shuffle(4)
ds = ds.select(range(10))


def clean(text):
    return re.sub(r"\s+([,.'!?])", r"\1", text.strip())


def get_input(row):
    msgs = [clean(msg) for msg in row["dialog"]]
    row["sample"] = "\n".join(msgs)
    row["context"] = ""

    return row


ds = ds.map(get_input, remove_columns=["dialog", "act", "emotion"])
ds

Map:   0%|          | 0/30 [00:00<?, ? examples/s]

Dataset({
    features: ['input'],
    num_rows: 30
})

## Define Evaluator

Here, we are doing simple classifiction, but there are two interesting points.

1. You can pass nearly any model for evaluation. (Thanks to [LiteLLM](https://github.com/BerriAI/litellm)).
2. **`fastrepl` enhances accuracy by reducing [bias](/guides/dealing_with_bias.md)**. `position_debias_strategy` is one example, which ensures that the order of labels doesn't affect the outcome.

In [None]:
evaluator = fastrepl.SimpleEvaluator.from_node(
    fastrepl.LLMClassificationHead(
        model="gpt-3.5-turbo",
        context="You will receive casual conversation between two people.",
        labels={
            "FUN": "at least one of the two people try to be funny and entertain.",
            "NOT_FUN": "given conversation lacks humor or entertainment value.",
        },
        position_debias_strategy="consensus",
    ),
)

## Run Evaluator

Here are some notes about running the evaluator:

1. `ThreadPool` is used to make it faster (controlled by the `NUM_THREADS` [environment variable](/getting_started/env.md)).
2. Any errors from different LLM providers are properly handled and retried with backoff if necessary.
3. Since we passed `num=2` to `run()`, it will execute same evaluation twice, and return two results. If there are high inconsistency between the two, you'll see [InconsistentPredictionWarning](/miscellaneous/warnings_and_errors).

In [10]:
result = fastrepl.LocalRunner(evaluator=evaluator, dataset=ds).run(num=2)
result

Output()

Dataset({
    features: ['input', 'prediction'],
    num_rows: 30
})

In [5]:
result.to_pandas()

Unnamed: 0,sample,context,prediction
0,"Would you like to take a look at the menu, sir...",,"[FUN, FUN]"
1,Help! Help!\nWhat's the matter?,,"[NOT_FUN, NOT_FUN]"
2,"Whatever we do, we should do it above board.\n...",,"[FUN, FUN]"
3,"May I see your passport, please?\nCertainly. H...",,"[NOT_FUN, NOT_FUN]"
4,We're thinking about going to America.\nHave y...,,"[NOT_FUN, NOT_FUN]"
5,"Do you believe in UFOs?\nOf course, they are o...",,"[FUN, FUN]"
6,What do you think about the equipment in our c...,,"[NOT_FUN, NOT_FUN]"
7,How was your business trip?\nGreat - they wine...,,"[FUN, FUN]"
8,"Hello, Parker. How ’ s everything?\nCan ’ t co...",,"[FUN, FUN]"
9,Our toner cartridges are already out of ink......,,"[NOT_FUN, NOT_FUN]"


One interesting point to note is that, due to the `position_debias_strategy="consensus"`, if the order of the labels affects the result, `fastrepl` will return `None`. We'll be returning more meaningful value in the later version of `fastrepl`.

In [11]:
first_result = [row[0] for row in result["prediction"]]
second_result = [row[1] for row in result["prediction"]]

print(first_result.count(None))
print(second_result.count(None))

1
1


Now we got some numbers.

In [12]:
def metric(result):
    f = result.count("FUN")
    nf = result.count("NOT_FUN")
    return f / (f + nf)


print(metric(first_result))
print(metric(second_result))

0.7241379310344828
0.7241379310344828
