In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import open_scoring as ocs

## Load a GPT-3 Scorer

GPT-3 is a text mining architecture provided by OpenAI - our finetuned models are hosted on their systems and accessed through an API. The benefit of this is accessibility - you don't need fancy systems to run the large language model, because *they* have the model on a fancy system that you can talk to. The downside is that there are costs involve to using it, and you need an account with OpenAI to do so.

### Creating an API Key

1. Sign up for an [account a openai.com](https://beta.openai.com/signup)
2. Make an API Key here: https://beta.openai.com/account/api-keys
3. Run the code below, and paste your API key when prompted.

In [6]:
# point to a text file with your openai key
# create one here: 
scorer = ocs.scoring.GPT_Scorer()

Here's how you use the scorer (it defaults to the *ada* model):

In [7]:
scorer.originality(target='brick', response='paperweight')

1.3

Or simply:

In [8]:
scorer.originality('brick', 'use for a clock pendulum')

3.0

## Using a different model

A number of models are included and pre-trained:

In [9]:
scorer.models

['ada', 'babbage', 'curie', 'davinci']

To use a difference model, supply a `model` argument:

In [125]:
scorer.originality('brick', 'use for a clock pendulum', model='babbage')

3.0

Bigger models are costlier. Davinci works best, but can score only about `450` responses for a dollar, where ada can score 34000 and babbage can score about 23000. 'babbage' or 'curie' are a good trade-off in cost vs performance.

Prices are listed at https://openai.com/api/pricing/, under "Fine-tuned models" > "Usage". Here is howhow many responses could be scored for a dollar, based on past studies performed in Summer 2022:

| model        |   responses/dollar |
|:-------------|-------------------:|
| gpt3-ada     |              33966 |
| gpt3-babbage |              22644 |
| gpt3-curie   |               4529 |
| gpt3-davinci |                453 |

## Scoring Many Responses

Here's how to score many responses at once:

In [127]:
scorer.originality_batch(['brick', 'rope'], ['use as a paperweight', 'dip the end in sugar and use to lure a raccoon closer to you'])

100%|██████████| 1/1 [00:00<00:00,  8.02it/s]


[1.5, 3.3]

If you're working from a DataFrame of data - a popular data science structure in Python - here's how you might score as a batch. First, I'll create a sample DataFrame. In a real-world setting you might load this data, with a function like [`pd.read_excel`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html) or [`pd.read_csv`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html).

In [12]:
# This is just sample data
import pandas as pd
df = pd.DataFrame([['brick', 'use as a paperweight'], ['rope', 'dip the end in sugar and use to lure a raccoon closer to you']], columns=['prompt', 'response'])
df

Unnamed: 0,prompt,response
0,brick,use as a paperweight
1,rope,dip the end in sugar and use to lure a raccoon...


Here's how that DataFrame may be scored.

In [13]:
df['scores'] = scorer.originality_batch(df.prompt, df.response, model='davinci')
df

100%|██████████| 1/1 [00:02<00:00,  2.41s/it]


[1.5, 4.1]