# Bulk Completions

Authors: Arjun Guha

This notebook is a demo of how to use `completions.py`.

In [1]:
from completions import AutoModel, completions
import pandas as pd

The cell below loads a model. See `completions.py` for other supported model types.

In [2]:
model = AutoModel(batch_size=50, path="bigcode/gpt_bigcode-santacoder")

In [3]:
problems = pd.DataFrame([
    { "prompt": "def factorial(n):" },
    { "prompt": "def fib(n):" }
])

settings = {
    "stop_sequences": ["\nprint", "\ndef", "\nif"],
    "temperature": 0.2,
    "top_p": 0.9,
    "max_tokens": 100,
    "prompt_prefix": "",
    "num_completions": 2,
    "model": model,
}
    


We save completions to a file to avoid regenerating duplicate completions.

In [4]:
results = completions(**settings, problems=problems, completions_path="/home/arjun/repos/codellm_tools/output.jsonl")
results

Found 4 existing completions.


  0%|          | 0/4 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:49152 for open-end generation.
100%|██████████| 4/4 [00:13<00:00,  3.38s/it]


Unnamed: 0,prompt,completion_settings,completion
0,def factorial(n):,"{'temperature': 0.2, 'top_p': 0.9, 'max_tokens...",\n if n == 0:\n return 1\n else:\...
1,def factorial(n):,"{'temperature': 0.2, 'top_p': 0.9, 'max_tokens...",\n if n == 0:\n return 1\n else:\...
2,def fib(n):,"{'temperature': 0.2, 'top_p': 0.9, 'max_tokens...",\n if n <= 1:\n return n\n return...
3,def fib(n):,"{'temperature': 0.2, 'top_p': 0.9, 'max_tokens...",\n if n == 0:\n return 0\n elif n...


Let's try to regenerate. Notice that we don't get new results.

In [5]:
results = completions(**settings, problems=problems, completions_path="output.jsonl")

  0%|          | 0/4 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:49152 for open-end generation.
100%|██████████| 4/4 [00:00<00:00,  4.40it/s]


Now let's add a problem.

In [6]:
problems = pd.DataFrame([
    { "prompt": "def factorial(n):" },
    { "prompt": "def mack(n):" }
])
results = completions(**settings, problems=problems, completions_path="output.jsonl")
results

Found 2 existing completions.


  0%|          | 0/4 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:49152 for open-end generation.
100%|██████████| 4/4 [00:01<00:00,  2.09it/s]


Unnamed: 0,prompt,completion_settings,completion
0,def factorial(n):,"{'temperature': 0.2, 'top_p': 0.9, 'max_tokens...",\n if n == 0:\n return 1\n else:\...
1,def factorial(n):,"{'temperature': 0.2, 'top_p': 0.9, 'max_tokens...",\n if n == 0:\n return 1\n else:\...
2,def mack(n):,"{'temperature': 0.2, 'top_p': 0.9, 'max_tokens...","\n """"""\n mack(n) returns the mack number..."
3,def mack(n):,"{'temperature': 0.2, 'top_p': 0.9, 'max_tokens...",\n if n == 1:\n return 1\n else:\...


Notice that we just generated two new completions.