In [1]:
import logging
logging.basicConfig(filename='/tmp/openai_demo.log') # avoid info message flood on stderr

# Using the OpenAI API with PixelTable

PixelTable helps users unify data and computation into a table interface. This notebook shows how we can use the OpenAI API via pixeltable.
We will keep all artifacts within the `demos` directory.

In [2]:
import pixeltable as pxt
cl = pxt.Client()
cl.create_dir('demos', ignore_errors=True)
path = 'demos.chat_completion_demo'

2024-01-25 15:22:55,554 INFO env env.py:183: found database postgresql://postgres:@/pixeltable?host=/Users/orm/Library/Caches/TemporaryItems/python_PostgresServer/dc4677b93f


  from tqdm.autonotebook import tqdm


2024-01-25 15:22:55,679 INFO env env.py:194: connecting to NOS
[32m2024-01-25 15:22:55.716[0m | [1mINFO    [0m | [36mnos.server[0m:[36minit[0m:[36m131[0m - [1mInference server already running (name=nos-inference-service-cpu, image=<Image: 'autonomi/nos:0.0.9-cpu'>, id=be5acf593f3a).[0m
2024-01-25 15:22:55,717 INFO env env.py:197: waiting for NOS
2024-01-25 15:22:55,749 INFO env env.py:218: connecting to OpenAI


In [3]:
if path in cl.list_tables():
    cl.drop_table(path)

t = cl.create_table(path, {'input': pxt.StringType()})


In [4]:
# text from https://en.wikipedia.org/wiki/Global_financial_crisis_in_September_2008
wikipedia_text = '''On Sunday, September 14, it was announced that Lehman Brothers would file for bankruptcy after the Federal Reserve Bank declined to participate in creating a financial support facility for Lehman Brothers.
The significance of the Lehman Brothers bankruptcy is disputed with some assigning it a pivotal role in the unfolding of subsequent events.
The principals involved, Ben Bernanke and Henry Paulson, dispute this view, citing a volume of toxic assets at Lehman which made a rescue impossible.[16][17] Immediately following the bankruptcy, JPMorgan Chase provided the broker dealer unit of Lehman Brothers with $138 billion to "settle securities transactions with customers of Lehman and its clearance parties" according to a statement made in a New York City Bankruptcy court filing.[18]
The same day, the sale of Merrill Lynch to Bank of America was announced.[19] The beginning of the week was marked by extreme instability in global stock markets, with dramatic drops in market values on Monday, September 15, and Wednesday, September 17.
On September 16, the large insurer American International Group (AIG), a significant participant in the credit default swaps markets, suffered a liquidity crisis following the downgrade of its credit rating.
The Federal Reserve, at AIG's request, and after AIG had shown that it could not find lenders willing to save it from insolvency, created a credit facility for up to US$85 billion in exchange for a 79.9% equity interest, and the right to suspend dividends to previously issued common and preferred stock.[20]'''

sample_inputs = wikipedia_text.split('\n')

In [5]:
# example row inserted, persisted to table.
t.insert([{'input': sample_inputs[0]}])
t.show()

Inserting rows into table: 1rows [00:00, 497.60rows/s]

inserted 1 row with 0 errors 





input
"On Sunday, September 14, it was announced that Lehman Brothers would file for bankruptcy after the Federal Reserve Bank declined to participate in creating a financial support facility for Lehman Brothers."


Calling OpenAI models involves constructing a message object, which we express in pixeltable by adding a new _computed_ column, ie a column that represents a computation.

In [6]:
prompt = "For the following sentence, extract all company names from the text."

msgs = [
    { "role": "system", "content": prompt },
    { "role": "user", "content": t.input }
]

t.add_column(input_msgs=msgs)

Computing cells: 100%|██████████| 1/1 [00:00<00:00, 221.83cells/s]

added 1 column values with 0 errors





UpdateStatus(num_rows=1, num_computed_values=1, num_excs=0, updated_cols=[], cols_with_excs=[])

Unlike the values of the`input` column, which users will provide. The `t.input_msgs` column is computed automatically from the `t.input` column values:

In [7]:
t.show()

input,input_msgs
"On Sunday, September 14, it was announced that Lehman Brothers would file for bankruptcy after the Federal Reserve Bank declined to participate in creating a financial support facility for Lehman Brothers.","[{'role': 'system', 'content': 'For the following sentence, extract all company names from the text.'}, {'role': 'user', 'content': 'On Sunday, September 14, it was announced that Lehman Brothers would file for bankruptcy after the Federal Reserve Bank declined to participate in creating a financial support facility for Lehman Brothers.'}]"


OpenAI APIs can also be thought of as function. In this case, pixeltable already offers first class support for this API, so it can be used
directly as a function to define new computed columns.

(For the following command to work, you must have set `os.environ['OPENAI_API_KEY']`)

In [8]:
from pixeltable.functions.openai import chat_completion
t.add_column(chat_output = chat_completion(model='gpt-3.5-turbo', messages=t.input_msgs))
t.show()

Computing cells: 100%|██████████| 1/1 [00:00<00:00,  2.56cells/s]

added 1 column values with 0 errors





input,input_msgs,chat_output
"On Sunday, September 14, it was announced that Lehman Brothers would file for bankruptcy after the Federal Reserve Bank declined to participate in creating a financial support facility for Lehman Brothers.","[{'role': 'system', 'content': 'For the following sentence, extract all company names from the text.'}, {'role': 'user', 'content': 'On Sunday, September 14, it was announced that Lehman Brothers would file for bankruptcy after the Federal Reserve Bank declined to participate in creating a financial support facility for Lehman Brothers.'}]","{'id': 'chatcmpl-8l0XQADZzSvE8JqSmN02udJFu523M', 'model': 'gpt-3.5-turbo-0613', 'usage': {'total_tokens': 65, 'prompt_tokens': 61, 'completion_tokens': 4}, 'object': 'chat.completion', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': 'Lehman Brothers', 'tool_calls': None, 'function_call': None}, 'logprobs': None, 'finish_reason': 'stop'}], 'created': 1706214176, 'system_fingerprint': None}"


The result objects of the OpenAI api are complex, so we will extract the response. We can express extraction as another computed column:

In [9]:
t.add_column(response=t.chat_output.choices[0].message.content)
t.show()

Computing cells: 100%|██████████| 1/1 [00:00<00:00, 200.80cells/s]

added 1 column values with 0 errors





input,input_msgs,chat_output,response
"On Sunday, September 14, it was announced that Lehman Brothers would file for bankruptcy after the Federal Reserve Bank declined to participate in creating a financial support facility for Lehman Brothers.","[{'role': 'system', 'content': 'For the following sentence, extract all company names from the text.'}, {'role': 'user', 'content': 'On Sunday, September 14, it was announced that Lehman Brothers would file for bankruptcy after the Federal Reserve Bank declined to participate in creating a financial support facility for Lehman Brothers.'}]","{'id': 'chatcmpl-8l0XQADZzSvE8JqSmN02udJFu523M', 'model': 'gpt-3.5-turbo-0613', 'usage': {'total_tokens': 65, 'prompt_tokens': 61, 'completion_tokens': 4}, 'object': 'chat.completion', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': 'Lehman Brothers', 'tool_calls': None, 'function_call': None}, 'logprobs': None, 'finish_reason': 'stop'}], 'created': 1706214176, 'system_fingerprint': None}",Lehman Brothers


We can use Dataframe (eg. pandas) or SQL-like operations to view only certain columns, in this case only the plain inputs and outputs:

In [10]:
t.select(t.input, t.response).show()

input,response
"On Sunday, September 14, it was announced that Lehman Brothers would file for bankruptcy after the Federal Reserve Bank declined to participate in creating a financial support facility for Lehman Brothers.",Lehman Brothers


Once we have defined these computed columns, much like with a spreadsheet, newly inserted `t.input` values trigger computation of all derived columns, such as `t.response`:

In [11]:
t.insert([{'input': sample_inputs[i]} for i in range(1, len(sample_inputs))])

Inserting rows into table: 5rows [00:00, 978.70rows/s] 2.93cells/s]
Computing cells: 100%|██████████| 15/15 [00:03<00:00,  4.38cells/s]

inserted 5 rows with 0 errors 





UpdateStatus(num_rows=5, num_computed_values=15, num_excs=0, updated_cols=[], cols_with_excs=[])

In [12]:
t.select(t.input, t.response).show()

input,response
"On Sunday, September 14, it was announced that Lehman Brothers would file for bankruptcy after the Federal Reserve Bank declined to participate in creating a financial support facility for Lehman Brothers.",Lehman Brothers
The significance of the Lehman Brothers bankruptcy is disputed with some assigning it a pivotal role in the unfolding of subsequent events.,Lehman Brothers
"The principals involved, Ben Bernanke and Henry Paulson, dispute this view, citing a volume of toxic assets at Lehman which made a rescue impossible.[16][17] Immediately following the bankruptcy, JPMorgan Chase provided the broker dealer unit of Lehman Brothers with $138 billion to ""settle securities transactions with customers of Lehman and its clearance parties"" according to a statement made in a New York City Bankruptcy court filing.[18]","Lehman Brothers, JPMorgan Chase"
"The same day, the sale of Merrill Lynch to Bank of America was announced.[19] The beginning of the week was marked by extreme instability in global stock markets, with dramatic drops in market values on Monday, September 15, and Wednesday, September 17.","Merrill Lynch, Bank of America"
"On September 16, the large insurer American International Group (AIG), a significant participant in the credit default swaps markets, suffered a liquidity crisis following the downgrade of its credit rating.",American International Group (AIG)
"The Federal Reserve, at AIG's request, and after AIG had shown that it could not find lenders willing to save it from insolvency, created a credit facility for up to US$85 billion in exchange for a 79.9% equity interest, and the right to suspend dividends to previously issued common and preferred stock.[20]",- The Federal Reserve\n- AIG


Once computed, the computed table columns are stored and persisted, avoiding repeated computation.
Pixeltable can be used to conduct experiments, for example, to compare prompt variations:


In [13]:
t.add_column(ground_truth = pxt.StringType(nullable=True))

UpdateStatus(num_rows=6, num_computed_values=0, num_excs=0, updated_cols=[], cols_with_excs=[])

In [14]:
sample_inputs

['On Sunday, September 14, it was announced that Lehman Brothers would file for bankruptcy after the Federal Reserve Bank declined to participate in creating a financial support facility for Lehman Brothers.',
 'The significance of the Lehman Brothers bankruptcy is disputed with some assigning it a pivotal role in the unfolding of subsequent events.',
 'The principals involved, Ben Bernanke and Henry Paulson, dispute this view, citing a volume of toxic assets at Lehman which made a rescue impossible.[16][17] Immediately following the bankruptcy, JPMorgan Chase provided the broker dealer unit of Lehman Brothers with $138 billion to "settle securities transactions with customers of Lehman and its clearance parties" according to a statement made in a New York City Bankruptcy court filing.[18]',
 'The same day, the sale of Merrill Lynch to Bank of America was announced.[19] The beginning of the week was marked by extreme instability in global stock markets, with dramatic drops in market va

In [15]:
ground_truth = {
'On Sunday, September 14, it was announced that Lehman Brothers would file for bankruptcy after the Federal Reserve Bank declined to participate in creating a financial support facility for Lehman Brothers.' : 'Lehman Brothers',
 'The significance of the Lehman Brothers bankruptcy is disputed with some assigning it a pivotal role in the unfolding of subsequent events.': 'Lehman Brothers',
 'The principals involved, Ben Bernanke and Henry Paulson, dispute this view, citing a volume of toxic assets at Lehman which made a rescue impossible.[16][17] Immediately following the bankruptcy, JPMorgan Chase provided the broker dealer unit of Lehman Brothers with $138 billion to "settle securities transactions with customers of Lehman and its clearance parties" according to a statement made in a New York City Bankruptcy court filing.[18]': 'JP Morgan Chase, Lehman Brothers',
 'The same day, the sale of Merrill Lynch to Bank of America was announced.[19] The beginning of the week was marked by extreme instability in global stock markets, with dramatic drops in market values on Monday, September 15, and Wednesday, September 17.': 'Merrill Lynch, Bank of America',
 'On September 16, the large insurer American International Group (AIG), a significant participant in the credit default swaps markets, suffered a liquidity crisis following the downgrade of its credit rating.': 'American International Group',
 "The Federal Reserve, at AIG's request, and after AIG had shown that it could not find lenders willing to save it from insolvency, created a credit facility for up to US$85 billion in exchange for a 79.9% equity interest, and the right to suspend dividends to previously issued common and preferred stock.[20]": 'American International Group',
}

In [16]:
for input,gt in ground_truth.items():
    t.update({'ground_truth': gt}, where=t.input == input)

Inserting rows into table: 1rows [00:00, 786.19rows/s]
Inserting rows into table: 1rows [00:00, 1402.78rows/s]
Inserting rows into table: 1rows [00:00, 1182.16rows/s]
Inserting rows into table: 1rows [00:00, 1259.17rows/s]
Inserting rows into table: 1rows [00:00, 1319.38rows/s]
Inserting rows into table: 1rows [00:00, 1377.44rows/s]


In [17]:
t.select(t.input, t.response, t.ground_truth).show()

input,response,ground_truth
"On Sunday, September 14, it was announced that Lehman Brothers would file for bankruptcy after the Federal Reserve Bank declined to participate in creating a financial support facility for Lehman Brothers.",Lehman Brothers,Lehman Brothers
The significance of the Lehman Brothers bankruptcy is disputed with some assigning it a pivotal role in the unfolding of subsequent events.,Lehman Brothers,Lehman Brothers
"The principals involved, Ben Bernanke and Henry Paulson, dispute this view, citing a volume of toxic assets at Lehman which made a rescue impossible.[16][17] Immediately following the bankruptcy, JPMorgan Chase provided the broker dealer unit of Lehman Brothers with $138 billion to ""settle securities transactions with customers of Lehman and its clearance parties"" according to a statement made in a New York City Bankruptcy court filing.[18]","Lehman Brothers, JPMorgan Chase","JP Morgan Chase, Lehman Brothers"
"The same day, the sale of Merrill Lynch to Bank of America was announced.[19] The beginning of the week was marked by extreme instability in global stock markets, with dramatic drops in market values on Monday, September 15, and Wednesday, September 17.","Merrill Lynch, Bank of America","Merrill Lynch, Bank of America"
"On September 16, the large insurer American International Group (AIG), a significant participant in the credit default swaps markets, suffered a liquidity crisis following the downgrade of its credit rating.",American International Group (AIG),American International Group
"The Federal Reserve, at AIG's request, and after AIG had shown that it could not find lenders willing to save it from insolvency, created a credit facility for up to US$85 billion in exchange for a 79.9% equity interest, and the right to suspend dividends to previously issued common and preferred stock.[20]",- The Federal Reserve\n- AIG,American International Group


Now that we have some ground truth available, we can carry out basic evaluations of the GPT outputs, for example by 
asking ChatGPT to decide whether the two are equivalent. 

In [18]:

eval_prompt = '''Compare the following listA and listB of entities, check if they contains the same entities.
Return a json object with the following format: {"reasoning": explaining your reasoning, "decision": 1 if the lists matched, 0 otherwise
'''

t.add_column(eval_prompt=[{ "role": "system", "content": eval_prompt },
                         { "role": "user", "content": pxt.functions.str_format('listA: "{output0}" \n listB: "{output1}"',
                                                                               output0=t.response, output1=t.ground_truth)}])
t.add_column(evaluation = chat_completion(model='gpt-3.5-turbo', messages=t.eval_prompt).choices[0].message.content)

Computing cells: 100%|██████████| 6/6 [00:00<00:00, 976.29cells/s]


added 6 column values with 0 errors


Computing cells: 100%|██████████| 6/6 [00:15<00:00,  2.62s/cells]

added 6 column values with 0 errors





UpdateStatus(num_rows=6, num_computed_values=6, num_excs=0, updated_cols=[], cols_with_excs=[])

In [19]:
import json
@pxt.udf(param_types=[pxt.StringType()], return_type=pxt.JsonType())
def parse_json(json_str):
    return json.loads(json_str)

In [20]:
t.add_column(correctness=parse_json(t.evaluation).decision)

Computing cells: 100%|██████████| 6/6 [00:00<00:00, 896.63cells/s]

added 6 column values with 0 errors





UpdateStatus(num_rows=6, num_computed_values=6, num_excs=0, updated_cols=[], cols_with_excs=[])

In [21]:
t.select(t.input, t.response, t.ground_truth, t.correctness).show()

input,response,ground_truth,correctness
"On Sunday, September 14, it was announced that Lehman Brothers would file for bankruptcy after the Federal Reserve Bank declined to participate in creating a financial support facility for Lehman Brothers.",Lehman Brothers,Lehman Brothers,1
"The principals involved, Ben Bernanke and Henry Paulson, dispute this view, citing a volume of toxic assets at Lehman which made a rescue impossible.[16][17] Immediately following the bankruptcy, JPMorgan Chase provided the broker dealer unit of Lehman Brothers with $138 billion to ""settle securities transactions with customers of Lehman and its clearance parties"" according to a statement made in a New York City Bankruptcy court filing.[18]","Lehman Brothers, JPMorgan Chase","JP Morgan Chase, Lehman Brothers",1
"The same day, the sale of Merrill Lynch to Bank of America was announced.[19] The beginning of the week was marked by extreme instability in global stock markets, with dramatic drops in market values on Monday, September 15, and Wednesday, September 17.","Merrill Lynch, Bank of America","Merrill Lynch, Bank of America",1
The significance of the Lehman Brothers bankruptcy is disputed with some assigning it a pivotal role in the unfolding of subsequent events.,Lehman Brothers,Lehman Brothers,1
"On September 16, the large insurer American International Group (AIG), a significant participant in the credit default swaps markets, suffered a liquidity crisis following the downgrade of its credit rating.",American International Group (AIG),American International Group,1
"The Federal Reserve, at AIG's request, and after AIG had shown that it could not find lenders willing to save it from insolvency, created a credit facility for up to US$85 billion in exchange for a 79.9% equity interest, and the right to suspend dividends to previously issued common and preferred stock.[20]",- The Federal Reserve\n- AIG,American International Group,1
