## API key

### Get one

To run this code, you need an API key from Open AI. This involves giving them your credit card and setting up spending limits. 

### Using it

I run this file locally via Jupyterlab, so it's in a folder with `gpt_api.txt` which contains my API key. 

To run this file in Google Colab, you _could_ directly type your API key into the notebook below, **but this is a bad idea.** 

Instead, one common way is to store the API key in a file on your Google Drive and then access it from the Colab notebook. Here's how you can do it:

1.    Create a new text file on your Google Drive and store your API key in it. Name the file something like `gpt_api.txt`.
1.    Mount your Google Drive to the Google Colab notebook by running the following code block.
    ```python
    import openai
    from google.colab import drive
    drive.mount('/content/drive')
    with open('/content/drive/gpt_api.txt', 'r') as f:
        openai.api_key = f.read().strip()
    ```
1.     This will prompt you to click on a link to authorize the connection. Follow the instructions, and copy the authorization code into the input box that appears in the Colab notebook. You can now continue on. 

In [1]:
# !pip install openai 

In [4]:
import openai

# don't type the key in this file! open it from file that is in gitignore, github secrets, or in your google drive

with open('gpt_api.txt', 'r') as f:
    openai.api_key = f.read().strip()

## Define key functions to do the lift

Read this

https://platform.openai.com/docs/guides/chat/introduction


In [None]:
# I'm not sure which model the below is, but it's not the super cheap gpt-3.5-turbo

# the cheaper option is something like this:
openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
        {"role": "user", "content": "Where was it played?"}
    ]
)

In [48]:
# gpt 4.0 wrote this mostly

import os
import glob

import numpy as np
import pandas as pd
from IPython.display import (  # used during dev - display(Markdown(markdown_table)) prints nice
    Markdown,
    display,
)
from tqdm import tqdm
from bs4 import BeautifulSoup

# Set Pandas display options to show full string
pd.set_option("display.max_colwidth", None)

def ask_openai(question, data):
    prompt = f"{data}\n---\n{question}"
    response = openai.Completion.create(
        engine="text-davinci-002",
        prompt=prompt,
        max_tokens=70,
        n=1,
        stop=None,
        temperature=0.5,
    )
    return response.choices[0].text.strip()

def parse_file(filename):

    # Define your question related to the loan application
    question = "Output a tab separated list containing two items: the name of the buyer, and the name of the seller."

    # remove the html
    with open(filename, "r") as fp:
        raw = BeautifulSoup(fp.read(), 'html.parser').get_text()

    return ask_openai(question, raw[:1850])

In [49]:
file_sentence_dict = {}
files = glob.glob("inputs/*") #get all the files in the inputs folder

for file in tqdm(files,total=len(files)):
    file_sentence_dict.update({file: parse_file(file)}) #update the dictionary 

100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.54it/s]


In [50]:
file_sentence_dict

{'inputs\\ex10-11.txt': 'Baxter Healthcare Corporation\tCFC International, Inc.',
 'inputs\\ex10.txt': 'Baxter Healthcare Corporation\tCFC International'}

## Examine output

In [51]:
df = pd.DataFrame(file_sentence_dict.items(), columns=['document', 'buyer_seller'])
df[['buyer', 'seller']] = df['buyer_seller'].str.split('\t', expand=True)
df = df.drop('buyer_seller', axis=1)
df


Unnamed: 0,document,buyer,seller
0,inputs\ex10-11.txt,Baxter Healthcare Corporation,"CFC International, Inc."
1,inputs\ex10.txt,Baxter Healthcare Corporation,CFC International


## Fermi estimate  of the project cost

Price: 0.002 per 1k tokens **in reply**

So

Cost = # docs * # tokens in reply per doc * 0.002/1000

The reply above was 10 tokens:

In [22]:
# !pip install --upgrade tiktoken

In [23]:
# open AI's tokenizer

import tiktoken
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
sent = 'Baxter Healthcare Corporation\tCFC International, Inc.'
len(encoding.encode(sent))


10

We have 0.25 million docs. 

## THE ESTIMATE, IN DOLLARS

In [25]:
files = 250000
toks_per = 10
cost_per_tok = 0.002/1000

files*toks_per*cost_per_tok

5.0

I can't eve believe that.

## Speed and rate limits

I sent 1850 characters, which OpenAI says is 376 tokens. 

In [28]:
len(encoding.encode('''<FILENAME>ex10.txt
<DESCRIPTION>CFC INTERNATIONAL, INC.-BAXTER PURCHASE AGREEMENT
<TEXT>
Exhibit 10.9


                               PURCHASE AGREEMENT

         This Agreement, effective March 1, 2001 is between CFC International, a
Delaware corporation, with offices at 500 State Street, Chicago Heights,
Illinois 60411 ("Seller") and Baxter Healthcare Corporation, a Delaware
corporation, with offices at One Baxter Parkway, Deerfield, Illinois 60015 on
behalf or its self and its affiliates (entities controlling, controlled by, or
under common control with Baxter)("Buyer").

                                 1.0 Background


         1.1 Seller produces hot stamping foil which conforms and meets the
Specification Requirements submitted, accepted and in Seller's possession for
the Specification numbers listed attached in the Exhibit A., hereafter referred
to as "Products". Product Specifications may be revised from time to time and
new Specifications and numbers added by mutual agreement between parties. Buyer
requires foil for use in printing flexible packaging.


                                2.0 Distribution


         2.1 Subject to the terms and conditions of this Agreement, Seller shall
manufacture and sell Products to Buyer, and Buyer shall purchase Products for
manufacture into goods for use or resale in any country in the world. Buyer
agrees to purchase all their global foiling requirements from seller, or as
stated in Section 13.2.


                            3.0 Shipment of Products


         3.1 Seller will ship Products, F.O.B. Seller's facility, freight
collect, to locations specified by Buyer and via carriers specified by Buyer.

         3.2 Seller agrees to maintain negotiated consignment inventory at
Baxter's locations per specific plant consignment agreements.
'''))

306

Input token rate limit is 60000 per minute:

In [37]:
# can do this many contracts per minute 
(
    60000 # tokens limit per nute
    /
    400   # conservative guess tokens per contract 
)


150.0

Only allowed to do 60 requests a second. But a single request can "batch" multiple prompts. 

So, 50 times a minute, send 3 contracts. sleep(1.2) between calls. 

In [53]:
# contracts per day
(
    # num contracts per minute
    (
        60000 # tokens per minute
        /
        400   # tokens per contract (if the lenght above is kept)
    )
    *
    60*24 # minutes in a day
) 

216000.0