## Kommunikáljunk dokumentumainkkal az `OpenAI` segítségével

- Fájl: WARRANT TO PURCHASE SHARES OF CAPITAL STOCK
- Feldolgozó algoritmus: GPT-4 (`gpt-4-0314`)

Példák
1. QA (Kérdéseket teszünk fel a szerződésről)
2. Instrukció (Átírunk, hozzáadunk szövegrészeket)
3. Információ-kinyerés (Kulcsszavakat, dátumokat, szerződő feleket beazonosítjuk)
4. Kivonatolás (Röviden összefoglaljuk a szerződés tartalmát)

In [1]:
import credentials
import time
import re
import os
os.environ["OPENAI_API_KEY"] = credentials.openai_api

import openai
import tiktoken

from langchain.document_loaders import PyPDFLoader

### 1. QA (kérdezzünk a szövegről!)

In [2]:
loader = PyPDFLoader("../docs/Law_Insider_eidos-therapeutics-inc_contract_Filed_23-03-2018_Contract.pdf")
docs = loader.load_and_split()

In [6]:
len(docs)

9

9 részre tagolta a PDF-et automatikusan, sok modell ugyanis nem képes befogadni egészben szövegeket. A `GPT-4` 'kontextusa' 8 000 token, ami azt jelenti, hogy a `prompt` + `completion` összesen nem haladhatja meg a ~8 000 szót. Ebbe biztosan beférünk, tehát az egész szerződést egyszerre be tudjuk adni neki

In [10]:
teljes_szoveg = ' '.join([i.page_content for i in docs])

print('Szavak száma:', len(teljes_szoveg.split()))
print('Karakterek száma:', len(teljes_szoveg))

Szavak száma: 3156
Karakterek száma: 19158


Milyen árazásra számítsunk?

GPT-4
- prompt:       $0.03 / 1K tokens
- completion:   $0.06 / 1K tokens

In [7]:
gpt_4_token_encoding = tiktoken.get_encoding("cl100k_base")
teljes_szoveg_tokenek = gpt_4_token_encoding.encode(teljes_szoveg)

prompt_template = """Use the following pieces of context to answer the users question. 
                     If you don't know the answer, just say that you don't know, don't try to make up an answer"""
prompt_tokenek = gpt_4_token_encoding.encode(prompt_template)

print(len(teljes_szoveg_tokenek))
print(len(prompt_tokenek))

4533
38


- Input tokenek száma: 4533 + 38 = 4571
- Output (completion / válasz) tokenek becslése: ~100

- Teljes költség = (4571 * 0.03 / 1000) + (100 * 0.06 / 1000) = 0.14313$ = 14.3 cent / megválaszolt kérdés
- Ebből 
    - prompt: 13.7 cent
    - completion: 0.6 cent


Durván 50 Ft-ba kerül 1 kérdés és megválaszolása `GPT-4` modellel. Ez azért van, mert jelenleg a teljes dokumentumot megadjuk neki kontextusként. Hatékonyabb módszer, ha kiválasztjuk a legrelevánsabb szövegrészeket, és csak azokat pakoljuk be a prompt-ba. 

In [11]:
def direct_OpenAI_QA(context, question, model = 'gpt-4'):

    prompt = f"""Use the following pieces of context to answer the users question. 
    If you don't know the answer, just say that you don't know, don't try to make up an answer

    {context}

    Question: 
    {question}

    Answer:"""

    message = [{"role": "user", "content": prompt}]

    result = openai.ChatCompletion.create(model=model,  
                                          messages = message, 
                                          temperature=0, 
                                          top_p=1)
    
    completion = result['choices'][0]['message']['content']
    usage_info = result['usage'].to_dict()

    return completion, usage_info

In [12]:
queries = ['What is this document about?',
           'What is the title of the contract?',
           'Who are the agreeing parties?',
           'Which law governs the contract?',
           'How many shares and at what price can be purchased?',
           'Upon net exercising, what is the formula used?',
           'Who are the representatives who signed the contract?',
           'Are there any attachments, amendments or any supplementary material included?',
           'Can you identify any risk factors?',
           'When was Michael Jordan born?']

In [13]:
usage_info = []

for query in queries:

    print('Q:', query)
    print('\n')

    _c, _u = direct_OpenAI_QA(teljes_szoveg, query)
    usage_info.append(_u)

    print(_c)
    print('\n', '-------' * 10)

Q: What is this document about?


This document is a Warrant to Purchase Shares of Capital Stock issued by Eidos Therapeutics, Inc. to BridgeBio Pharma LLC. It outlines the terms and conditions for the holder (BridgeBio Pharma LLC) to purchase a specified number of shares of the company's capital stock at a predetermined exercise price, subject to adjustments and other provisions.

 ----------------------------------------------------------------------
Q: What is the title of the contract?


The title of the contract is "WARRANT TO PURCHASE SHARES OF CAPITAL STOCK".

 ----------------------------------------------------------------------
Q: Who are the agreeing parties?


The agreeing parties are Eidos Therapeutics, Inc. and BridgeBio Pharma LLC.

 ----------------------------------------------------------------------
Q: Which law governs the contract?


This Warrant shall be governed by and construed in accordance with the Delaware General Corporation Law as to matters within the scop

Minden válasz helyesnek bizonyul, ráadásul a "kakukktojás" kérdésen sem hasalt el: kontextusból nem szabadna tudnia, mikor született Michael Jordan, és azt nem is válaszolja meg. Természeseten erre a kérdésre biztosan tudja egyébként a választ, viszont ha nem tudná, 'hallucinálni' kezdene, amit így elkerültünk (prompt-ban utasítottuk, hogy ha kizárólag a kontextusból nem tudja kivenni az infót, akkor mondja meg)

Mennyibe került mindez?

In [21]:
prompt_koltseg = sum([i['prompt_tokens'] for i in usage_info]) * 0.03 / 1000
completion_koltseg = sum([i['completion_tokens'] for i in usage_info]) * 0.06 / 1000

print('Prompt teljes költsége:', prompt_koltseg)
print('Completion teljes költsége:', completion_koltseg)
print('Teljes költség:', prompt_koltseg + completion_koltseg)

Prompt teljes költsége: 1.37856
Completion teljes költsége: 0.030899999999999997
Teljes költség: 1.40946


Összesen $1.4-et költöttünk el minden megválaszolásához

Egyesével kell feltenni a kérdéseket? Mi van, ha a promptba több kérdést sorolunk fel, és utasítjuk a modell, miképp válaszoljon

In [22]:
queries_as_a_whole = '\n'.join([str(i+1) + '. ' + j for i, j in zip(range(len(queries)), queries)])
print(queries_as_a_whole)

1. What is this document about?
2. What is the title of the contract?
3. Who are the agreeing parties?
4. Which law governs the contract?
5. How many shares and at what price can be purchased?
6. Upon net exercising, what is the formula used?
7. Who are the representatives who signed the contract?
8. Are there any attachments, amendments or any supplementary material included?
9. Can you identify any risk factors?
10. When was Michael Jordan born?


In [27]:
def direct_OpenAI_multi_QA(context, question, model = 'gpt-4'):

    prompt = f"""Use the following pieces of context to answer the questions at the end. 
    If you don't know the answer based on the context, just say that you don't know, don't try to make up an answer.

    {context}

    List of questions: 
    {question}

    Answers to each question:"""

    message = [{"role": "user", "content": prompt}]

    result = openai.ChatCompletion.create(model=model,  
                                          messages = message, 
                                          temperature=0, 
                                          top_p=1)
    
    completion = result['choices'][0]['message']['content']
    usage_info = result['usage'].to_dict()

    return completion, usage_info

In [28]:
_c, _u = direct_OpenAI_multi_QA(teljes_szoveg, queries_as_a_whole)

In [29]:
print(_c)

1. This document is about a warrant to purchase shares of capital stock in Eidos Therapeutics, Inc.
2. The title of the contract is "Warrant to Purchase Shares of Capital Stock."
3. The agreeing parties are Eidos Therapeutics, Inc. and BridgeBio Pharma LLC.
4. The contract is governed by the Delaware General Corporation Law and the internal laws of the State of California.
5. The number of shares that can be purchased is determined by dividing $4,000,000 by the applicable Exercise Price, rounded down to the nearest whole share. The Exercise Price depends on whether the Warrant is exercisable for shares of New Preferred Stock or Series Seed Preferred Stock, as specified in Section 1(b) of the document.
6. Upon net exercising, the formula used is: X = Y(A - B) / A, where X is the number of Warrant Shares to be issued, Y is the number of Warrant Shares purchasable under the Warrant, A is the fair market value of one Warrant Share, and B is the Exercise Price.
7. The representatives who si

In [30]:
prompt_koltseg = _u['prompt_tokens'] * 0.03 / 1000
completion_koltseg = _u['completion_tokens'] * 0.06 / 1000

print('Prompt teljes költsége:', prompt_koltseg)
print('Completion teljes költsége:', completion_koltseg)
print('Teljes költség:', prompt_koltseg + completion_koltseg)

Prompt teljes költsége: 0.14103
Completion teljes költsége: 0.018779999999999998
Teljes költség: 0.15980999999999998


Az előző 1.4 USD helyett most csak 16 centbe (12%-a) került az API hívás. A válaszok pontossága és minősége viszont valamelyest romlott, igaz csekély mértékben. A jobb válaszok érdekében tehár érdemes egyszerre 1 kérdést feltenni

### 2. Szöveg módosítása / kiegészítése

In [34]:
def direct_OpenAI_Instruct(context, instruction, model = 'gpt-4'):

    prompt = f"""Use the following pieces of context to complete the given task. 
    You do not need to modify the original text, just come up with the suggested modifications.

    {context}

    Instruction: 
    {instruction}

    Completion: """

    message = [{"role": "user", "content": prompt}]

    result = openai.ChatCompletion.create(model=model,  
                                          messages = message, 
                                          temperature=0, 
                                          top_p=1)
    
    completion = result['choices'][0]['message']['content']
    usage_info = result['usage'].to_dict()

    return completion, usage_info

In [39]:
text_gen_instructions = [
    'Write up a clause about market volatility: upon a 20% drop in S&P500 contract shall be terminated',
    'Suggest a draft for a third option to the number of shares: upon an earthquake, price shall be only $1.1',
    'Rewrite the governing law section: Texas law is the only correct one']

In [40]:
for instruction in text_gen_instructions:

    print('I:', instruction)
    print('\n')

    _c, _ = direct_OpenAI_Instruct(teljes_szoveg, instruction)

    print(_c)
    print('\n', '-------' * 10)

I: Write up a clause about market volatility: upon a 20% drop in S&P500 contract shall be terminated


Market Volatility Termination Clause: In the event that the S&P 500 index experiences a decline of 20% or more from its highest closing value within the previous 12-month period, this Warrant shall automatically terminate and be of no further force or effect. The termination shall be effective as of the close of business on the date of such decline in the S&P 500 index.

 ----------------------------------------------------------------------
I: Suggest a draft for a third option to the number of shares: upon an earthquake, price shall be only $1.1


(iii) if the Warrant is exercisable for shares upon the occurrence of an earthquake, as set forth in clause (c) of Section 2 below, the Exercise Price shall be $1.1 per share.

 ----------------------------------------------------------------------
I: Rewrite the governing law section: Texas law is the only correct one


10. Governing Law.

Használható, emberszerűen megfogalmazott módosítási javaslatokatt tett a modell

### 3. Információ kinyerés

In [41]:
def direct_OpenAI_InfExtr(context, model = 'gpt-4'):

    prompt = f"""Use the following pieces of context to complete the task at the end. 
    If you don't know the answer, just say that you don't know, don't try to make up an answer.

    {context}

    Task:
    Extract all keywords, numbers, dates, organizations, people, any other important parts.
    List them as key-value pairs, organize them into categories.

    Extracted information: """

    message = [{"role": "user", "content": prompt}]

    result = openai.ChatCompletion.create(model=model,  
                                          messages = message, 
                                          temperature=0, 
                                          top_p=1)
    
    completion = result['choices'][0]['message']['content']
    usage_info = result['usage'].to_dict()

    return completion, usage_info

In [42]:
_a, _u = direct_OpenAI_InfExtr(teljes_szoveg)

print(_a)

Dates:
- Date of Issuance: February 22, 2018
- Void after: February 21, 2021

Organizations:
- Eidos Therapeutics, Inc.
- BridgeBio Pharma LLC

People:
- Christine Siu (Chief Financial Officer of Eidos Therapeutics, Inc.)
- Neil Kumar (Chief Executive Officer of BridgeBio Pharma LLC)

Keywords:
- Warrant
- Securities Act of 1933
- Purchase Agreement
- Holder
- Exercise Price
- Warrant Shares
- New Preferred Stock
- Series Seed Preferred Stock
- Qualified Financing
- Deemed Liquidation Event
- IPO (Initial Public Offering)
- Net Exercise
- Delaware General Corporation Law
- State of California

Numbers:
- Warrant Shares: $4,000,000
- Exercise Price for Series Seed Preferred Stock: $1.3248 per share
- Notice period for Record Date, Deemed Liquidation Event, or IPO: 10 days

Important Parts:
- Warrant and securities not registered under the Securities Act of 1933
- Holder entitled to purchase Warrant Shares upon surrender of the Warrant
- Exercise Period commences upon Qualified Financing

### 4. Szövegkivonatolás

In [43]:
def direct_OpenAI_Summ(context, model = 'gpt-4'):

    prompt = f"""Write a concise summary of the following. 
    Include all financial data, such as the number and price of stocks, agreeing parties, important dates.

    {context}

    CONCISE SUMMARY:"""

    message = [{"role": "user", "content": prompt}]

    result = openai.ChatCompletion.create(model=model,  
                                          messages = message, 
                                          temperature=0, 
                                          top_p=1)
    
    completion = result['choices'][0]['message']['content']
    usage_info = result['usage'].to_dict()

    return completion, usage_info

In [45]:
_s, _u = direct_OpenAI_Summ(teljes_szoveg)

print(_s)

Eidos Therapeutics has issued a warrant to BridgeBio Pharma, allowing the purchase of up to $4,000,000 worth of Eidos' capital stock. The warrant is valid from February 22, 2018, to February 21, 2021. The purchase price for the warrant shares is dependent on whether the warrant is exercisable for shares of New Preferred Stock or Series Seed Preferred Stock.


Nézzük meg mire képes ha más nyelvű outputot kérünk tőle

In [47]:
def direct_OpenAI_Summ_output_language(context, model = 'gpt-4', language = 'Hungarian'):

    prompt = f"""Write a concise summary of the following. 
    Include all financial data, such as the number and price of stocks, agreeing parties, important dates.

    {context}

    CONCISE SUMMARY IN {language}:"""

    message = [{"role": "user", "content": prompt}]

    result = openai.ChatCompletion.create(model=model,  
                                          messages = message, 
                                          temperature=0, 
                                          top_p=1)
    
    completion = result['choices'][0]['message']['content']
    usage_info = result['usage'].to_dict()

    return completion, usage_info

In [48]:
_h, _u = direct_OpenAI_Summ_output_language(teljes_szoveg, language='HUNGARIAN')

print(_h)

Az Eidos Therapeutics, Inc. kibocsát egy részvényvásárlási jogot a BridgeBio Pharma LLC részére, amely 2018. február 22-én lép életbe és 2021. február 21-én jár le. A jogosultság értéke 4 000 000 dollár, amelyet a megfelelő gyakorlási ár osztva határoz meg. A részvények vásárlásának gyakorlási ára a következőképpen alakul: (i) ha a jogosultság új preferált részvényekre vonatkozik, akkor a minősített finanszírozás során értékesített új preferált részvények legalacsonyabb ára, (ii) ha a jogosultság a sorozat vetőmag preferált részvényekre vonatkozik, akkor 1,3248 dollár részvényenként. A jogosultság gyakorlási időszaka a minősített finanszírozás végrehajtásától vagy a tőkébe fektetési eseménytől számított három évig tart.


In [49]:
_r, _u = direct_OpenAI_Summ_output_language(teljes_szoveg, language='RUSSIAN')

print(_r)

Выставка 4.4
Дата выпуска: 22 февраля 2018 года. Срок действия: до 21 февраля 2021 года. Eidos Therapeutics, Inc. выпускает гарантию на покупку акций капитала для BridgeBio Pharma LLC. Согласно соглашению о покупке заметок от 22 февраля 2018 года, держатель гарантии имеет право приобрести у компании акции на сумму до 4 000 000 долларов США по соответствующей цене акций. Гарантия может быть исполнена в течение срока, начинающегося с даты заключения квалифицированного финансирования или непосредственно перед завершением события ликвидации, и заканчивается через три года после начала исполнения или при завершении события ликвидации или IPO.
