## Kommunikáljunk dokumentumainkkal a `LangChain` & `OpenAI` segítségével

- Fájl: WARRANT TO PURCHASE SHARES OF CAPITAL STOCK
- Feldolgozó algoritmus: GPT-4 (`gpt-4-0314`)

In [5]:
import credentials
import time
import re
import os
os.environ["OPENAI_API_KEY"] = credentials.openai_api

import openai
import tiktoken

from langchain.document_loaders import PyPDFLoader

from langchain.chains.question_answering import load_qa_chain
from langchain.chains.summarize import load_summarize_chain
from langchain.prompts import PromptTemplate

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI

### 1. QA (kérdezzünk a szövegről!)

In [2]:
loader = PyPDFLoader("../docs/Law_Insider_eidos-therapeutics-inc_contract_Filed_23-03-2018_Contract.pdf")
docs = loader.load_and_split()

In [49]:
chain = load_qa_chain(ChatOpenAI(model_name = 'gpt-4', temperature=0.0), 
                      chain_type="stuff") #map_reduce

Milyen árazásra számítsunk?

GPT-4
- prompt:       $0.03 / 1K tokens
- completion:   $0.06 / 1K tokens

In [4]:
gpt_4_token_encoding = tiktoken.get_encoding("cl100k_base")
teljes_szoveg = ' '.join([i.page_content for i in docs])
teljes_szoveg_tokenek = gpt_4_token_encoding.encode(teljes_szoveg)

prompt_template = """Use the following pieces of context to answer the users question. 
                     If you don't know the answer, just say that you don't know, don't try to make up an answer"""
prompt_tokenek = gpt_4_token_encoding.encode(prompt_template)

print(len(teljes_szoveg_tokenek))
print(len(prompt_tokenek))

4533
38


- Input tokenek száma: 4533 + 38 = 4571
- Output (completion / válasz) tokenek becslése: ~100

- Teljes költség = (4571 * 0.03 / 1000) + (100 * 0.06 / 1000) = 0.14313$ = 14.3 cent / megválaszolt kérdés
- Ebből 
    - prompt: 13.7 cent
    - completion: 0.6 cent


Durván 50 Ft-ba kerül 1 kérdés és megválaszolása `GPT-4` modellel. Ez azért van, mert jelenleg a teljes dokumentumot megadjuk neki kontextusként. Hatékonyabb módszer, ha kiválasztjuk a legrelevánsabb szövegrészeket, és csak azokat pakoljuk be a prompt-ba. 

In [6]:
queries = ['What is this input file?',
           'What is this document about?',
           'What is the title of the contract?',
           'Who are the agreeing parties?',
           'Which law governs the contract?',
           'How many shares and at what price can be purchased?',
           'Upon net exercising, what is the formula used?',
           'Who are the representatives who signed the contract?',
           'Are there any attachments, amendments or any supplementary material included?',
           'Can you identify any risk factors?',
           'When was Michael Jordan born?']

In [7]:
for query in queries:
    print('Q:', query)
    print('\n')
    print(chain.run(input_documents = docs, question = query))
    print('\n', '-------' * 10)
    #time.sleep(20)

Q: What is this input file?


This input file is a legal document representing a Warrant issued by Eidos Therapeutics, Inc. to BridgeBio Pharma LLC. A warrant is a financial instrument that gives the holder the right to purchase a company's stock at a specific price within a certain time frame. The document outlines the terms and conditions of the warrant, including the number of shares, exercise price, exercise period, and various provisions related to adjustments, stockholder rights, and governing laws.

 ----------------------------------------------------------------------
Q: What is this document about?


This document is a Warrant to Purchase Shares of Capital Stock issued by Eidos Therapeutics, Inc. to BridgeBio Pharma LLC. It outlines the terms and conditions for the holder (BridgeBio Pharma LLC) to purchase a specified number of shares of the company's capital stock at a predetermined exercise price. The document also includes provisions for adjustments to the exercise price a

Minden válasz helyesnek bizonyul, ráadásul a "kakukktojás" kérdésen sem hasalt el: kontextusból nem szabadna tudnia, mikor született Michael Jordan, és azt nem is válaszolja meg. Természeseten erre a kérdésre biztosan tudja egyébként a választ, viszont ha nem tudná, 'hallucinálni' kezdene, amit így elkerültünk (prompt-ban utasítottuk, hogy ha kizárólag a kontextusból nem tudja kivenni az infót, akkor mondja meg)

Egyesével kell feltenni a kérdéseket? Mi van, ha a promptba több kérdést sorolunk fel, és utasítjuk a modell, miképp válaszoljon

In [77]:
prompt_template = """Use the following pieces of context to answer the questions at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

List of questions: 
{question}

Answers to each question:"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

In [78]:
queries_as_a_whole = '\n'.join([str(i+1) + '. ' + j for i, j in zip(range(len(queries)), queries)])
print(queries_as_a_whole)

1. What is this input file?
2. What is this document about?
3. What is the title of the contract?
4. Who are the agreeing parties?
5. Which law governs the contract?
6. How many shares and at what price can be purchased?
7. Upon net exercising, what is the formula used?
8. Who are the representatives who signed the contract?
9. Are there any attachments, amendments or any supplementary material included?
10. Can you identify any risk factors?
11. When was Michael Jordan born?


In [None]:
# time out errors

#multi_Q_output = chain.run(input_documents = docs, question = queries_as_a_whole)
#print(multi_Q_output)

Közvetlenül az `OpenAI` csomaggal

In [128]:
def direct_OpenAI_QA(context, question, model = 'gpt-4'):

    prompt = f"""Use the following pieces of context to answer the questions at the end. 
    If you don't know the answer, just say that you don't know, don't try to make up an answer.

    {context}

    List of questions: 
    {question}

    Answers to each question:"""

    message = [{"role": "user", "content": prompt}]

    result = openai.ChatCompletion.create(model=model,  
                                          messages = message, 
                                          temperature=0, 
                                          top_p=1)
    
    completion = result['choices'][0]['message']['content']
    usage_info = result['usage'].to_dict()

    return completion, usage_info

In [108]:
_c, _u = direct_OpenAI_QA(teljes_szoveg,
                          queries_as_a_whole)

In [110]:
print(_c)

1. This input file is a Warrant to Purchase Shares of Capital Stock issued by Eidos Therapeutics, Inc. to BridgeBio Pharma LLC.

2. This document is about a warrant that grants BridgeBio Pharma LLC the right to purchase shares of capital stock from Eidos Therapeutics, Inc. under certain terms and conditions.

3. The title of the contract is "Warrant to Purchase Shares of Capital Stock."

4. The agreeing parties are Eidos Therapeutics, Inc. and BridgeBio Pharma LLC.

5. The contract is governed by the Delaware General Corporation Law for matters within its scope and the internal laws of the State of California for all other matters.

6. The number of shares that can be purchased is determined by dividing $4,000,000 by the applicable Exercise Price, rounded down to the nearest whole share. The Exercise Price depends on whether the Warrant is exercisable for shares of New Preferred Stock or Series Seed Preferred Stock, as specified in Section 1(b) of the document.

7. Upon net exercising,

In [111]:
_u

{'prompt_tokens': 4705, 'completion_tokens': 357, 'total_tokens': 5062}

Így összesen használtunk ~5k tokens, míg előbb kérdésenként minimum 4000-et, azaz minimum ~45, 8x annyiba került egyesével feltenni őket. Ráadásul a válaszok minősége is teljesen kielégítő a második esetben is

### 2. Szöveg módosítása / kiegészítése

In [41]:
text_gen_instructions = [
    'Write up a clause about market volatility: upon a 20% drop in S&P500 contract shall be terminated.',
    'Draft a third option to number of shares: upon an earthquake price shall be only $1.1',
    'Rewrite the governing law section: Texas law is the correct one.']

In [42]:
for instruction in text_gen_instructions:
    print('I:', instruction)
    print('\n')
    print(chain.run(input_documents = docs, question = instruction))
    print('\n', '-------' * 10)
    #time.sleep(20)

I: Write up a clause about market volatility: upon a 20% drop in S&P500 contract shall be terminated.


Market Volatility Termination Clause:

Notwithstanding any other provision in this Agreement, in the event that the S&P 500 Index experiences a decline of twenty percent (20%) or more from its highest closing value within the preceding twelve (12) month period, either party may, upon written notice to the other party, elect to terminate this Agreement immediately. Such termination shall be effective as of the date of the written notice, and neither party shall have any further obligations or liabilities to the other party under this Agreement, except for any obligations or liabilities that have accrued prior to the effective date of termination.

 ----------------------------------------------------------------------
I: Draft a third option to number of shares: upon an earthquake price shall be only $1.1


I cannot modify the original document, but I can provide you with a suggested 

Használható, emberszerűen megfogalmazott módosítási javaslatokatt tett a modell. A 2. kérdésnél felhívta a figyelmünket, hogy módosítani nem tud, de megírja javaslatait, valamint jogi tanács kérésére serkentett

Ugyanúgy ahogy a QA esetében, most is nézzük meg, hogy 1 API hívásból meg tudjuk-e oldani a feladatot: egyszerre kérünk tőle 3 instrukciót

In [112]:
text_gen_instructions_all = '\n'.join([str(i+1) + '. ' + j for i, j in zip(range(len(text_gen_instructions)), text_gen_instructions)])
print(text_gen_instructions_all)

1. Write up a clause about market volatility: upon a 20% drop in S&P500 contract shall be terminated.
2. Draft a third option to number of shares: upon an earthquake price shall be only $1.1
3. Rewrite the governing law section: Texas law is the correct one.


In [119]:
_c, _u = direct_OpenAI_QA(teljes_szoveg,
                          text_gen_instructions_all)

In [120]:
print(_c)

1. Market Volatility Termination Clause: In the event that the S&P 500 index experiences a decline of 20% or more from its highest point within the previous 30 days, this Warrant shall automatically terminate and be of no further force or effect. The parties acknowledge and agree that such market volatility may have a material adverse effect on the value of the Warrant Shares and the ability of the Holder to realize any potential gains from the exercise of this Warrant.

2. Additional Option for Number of Shares: In the event of an earthquake with a magnitude of 6.0 or greater on the Richter scale occurring within a 100-mile radius of the Company's principal place of business, the Holder shall have the option to purchase Warrant Shares at a reduced Exercise Price of $1.1 per share. This option shall be exercisable for a period of 30 days following the occurrence of such earthquake, and the Holder must provide written notice to the Company of its intention to exercise this option within

In [121]:
_u

{'prompt_tokens': 4658, 'completion_tokens': 303, 'total_tokens': 4961}

Újra nagy költségcsökkentést tapasztalunk, ~3szor kevesebb token használattal elvégeztük ugyanazt a feladatot

Mielőtt a hivatalos 'szövegösszegző' / 'szövegkivonatoló' promptot használnánk, nézzük meg, hogy a QA is remekül kinyeri a kellő információt.

In [8]:
q = 'Please summarize the entire text as consicely as possible.'
a = chain.run(input_documents = docs, question = q)

In [9]:
print('Q:', q)
print('\n')
#print(re.sub("(.{89})", "\\1\n", a, 0, re.DOTALL))
print(a)

Q: Please summarize the entire text as consicely as possible.


This document is a Warrant issued by Eidos Therapeutics, Inc. to BridgeBio Pharma LLC, allowing the holder to purchase shares of the company's capital stock. The Warrant is issued under the Securities Act of 1933 and is subject to certain restrictions on transfer. The Warrant was issued on February 22, 2018, and is void after February 21, 2021. The holder can exercise the Warrant to purchase shares of either New Preferred Stock or Series Seed Preferred Stock, depending on certain conditions. The Exercise Price for the shares is subject to adjustments in case of stock splits, dividends, or other corporate actions. The Warrant does not grant the holder any stockholder rights until it is exercised. The document also includes provisions for notice, governing law, successors and assigns, amendment, and severability.


Egészen jó kivonata az eredetileg 9 oldalas PDF-nek!

### 3. Információ kinyerés

In [136]:
def direct_OpenAI_InfExtr(context, model = 'gpt-4'):

    prompt = f"""Use the following pieces of context to complete the task at the end. 
    If you don't know the answer, just say that you don't know, don't try to make up an answer.

    {context}

    Task:
    Extract all keywords, numbers, dates, organizations, people, any other important parts.
    List them as key-value pairs, organize them into categories.

    Extracted information: """

    message = [{"role": "user", "content": prompt}]

    result = openai.ChatCompletion.create(model=model,  
                                          messages = message, 
                                          temperature=0, 
                                          top_p=1)
    
    completion = result['choices'][0]['message']['content']
    usage_info = result['usage'].to_dict()

    return completion, usage_info

In [137]:
_a, _u = direct_OpenAI_InfExtr(teljes_szoveg)

print(_a)

Dates:
- Date of Issuance: February 22, 2018
- Void after: February 21, 2021

Organizations:
- Eidos Therapeutics, Inc.
- BridgeBio Pharma LLC

People:
- Christine Siu (Chief Financial Officer of Eidos Therapeutics, Inc.)
- Neil Kumar (Chief Executive Officer of BridgeBio Pharma LLC)

Keywords:
- Warrant
- Securities Act of 1933
- Purchase Agreement
- Holder
- Exercise Price
- Warrant Shares
- New Preferred Stock
- Series Seed Preferred Stock
- Qualified Financing
- Deemed Liquidation Event
- IPO (Initial Public Offering)
- Net Exercise
- Delaware General Corporation Law
- State of California

Numbers:
- Warrant Shares: $4,000,000 (divided by the applicable Exercise Price)
- Exercise Price for Series Seed Preferred Stock: $1.3248 per share
- Notice period: 10 days
- Domestic United States delivery: 4 business days
- International delivery: 4 business days (with 2-day delivery service)

Other important parts:
- Exercise Period: Commencing on the earlier to occur of (a) the consummation 

In [138]:
_u

{'prompt_tokens': 4621, 'completion_tokens': 419, 'total_tokens': 5040}

### 4. Szövegkivonatolás

In [10]:
chain = load_summarize_chain(ChatOpenAI(model_name = 'gpt-4', temperature=0.0), chain_type="stuff")
summary = chain.run(docs)

In [12]:
print(re.sub("(.{90})", "\\1\n", summary, 0, re.DOTALL))
#print(summary)

This document is a warrant issued by Eidos Therapeutics, Inc. to BridgeBio Pharma LLC, all
owing BridgeBio to purchase shares of Eidos' capital stock. The warrant is exercisable und
er certain conditions and subject to adjustments. The warrant is not registered under the 
Securities Act of 1933 and cannot be sold or transferred without proper registration or ex
emption. The exercise period, method of exercise, and adjustments to the exercise price an
d number of warrant shares are detailed within the document.


Egyedi `prompt` segítségével személyre szabhatjuk az instrukciót!

In [13]:
prompt_template = """Write a concise summary of the following. 
Include all financial data, such as the number and price of stocks.

{text}

CONCISE SUMMARY:"""

PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
chain = load_summarize_chain(ChatOpenAI(model_name = 'gpt-4', temperature=0.0), 
                             chain_type="stuff", prompt=PROMPT)
summary = chain.run(docs)

In [14]:
print(re.sub("(.{90})", "\\1\n", summary, 0, re.DOTALL))

Eidos Therapeutics has issued a warrant to BridgeBio Pharma LLC, allowing the purchase of 
up to $4,000,000 worth of Eidos' capital stock. The warrant is exercisable for shares of N
ew Preferred Stock or Series Seed Preferred Stock, with exercise prices of the lowest per 
share purchase price in a Qualified Financing or $1.3248 per share, respectively. The warr
ant is valid from February 22, 2018, to February 21, 2021.


Nézzük meg mire képes ha magyar outputot kérünk tőle

In [15]:
prompt_template = """Write a concise summary of the following. 
Include all financial data, such as the number and price of stocks.

{text}

CONCISE SUMMARY IN HUNGARIAN:"""

PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
chain = load_summarize_chain(ChatOpenAI(model_name = 'gpt-4', temperature=0.0), 
                             chain_type="stuff", prompt=PROMPT)
summary = chain.run(docs)

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised Timeout: Request timed out: HTTPSConnectionPool(host='api.openai.com', port=443): Read timed out. (read timeout=60).


In [61]:
print(re.sub("(.{90})", "\\1\n", summary, 0, re.DOTALL))

Az Eidos Therapeutics, Inc. kibocsát egy részvényvásárlási jogot a BridgeBio Pharma LLC ré
szére, amelynek értéke 4 000 000 dollár. A részvények vételi ára a következőképpen alakul:
 ha az új preferált részvényekre vonatkozik, akkor a minősített finanszírozás során értéke
sített új preferált részvények legalacsonyabb ára; ha a sorozat Seed Preferred részvényekr
e vonatkozik, akkor 1,3248 dollár részvényenként. A részvényvásárlási jog a kibocsátás dát
umától számított három évig érvényes.


Most nézzük meg mire képes magyar prompttal, angol kontextussal, magyar output-ot kérve

In [17]:
prompt_template = """Írj egy rövid, lényegretörő összefoglalót az alábbi szövegről. 
Említs meg minden pénzügyi információt, köztük a részvények mennyiségét és árát.

{text}

Összefoglaló magyar nyelven:"""

PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
chain = load_summarize_chain(ChatOpenAI(model_name = 'gpt-4', temperature=0.0), 
                             chain_type="stuff", prompt=PROMPT)
summary = chain.run(docs)

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised Timeout: Request timed out: HTTPSConnectionPool(host='api.openai.com', port=443): Read timed out. (read timeout=60).
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 2.0 seconds as it raised Timeout: Request timed out: HTTPSConnectionPool(host='api.openai.com', port=443): Read timed out. (read timeout=60).
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised Timeout: Request timed out: HTTPSConnectionPool(host='api.openai.com', port=443): Read timed out. (read timeout=60).
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 8.0 seconds as it raised Timeout: Request timed out: HTTPSConnectionPool(host='api.openai.com', port=443): Read timed out. (read timeout=60).


In [18]:
print(re.sub("(.{90})", "\\1\n", summary, 0, re.DOTALL))

A dokumentum egy részvényvásárlási opciót tartalmaz, amelyet az Eidos Therapeutics, Inc. a
dott ki a BridgeBio Pharma LLC számára. Az opció lehetővé teszi a tulajdonos számára, hogy
 legfeljebb 4 000 000 dollár értékben vásároljon részvényeket a kibocsátó társaságban. Az 
opció érvényességi ideje 2018. február 22-től 2021. február 21-ig tart. Az opció gyakorlás
ának ára a részvények osztályától és időpontjától függően változik, de a dokumentum 1,3248
 dolláros árat említ a Series Seed Preferred Stock részvények esetében.


In [48]:
r = openai.api_requestor.APIRequestor()
resp = r.request("GET", '/dashboard/billing/usage?start_date=2023-05-29&end_date=2023-05-30')
resp_object = resp[0]
cost_USD = resp_object.data['total_usage'] / 100

print('Daily cost:', cost_USD)

Daily cost: 6.259726000000001
