## In this notebook, we called GPT 3.5 again to generate summaries for the 10K reports as a baseline for comparison. Below is the general process:
- Load in processed Item 7 data where all the tables were removed, useless contents were removes, and useful contents were re-ordered and keyed by each big theme (e.g. revenue, gross profit margin)
- For each report, load in contents under each theme and call GPT 3.5 either once or multiple times to generate summaries (since the API has maximum token limit for each call)
- Ask GPT 3.5 to generate a summary for each theme 
- Concat the summaries together theme by theme to form the summary for the Item 7 as a whole
- Calculate ROUGE scores and BERT score for GPT 3.5-generated summary comparing with the ground truth labels

In [None]:
# mount Google drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
!pip install openai
!pip install rouge
!pip install bert_score

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
# import packages
import pandas as pd
import numpy as np
import glob
import os.path
import re
import openai
from tqdm import tqdm
import math
from rouge import Rouge
from bert_score import score
import torch
import time
from openai.error import InvalidRequestError
import pickle

In [None]:
# load processed data
file_path = "/content/drive/MyDrive/w210_capstone_project/data/SEC_Edgar_Annual_Financial_Filings_2021/working3/"

df = pd.read_pickle(os.path.join(file_path,'item7_text_v5.pkl'))

df['label'] = df['label'].apply(lambda x: re.sub('\n', ' ', x))

df[df['kept_report_length'] <= df['label_length']]

Unnamed: 0,id,label_length,label,kept_report_length,report,has_label
20,789019,1094,Microsoft is a technology company whose missio...,13,"{'Operating Income': '', 'Revenues': '', 'Liqu...",True
31,829323,1855,Inuvo is a technology company that develops an...,1792,"{'Operating Income': '', 'Revenues': '', 'Liqu...",True
32,843006,628,"Total revenue increased by $2,231,000, or 14%,...",13,"{'Operating Income': '', 'Revenues': '', 'Liqu...",True
36,860731,2230,The company provides integrated information ma...,13,"{'Operating Income': '', 'Revenues': '', 'Liqu...",True


In [None]:
labels_to_remove = ['789019','829323','843006','860731']
df = df[~df['id'].isin(labels_to_remove)]

In [None]:
# check unique themes
themes = []
for index, row in df.iterrows():
  for key in row['report'].keys():
    if key not in themes:
      themes.append(key)
themes

['Operating Income',
 'Revenues',
 'Liquidity',
 'Results of Operations',
 'Business Overview',
 'Not Important',
 'Debt',
 'Gross Profit Margin',
 'Interest expense',
 'Results of operations',
 'Interest Expense',
 'Business overview',
 'Not important']

In [None]:
# input data cleaning
def combine_interest_expense(row):
    expense_1 = row['Interest expense']
    expense_2 = row['Interest Expense']
    
    # Combine the values of the two keys
    combined_expense = expense_1 + expense_2
    
    # Remove the 'Interest expense' key
    del row['Interest expense']
    
    # Update the 'Interest Expense' key with the combined value
    row['Interest Expense'] = str(combined_expense)
    # Remove the 'Not Important' key
    if 'Not Important' in row:
        del row['Not Important']
    
    return row

def combine_business_overview(row):
    bussiness_1 = row['Business Overview']
    bussiness_2 = row['Business overview']
    
    # Combine the values of the two keys
    combined_business = bussiness_1 + bussiness_2
    
    # Remove the 'Interest expense' key
    del row['Business overview']
    
    # Update the 'Interest Expense' key with the combined value
    row['Business Overview'] = str(combined_business)
    # Remove the 'Not Important' key
    
    return row

def combine_operations(row):
    operation_1 = row['Results of Operations']
    operation_2 = row['Results of operations']
    
    # Combine the values of the two keys
    combined_operation = operation_1 + operation_2
    
    # Remove the 'Interest expense' key
    del row['Results of operations']
    
    # Update the 'Interest Expense' key with the combined value
    row['Results of Operations'] = str(combined_operation)
    # Remove the 'Not Important' key
    
    return row

# Update the 'report' column using the apply function and the custom function
df['report'] = df['report'].apply(combine_interest_expense)
df['report'] = df['report'].apply(combine_business_overview)
df['report'] = df['report'].apply(combine_operations)

In [None]:
def remove_underscore_sentences(text):
    sentences = text.split('. ')
    cleaned_sentences = [sentence for sentence in sentences if not re.search(r'_+', sentence)]
    return '. '.join(cleaned_sentences)

keys = [
    'Business Overview',
    'Results of Operations',
    'Revenues',
    'Gross Profit Margin',
    'Operating Income',
    'Interest Expense',
    'Liquidity',
    'Debt'
]

for index, row in df.iterrows():
    report = row['report']
    for key in keys:
        if key in report:
            report[key] = remove_underscore_sentences(report[key])

# If you need to overwrite the 'report' column with the cleaned data
df['report'] = df['report'].apply(lambda report: {key: remove_underscore_sentences(report[key]) for key in keys if key in report})

In [None]:
df.head()

Unnamed: 0,id,label_length,label,kept_report_length,report,has_label
0,8670,1142,"AUTOMATIC DATA PROCESSING, INC. (“ADPI”) Auto...",3770,{'Business Overview': 'highlights from the yea...,True
1,50471,907,"Park City Group, Inc. (“PCGI”) The Company is ...",1599,{'Business Overview': 'the company is a saas p...,True
2,78749,756,"AGILYSYS, Inc. (“AI”) Agilysys has been a lead...",3348,{'Business Overview': ' agilysys has been a le...,True
3,317788,927,"Digital Turbine, Inc. (“DTI”) Digital Turbine,...",6580,{'Business Overview': 'we allocate the fair va...,True
4,320340,933,Intelligent Systems Corporation (“ISC”) ISC’s...,1757,{'Business Overview': 'our consolidated operat...,True


In [None]:
newdf = df.copy()
for key in keys:
  newdf[key] = newdf['report'].apply(lambda x : x[key])
newdf.head()

Unnamed: 0,id,label_length,label,kept_report_length,report,has_label,Business Overview,Results of Operations,Revenues,Gross Profit Margin,Operating Income,Interest Expense,Liquidity,Debt
0,8670,1142,"AUTOMATIC DATA PROCESSING, INC. (“ADPI”) Auto...",3770,{'Business Overview': 'highlights from the yea...,True,"highlights from the year ended june 30, 2021 i...",,"for the year ended june 30, respectively: reve...",,"in addition to our u.s. gaap results, we use a...",,"as of june 30, 2021, cash and cash equivalent ...",the following table provides a summary of our ...
1,50471,907,"Park City Group, Inc. (“PCGI”) The Company is ...",1599,{'Business Overview': 'the company is a saas p...,True,"the company is a saas provider, and the parent...",,"during the fiscal year ended june 30, 2021, th...","cost of services and product support was $6,88...","to supplement our financial statements, histor...",,net cash provided by operating activities is s...,the company does not have any off-balance shee...
2,78749,756,"AGILYSYS, Inc. (“AI”) Agilysys has been a lead...",3348,{'Business Overview': ' agilysys has been a le...,True,agilysys has been a leader in hospitality sof...,fiscal 2021 compared with fiscal 2020 net reve...,"as required by the sec, we separately present ...",,,,overview our cash requirements consist primari...,we have not entered into any off-balance sheet...
3,317788,927,"Digital Turbine, Inc. (“DTI”) Digital Turbine,...",6580,{'Business Overview': 'we allocate the fair va...,True,we allocate the fair value of purchase conside...,"all discussions in this item 7, management’s d...",52,,"during the years ended march 31, 2021 and 2020...",fiscal 2021 compared to fiscal 2020 total inte...,our primary sources of liquidity have historic...,we do not have any relationships with unconsol...
4,320340,933,Intelligent Systems Corporation (“ISC”) ISC’s...,1757,{'Business Overview': 'our consolidated operat...,True,our consolidated operations consist of our cor...,the following discussion should be read in con...,– total revenue for the year ended december 31...,– total cost of revenue was 43 percent of tota...,"– for the twelve months ended december 31, 202...",,"our cash balance at december 31, 2020 was $37,...",


In [None]:
# import private OpenAI API key
with open('/content/drive/MyDrive/openai.txt') as f:
    lines = f.readlines()
openai.api_key = lines[0]

In [None]:
def generate_summary_with_gpt3point5(text, max_num_tokens, token_ratio):
    num_input_tokens = math.ceil(len(text.split(' ')) * token_ratio)
    num_output_tokens = int(max_num_tokens - num_input_tokens)
    response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages = [{"role": "user", "content": f'Summarize the following text in no more than {num_output_tokens} tokens: "{text}"'}],
    max_tokens=num_output_tokens,
    n=1,
    stop=None,
    temperature=0.7,
    )
    output = response['choices'][0]['message']['content']
    num_output_tokens = math.ceil(len(output.split(' ')) * token_ratio)
    print(f'num_input_tokens: {num_input_tokens}, ' \
          + f'num_output_tokens: {num_output_tokens}, ' \
          f'total: {num_input_tokens + num_output_tokens}')
    # return response.choices[0].text
    return output

In [None]:
def truncate_text_for_gpt3point5(row, col, output_length):
  file_id = row['id']
  text = row[col].strip()
  # number of words in ground truth labels
  # number of words in report
  input_length = len(text.split(' '))
  # control length of generated summary per chunk
  # by the ratio between the input length and output length
  token_ratio = 1000.0 / 750.0
  input_token_length = math.ceil(input_length * token_ratio)
  if input_token_length < 32:
    return ''
  output_token_length = math.ceil(output_length * token_ratio)

  summaries = []
  max_num_tokens = 4000
  done = False
  num_api_calls = 0
  while not done:
    try:
      num_api_calls = math.ceil((input_token_length + output_token_length) / max_num_tokens)
      # truncate original text evenly for API calls
      split_word_indices = np.asarray(np.linspace(0, len(text.split(" ")), num_api_calls + 1), dtype = "int")
      # print(f"Processing file {file_id}, input length: {input_length}")
      print(f"key {col}, input token length: {input_token_length}, output token length: {output_token_length}")
      summaries = []
      # call API chunk by chunk
      for i in range(len(split_word_indices) - 1):
        # print(f"-----Processing text chunk from word {split_word_indices[i]} to word {split_word_indices[i+1]}.")
        chunk_text = " ".join(text.split(" ")[split_word_indices[i]:split_word_indices[i+1]])
        chunk_summary = generate_summary_with_gpt3point5(chunk_text, max_num_tokens, token_ratio)
        # wait due to OpenAI rate limit
        summaries.append(chunk_summary)
        time.sleep(15)
      done = True
    except InvalidRequestError as e:
      if "This model's maximum context length" not in e.user_message:
        raise Exception(e)
        done = True
      else:
        max_num_tokens -= 100
        print(f'reduce max_num_tokens by 100: {max_num_tokens}')

  # return generated summaries of all chunks
  return " ".join(summaries)

In [None]:
def toDoc(r):
  doc = ''
  for key in keys:
    text = r[key].strip()
    if text != "":
      doc += key + '\n'
      doc += text + '\n\n'
  return doc.strip()

# ensure that the output directory exists
summary_path = "/content/drive/MyDrive/w210_capstone_project/data/SEC_Edgar_Annual_Financial_Filings_2021/working3/gpt"
if not os.path.exists(summary_path):
  os.makedirs(summary_path)

# loop thru all files
for i, row in tqdm(newdf.iterrows()):
  id = row['id']

  # if the file exists, then don't call gpt
  output_name = f'{summary_path}/{id}.txt'
  if os.path.exists(output_name): continue

  # estimate output lengths
  output_length = row['label_length']
  output_lengths = {}
  total = 0
  for key in keys:
    input_length = len(row[key].split(' '))
    output_lengths[key] = input_length * output_length
    total += input_length
  total = max(total, 1.0)

  # call gpt otherwise
  out = { 'id': id }
  for key in keys:
    print(f'working on {id}: {key} ...')
    out[key] = truncate_text_for_gpt3point5(row, key, math.floor(output_lengths[key] / total))

  # compose a summary document and write it out
  summary = toDoc(out)
  with open(output_name, 'w') as f:
    f.write(summary)
  # break

0it [00:00, ?it/s]

working on 50471: Business Overview ...
key Business Overview, input token length: 99, output token length: 56
num_input_tokens: 99, num_output_tokens: 90, total: 189
working on 50471: Results of Operations ...
working on 50471: Revenues ...
key Revenues, input token length: 770, output token length: 438
num_input_tokens: 770, num_output_tokens: 254, total: 1024
working on 50471: Gross Profit Margin ...
key Gross Profit Margin, input token length: 152, output token length: 86
num_input_tokens: 152, num_output_tokens: 132, total: 284
working on 50471: Operating Income ...
key Operating Income, input token length: 231, output token length: 131
num_input_tokens: 231, num_output_tokens: 103, total: 334
working on 50471: Interest Expense ...
working on 50471: Liquidity ...
key Liquidity, input token length: 804, output token length: 458
num_input_tokens: 804, num_output_tokens: 275, total: 1079
working on 50471: Debt ...
key Debt, input token length: 68, output token length: 39
num_input_to

2it [02:05, 62.77s/it]

working on 78749: Business Overview ...
key Business Overview, input token length: 358, output token length: 80
num_input_tokens: 358, num_output_tokens: 242, total: 600
working on 78749: Results of Operations ...
key Results of Operations, input token length: 2987, output token length: 675
num_input_tokens: 2987, num_output_tokens: 170, total: 3157
working on 78749: Revenues ...
key Revenues, input token length: 123, output token length: 27
num_input_tokens: 123, num_output_tokens: 91, total: 214
working on 78749: Gross Profit Margin ...
working on 78749: Operating Income ...
working on 78749: Interest Expense ...
working on 78749: Liquidity ...
key Liquidity, input token length: 815, output token length: 184
num_input_tokens: 815, num_output_tokens: 324, total: 1139
working on 78749: Debt ...
key Debt, input token length: 171, output token length: 38
num_input_tokens: 171, num_output_tokens: 108, total: 279


3it [03:58, 83.78s/it]

working on 317788: Business Overview ...
key Business Overview, input token length: 902, output token length: 127
num_input_tokens: 902, num_output_tokens: 299, total: 1201
working on 317788: Results of Operations ...
key Results of Operations, input token length: 1210, output token length: 170
num_input_tokens: 1210, num_output_tokens: 179, total: 1389
working on 317788: Revenues ...
working on 317788: Gross Profit Margin ...
working on 317788: Operating Income ...
key Operating Income, input token length: 2478, output token length: 348
reduce max_num_tokens by 100: 3900
key Operating Income, input token length: 2478, output token length: 348
num_input_tokens: 2478, num_output_tokens: 236, total: 2714
working on 317788: Interest Expense ...
key Interest Expense, input token length: 815, output token length: 115
num_input_tokens: 815, num_output_tokens: 288, total: 1103
working on 317788: Liquidity ...
key Liquidity, input token length: 979, output token length: 138
num_input_tokens: 9

4it [06:34, 110.47s/it]

working on 320340: Business Overview ...
key Business Overview, input token length: 1274, output token length: 678
num_input_tokens: 1274, num_output_tokens: 322, total: 1596
working on 320340: Results of Operations ...
key Results of Operations, input token length: 34, output token length: 18
num_input_tokens: 34, num_output_tokens: 26, total: 60
working on 320340: Revenues ...
working on 320340: Gross Profit Margin ...
key Gross Profit Margin, input token length: 344, output token length: 183
num_input_tokens: 344, num_output_tokens: 196, total: 540
working on 320340: Operating Income ...
key Operating Income, input token length: 332, output token length: 176
num_input_tokens: 332, num_output_tokens: 180, total: 512
working on 320340: Interest Expense ...
working on 320340: Liquidity ...
key Liquidity, input token length: 324, output token length: 172
num_input_tokens: 324, num_output_tokens: 144, total: 468


5it [08:21, 109.59s/it]

working on 320340: Debt ...
working on 713425: Business Overview ...
key Business Overview, input token length: 340, output token length: 87
num_input_tokens: 340, num_output_tokens: 208, total: 548
working on 713425: Results of Operations ...
key Results of Operations, input token length: 66, output token length: 16
num_input_tokens: 66, num_output_tokens: 64, total: 130
working on 713425: Revenues ...
key Revenues, input token length: 1804, output token length: 463
num_input_tokens: 1804, num_output_tokens: 186, total: 1990
working on 713425: Gross Profit Margin ...
key Gross Profit Margin, input token length: 888, output token length: 228
num_input_tokens: 888, num_output_tokens: 320, total: 1208
working on 713425: Operating Income ...
key Operating Income, input token length: 244, output token length: 63
num_input_tokens: 244, num_output_tokens: 235, total: 479
working on 713425: Interest Expense ...
working on 713425: Liquidity ...
key Liquidity, input token length: 1138, output t

6it [10:54, 123.84s/it]

working on 723531: Business Overview ...
key Business Overview, input token length: 1110, output token length: 192
num_input_tokens: 1110, num_output_tokens: 212, total: 1322
working on 723531: Results of Operations ...
key Results of Operations, input token length: 224, output token length: 39
num_input_tokens: 224, num_output_tokens: 143, total: 367
working on 723531: Revenues ...
key Revenues, input token length: 739, output token length: 128
num_input_tokens: 739, num_output_tokens: 199, total: 938
working on 723531: Gross Profit Margin ...
key Gross Profit Margin, input token length: 46, output token length: 7
num_input_tokens: 46, num_output_tokens: 55, total: 101
working on 723531: Operating Income ...
key Operating Income, input token length: 912, output token length: 159
num_input_tokens: 912, num_output_tokens: 235, total: 1147
working on 723531: Interest Expense ...
working on 723531: Liquidity ...
key Liquidity, input token length: 1022, output token length: 178
num_input_t

7it [13:33, 135.26s/it]

working on 1810806: Business Overview ...
key Business Overview, input token length: 838, output token length: 244
num_input_tokens: 838, num_output_tokens: 250, total: 1088
working on 1810806: Results of Operations ...
key Results of Operations, input token length: 55, output token length: 15
num_input_tokens: 55, num_output_tokens: 67, total: 122
working on 1810806: Revenues ...
key Revenues, input token length: 1207, output token length: 351
num_input_tokens: 1207, num_output_tokens: 224, total: 1431
working on 1810806: Gross Profit Margin ...
key Gross Profit Margin, input token length: 336, output token length: 98
num_input_tokens: 336, num_output_tokens: 126, total: 462
working on 1810806: Operating Income ...
key Operating Income, input token length: 2208, output token length: 643
num_input_tokens: 2208, num_output_tokens: 258, total: 2466
working on 1810806: Interest Expense ...
key Interest Expense, input token length: 234, output token length: 68
num_input_tokens: 234, num_ou

8it [16:33, 149.18s/it]

working on 1806837: Business Overview ...
key Business Overview, input token length: 1871, output token length: 194
num_input_tokens: 1871, num_output_tokens: 211, total: 2082
working on 1806837: Results of Operations ...
key Results of Operations, input token length: 1744, output token length: 180
reduce max_num_tokens by 100: 3900
key Results of Operations, input token length: 1744, output token length: 180
num_input_tokens: 1744, num_output_tokens: 284, total: 2028
working on 1806837: Revenues ...
key Revenues, input token length: 3224, output token length: 335
num_input_tokens: 3224, num_output_tokens: 298, total: 3522
working on 1806837: Gross Profit Margin ...
key Gross Profit Margin, input token length: 323, output token length: 34
num_input_tokens: 323, num_output_tokens: 195, total: 518
working on 1806837: Operating Income ...
key Operating Income, input token length: 1227, output token length: 127
num_input_tokens: 1227, num_output_tokens: 259, total: 1486
working on 1806837:

9it [19:50, 163.92s/it]

working on 1794515: Business Overview ...
key Business Overview, input token length: 1487, output token length: 204
num_input_tokens: 1487, num_output_tokens: 203, total: 1690
working on 1794515: Results of Operations ...
working on 1794515: Revenues ...
key Revenues, input token length: 2926, output token length: 403
reduce max_num_tokens by 100: 3900
key Revenues, input token length: 2926, output token length: 403
reduce max_num_tokens by 100: 3800
key Revenues, input token length: 2926, output token length: 403
num_input_tokens: 2926, num_output_tokens: 144, total: 3070
working on 1794515: Gross Profit Margin ...
key Gross Profit Margin, input token length: 116, output token length: 16
num_input_tokens: 116, num_output_tokens: 92, total: 208
working on 1794515: Operating Income ...
key Operating Income, input token length: 3354, output token length: 463
num_input_tokens: 3354, num_output_tokens: 431, total: 3785
working on 1794515: Interest Expense ...
key Interest Expense, input to

10it [22:34, 163.96s/it]

working on 1803696: Business Overview ...
key Business Overview, input token length: 696, output token length: 171
num_input_tokens: 696, num_output_tokens: 392, total: 1088
working on 1803696: Results of Operations ...
key Results of Operations, input token length: 4499, output token length: 1111
num_input_tokens: 2250, num_output_tokens: 199, total: 2449
num_input_tokens: 2250, num_output_tokens: 215, total: 2465
working on 1803696: Revenues ...
key Revenues, input token length: 580, output token length: 143
num_input_tokens: 580, num_output_tokens: 288, total: 868
working on 1803696: Gross Profit Margin ...
working on 1803696: Operating Income ...
working on 1803696: Interest Expense ...
working on 1803696: Liquidity ...
key Liquidity, input token length: 2783, output token length: 687
reduce max_num_tokens by 100: 3900
key Liquidity, input token length: 2783, output token length: 687
num_input_tokens: 2783, num_output_tokens: 214, total: 2997
working on 1803696: Debt ...
key Debt, 

11it [25:06, 160.48s/it]

working on 1786352: Business Overview ...
key Business Overview, input token length: 2160, output token length: 362
num_input_tokens: 2160, num_output_tokens: 236, total: 2396
working on 1786352: Results of Operations ...
key Results of Operations, input token length: 1672, output token length: 279
num_input_tokens: 1672, num_output_tokens: 258, total: 1930
working on 1786352: Revenues ...
key Revenues, input token length: 1211, output token length: 203
num_input_tokens: 1211, num_output_tokens: 184, total: 1395
working on 1786352: Gross Profit Margin ...
key Gross Profit Margin, input token length: 1010, output token length: 168
num_input_tokens: 1010, num_output_tokens: 227, total: 1237
working on 1786352: Operating Income ...
key Operating Income, input token length: 330, output token length: 55
num_input_tokens: 330, num_output_tokens: 138, total: 468
working on 1786352: Interest Expense ...
working on 1786352: Liquidity ...
key Liquidity, input token length: 2212, output token len

12it [27:52, 162.08s/it]

working on 1773383: Business Overview ...
key Business Overview, input token length: 623, output token length: 178
num_input_tokens: 623, num_output_tokens: 198, total: 821
working on 1773383: Results of Operations ...
key Results of Operations, input token length: 2655, output token length: 758
reduce max_num_tokens by 100: 3900
key Results of Operations, input token length: 2655, output token length: 758
num_input_tokens: 2655, num_output_tokens: 172, total: 2827
working on 1773383: Revenues ...
key Revenues, input token length: 600, output token length: 171
num_input_tokens: 600, num_output_tokens: 206, total: 806
working on 1773383: Gross Profit Margin ...
key Gross Profit Margin, input token length: 291, output token length: 83
num_input_tokens: 291, num_output_tokens: 199, total: 490
working on 1773383: Operating Income ...
key Operating Income, input token length: 1176, output token length: 335
num_input_tokens: 1176, num_output_tokens: 483, total: 1659
working on 1773383: Inter

13it [30:52, 167.50s/it]

working on 1768267: Business Overview ...
key Business Overview, input token length: 1312, output token length: 191
num_input_tokens: 1312, num_output_tokens: 256, total: 1568
working on 1768267: Results of Operations ...
key Results of Operations, input token length: 4479, output token length: 654
reduce max_num_tokens by 100: 3900
key Results of Operations, input token length: 4479, output token length: 654
reduce max_num_tokens by 100: 3800
key Results of Operations, input token length: 4479, output token length: 654
num_input_tokens: 2239, num_output_tokens: 134, total: 2373
reduce max_num_tokens by 100: 3700
key Results of Operations, input token length: 4479, output token length: 654
num_input_tokens: 2239, num_output_tokens: 199, total: 2438
num_input_tokens: 2240, num_output_tokens: 219, total: 2459
working on 1768267: Revenues ...
working on 1768267: Gross Profit Margin ...
working on 1768267: Operating Income ...
key Operating Income, input token length: 522, output token len

14it [33:50, 170.75s/it]

working on 1764925: Business Overview ...
key Business Overview, input token length: 866, output token length: 267
num_input_tokens: 866, num_output_tokens: 266, total: 1132
working on 1764925: Results of Operations ...
key Results of Operations, input token length: 1102, output token length: 340
num_input_tokens: 1102, num_output_tokens: 175, total: 1277
working on 1764925: Revenues ...
key Revenues, input token length: 1438, output token length: 443
num_input_tokens: 1438, num_output_tokens: 219, total: 1657
working on 1764925: Gross Profit Margin ...
key Gross Profit Margin, input token length: 110, output token length: 34
num_input_tokens: 110, num_output_tokens: 72, total: 182
working on 1764925: Operating Income ...
key Operating Income, input token length: 1094, output token length: 338
num_input_tokens: 1094, num_output_tokens: 178, total: 1272
working on 1764925: Interest Expense ...
key Interest Expense, input token length: 118, output token length: 36
num_input_tokens: 118, 

15it [36:51, 173.86s/it]

working on 1739942: Business Overview ...
key Business Overview, input token length: 194, output token length: 43
num_input_tokens: 194, num_output_tokens: 132, total: 326
working on 1739942: Results of Operations ...
key Results of Operations, input token length: 879, output token length: 196
num_input_tokens: 879, num_output_tokens: 246, total: 1125
working on 1739942: Revenues ...
key Revenues, input token length: 2514, output token length: 560
num_input_tokens: 2514, num_output_tokens: 228, total: 2742
working on 1739942: Gross Profit Margin ...
key Gross Profit Margin, input token length: 292, output token length: 64
num_input_tokens: 292, num_output_tokens: 175, total: 467
working on 1739942: Operating Income ...
key Operating Income, input token length: 2734, output token length: 610
num_input_tokens: 2734, num_output_tokens: 346, total: 3080
working on 1739942: Interest Expense ...
key Interest Expense, input token length: 132, output token length: 30
num_input_tokens: 132, num

16it [40:07, 180.40s/it]

working on 1739936: Business Overview ...
key Business Overview, input token length: 1022, output token length: 303
num_input_tokens: 1022, num_output_tokens: 318, total: 1340
working on 1739936: Results of Operations ...
key Results of Operations, input token length: 2110, output token length: 627
num_input_tokens: 2110, num_output_tokens: 242, total: 2352
working on 1739936: Revenues ...
key Revenues, input token length: 379, output token length: 112
num_input_tokens: 379, num_output_tokens: 167, total: 546
working on 1739936: Gross Profit Margin ...
key Gross Profit Margin, input token length: 962, output token length: 286
num_input_tokens: 962, num_output_tokens: 188, total: 1150
working on 1739936: Operating Income ...
key Operating Income, input token length: 88, output token length: 26
num_input_tokens: 88, num_output_tokens: 78, total: 166
working on 1739936: Interest Expense ...
key Interest Expense, input token length: 91, output token length: 27
num_input_tokens: 91, num_out

17it [43:02, 178.71s/it]

working on 736012: Business Overview ...
key Business Overview, input token length: 92, output token length: 68
num_input_tokens: 92, num_output_tokens: 70, total: 162
working on 736012: Results of Operations ...
working on 736012: Revenues ...
key Revenues, input token length: 390, output token length: 291
num_input_tokens: 390, num_output_tokens: 214, total: 604
working on 736012: Gross Profit Margin ...
key Gross Profit Margin, input token length: 126, output token length: 94
num_input_tokens: 126, num_output_tokens: 122, total: 248
working on 736012: Operating Income ...
working on 736012: Interest Expense ...
key Interest Expense, input token length: 102, output token length: 75
num_input_tokens: 102, num_output_tokens: 66, total: 168
working on 736012: Liquidity ...
key Liquidity, input token length: 1052, output token length: 787
num_input_tokens: 1052, num_output_tokens: 212, total: 1264
working on 736012: Debt ...
key Debt, input token length: 127, output token length: 95
num_

18it [45:09, 163.39s/it]

working on 746210: Business Overview ...
key Business Overview, input token length: 178, output token length: 118
num_input_tokens: 178, num_output_tokens: 100, total: 278
working on 746210: Results of Operations ...
working on 746210: Revenues ...
working on 746210: Gross Profit Margin ...
working on 746210: Operating Income ...
working on 746210: Interest Expense ...
working on 746210: Liquidity ...
key Liquidity, input token length: 1628, output token length: 1086
reduce max_num_tokens by 100: 3900
key Liquidity, input token length: 1628, output token length: 1086
num_input_tokens: 1628, num_output_tokens: 243, total: 1871
working on 746210: Debt ...
key Debt, input token length: 44, output token length: 30
num_input_tokens: 44, num_output_tokens: 48, total: 92


19it [46:11, 132.82s/it]

working on 769397: Business Overview ...
working on 769397: Results of Operations ...
key Results of Operations, input token length: 3155, output token length: 700
num_input_tokens: 3155, num_output_tokens: 330, total: 3485
working on 769397: Revenues ...
key Revenues, input token length: 1210, output token length: 268
num_input_tokens: 1210, num_output_tokens: 442, total: 1652
working on 769397: Gross Profit Margin ...
working on 769397: Operating Income ...
working on 769397: Interest Expense ...
working on 769397: Liquidity ...
key Liquidity, input token length: 1667, output token length: 370
num_input_tokens: 1667, num_output_tokens: 254, total: 1921
working on 769397: Debt ...
key Debt, input token length: 356, output token length: 79
num_input_tokens: 356, num_output_tokens: 191, total: 547


20it [47:58, 125.14s/it]

working on 796343: Business Overview ...
key Business Overview, input token length: 356, output token length: 91
num_input_tokens: 356, num_output_tokens: 143, total: 499
working on 796343: Results of Operations ...
working on 796343: Revenues ...
key Revenues, input token length: 2871, output token length: 734
num_input_tokens: 2871, num_output_tokens: 260, total: 3131
working on 796343: Gross Profit Margin ...
key Gross Profit Margin, input token length: 164, output token length: 42
num_input_tokens: 164, num_output_tokens: 115, total: 279
working on 796343: Operating Income ...
key Operating Income, input token length: 912, output token length: 232
num_input_tokens: 912, num_output_tokens: 214, total: 1126
working on 796343: Interest Expense ...
working on 796343: Liquidity ...
key Liquidity, input token length: 1238, output token length: 316
num_input_tokens: 1238, num_output_tokens: 370, total: 1608
working on 796343: Debt ...
key Debt, input token length: 691, output token length

21it [50:17, 129.13s/it]

working on 807863: Business Overview ...
key Business Overview, input token length: 774, output token length: 120
num_input_tokens: 774, num_output_tokens: 190, total: 964
working on 807863: Results of Operations ...
key Results of Operations, input token length: 2647, output token length: 411
num_input_tokens: 2647, num_output_tokens: 156, total: 2803
working on 807863: Revenues ...
key Revenues, input token length: 2846, output token length: 442
reduce max_num_tokens by 100: 3900
key Revenues, input token length: 2846, output token length: 442
reduce max_num_tokens by 100: 3800
key Revenues, input token length: 2846, output token length: 442
num_input_tokens: 2846, num_output_tokens: 250, total: 3096
working on 807863: Gross Profit Margin ...
working on 807863: Operating Income ...
working on 807863: Interest Expense ...
working on 807863: Liquidity ...
key Liquidity, input token length: 3624, output token length: 563
reduce max_num_tokens by 100: 3900
key Liquidity, input token leng

22it [52:41, 133.61s/it]

working on 807882: Business Overview ...
key Business Overview, input token length: 1103, output token length: 632
num_input_tokens: 1103, num_output_tokens: 328, total: 1431
working on 807882: Results of Operations ...
key Results of Operations, input token length: 224, output token length: 128
num_input_tokens: 224, num_output_tokens: 167, total: 391
working on 807882: Revenues ...
key Revenues, input token length: 102, output token length: 58
num_input_tokens: 102, num_output_tokens: 86, total: 188
working on 807882: Gross Profit Margin ...
working on 807882: Operating Income ...
key Operating Income, input token length: 238, output token length: 136
num_input_tokens: 238, num_output_tokens: 243, total: 481
working on 807882: Interest Expense ...
key Interest Expense, input token length: 112, output token length: 64
num_input_tokens: 112, num_output_tokens: 88, total: 200
working on 807882: Liquidity ...
key Liquidity, input token length: 456, output token length: 262
num_input_toke

23it [55:18, 140.60s/it]

working on 813672: Business Overview ...
key Business Overview, input token length: 1426, output token length: 580
num_input_tokens: 1426, num_output_tokens: 270, total: 1696
working on 813672: Results of Operations ...
key Results of Operations, input token length: 386, output token length: 156
num_input_tokens: 386, num_output_tokens: 222, total: 608
working on 813672: Revenues ...
key Revenues, input token length: 1030, output token length: 419
num_input_tokens: 1030, num_output_tokens: 223, total: 1253
working on 813672: Gross Profit Margin ...
key Gross Profit Margin, input token length: 568, output token length: 231
num_input_tokens: 568, num_output_tokens: 240, total: 808
working on 813672: Operating Income ...
key Operating Income, input token length: 531, output token length: 216
num_input_tokens: 531, num_output_tokens: 174, total: 705
working on 813672: Interest Expense ...
key Interest Expense, input token length: 128, output token length: 52
num_input_tokens: 128, num_outp

24it [58:20, 153.20s/it]

working on 814547: Business Overview ...
key Business Overview, input token length: 2546, output token length: 1074
num_input_tokens: 2546, num_output_tokens: 283, total: 2829
working on 814547: Results of Operations ...
key Results of Operations, input token length: 812, output token length: 343
num_input_tokens: 812, num_output_tokens: 163, total: 975
working on 814547: Revenues ...
key Revenues, input token length: 768, output token length: 324
num_input_tokens: 768, num_output_tokens: 422, total: 1190
working on 814547: Gross Profit Margin ...
key Gross Profit Margin, input token length: 512, output token length: 216
num_input_tokens: 512, num_output_tokens: 376, total: 888
working on 814547: Operating Income ...
key Operating Income, input token length: 582, output token length: 246
num_input_tokens: 582, num_output_tokens: 246, total: 828
working on 814547: Interest Expense ...
key Interest Expense, input token length: 174, output token length: 72
num_input_tokens: 174, num_outpu

25it [1:01:56, 172.10s/it]

working on 814549: Business Overview ...
key Business Overview, input token length: 1856, output token length: 842
num_input_tokens: 1856, num_output_tokens: 207, total: 2063
working on 814549: Results of Operations ...
working on 814549: Revenues ...
key Revenues, input token length: 46, output token length: 20
num_input_tokens: 46, num_output_tokens: 58, total: 104
working on 814549: Gross Profit Margin ...
working on 814549: Operating Income ...
key Operating Income, input token length: 1426, output token length: 647
reduce max_num_tokens by 100: 3900
key Operating Income, input token length: 1426, output token length: 647
num_input_tokens: 1426, num_output_tokens: 196, total: 1622
working on 814549: Interest Expense ...
key Interest Expense, input token length: 122, output token length: 55
num_input_tokens: 122, num_output_tokens: 111, total: 233
working on 814549: Liquidity ...
key Liquidity, input token length: 1280, output token length: 580
num_input_tokens: 1280, num_output_tok

26it [1:04:16, 162.34s/it]

working on 816761: Business Overview ...
key Business Overview, input token length: 631, output token length: 214
num_input_tokens: 631, num_output_tokens: 179, total: 810
working on 816761: Results of Operations ...
key Results of Operations, input token length: 174, output token length: 59
num_input_tokens: 174, num_output_tokens: 114, total: 288
working on 816761: Revenues ...
key Revenues, input token length: 1103, output token length: 374
num_input_tokens: 1103, num_output_tokens: 208, total: 1311
working on 816761: Gross Profit Margin ...
key Gross Profit Margin, input token length: 34, output token length: 11
num_input_tokens: 34, num_output_tokens: 36, total: 70
working on 816761: Operating Income ...
key Operating Income, input token length: 91, output token length: 31
num_input_tokens: 91, num_output_tokens: 86, total: 177
working on 816761: Interest Expense ...
working on 816761: Liquidity ...
key Liquidity, input token length: 1392, output token length: 471
num_input_tokens

27it [1:06:46, 158.55s/it]

working on 823546: Business Overview ...
key Business Overview, input token length: 2004, output token length: 1158
num_input_tokens: 2004, num_output_tokens: 323, total: 2327
working on 823546: Results of Operations ...
key Results of Operations, input token length: 554, output token length: 319
num_input_tokens: 554, num_output_tokens: 228, total: 782
working on 823546: Revenues ...
working on 823546: Gross Profit Margin ...
working on 823546: Operating Income ...
working on 823546: Interest Expense ...
working on 823546: Liquidity ...
key Liquidity, input token length: 747, output token length: 431
num_input_tokens: 747, num_output_tokens: 347, total: 1094


28it [1:08:03, 134.35s/it]

working on 823546: Debt ...
working on 827876: Business Overview ...
working on 827876: Results of Operations ...
key Results of Operations, input token length: 342, output token length: 212
num_input_tokens: 342, num_output_tokens: 110, total: 452
working on 827876: Revenues ...
key Revenues, input token length: 243, output token length: 151
num_input_tokens: 243, num_output_tokens: 228, total: 471
working on 827876: Gross Profit Margin ...
working on 827876: Operating Income ...
working on 827876: Interest Expense ...
working on 827876: Liquidity ...
key Liquidity, input token length: 1076, output token length: 671
reduce max_num_tokens by 100: 3900
key Liquidity, input token length: 1076, output token length: 671
num_input_tokens: 1076, num_output_tokens: 468, total: 1544
working on 827876: Debt ...
key Debt, input token length: 315, output token length: 196
num_input_tokens: 315, num_output_tokens: 171, total: 486


29it [1:09:40, 123.10s/it]

working on 1725579: Business Overview ...
key Business Overview, input token length: 1504, output token length: 474
num_input_tokens: 1504, num_output_tokens: 259, total: 1763
working on 1725579: Results of Operations ...
working on 1725579: Revenues ...
key Revenues, input token length: 331, output token length: 104
num_input_tokens: 331, num_output_tokens: 132, total: 463
working on 1725579: Gross Profit Margin ...
key Gross Profit Margin, input token length: 1063, output token length: 335
num_input_tokens: 1063, num_output_tokens: 236, total: 1299
working on 1725579: Operating Income ...
key Operating Income, input token length: 1511, output token length: 476
num_input_tokens: 1511, num_output_tokens: 355, total: 1866
working on 1725579: Interest Expense ...
working on 1725579: Liquidity ...
key Liquidity, input token length: 1259, output token length: 396
num_input_tokens: 1259, num_output_tokens: 390, total: 1649
working on 1725579: Debt ...
key Debt, input token length: 658, outp

30it [1:12:07, 130.07s/it]

working on 849399: Business Overview ...
key Business Overview, input token length: 344, output token length: 184
num_input_tokens: 344, num_output_tokens: 184, total: 528
working on 849399: Results of Operations ...
key Results of Operations, input token length: 1231, output token length: 663
num_input_tokens: 1231, num_output_tokens: 148, total: 1379
working on 849399: Revenues ...
working on 849399: Gross Profit Margin ...
key Gross Profit Margin, input token length: 126, output token length: 67
num_input_tokens: 126, num_output_tokens: 134, total: 260
working on 849399: Operating Income ...
key Operating Income, input token length: 844, output token length: 454
num_input_tokens: 844, num_output_tokens: 223, total: 1067
working on 849399: Interest Expense ...
working on 849399: Liquidity ...
key Liquidity, input token length: 1807, output token length: 972
num_input_tokens: 1807, num_output_tokens: 347, total: 2154
working on 849399: Debt ...
key Debt, input token length: 932, outpu

31it [1:14:22, 131.73s/it]

working on 727634: Business Overview ...
key Business Overview, input token length: 4438, output token length: 707
num_input_tokens: 2219, num_output_tokens: 367, total: 2586
num_input_tokens: 2219, num_output_tokens: 318, total: 2537
working on 727634: Results of Operations ...
working on 727634: Revenues ...
key Revenues, input token length: 154, output token length: 24
num_input_tokens: 154, num_output_tokens: 139, total: 293
working on 727634: Gross Profit Margin ...
key Gross Profit Margin, input token length: 86, output token length: 14
num_input_tokens: 86, num_output_tokens: 63, total: 149
working on 727634: Operating Income ...
key Operating Income, input token length: 1887, output token length: 300
num_input_tokens: 1887, num_output_tokens: 215, total: 2102


32it [1:16:17, 126.55s/it]

working on 727634: Interest Expense ...
working on 727634: Liquidity ...
working on 727634: Debt ...
working on 855612: Business Overview ...
key Business Overview, input token length: 632, output token length: 284
num_input_tokens: 632, num_output_tokens: 258, total: 890
working on 855612: Results of Operations ...
key Results of Operations, input token length: 714, output token length: 322
num_input_tokens: 714, num_output_tokens: 230, total: 944
working on 855612: Revenues ...
key Revenues, input token length: 578, output token length: 260
num_input_tokens: 578, num_output_tokens: 256, total: 834
working on 855612: Gross Profit Margin ...
key Gross Profit Margin, input token length: 480, output token length: 216
num_input_tokens: 480, num_output_tokens: 302, total: 782
working on 855612: Operating Income ...
working on 855612: Interest Expense ...
working on 855612: Liquidity ...
key Liquidity, input token length: 1040, output token length: 468
num_input_tokens: 1040, num_output_tok

33it [1:18:44, 132.72s/it]

working on 876167: Business Overview ...
key Business Overview, input token length: 334, output token length: 108
num_input_tokens: 334, num_output_tokens: 171, total: 505
working on 876167: Results of Operations ...
key Results of Operations, input token length: 4139, output token length: 1347
num_input_tokens: 2070, num_output_tokens: 178, total: 2248
num_input_tokens: 2070, num_output_tokens: 324, total: 2394
working on 876167: Revenues ...
working on 876167: Gross Profit Margin ...
working on 876167: Operating Income ...
working on 876167: Interest Expense ...
working on 876167: Liquidity ...
key Liquidity, input token length: 3479, output token length: 1132
num_input_tokens: 1739, num_output_tokens: 236, total: 1975
num_input_tokens: 1740, num_output_tokens: 303, total: 2043
working on 876167: Debt ...
key Debt, input token length: 284, output token length: 92
num_input_tokens: 284, num_output_tokens: 180, total: 464


34it [1:21:03, 134.57s/it]

working on 877890: Business Overview ...
key Business Overview, input token length: 3040, output token length: 642
num_input_tokens: 3040, num_output_tokens: 450, total: 3490
working on 877890: Results of Operations ...
key Results of Operations, input token length: 4030, output token length: 850
num_input_tokens: 2015, num_output_tokens: 200, total: 2215
num_input_tokens: 2015, num_output_tokens: 488, total: 2503
working on 877890: Revenues ...
working on 877890: Gross Profit Margin ...
working on 877890: Operating Income ...
working on 877890: Interest Expense ...
working on 877890: Liquidity ...
key Liquidity, input token length: 2472, output token length: 522
num_input_tokens: 2472, num_output_tokens: 218, total: 2690
working on 877890: Debt ...
key Debt, input token length: 476, output token length: 100
num_input_tokens: 476, num_output_tokens: 163, total: 639


35it [1:23:11, 132.57s/it]

working on 883241: Business Overview ...
key Business Overview, input token length: 1416, output token length: 423
num_input_tokens: 1416, num_output_tokens: 178, total: 1594
working on 883241: Results of Operations ...
key Results of Operations, input token length: 1136, output token length: 339
num_input_tokens: 1136, num_output_tokens: 228, total: 1364
working on 883241: Revenues ...
key Revenues, input token length: 1558, output token length: 464
num_input_tokens: 1558, num_output_tokens: 198, total: 1756
working on 883241: Gross Profit Margin ...
key Gross Profit Margin, input token length: 394, output token length: 118
num_input_tokens: 394, num_output_tokens: 251, total: 645
working on 883241: Operating Income ...
working on 883241: Interest Expense ...
key Interest Expense, input token length: 40, output token length: 11
num_input_tokens: 40, num_output_tokens: 42, total: 82
working on 883241: Liquidity ...
key Liquidity, input token length: 650, output token length: 194
num_in

36it [1:25:49, 140.29s/it]

working on 884144: Business Overview ...
key Business Overview, input token length: 1048, output token length: 438
num_input_tokens: 1048, num_output_tokens: 158, total: 1206
working on 884144: Results of Operations ...
key Results of Operations, input token length: 87, output token length: 36
num_input_tokens: 87, num_output_tokens: 88, total: 175
working on 884144: Revenues ...
key Revenues, input token length: 998, output token length: 416
num_input_tokens: 998, num_output_tokens: 283, total: 1281
working on 884144: Gross Profit Margin ...
key Gross Profit Margin, input token length: 167, output token length: 71
num_input_tokens: 167, num_output_tokens: 128, total: 295
working on 884144: Operating Income ...
working on 884144: Interest Expense ...
key Interest Expense, input token length: 71, output token length: 30
num_input_tokens: 71, num_output_tokens: 59, total: 130
working on 884144: Liquidity ...
key Liquidity, input token length: 2206, output token length: 920
num_input_toke

37it [1:28:01, 137.77s/it]

working on 884144: Debt ...
working on 892482: Business Overview ...
key Business Overview, input token length: 128, output token length: 43
num_input_tokens: 128, num_output_tokens: 111, total: 239
working on 892482: Results of Operations ...
key Results of Operations, input token length: 52, output token length: 18
num_input_tokens: 52, num_output_tokens: 34, total: 86
working on 892482: Revenues ...
key Revenues, input token length: 631, output token length: 215
num_input_tokens: 631, num_output_tokens: 278, total: 909
working on 892482: Gross Profit Margin ...
key Gross Profit Margin, input token length: 371, output token length: 126
num_input_tokens: 371, num_output_tokens: 219, total: 590
working on 892482: Operating Income ...
key Operating Income, input token length: 970, output token length: 331
reduce max_num_tokens by 100: 3900
key Operating Income, input token length: 970, output token length: 331
num_input_tokens: 970, num_output_tokens: 294, total: 1264
working on 892482:

38it [1:30:49, 147.00s/it]

working on 922521: Business Overview ...
key Business Overview, input token length: 1666, output token length: 636
num_input_tokens: 1666, num_output_tokens: 320, total: 1986
working on 922521: Results of Operations ...
working on 922521: Revenues ...
key Revenues, input token length: 684, output token length: 262
num_input_tokens: 684, num_output_tokens: 326, total: 1010
working on 922521: Gross Profit Margin ...
key Gross Profit Margin, input token length: 475, output token length: 182
num_input_tokens: 475, num_output_tokens: 224, total: 699
working on 922521: Operating Income ...
key Operating Income, input token length: 1003, output token length: 383
num_input_tokens: 1003, num_output_tokens: 235, total: 1238
working on 922521: Interest Expense ...
key Interest Expense, input token length: 74, output token length: 28
num_input_tokens: 74, num_output_tokens: 68, total: 142
working on 922521: Liquidity ...
key Liquidity, input token length: 2187, output token length: 836
num_input_t

39it [1:33:17, 147.12s/it]

working on 922521: Debt ...
working on 929940: Business Overview ...
key Business Overview, input token length: 872, output token length: 268
num_input_tokens: 872, num_output_tokens: 311, total: 1183
working on 929940: Results of Operations ...
key Results of Operations, input token length: 1050, output token length: 323
reduce max_num_tokens by 100: 3900
key Results of Operations, input token length: 1050, output token length: 323
num_input_tokens: 1050, num_output_tokens: 239, total: 1289
working on 929940: Revenues ...
key Revenues, input token length: 1284, output token length: 396
num_input_tokens: 1284, num_output_tokens: 228, total: 1512
working on 929940: Gross Profit Margin ...
key Gross Profit Margin, input token length: 276, output token length: 84
num_input_tokens: 276, num_output_tokens: 183, total: 459
working on 929940: Operating Income ...
key Operating Income, input token length: 396, output token length: 122
num_input_tokens: 396, num_output_tokens: 211, total: 607
w

40it [1:36:25, 159.57s/it]

working on 935036: Business Overview ...
key Business Overview, input token length: 2026, output token length: 694
num_input_tokens: 2026, num_output_tokens: 250, total: 2276
working on 935036: Results of Operations ...
key Results of Operations, input token length: 39, output token length: 12
num_input_tokens: 39, num_output_tokens: 40, total: 79
working on 935036: Revenues ...
key Revenues, input token length: 1864, output token length: 639
num_input_tokens: 1864, num_output_tokens: 207, total: 2071
working on 935036: Gross Profit Margin ...
working on 935036: Operating Income ...
key Operating Income, input token length: 1892, output token length: 648
reduce max_num_tokens by 100: 3900
key Operating Income, input token length: 1892, output token length: 648
num_input_tokens: 1892, num_output_tokens: 454, total: 2346
working on 935036: Interest Expense ...
working on 935036: Liquidity ...
key Liquidity, input token length: 1023, output token length: 351
num_input_tokens: 1023, num_ou

41it [1:38:55, 156.71s/it]

working on 941685: Business Overview ...
key Business Overview, input token length: 1252, output token length: 404
num_input_tokens: 1252, num_output_tokens: 312, total: 1564
working on 941685: Results of Operations ...
working on 941685: Revenues ...
key Revenues, input token length: 2347, output token length: 759
num_input_tokens: 2347, num_output_tokens: 179, total: 2526
working on 941685: Gross Profit Margin ...
key Gross Profit Margin, input token length: 651, output token length: 210
num_input_tokens: 651, num_output_tokens: 388, total: 1039
working on 941685: Operating Income ...
key Operating Income, input token length: 403, output token length: 130
num_input_tokens: 403, num_output_tokens: 262, total: 665
working on 941685: Interest Expense ...
key Interest Expense, input token length: 90, output token length: 28
num_input_tokens: 90, num_output_tokens: 60, total: 150
working on 941685: Liquidity ...
key Liquidity, input token length: 3200, output token length: 1035
num_input_

42it [1:42:15, 169.50s/it]

working on 948708: Business Overview ...
key Business Overview, input token length: 311, output token length: 84
num_input_tokens: 311, num_output_tokens: 136, total: 447
working on 948708: Results of Operations ...
key Results of Operations, input token length: 868, output token length: 236
reduce max_num_tokens by 100: 3900
key Results of Operations, input token length: 868, output token length: 236
num_input_tokens: 868, num_output_tokens: 236, total: 1104
working on 948708: Revenues ...
key Revenues, input token length: 607, output token length: 164
num_input_tokens: 607, num_output_tokens: 271, total: 878
working on 948708: Gross Profit Margin ...
working on 948708: Operating Income ...
key Operating Income, input token length: 91, output token length: 24
num_input_tokens: 91, num_output_tokens: 103, total: 194
working on 948708: Interest Expense ...
working on 948708: Liquidity ...
key Liquidity, input token length: 250, output token length: 67
num_input_tokens: 250, num_output_t

43it [1:44:29, 158.93s/it]

working on 1001601: Business Overview ...
key Business Overview, input token length: 466, output token length: 286
num_input_tokens: 466, num_output_tokens: 156, total: 622
working on 1001601: Results of Operations ...
key Results of Operations, input token length: 674, output token length: 412
num_input_tokens: 674, num_output_tokens: 222, total: 896
working on 1001601: Revenues ...
working on 1001601: Gross Profit Margin ...
working on 1001601: Operating Income ...
working on 1001601: Interest Expense ...
working on 1001601: Liquidity ...
key Liquidity, input token length: 3028, output token length: 1858
num_input_tokens: 1514, num_output_tokens: 196, total: 1710
num_input_tokens: 1515, num_output_tokens: 214, total: 1729


44it [1:45:57, 137.64s/it]

working on 1001601: Debt ...
working on 1002517: Business Overview ...
key Business Overview, input token length: 1043, output token length: 368
num_input_tokens: 1043, num_output_tokens: 228, total: 1271
working on 1002517: Results of Operations ...
key Results of Operations, input token length: 223, output token length: 79
num_input_tokens: 223, num_output_tokens: 179, total: 402
working on 1002517: Revenues ...
key Revenues, input token length: 1722, output token length: 607
reduce max_num_tokens by 100: 3900
key Revenues, input token length: 1722, output token length: 607
num_input_tokens: 1722, num_output_tokens: 270, total: 1992
working on 1002517: Gross Profit Margin ...
working on 1002517: Operating Income ...
key Operating Income, input token length: 550, output token length: 194
num_input_tokens: 550, num_output_tokens: 198, total: 748
working on 1002517: Interest Expense ...
working on 1002517: Liquidity ...
key Liquidity, input token length: 2388, output token length: 843
r

45it [1:48:37, 144.45s/it]

working on 1013462: Business Overview ...
key Business Overview, input token length: 2547, output token length: 492
num_input_tokens: 2547, num_output_tokens: 260, total: 2807
working on 1013462: Results of Operations ...
key Results of Operations, input token length: 2099, output token length: 406
num_input_tokens: 2099, num_output_tokens: 278, total: 2377
working on 1013462: Revenues ...
key Revenues, input token length: 1387, output token length: 268
num_input_tokens: 1387, num_output_tokens: 212, total: 1599
working on 1013462: Gross Profit Margin ...
working on 1013462: Operating Income ...
key Operating Income, input token length: 484, output token length: 94
num_input_tokens: 484, num_output_tokens: 152, total: 636
working on 1013462: Interest Expense ...
working on 1013462: Liquidity ...
key Liquidity, input token length: 1539, output token length: 298
num_input_tokens: 1539, num_output_tokens: 266, total: 1805
working on 1013462: Debt ...
key Debt, input token length: 604, out

46it [1:50:55, 144.69s/it]


In [None]:
# load in GPT generated summaries after creating them
summary_list = glob.glob(os.path.join(summary_path,'*.txt'))
len(summary_list)

46

In [None]:
summary_df = []

def get_id(x):
  x = re.sub("/content/drive/MyDrive/w210_capstone_project/data/SEC_Edgar_Annual_Financial_Filings_2021/working3/gpt/", "", x)
  x = x.split(".")[0]
  return x

for s in summary_list:
  with open(s, 'r') as f:
    tmp_summary = "\n".join(f.readlines())
    summary_df.append({"id": get_id(s), "gpt_summary": tmp_summary})

summary_df = pd.DataFrame(summary_df)
summary_df.head()

Unnamed: 0,id,gpt_summary
0,8670,Business Overview\n\nADP is a global provider ...
1,50471,Business Overview\n\nThe company is a SaaS pro...
2,78749,Business Overview\n\nAgilysys is a hospitality...
3,317788,"Business Overview\n\nDigital Turbine, Inc. all..."
4,320340,Business Overview\n\nThe company provides tech...


In [None]:
df = pd.merge(df, summary_df, how = "left", on = "id")
df.head()

Unnamed: 0,id,label_length,label,kept_report_length,report,has_label,gpt_summary
0,8670,1142,"AUTOMATIC DATA PROCESSING, INC. (“ADPI”) Auto...",3770,{'Business Overview': 'highlights from the yea...,True,Business Overview\n\nADP is a global provider ...
1,50471,907,"Park City Group, Inc. (“PCGI”) The Company is ...",1599,{'Business Overview': 'the company is a saas p...,True,Business Overview\n\nThe company is a SaaS pro...
2,78749,756,"AGILYSYS, Inc. (“AI”) Agilysys has been a lead...",3348,{'Business Overview': ' agilysys has been a le...,True,Business Overview\n\nAgilysys is a hospitality...
3,317788,927,"Digital Turbine, Inc. (“DTI”) Digital Turbine,...",6580,{'Business Overview': 'we allocate the fair va...,True,"Business Overview\n\nDigital Turbine, Inc. all..."
4,320340,933,Intelligent Systems Corporation (“ISC”) ISC’s...,1757,{'Business Overview': 'our consolidated operat...,True,Business Overview\n\nThe company provides tech...


In [None]:
def calculate_metrics(ref_sentences, cand_sentences):
    # Load the models for BERTScore

    # Calculate the ROUGE scores
    rouge = Rouge()
    rouge_scores = rouge.get_scores(cand_sentences, ref_sentences, avg=True)

    # Calculate the BERTScore
    bertscore = get_bert_score(cand_sentences.tolist(), ref_sentences.tolist())

    #return rouge_scores[0]['rouge-1']['f'],rouge_scores[0]['rouge-2']['f'],rouge_scores[0]['rouge-l']['f'],bertscore
    return rouge_scores['rouge-1']['f'], rouge_scores['rouge-2']['f'], rouge_scores['rouge-l']['f'], bertscore

def get_bert_score(cands, refs):
    assert len(cands) == len(refs)
    P, R, F1 = score(cands, refs, lang='en')
    P = torch.mean(P, dim=0).item()
    R = torch.mean(R, dim=0).item()
    F1 = torch.mean(F1, dim=0).item()
    return F1

In [None]:
calculate_metrics(df['gpt_summary'], df['label'])

Downloading (…)lve/main/config.json:   0%|          | 0.00/482 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.43G [00:00<?, ?B/s]

Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.dense.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


(0.4905263727798257,
 0.27665308574061515,
 0.46643910380809533,
 0.85709148645401)

In [None]:
df.to_pickle(os.path.join(summary_path, "gpt_summaries_df.pkl"))