In [1]:
from datasets import load_dataset
import pandas as pd
import json 

dataset = load_dataset("PatronusAI/financebench")
df = dataset['train'].to_pandas()

In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, HfArgumentParser, TrainingArguments, pipeline

In [3]:
base_model_name = "meta-llama/Llama-2-7b-chat-hf"
refined_model = "finetuned_10labels"
# Tokenizer
llama_tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
llama_tokenizer.pad_token = llama_tokenizer.eos_token

In [15]:
def get_model_answer(model_id, df):
    system_prompt = "You are a financial chatbot trained to answer questions based on the information provided in 10-K documents. Your responses should be directly sourced from the content of these documents. When asked a question, ensure that your answer is explicitly supported by the text in the 10-K filing, and do not include any external information, interpretations, or assumptions not clearly stated in the document. If a question pertains to financial data or analysis that is not explicitly covered in the 10-K filing provided, respond by stating that the information is not available in the document. Your primary focus should be on accuracy, specificity, and adherence to the information in 10-K documents, particularly regarding financial statements, company performance, and market positions.\n" 
    user_prompt = df['question'][19]
    input_section = df['evidence_text'][19]
    text_gen = pipeline(task="text-generation", model=model_id, tokenizer=llama_tokenizer)
    output = text_gen(f"<s>[INST] <<SYS>>\n{system_prompt}\n<</SYS>>\n\n{user_prompt} {input_section} [/INST]")
    generated_text = output[0]['generated_text']
    # Extracting the part of the text after [/INST]
    response_start_idx = generated_text.find("[/INST]") + len("[/INST]")
    model_response = generated_text[response_start_idx:].strip()
    print(model_response)
    return model_response

In [16]:
get_model_answer("finetuned_10labels", df)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

According to Amazon's consolidated statement of operations for FY2017, the company's year-over-year change in revenue was a 27.7% increase from $79.268 billion in FY2016 to $118.573 billion in FY2017.

To calculate this change, we can use the line items shown in the statement of income:

* Net product sales: $79,268 million in FY2016 and $118,573 million in FY2017, for a 48.5% increase.
* Net service sales: $27,738 million in FY2016 and $59,293 million in FY2017, for a 115.5% increase.
* Total net sales: $107,006 million in FY2016 and $177,866 million in FY2017, for a 64.5% increase.

Therefore, Amazon's year-over-year change in revenue from FY2016 to FY2017 was a 27.7% increase.


"According to Amazon's consolidated statement of operations for FY2017, the company's year-over-year change in revenue was a 27.7% increase from $79.268 billion in FY2016 to $118.573 billion in FY2017.\n\nTo calculate this change, we can use the line items shown in the statement of income:\n\n* Net product sales: $79,268 million in FY2016 and $118,573 million in FY2017, for a 48.5% increase.\n* Net service sales: $27,738 million in FY2016 and $59,293 million in FY2017, for a 115.5% increase.\n* Total net sales: $107,006 million in FY2016 and $177,866 million in FY2017, for a 64.5% increase.\n\nTherefore, Amazon's year-over-year change in revenue from FY2016 to FY2017 was a 27.7% increase."

In [4]:
def get_all_model_answers(model_id, df):
    # Initialize the text generation pipeline
    text_gen = pipeline(task="text-generation", model=model_id, tokenizer=llama_tokenizer)

    # Lists to store the data
    questions, answers, model_responses = [], [], []

    for index, row in df.iterrows():
        # Append the original question and answer
        questions.append(row['question'])
        answers.append(row['answer'])

        system_prompt = "You are a financial chatbot trained to answer questions based on the information provided in 10-K documents. Your responses should be directly sourced from the content of these documents. When asked a question, ensure that your answer is explicitly supported by the text in the 10-K filing, and do not include any external information, interpretations, or assumptions not clearly stated in the document. If a question pertains to financial data or analysis that is not explicitly covered in the 10-K filing provided, respond by stating that the information is not available in the document. Your primary focus should be on accuracy, specificity, and adherence to the information in 10-K documents, particularly regarding financial statements, company performance, and market positions.\n" 
        user_prompt = row['question']
        input_section = row['evidence_text']
        prompt = f"<s>[INST] <<SYS>>\n{system_prompt}\n<</SYS>>\n\n{user_prompt} {input_section} [/INST]"
        
        output = text_gen(prompt)
        generated_text = output[0]['generated_text']
        
        # Extract the part of the text after [/INST]
        response_start_idx = generated_text.find("[/INST]") + len("[/INST]")
        model_response = generated_text[response_start_idx:].strip()
        print(model_response)
        model_responses.append(model_response)

    # Create a new DataFrame with the responses and original columns
    response_df = pd.DataFrame({
        'Question': questions,
        'Refrence_Answer': answers,
        'Model_Response': model_responses
    })

    return response_df



In [None]:
response_df_10labels = get_all_model_answers("finetuned_10labels", df)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Based on the information provided in the 2018 10-K document, the FY2018 capital expenditure amount for 3M is $1,577 million. This amount can be found on the Consolidated Statement of Cash Flows under the "Purchases of property, plant and equipment (PP&E)" section.
Based on the information provided in the balance sheet of 3M Company as of December 31, 2018, the year-end net PPNE for 3M is $26,652 billion.
Based on the information provided in the 10-K filing for 3M Company and Subsidiaries for FY2022, the answer to the question is no, 3M is not a capital-intensive business.

The consolidated statement of income for FY2022 shows that the company's net sales were $34.2 billion, while its property, plant, and equipment (PP&E) net of accumulated depreciation was $9.2 billion. This means that the company's PP&E assets were approximately 27% of its net sales, which is not considered a high level of capital intensity.

Additionally, the consolidated balance sheet shows that the company's total 




According to MGM Resorts' 2022 Form 10-K filed with the SEC, the answer to your question is:

"Yes, MGM Resorts paid dividends to common shareholders in FY2022. The company maintained an annual dividend of $0.01 per share throughout 2022."

This information is directly sourced from the "Dividend Policy" section of the 10-K filing, which states:

"The company has a long-standing policy of paying a dividend on its common stock. The board of directors reviews the dividend policy on a regular basis and may adjust the dividend payment based on the company's financial performance and other factors. In 2022, the company paid a dividend of $0.01 per share on its common stock."

Therefore, based on the information provided in the 10-K document, MGM Resorts did pay dividends to common shareholders in FY2022.
According to MGM's 10-K filing for the year ended December 31, 2022, the region with the highest EBITDAR contribution was Las Vegas Strip Resorts, with EBITDAR of $877,052 for the three mon

In [None]:
response_df_20labels = get_all_model_answers("finetuned_20labels", df)

In [None]:
response_df_30labels = get_all_model_answers("finetuned_30labels", df)

In [None]:
response_df_40labels = get_all_model_answers("finetuned_40labels", df)

In [None]:
response_df_50labels = get_all_model_answers("finetuned_50labels", df)