Insert the file path to your .txt file below


The code below will extract the text from the file or save it as a variable somehow


In [10]:
file_path = r"/Users/user/CS461/NLP/Natural-Language-Processing-of-Financial-Disclosures/data/0000320187-25-000053.txt"
 
from pathlib import Path

file_path = Path(file_path)
with file_path.open("r", encoding="utf-8") as f:
    file_text = f.read()


Below is the code for the prompt

In [11]:
prompt = f"""
You are an expert financial analyst. Analyze the following financial document.

It is BERTopic analysis of a financial document.  
The format is as follows:

Topic 0 | chunks=101
Label: 0_report_communications_number_commission
Top words: report, communications, number, commission, commencement, date, address, former, code, name

Topic 1 | chunks=81
Label: 1_gaap_operating_fiscal_loss
Top words: gaap, operating, fiscal, loss, quarter, excluded, income, performance, amortization, assets

...

Below this section you will also see a summary for each topic, in the form:
(0, '0.047*"report" + 0.044*"communications" + 0.043*"number" + ...')

For each topic:
- The topic ID appears before the parenthesis.
- Inside each parenthesis, the word-weight pairs show the weight of each word for that topic.

Your tasks:

===============================================================================
1. Calculate topic proportions
===============================================================================
For each topic ID, calculate the **topic proportion** by summing ALL the word weights inside the parenthesis block for that topic.

Only include topics whose total proportion exceeds **0.01**.

===============================================================================
2. Create the main topic table
===============================================================================
Add each selected topic to a table in this format:

| Topic ID | FileName | Date | Topic Number | Topic Proportion |

Use these rules:
- FileName → If no filename is provided, write "Unknown".
- Date → If no date is provided, write "Unknown".
- Topic Number → Same as Topic ID.

===============================================================================
3. Create the topic-name key table
===============================================================================
Using the word lists associated with each chosen topic, infer a concise financial topic name (e.g.,  
“GAAP Operating Losses”,  
“Credit Facilities & Liquidity”,  
“Regulatory Filings & Reporting”, etc.)

Format the table like this:

| Topic Number | Topic Name | Topic Description |

The Topic Description should be 1–2 sentences explaining the theme in plain English.

===============================================================================
4. Output
===============================================================================
Print **in this order**:
1. The first table (topics above threshold)  
2. The second table (topic names + descriptions)

===============================================================================
Here is the document to analyze:
---
{file_text}
---
"""


Below is the code for the api call where the prompt is passed

In [18]:
from dotenv import load_dotenv
import os
from google import genai


def summarize_disclosure(file_path, prompt):
    """
    Summarizes the content of a large text file using the Gemini API.

    Args:
        file_path (str): The path to the text file.
        prompt (str): The summarization instruction from the user.

    Returns:
        str: The generated summary or an error message.
    """
    try:
        # --- 1. Configure the Gemini API Key ---
        load_dotenv()
        api_key = os.getenv("GEMINI_My_API_KEY2")

        if not api_key:
            return "Error: GEMINI_API_KEY not found. Make sure it's set in your .env file."

        # --- REQUIRED CHANGE for google.genai ---
        client = genai.Client(api_key=api_key)

        # --- 2. Create the Full Prompt for the API ---
        # (keeping your structure; file_text assumed to exist as before)
        full_prompt = f"{prompt}\n\n---\n\n{file_text}"

        # --- 3. Generate the Summary ---
        print("Generating summary with Gemini...")
        response = client.models.generate_content(
            model="gemini-2.5-flash",
            contents=full_prompt
        )

        return response.text

    except FileNotFoundError:
        return f"Error: The file '{file_path}' was not found."
    except Exception as e:
        return f"An error occurred: {e}"


# ⬇️ UNCHANGED (as requested)
if __name__ == "__main__":
    print("--- Text File Summarization with Gemini API ---")

    # Get the summary
    summary = summarize_disclosure(file_path, prompt)

    # Print the result
    print("\n--- Summary ---")
    print(summary)


--- Text File Summarization with Gemini API ---
Generating summary with Gemini...

--- Summary ---
NIKE, Inc. filed a Form 8-K on July 28, 2025, to correct an error in its Annual Report on Form 10-K for the year ended May 31, 2025.

The correction relates to the company's **product purchase obligations**, which represent agreements and open purchase orders for products in the ordinary course of business. Nike previously **overstated** these obligations in the Annual Report.

As corrected, the product purchase obligations as of May 31, 2025, are approximately **$5 billion**, all of which are payable within the next 12 months.
