# 10-Q Financial Summary

This notebook reads JPMorgan Chase 10-Q filings (HTML format) and uses GPT-4o to summarize the most important financial information for each quarter.

## 1. Setup and Configuration

In [None]:
# Configuration
DATA_DIR = Path("../Data/2025 10Q")
OUTPUT_DIR = Path("../Processed Data")
OUTPUT_FILE = OUTPUT_DIR / "10Q_summaries.json"

# Create output directory if it doesn't exist
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

# Find all 10-Q files
tenq_files = sorted(DATA_DIR.glob("10Q_2025Q*.html"))
print(f"Found {len(tenq_files)} 10-Q files:")
for f in tenq_files:
    print(f"  - {f.name}")
print(f"\nOutput will be saved to: {OUTPUT_FILE}")

Found 3 10-Q files:
  - 10Q_2025Q1.html
  - 10Q_2025Q2.html
  - 10Q_2025Q3.html

Output will be saved to: ../Processed Data/10Q_summaries.json


In [None]:
# Initialize OpenAI client


## 2. Extract Text from HTML 10-Q Files

## 3. Extract Key Financial Sections


In [31]:
def extract_key_sections(text):
    """
    Extract key financial sections from 10-Q text
    
    Targets the most information-dense sections needed for comprehensive analysis:
    - Part I, Item 1: Financial Statements (consolidated income, balance sheet)
    - Part I, Item 2: Management's Discussion and Analysis (MD&A)
    - Part II, Item 1A: Risk Factors (if material changes)
    
    Args:
        text: Full 10-Q text
    
    Returns:
        str: Extracted key sections
    """
    # Define sections to extract with their patterns and char limits
    # Structure: (pattern, max_chars, description)
    sections_config = [
        # Financial Statements - Core numbers
        (r'CONDENSED CONSOLIDATED STATEMENTS OF INCOME', 15000, 'Income Statement'),
        (r'CONSOLIDATED BALANCE SHEETS?', 15000, 'Balance Sheet'),
        (r'CONSOLIDATED STATEMENTS OF CASH FLOWS', 10000, 'Cash Flow'),
        
        # MD&A - The most detailed narrative section
        (r'ITEM 2[.\s]*MANAGEMENT.?S DISCUSSION AND ANALYSIS', 50000, 'MD&A Full'),
        (r'EXECUTIVE OVERVIEW', 10000, 'Executive Summary'),
        (r'CONSOLIDATED RESULTS OF OPERATIONS', 20000, 'Results of Operations'),
        (r'FIRM-WIDE RESULTS', 15000, 'Firm-wide Results'),
        
        # Business Segments
        (r'(?:CONSUMER|CCB).{0,20}COMMUNITY BANKING', 12000, 'CCB Segment'),
        (r'CORPORATE.{0,20}INVESTMENT BANK', 12000, 'CIB Segment'),
        (r'COMMERCIAL BANKING', 10000, 'CB Segment'),
        (r'ASSET.{0,20}WEALTH MANAGEMENT', 10000, 'AWM Segment'),
        
        # Credit and Risk
        (r'CREDIT PORTFOLIO', 15000, 'Credit Portfolio'),
        (r'CREDIT RISK', 12000, 'Credit Risk'),
        (r'ALLOWANCE FOR (?:CREDIT )?LOSSES', 10000, 'Credit Reserves'),
        (r'NET CHARGE-OFFS?', 8000, 'Charge-offs'),
        
        # Capital and Liquidity
        (r'CAPITAL RISK MANAGEMENT', 12000, 'Capital Management'),
        (r'REGULATORY CAPITAL', 10000, 'Capital Ratios'),
        (r'LIQUIDITY RISK MANAGEMENT', 10000, 'Liquidity'),
        
        # Risk Factors
        (r'ITEM 1A[.\s]*RISK FACTORS', 15000, 'Risk Factors'),
    ]
    
    extracted = []
    found_sections = []
    
    for pattern, max_chars, description in sections_config:
        match = re.search(pattern, text, re.IGNORECASE)
        if match:
            start = match.start()
            # Extract from match to max_chars
            section_text = text[start:start + max_chars]
            extracted.append(section_text)
            found_sections.append(description)
    
    print(f"\nFound {len(found_sections)} sections:")
    for section in found_sections:
        print(f"{section}")
    
    # If we found sections, join them with clear separators
    if extracted:
        result = '\n\n' + '='*80 + '\n=== SECTION BREAK ===\n' + '='*80 + '\n\n'
        result = result.join(extracted)
        return result
    
    # Fallback: if no sections found, return first portion
    print("No structured sections found, using first 80,000 characters")
    return text[:80000]

# Test section extraction
key_sections = extract_key_sections(test_text)
print(f"\nExtraction Summary:")
print(f"  Total extracted: {len(key_sections):,} characters")
print(f"  Original length: {len(test_text):,} characters")
print(f"  Reduction: {100 - (len(key_sections)/len(test_text)*100):.1f}%")


Found 17 sections:
Balance Sheet
Cash Flow
MD&A Full
Executive Summary
Results of Operations
CCB Segment
CIB Segment
CB Segment
AWM Segment
Credit Portfolio
Credit Risk
Credit Reserves
Charge-offs
Capital Management
Capital Ratios
Liquidity
Risk Factors

Extraction Summary:
  Total extracted: 243,992 characters
  Original length: 724,710 characters
  Reduction: 66.3%


# Save to JSON file

In [32]:
def build_summarization_prompt(text, quarter):
    """
    Build prompt for GPT to summarize 10-Q filing
    
    Args:
        text: Extracted 10-Q text
        quarter: Quarter identifier (e.g., "2025Q1")
    
    Returns:
        str: Prompt for GPT
    """
    prompt = f"""You are a senior financial analyst reviewing JPMorgan Chase's {quarter} 10-Q filing.

Provide a detailed and comprehensive financial analysis covering these key areas:

1. Executive summary of quarterly performance
2. Core financials: revenue, net income, EPS, ROE/ROTCE (with comparisons to prior quarter and year-ago)
3. Net interest income trends and margin analysis
4. Balance sheet highlights: assets, deposits, loans, capital ratios
5. Business segment performance: CCB, CIB, Commercial Banking, Asset & Wealth Management
6. Credit quality: allowances, charge-offs, NPAs, delinquency trends
7. Operating expenses and efficiency metrics
8. Liquidity and funding position
9. Material risk factors or regulatory developments
10. Strategic initiatives and management outlook

Use precise, professional language. Include key figures and percentages where available, and base all statements strictly on the filing content without adding outside interpretation.

---

10-Q Filing Content:

{text}

---

Analysis:"""
    
    return prompt

## 5. Generate Summaries for All 10-Q Filings

In [33]:
def summarize_10q(file_path, client):
    """
    Generate summary for a single 10-Q file
    
    Args:
        file_path: Path to 10-Q HTML file
        client: OpenAI client
    
    Returns:
        dict: Summary results
    """
    # Extract quarter from filename (e.g., "10Q_2025Q1.html" -> "2025Q1")
    quarter = file_path.stem.split('_')[1]
    
    print(f"\nProcessing {quarter}...")
    
    # Extract text
    full_text = extract_text_from_10q(file_path)
    key_sections = extract_key_sections(full_text)
    
    print(f"  Full text: {len(full_text):,} characters")
    print(f"  Key sections: {len(key_sections):,} characters")
    
    # Build prompt
    prompt = build_summarization_prompt(key_sections, quarter)
    
    # Call GPT-4
    print(f"  Calling GPT-4o...")
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are a financial analyst specializing in banking sector analysis."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.3,
        max_tokens=2000
    )
    
    summary = response.choices[0].message.content
    
    print(f"Summary generated ({len(summary)} characters)")
    
    return {
        'quarter': quarter,
        'file': file_path.name,
        'summary': summary,
        'input_chars': len(key_sections),
        'output_chars': len(summary)
    }

In [34]:
# Generate summaries for all quarters
summaries = []

for file_path in tenq_files:
    try:
        result = summarize_10q(file_path, client)
        summaries.append(result)
    except Exception as e:
        print(f"Error processing {file_path.name}: {e}")

print(f"\n{'='*80}")
print(f"Completed {len(summaries)} out of {len(tenq_files)} summaries")
print(f"{'='*80}")


Processing 2025Q1...

Found 17 sections:
Balance Sheet
Cash Flow
MD&A Full
Executive Summary
Results of Operations
CCB Segment
CIB Segment
CB Segment
AWM Segment
Credit Portfolio
Credit Risk
Credit Reserves
Charge-offs
Capital Management
Capital Ratios
Liquidity
Risk Factors
  Full text: 724,710 characters
  Key sections: 243,992 characters
  Calling GPT-4o...
Summary generated (3329 characters)

Processing 2025Q2...

Found 17 sections:
Balance Sheet
Cash Flow
MD&A Full
Executive Summary
Results of Operations
CCB Segment
CIB Segment
CB Segment
AWM Segment
Credit Portfolio
Credit Risk
Credit Reserves
Charge-offs
Capital Management
Capital Ratios
Liquidity
Risk Factors
  Full text: 822,213 characters
  Key sections: 243,992 characters
  Calling GPT-4o...
Summary generated (3569 characters)

Processing 2025Q3...

Found 17 sections:
Balance Sheet
Cash Flow
MD&A Full
Executive Summary
Results of Operations
CCB Segment
CIB Segment
CB Segment
AWM Segment
Credit Portfolio
Credit Risk
Credit R

## 6. Display Summaries

In [37]:
# Display summaries
for summary in summaries:
    print(f"\n{'='*80}")
    print(f"QUARTER: {summary['quarter']}")
    print(f"{'='*80}")
    print(summary['summary'])
    print(f"\n{'-'*80}")


QUARTER: 2025Q1
### JPMorgan Chase 2025 Q1 Financial Analysis

#### 1. Executive Summary of Quarterly Performance
JPMorgan Chase reported a robust performance for the first quarter of 2025, with net income rising by 9% year-over-year to $14.6 billion. The firm achieved an EPS of $5.07, reflecting a 14% increase, and maintained a strong ROE of 18%. Total net revenue increased by 8% to $45.3 billion, driven by both net interest income and noninterest revenue growth. The results were bolstered by a $588 million gain related to the First Republic acquisition.

#### 2. Core Financials
- **Revenue**: $45.3 billion, up 8% from $41.9 billion in Q1 2024.
- **Net Income**: $14.6 billion, up 9% from $13.4 billion in Q1 2024.
- **Earnings Per Share (EPS)**: $5.07, up 14% from $4.44 in Q1 2024.
- **Return on Equity (ROE)**: 18%, up from 17% in Q1 2024.
- **Return on Tangible Common Equity (ROTCE)**: 21%, consistent with Q1 2024.

#### 3. Net Interest Income Trends and Margin Analysis
Net interest 

## 7. Save Summaries to JSON

In [38]:
# Save to JSON file
output_data = {
    'generated_date': str(pd.Timestamp.now()),
    'summaries': summaries
}

with open(OUTPUT_FILE, 'w', encoding='utf-8') as f:
    json.dump(output_data, f, indent=2, ensure_ascii=False)

print(f"Summaries saved to: {OUTPUT_FILE}")

Summaries saved to: ../Processed Data/10Q_summaries.json
