# User SEC Filing Upload and Comparative Analysis

This notebook demonstrates how to:
1. Upload your own SEC filing (PDF or TXT)
2. Index it alongside existing company filings
3. Ask comparative questions
4. Compare metrics across companies

## Prerequisites
- Run `GENAI_PROJECT_CHROMADB.ipynb` first to set up the RAG system
- Have your SEC filing ready (PDF or TXT format)

## Setup and Imports

In [None]:
# Install required packages if needed
!pip install PyPDF2 scikit-learn -q

In [None]:
import sys
import os
from pathlib import Path

# Import the user filing module
from user_filing_upload import UserFilingManager, ComparativeAnalyzer

# Import existing RAG implementation
# Note: Make sure you've run GENAI_PROJECT_CHROMADB.ipynb first
# or import the FinBERTFinancialRAG class from there

## Load Existing RAG Instance

If you've already run the main notebook and saved the RAG instance, load it here.
Otherwise, create a new instance.

In [None]:
# Option 1: If you have the RAG instance from the main notebook, use it
# Assuming 'rag' variable exists from GENAI_PROJECT_CHROMADB.ipynb

# Option 2: Create a new instance (if starting fresh)
# Uncomment the following if you need to create a new instance:

"""
from sentence_transformers import SentenceTransformer
import chromadb
import torch

class FinBERTFinancialRAG:
    def __init__(self, persist_directory="~/FinancialAI/chromadb"):
        self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
        print(f"Using device: {self.device}")
        
        # Load FinBERT embedder
        self.embedder = SentenceTransformer('yiyanghkust/finbert-tone')
        self.embedder.to(self.device)
        
        # Initialize ChromaDB
        persist_directory = os.path.expanduser(persist_directory)
        self.chroma_client = chromadb.PersistentClient(path=persist_directory)
        
        # Get or create collection
        try:
            self.collection = self.chroma_client.get_collection(name="financial_filings")
            print(f"Loaded existing collection with {self.collection.count()} documents")
        except:
            self.collection = self.chroma_client.create_collection(
                name="financial_filings",
                metadata={"description": "Financial SEC filings with user uploads"}
            )
            print("Created new collection")

# Create RAG instance
rag = FinBERTFinancialRAG()
"""

# For this demo, we'll assume 'rag' exists from the main notebook
print(f"RAG instance ready with {rag.collection.count()} documents in database")

## Initialize User Filing Manager and Comparative Analyzer

In [None]:
# Create manager instances
uploader = UserFilingManager(rag)
analyzer = ComparativeAnalyzer(rag)

print("‚úÖ Managers initialized successfully!")

## Step 1: Upload Your SEC Filing

Upload your own company's SEC filing. Supported formats:
- PDF files (.pdf)
- Text files (.txt)

In [None]:
# Example: Upload a 10-K filing
# Replace these values with your actual filing information

upload_result = uploader.upload_user_filing(
    file_path="/path/to/your/filing.pdf",  # ‚¨ÖÔ∏è CHANGE THIS to your file path
    company_name="My Company Inc",          # ‚¨ÖÔ∏è CHANGE THIS to your company name
    filing_type="10-K",                     # Filing type (10-K, 10-Q, 8-K, etc.)
    fiscal_year="2023",                     # Fiscal year
    section_name="Full Filing",             # Section name (or "MD&A", "Business", etc.)
    cik="USER_MYCOMPANY"                    # Optional: Custom identifier
)

# Display upload summary
print("\n" + "="*60)
print("üìä UPLOAD SUMMARY")
print("="*60)
for key, value in upload_result.items():
    print(f"{key:20s}: {value}")

### Upload Multiple Sections (Optional)

If you have different sections of your filing in separate files:

In [None]:
# Example: Upload different sections separately
sections = [
    {
        "file_path": "/path/to/business_section.pdf",
        "section_name": "Business Description"
    },
    {
        "file_path": "/path/to/mda_section.pdf",
        "section_name": "MD&A"
    },
    {
        "file_path": "/path/to/financial_statements.pdf",
        "section_name": "Financial Statements"
    }
]

company_info = {
    "company_name": "My Company Inc",
    "filing_type": "10-K",
    "fiscal_year": "2023",
    "cik": "USER_MYCOMPANY"
}

# Upload each section
for section in sections:
    result = uploader.upload_user_filing(
        file_path=section["file_path"],
        section_name=section["section_name"],
        **company_info
    )
    print(f"‚úÖ Uploaded {section['section_name']}: {result['chunks_created']} chunks")

## Step 2: View All Uploaded Companies

In [None]:
# Get list of user-uploaded companies
uploaded_companies = uploader.get_user_uploaded_companies()

print("\n" + "="*60)
print(f"üìÅ USER-UPLOADED COMPANIES ({len(uploaded_companies)} total)")
print("="*60)

for company in uploaded_companies:
    print(f"\nüè¢ {company['company_name']}")
    print(f"   CIK: {company['cik']}")
    print(f"   Filing: {company['filing_type']} - {company['fiscal_year']}")
    print(f"   Uploaded: {company['upload_timestamp']}")

## Step 3: Ask Comparative Questions

Now you can ask questions that compare your filing with other companies in the database.

### Example 1: Revenue Growth Comparison

In [None]:
result = analyzer.ask_comparative_question(
    question="How does the revenue growth rate compare across these companies? What are the key revenue drivers?",
    user_company="My Company Inc",  # ‚¨ÖÔ∏è Your uploaded company
    comparison_companies=[          # ‚¨ÖÔ∏è Companies to compare with
        "APPLE INC",
        "MICROSOFT CORP",
        "ALPHABET INC"
    ],
    top_k=5,
    use_hybrid=True
)

print("\n" + "="*60)
print("üí° COMPARATIVE ANALYSIS: Revenue Growth")
print("="*60)
print(f"\nQuestion: {result['question']}")
print(f"\n{result['answer']}")

### Example 2: Risk Factor Comparison

In [None]:
result = analyzer.ask_comparative_question(
    question="What are the main risk factors disclosed by each company? How do they differ?",
    user_company="My Company Inc",
    comparison_companies=["APPLE INC", "MICROSOFT CORP"],
    top_k=5
)

print("\n" + "="*60)
print("‚ö†Ô∏è COMPARATIVE ANALYSIS: Risk Factors")
print("="*60)
print(f"\n{result['answer']}")

### Example 3: Business Strategy Comparison

In [None]:
result = analyzer.ask_comparative_question(
    question="What are the key business strategies and competitive advantages mentioned by each company?",
    user_company="My Company Inc",
    comparison_companies=["APPLE INC", "ALPHABET INC"],
    top_k=5
)

print("\n" + "="*60)
print("üéØ COMPARATIVE ANALYSIS: Business Strategy")
print("="*60)
print(f"\n{result['answer']}")

### Example 4: Compare with All Companies (Open Comparison)

In [None]:
# Don't specify comparison_companies to compare with all relevant companies in database
result = analyzer.ask_comparative_question(
    question="How does the R&D spending as a percentage of revenue compare across companies?",
    user_company="My Company Inc",
    comparison_companies=None,  # Will retrieve from all companies
    top_k=10
)

print("\n" + "="*60)
print("üî¨ COMPARATIVE ANALYSIS: R&D Spending (vs All Companies)")
print("="*60)
print(f"\n{result['answer']}")

## Step 4: Compare Specific Metrics

Use built-in metric comparisons for common financial metrics.

In [None]:
# Available metrics: 'revenue', 'profit', 'risk', 'assets', 'debt', 'cashflow'
metrics_to_compare = ['revenue', 'profit', 'risk', 'debt']

comparison_companies = ["APPLE INC", "MICROSOFT CORP"]

for metric in metrics_to_compare:
    print("\n" + "="*60)
    print(f"üìä METRIC COMPARISON: {metric.upper()}")
    print("="*60)
    
    result = analyzer.compare_metrics(
        user_company="My Company Inc",
        comparison_companies=comparison_companies,
        metric_type=metric
    )
    
    print(f"\n{result['answer']}")
    print("\n" + "-"*60)

## Step 5: Custom Comparative Queries

Ask your own custom questions for comparison.

In [None]:
# Define your custom questions
custom_questions = [
    "How do the companies describe their market position and competitive landscape?",
    "What are the key investments and capital expenditures mentioned?",
    "How do the companies discuss their environmental and sustainability initiatives?",
    "What technology trends and innovations are highlighted by each company?",
    "How do the companies describe their customer base and market segments?"
]

user_company = "My Company Inc"  # ‚¨ÖÔ∏è Your company
comparison_companies = ["APPLE INC", "MICROSOFT CORP"]  # ‚¨ÖÔ∏è Companies to compare

for question in custom_questions:
    print("\n" + "="*60)
    print(f"‚ùì {question}")
    print("="*60)
    
    result = analyzer.ask_comparative_question(
        question=question,
        user_company=user_company,
        comparison_companies=comparison_companies,
        top_k=5
    )
    
    print(f"\n{result['answer']}")
    print("\n")

## Step 6: View Supporting Evidence

Examine the specific chunks retrieved for comparison.

In [None]:
# Ask a question and examine the evidence
result = analyzer.ask_comparative_question(
    question="What are the main sources of revenue for each company?",
    user_company="My Company Inc",
    comparison_companies=["APPLE INC"],
    top_k=3
)

print("\n" + "="*60)
print("üìö SUPPORTING EVIDENCE")
print("="*60)

# Show user company evidence
print(f"\nüè¢ Evidence from {result['user_company']}:")
print("-"*60)
for i, (chunk, metadata) in enumerate(zip(
    result['user_context']['chunks'][:3],
    result['user_context']['metadatas'][:3]
)):
    print(f"\nChunk {i+1}:")
    print(f"Section: {metadata.get('section')}")
    print(f"Year: {metadata.get('year')}")
    print(f"Text: {chunk[:200]}...")

# Show comparison company evidence
for company, context in result['comparison_contexts'].items():
    print(f"\n\nüè¢ Evidence from {company}:")
    print("-"*60)
    for i, (chunk, metadata) in enumerate(zip(
        context['chunks'][:3],
        context['metadatas'][:3]
    )):
        print(f"\nChunk {i+1}:")
        print(f"Section: {metadata.get('section')}")
        print(f"Year: {metadata.get('year')}")
        print(f"Text: {chunk[:200]}...")

## Step 7: Delete User Upload (If Needed)

Remove a user-uploaded filing from the database.

In [None]:
# Delete a specific user upload by CIK
# Uncomment to execute:

# uploader.delete_user_filing(cik="USER_MYCOMPANY")

## Advanced: Batch Upload Multiple Filings

In [None]:
# Upload multiple years of filings for the same company
filings = [
    {
        "file_path": "/path/to/2023_10k.pdf",
        "fiscal_year": "2023"
    },
    {
        "file_path": "/path/to/2022_10k.pdf",
        "fiscal_year": "2022"
    },
    {
        "file_path": "/path/to/2021_10k.pdf",
        "fiscal_year": "2021"
    }
]

company_info = {
    "company_name": "My Company Inc",
    "filing_type": "10-K",
    "section_name": "Full Filing",
    "cik": "USER_MYCOMPANY"
}

for filing in filings:
    try:
        result = uploader.upload_user_filing(
            file_path=filing["file_path"],
            fiscal_year=filing["fiscal_year"],
            **company_info
        )
        print(f"‚úÖ Uploaded {filing['fiscal_year']}: {result['chunks_created']} chunks")
    except Exception as e:
        print(f"‚ùå Error uploading {filing['fiscal_year']}: {str(e)}")

## Summary Statistics

In [None]:
# Get overall database statistics
total_docs = rag.collection.count()
user_uploads = uploader.get_user_uploaded_companies()

print("\n" + "="*60)
print("üìä DATABASE STATISTICS")
print("="*60)
print(f"\nTotal documents in database: {total_docs}")
print(f"User-uploaded companies: {len(user_uploads)}")
print(f"\nUser uploads:")
for company in user_uploads:
    print(f"  - {company['company_name']} ({company['fiscal_year']})")