# Financial Datasets Exploration
## HuggingFace Public Datasets for Credit Analysis

This notebook explores public financial datasets available on HuggingFace that can be used for:
- Training and validation of the credit memo generator
- Testing document extraction capabilities
- Benchmarking financial analysis accuracy

In [None]:
# Install required packages
!pip install datasets pandas matplotlib

In [None]:
from datasets import load_dataset
import pandas as pd
import matplotlib.pyplot as plt

## 1. AdaptLLM/finance-tasks
Comprehensive financial tasks dataset

In [None]:
# Load AdaptLLM finance tasks dataset
try:
    finance_tasks = load_dataset("AdaptLLM/finance-tasks")
    print("Dataset loaded successfully!")
    print(f"Available splits: {finance_tasks.keys()}")
    print(f"\nSample record:")
    print(finance_tasks['train'][0])
except Exception as e:
    print(f"Error loading dataset: {e}")

## 2. gbharti/finance-alpaca
Financial instruction-response pairs

In [None]:
# Load finance-alpaca dataset
try:
    finance_alpaca = load_dataset("gbharti/finance-alpaca")
    print("Dataset loaded successfully!")
    print(f"Available splits: {finance_alpaca.keys()}")
    print(f"Total records: {len(finance_alpaca['train'])}")
    print(f"\nSample instruction-output pair:")
    sample = finance_alpaca['train'][0]
    print(f"Instruction: {sample.get('instruction', 'N/A')}")
    print(f"Output: {sample.get('output', 'N/A')[:200]}...")
except Exception as e:
    print(f"Error loading dataset: {e}")

## 3. PatronusAI/financebench
Financial question answering benchmark

In [None]:
# Load FinanceBench dataset
try:
    financebench = load_dataset("PatronusAI/financebench")
    print("Dataset loaded successfully!")
    print(f"Available splits: {financebench.keys()}")
    print(f"\nSample financial question:")
    sample = financebench['train'][0]
    print(f"Question: {sample.get('question', 'N/A')}")
    print(f"Answer: {sample.get('answer', 'N/A')}")
except Exception as e:
    print(f"Error loading dataset: {e}")

## 4. JanosAudran/financial-reports-sec
SEC financial reports dataset

In [None]:
# Load SEC financial reports
try:
    sec_reports = load_dataset("JanosAudran/financial-reports-sec")
    print("Dataset loaded successfully!")
    print(f"Available splits: {sec_reports.keys()}")
    print(f"\nSample SEC report:")
    sample = sec_reports['train'][0]
    for key in sample.keys():
        print(f"{key}: {str(sample[key])[:100]}...")
except Exception as e:
    print(f"Error loading dataset: {e}")

## Data Analysis Example
### Extract and analyze financial metrics from sample data

In [None]:
# Example: Create sample financial data for testing
sample_financial_data = {
    'company': ['Company A', 'Company B', 'Company C', 'Company D', 'Company E'],
    'revenue': [5000000, 8000000, 3000000, 12000000, 6500000],
    'net_income': [400000, 800000, 150000, 1200000, 500000],
    'total_debt': [2000000, 4000000, 1500000, 6000000, 2500000],
    'total_assets': [5000000, 10000000, 4000000, 15000000, 7000000],
    'ebitda': [600000, 1000000, 300000, 1500000, 700000]
}

df = pd.DataFrame(sample_financial_data)

# Calculate key ratios
df['net_income_margin'] = (df['net_income'] / df['revenue'] * 100).round(2)
df['debt_to_ebitda'] = (df['total_debt'] / df['ebitda']).round(2)
df['leverage_ratio'] = (df['total_debt'] / df['total_assets']).round(2)

print("Sample Financial Analysis:")
print(df)

# Visualize ratios
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

df.plot(x='company', y='net_income_margin', kind='bar', ax=axes[0], legend=False, color='#3498db')
axes[0].set_title('Net Income Margin (%)')
axes[0].set_ylabel('Percentage')

df.plot(x='company', y='debt_to_ebitda', kind='bar', ax=axes[1], legend=False, color='#e74c3c')
axes[1].set_title('Debt to EBITDA')
axes[1].set_ylabel('Ratio')

df.plot(x='company', y='leverage_ratio', kind='bar', ax=axes[2], legend=False, color='#27ae60')
axes[2].set_title('Leverage Ratio')
axes[2].set_ylabel('Ratio')

plt.tight_layout()
plt.show()

## Using Datasets for Testing

These datasets can be used to:
1. **Test document extraction**: Use actual financial documents to test LandingAI ADE API
2. **Validate calculations**: Compare calculated ratios against known values
3. **Train/fine-tune**: Use for additional training if needed
4. **Benchmark performance**: Measure accuracy of extraction and analysis

## Next Steps

1. Download sample documents from these datasets
2. Process them through the credit memo generator
3. Validate the accuracy of:
   - Document extraction
   - Financial ratio calculations
   - Credit memo narratives