# FIRE Data Transformation and Ingestion Patterns Analysis

This notebook analyzes the example data files from the FIRE (Financial Instrument and Risk Engine) repository to understand:
1. Common data structures
2. Transformation patterns
3. Data ingestion approaches

First, let's import necessary libraries and load some example files.

# FIRE Data Analysis

This notebook analyzes the example files from the FIRE (Financial Regulatory) data model. We'll explore:

1. Example JSON files and their structure
2. Common data patterns
3. Data transformations and validations
4. Ingestion patterns and best practices

## Setup

In [None]:
import json
import os
import glob
from pathlib import Path
import pandas as pd
import numpy as np
from pprint import pprint

# Configure pandas display
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

# Define paths
FIRE_EXAMPLES_PATH = '/tmp/fire-repo/examples'
examples = glob.glob(os.path.join(FIRE_EXAMPLES_PATH, '*.json'))

## Loading Example Files

Let's look at some example JSON files from the FIRE repository to understand their structure and patterns.

## Analysis of FIRE Examples

FIRE provides a rich set of example files demonstrating various financial instruments and their representation in the data model. Let's analyze these examples to understand:

1. Common data structures
2. Field patterns and relationships
3. Data validation rules
4. Best practices for ingestion

First, let's load and examine some examples:

In [None]:
import glob
import json
import os
from pprint import pprint

# List all example JSON files
example_files = glob.glob('/tmp/fire-repo/examples/*.json')
print(f"Found {len(example_files)} example files")

# Create a dictionary to store example data by type
examples_by_type = {}

# Load each example file and categorize by type
for file_path in example_files:
    with open(file_path, 'r') as f:
        data = json.load(f)
        type_name = data.get('type', 'unknown')
        if type_name not in examples_by_type:
            examples_by_type[type_name] = []
        examples_by_type[type_name].append({
            'file': os.path.basename(file_path),
            'data': data
        })

# Print summary of types
print("\nTypes of financial instruments found:")
for type_name, examples in examples_by_type.items():
    print(f"{type_name}: {len(examples)} examples")

# Load all example files
example_data = {}
for example_file in example_files:
    with open(example_file, 'r') as f:
        name = os.path.basename(example_file).replace('.json', '')
        example_data[name] = json.load(f)

# Analyze common fields
def extract_fields(data, prefix=''):
    fields = []
    if isinstance(data, dict):
        for k, v in data.items():
            full_key = f"{prefix}.{k}" if prefix else k
            if isinstance(v, (dict, list)):
                fields.extend(extract_fields(v, full_key))
            else:
                fields.append(full_key)
    elif isinstance(data, list):
        for item in data:
            fields.extend(extract_fields(item, prefix))
    return fields

# Get common fields across all examples
all_fields = {}
for name, data in example_data.items():
    all_fields[name] = set(extract_fields(data))

# Find fields that appear in most examples
common_fields = set.intersection(*all_fields.values())
print("Common fields across all examples:")
pprint(sorted(common_fields))

## Data Transformation Patterns

From analyzing the examples, we can identify several key data transformation patterns:

1. Nested structure flattening
2. Date/time handling
3. Currency conversions
4. Calculated fields
5. Reference data lookups

Let's implement some common transformations:

In [None]:
# Example transformations for FIRE data

def flatten_fire_record(record):
    """Flatten nested FIRE JSON structures"""
    flat_data = {}
    
    def flatten(data, prefix=''):
        if isinstance(data, dict):
            for k, v in data.items():
                key = f"{prefix}.{k}" if prefix else k
                if isinstance(v, (dict, list)):
                    flatten(v, key)
                else:
                    flat_data[key] = v
        elif isinstance(data, list):
            for i, item in enumerate(data):
                flatten(item, f"{prefix}[{i}]")
    
    flatten(record)
    return flat_data

# Example transformation pipeline
def transform_fire_record(record):
    # 1. Flatten structure
    flat = flatten_fire_record(record)
    
    # 2. Convert dates to datetime
    for k, v in flat.items():
        if isinstance(v, str) and ('date' in k.lower() or 'time' in k.lower()):
            try:
                flat[k] = pd.to_datetime(v)
            except:
                pass
    
    # 3. Currency normalization (example)
    amount_fields = [k for k in flat.keys() if 'amount' in k.lower()]
    for field in amount_fields:
        if isinstance(flat.get(field), (int, float)):
            currency = flat.get(f"{field.rsplit('.', 1)[0]}.currency", 'USD')
            # Here you would apply currency conversion as needed
            
    return flat

# Apply transformations to an example
example_name = next(iter(example_data.keys()))
transformed = transform_fire_record(example_data[example_name])
print(f"Transformed fields from {example_name}:")
pprint({k: v for k, v in transformed.items() if pd.notnull(v)})

## Data Ingestion Patterns

FIRE data can be ingested in several ways:

1. Direct JSON ingestion
2. CSV/Tabular format (after flattening)
3. Database loading
4. Streaming ingestion

Let's implement some common ingestion patterns:

In [None]:
# Example ingestion patterns

# 1. Batch JSON ingestion
def ingest_fire_json_batch(file_paths):
    records = []
    for path in file_paths:
        with open(path, 'r') as f:
            record = json.load(f)
            transformed = transform_fire_record(record)
            records.append(transformed)
    return pd.DataFrame.from_records(records)

# Create DataFrame from all examples
df = ingest_fire_json_batch(examples)
print("\nDataFrame shape:", df.shape)
print("\nColumns:", sorted(df.columns))

# 2. Example CSV export (for systems that prefer tabular format)
csv_path = 'fire_data.csv'
df.to_csv(csv_path, index=False)
print(f"\nExported to {csv_path}")

# 3. Database ingestion example (using SQLite for demonstration)
import sqlite3
from datetime import datetime

def create_fire_table(conn):
    """Create a table with common FIRE fields"""
    conn.execute('''
    CREATE TABLE IF NOT EXISTS fire_data (
        id TEXT PRIMARY KEY,
        type TEXT,
        date_created TIMESTAMP,
        currency TEXT,
        amount REAL,
        raw_data JSON
    )
    ''')

def ingest_to_db(records, db_path=':memory:'):
    """Ingest FIRE records to SQLite database"""
    conn = sqlite3.connect(db_path)
    create_fire_table(conn)
    
    for record in records:
        # Extract key fields
        data = {
            'id': record.get('id', str(hash(str(record)))),
            'type': record.get('type'),
            'date_created': datetime.now().isoformat(),
            'currency': record.get('currency'),
            'amount': record.get('amount'),
            'raw_data': json.dumps(record)
        }
        
        # Insert into database
        placeholders = ', '.join(['?' for _ in data])
        cols = ', '.join(data.keys())
        sql = f'INSERT INTO fire_data ({cols}) VALUES ({placeholders})'
        conn.execute(sql, list(data.values()))
    
    conn.commit()
    return conn

# Example database ingestion
conn = ingest_to_db(example_data.values())
cursor = conn.execute('SELECT * FROM fire_data LIMIT 5')
print("\nSample database records:")
for row in cursor:
    print(row)

## Summary and Best Practices

From our analysis of FIRE examples, we can conclude:

1. Data Structure Patterns:
   - Hierarchical JSON structure with consistent top-level fields
   - Common fields across all records (id, type, date_created, etc.)
   - Nested objects for complex financial instruments
   - Arrays for multiple related items

2. Data Transformation Best Practices:
   - Flatten nested structures for easier processing
   - Convert dates to proper datetime objects
   - Normalize currencies and amounts
   - Preserve raw data alongside transformed data
   - Validate against JSON schemas

3. Data Ingestion Recommendations:
   - Use batch processing for large datasets
   - Maintain both raw and transformed data
   - Index key fields for efficient querying
   - Implement proper error handling and validation
   - Consider both SQL and NoSQL storage options

4. Implementation Tips:
   - Use pandas for data manipulation
   - Implement proper data validation
   - Handle currency conversions carefully
   - Maintain data lineage
   - Log all transformations