# Extract Structured Insider Trades (Directors' Dealings) with AI

This notebook demonstrates how to convert unstructured insider trade filings (Directors' Dealings, or `DIRS`) into clean, structured JSON using the FinancialReports API and Google's Gemini Flash AI model.

### The Value Proposition

Clients often need structured data on insider transactions (who bought/sold, what, how much, and at what price). Manually parsing this from raw text filings is difficult and time-consuming, as formats vary.

This workflow provides an automated solution. It uses our API to find the filings and a powerful AI model to read the filing's markdown and return a clean, predictable JSON object, saving clients significant development time.

In [None]:
import os
import json
import pandas as pd
import google.generativeai as genai
from google.generativeai import types
from financial_reports_generated_client import ApiClient, Configuration
from financial_reports_generated_client.api.filings_api import FilingsApi
from dotenv import load_dotenv
import logging

# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Load environment variables from a .env file (if it exists)
load_dotenv()

# Load API keys from environment
FR_API_KEY = os.environ.get("FR_API_KEY")
GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY")

if not FR_API_KEY:
    raise ValueError("FR_API_KEY not found. Please set it as an environment variable.")

if not GEMINI_API_KEY:
    raise ValueError("GEMINI_API_KEY not found. Please set it as an environment variable.")

logger.info("API keys loaded successfully.")

In [None]:
# 1. Configure FinancialReports API Client
config = Configuration(host="https://api.financialreports.eu")
config.api_key['X-API-Key'] = FR_API_KEY
fr_api_client = ApiClient(config)
filings_api = FilingsApi(fr_api_client)

# 2. Configure Google Gemini Client
genai.configure(api_key=GEMINI_API_KEY)

logger.info("API clients configured.")

In [None]:
# --- User-Configurable Parameters ---

# Define the company you want to search for
COMPANY_ISIN = "DE000A1EWWW0"  # Example: adidas AG

# Define the date range for the search
RELEASE_DATE_FROM = "2024-01-01T00:00:00Z"

# Define the Gemini model to use
MODEL_NAME = "models/gemini-flash-latest"

# --------------------------------------

logger.info(f"Parameters set: ISIN={COMPANY_ISIN}, DateFrom={RELEASE_DATE_FROM}")

## Step 1: Find DIRS Filings

First, we use the `/filings/` endpoint via `FilingsApi` to find all filings with the type `DIRS` for our target company.

In [None]:
logger.info(f"Searching for 'DIRS' filings for ISIN {COMPANY_ISIN}...")

try:
    filings_response = filings_api.filings_list(
        company_isin=COMPANY_ISIN,
        type="DIRS",
        release_datetime_from=RELEASE_DATE_FROM,
        page_size=10
    )

    if filings_response and filings_response.results:
        filings_to_process = filings_response.results
        logger.info(f"Found {filings_response.count} total filings. Processing first {len(filings_to_process)}.")
        
        print(f"--- Example Filing Found ---")
        print(f"ID: {filings_to_process[0].id}")
        print(f"Title: {filings_to_process[0].title}")
        print(f"Release Date: {filings_to_process[0].release_datetime}")
        print(f"Company: {filings_to_process[0].company.name}")
        print("----------------------------")
    else:
        filings_to_process = []
        logger.warning("No 'DIRS' filings found matching the criteria.")

except Exception as e:
    logger.error(f"Error fetching filings: {e}")
    filings_to_process = []

## Step 2: Define the Structured Output Schema

We define a `Schema` object that tells the Gemini model exactly what data to extract and what the final JSON structure must look like.

In [None]:
transaction_schema = types.Schema(
    type=types.Type.OBJECT,
    properties={
        "transaction_date": types.Schema(type=types.Type.STRING, description="The date of the transaction (YYYY-MM-DD)"),
        "financial_instrument": types.Schema(type=types.Type.STRING, description="The financial instrument, e.g., 'Shares' or 'Stock Options'"),
        "nature_of_transaction": types.Schema(type=types.Type.STRING, description="The nature of the transaction, e.g., 'Acquisition', 'Disposal' or 'Purchase'"),
        "price": types.Schema(type=types.Type.NUMBER, description="The price per unit of the instrument"),
        "currency": types.Schema(type=types.Type.STRING, description="The currency of the transaction (e.g., 'EUR', 'GBP')"),
        "volume": types.Schema(type=types.Type.NUMBER, description="The number of units transacted"),
        "total_value": types.Schema(type=types.Type.NUMBER, description="The total value of the transaction (Price * Volume)"),
        "venue": types.Schema(type=types.Type.STRING, description="The trading venue, e.g., 'XETRA', 'Outside a trading venue'")
    }
)

reporting_person_schema = types.Schema(
    type=types.Type.OBJECT,
    properties={
        "name": types.Schema(type=types.Type.STRING, description="The name of the reporting person (e.g., 'John Doe')"),
        "position": types.Schema(type=types.Type.STRING, description="The position of the person, e.g., 'CEO', 'Member of the Supervisory Board'")
    }
)

dirs_schema = types.Schema(
    type=types.Type.OBJECT,
    properties={
        "issuer_name": types.Schema(type=types.Type.STRING, description="The name of the company issuing the securities"),
        "issuer_isin": types.Schema(type=types.Type.STRING, description="The ISIN of the issuing company"),
        "reporting_person_details": reporting_person_schema,
        "transactions": types.Schema(
            type=types.Type.ARRAY,
            description="A list of all transactions reported in this filing",
            items=transaction_schema
        )
    }
)

logger.info("Structured output schema defined successfully.")

## Step 3: Create AI Extraction Function

This helper function fetches the markdown content for a given filing ID and uses Gemini to extract structured JSON data.

In [None]:
def extract_structured_data(filing_id: int) -> dict | None:
    """
    Fetches markdown for a filing and uses Gemini to extract structured data.
    """
    logger.info(f"Processing filing_id: {filing_id}...")
    
    try:
        markdown_content = filings_api.filings_markdown_retrieve(id=filing_id)
        if not markdown_content:
            logger.warning(f"No markdown content found for filing_id: {filing_id}")
            return None
    except Exception as e:
        logger.error(f"Error fetching markdown for filing_id {filing_id}: {e}")
        return None

    prompt = f"""
    Extract the directors' dealing (insider trade) information from the following financial filing.
    Provide all monetary values as numbers, not strings.
    Ensure the 'transaction_date' is in 'YYYY-MM-DD' format.
    If multiple transactions are listed, include all of them in the 'transactions' list.

    Filing Content:
    ---
    {markdown_content}
    ---
    """
    
    contents = [types.Part.from_text(prompt)]

    model = genai.GenerativeModel(
        model_name=MODEL_NAME,
        generation_config=types.GenerationConfig(
            response_mime_type="application/json",
            response_schema=dirs_schema,
        )
    )

    try:
        response = model.generate_content(contents)
        json_data = json.loads(response.text)
        logger.info(f"Successfully extracted data for filing_id: {filing_id}")
        return json_data
        
    except Exception as e:
        logger.error(f"Error generating content for filing_id {filing_id}: {e}")
        return None

## Step 4: Execute Workflow & Aggregate Results

We loop through the filings found in Step 1, call the extraction function, and aggregate the results.

In [None]:
all_structured_data = []
processed_count = 0

if not filings_to_process:
    logger.warning("No filings to process. Skipping extraction loop.")
else:
    for filing in filings_to_process:
        if filing.id is None:
            continue
            
        data = extract_structured_data(filing.id)
        
        if data:
            data['filing_id'] = filing.id
            data['filing_title'] = filing.title
            data['filing_release_datetime'] = str(filing.release_datetime)
            all_structured_data.append(data)
            processed_count += 1

logger.info(f"--- Workflow Complete ---")
logger.info(f"Successfully processed {processed_count} out of {len(filings_to_process)} filings.")

if all_structured_data:
    print("\n--- Raw JSON Output (First Result) ---")
    print(json.dumps(all_structured_data[0], indent=2))
else:
    print("\nNo structured data was extracted.")

## Step 5: Analyze and Flatten Data with Pandas

We use `pandas.json_normalize` to flatten the nested JSON into a clean, analysis-ready table.

In [None]:
if not all_structured_data:
    logger.warning("No data to flatten. Skipping pandas DataFrame creation.")
else:
    try:
        meta_keys = [
            'filing_id',
            'filing_title',
            'filing_release_datetime',
            'issuer_name',
            'issuer_isin',
            ['reporting_person_details', 'name'],
            ['reporting_person_details', 'position']
        ]

        df_transactions = pd.json_normalize(
            all_structured_data, 
            record_path=['transactions'], 
            meta=meta_keys,
            errors='ignore'
        )

        df_transactions.rename(columns={
            'reporting_person_details.name': 'person_name',
            'reporting_person_details.position': 'person_position'
        }, inplace=True)

        print("\n--- Flattened DataFrame of All Transactions ---")
        display(df_transactions)

    except Exception as e:
        logger.error(f"Error flattening data with pandas: {e}")
        print(all_structured_data)

## Conclusion

This pipeline provides a scalable way to extract structured insights from unstructured regulatory filings using the FinancialReports API and state-of-the-art AI.