# Deeptech M&A Momentum: Time-Series Aggregation

## Phase 3, Step 3.1: sector-level deal flow time-series

This notebook takes the deeptech M\&A deals (classified in Phase 2) and aggregates the total dollar value of transactions into continuous quarterly time series for each sector. This aggregated data forms the base for our momentum signal.

---

In [18]:
# Imports
from pathlib import Path
import sys

import polars as pl

In [19]:
# File paths
CLASSIFIED_DATA_PATH = Path("../../data/processed/2.2_classified_deals.parquet")

# List of frequencies to test
TEST_FREQUENCIES = ["1mo", "3mo", "6mo"]
print(f"Frequencies to Aggregate: {TEST_FREQUENCIES}")

# --- Utility Function: Ensure Continuity ---
def ensure_continuity(df_volume: pl.DataFrame, freq: str) -> pl.DataFrame:
    """
    Transforms the aggregated Polars DataFrame into a continuous panel dataset,
    filling missing (zero-volume) periods for each sector.
    """
    
    # Identify all unique sectors
    all_sectors = df_volume.get_column("deeptech_sector").unique().to_list()
    
    # 1. Pivot the data to create gaps for missing dates
    df_continuous = df_volume.pivot(
        index="announced_date", 
        on="deeptech_sector", 
        values="total_deal_volume_usd"
    ).fill_null(0).unpivot(
        index="announced_date", 
        on=all_sectors,
        variable_name="deeptech_sector", 
        value_name="total_deal_volume_usd"
    ).sort("deeptech_sector", "announced_date")

    # 2. Merge back the transaction count and fill nulls (which are now zeros)
    df_final_series = df_continuous.join(
        df_volume.select("announced_date", "deeptech_sector", "transaction_count"),
        on=["announced_date", "deeptech_sector"],
        how="left"
    ).fill_null(0) 

    # 3. Final type casting
    df_final_series = df_final_series.with_columns(
        pl.col("transaction_count").cast(pl.Int64)
    )
    
    print(f"  ✓ Continuity ensured for {freq}. Total time-series points: {len(df_final_series):,}")
    return df_final_series


Frequencies to Aggregate: ['1mo', '3mo', '6mo']


### 3.1: Load data and filter for deeptech

In [20]:
# Read the Parquet file and filter for only deeptech sectors
df_classified = pl.read_parquet(CLASSIFIED_DATA_PATH)
initial_count = len(df_classified)

df_deeptech = df_classified.filter(
    (pl.col("deeptech_sector") != "NOISE") & 
    (pl.col("deeptech_sector") != "NON_DEEPTECH")
).select(
    "announced_date", 
    "deal_value_usd", 
    "deeptech_sector"
).with_columns(
    pl.col("announced_date").str.to_date()
)

final_count = len(df_deeptech)
print(f"✓ Deeptech deals retained: {final_count:,} ({(final_count/initial_count)*100:.1f}%)")

✓ Deeptech deals retained: 3,801 (9.8%)


### 3.2 & 3.3: Aggregate, ensure continuity and save (looping frequencies)

In [25]:
for freq in TEST_FREQUENCIES:
    print(f"\n" + "="*50)
    print(f"Aggregating data for frequency: {freq}")
    print("="*50)
    
    # 1. Resample (group) by Sector and Time, summing deal value
    df_volume = df_deeptech.sort("deeptech_sector", "announced_date").group_by_dynamic(
        "announced_date", 
        every=freq,                  
        group_by="deeptech_sector",  
        closed="left",               # Signal for period T uses data from period T
        label="left"                 # Use 'left' to label with the start of the interval
    ).agg(
        pl.sum("deal_value_usd").alias("total_deal_volume_usd"),
        pl.len().alias("transaction_count")
    ).sort("deeptech_sector", "announced_date")
    
    print(f"Intermediate aggregated rows: {len(df_volume):,}")

    # 2. Ensure Continuity (Step 3.3)
    df_final_series = ensure_continuity(df_volume, freq)
    
    # 3. Export the dataset with the frequency in the filename
    OUTPUT_PATH = Path(f"../../data/processed/3.0_sector_volume_{freq}.csv")
    OUTPUT_PATH.parent.mkdir(parents=True, exist_ok=True)
    df_final_series.write_csv(OUTPUT_PATH)
    
    print(f"✓ Final series for {freq} saved to: {OUTPUT_PATH}")
    print(df_final_series.head(3))


Aggregating data for frequency: 1mo
Intermediate aggregated rows: 688
  ✓ Continuity ensured for 1mo. Total time-series points: 728
✓ Final series for 1mo saved to: ..\..\data\processed\3.0_sector_volume_1mo.csv
shape: (3, 4)
┌────────────────┬─────────────────────────────────┬───────────────────────┬───────────────────┐
│ announced_date ┆ deeptech_sector                 ┆ total_deal_volume_usd ┆ transaction_count │
│ ---            ┆ ---                             ┆ ---                   ┆ ---               │
│ date           ┆ str                             ┆ f64                   ┆ i64               │
╞════════════════╪═════════════════════════════════╪═══════════════════════╪═══════════════════╡
│ 2018-01-01     ┆ Advanced Battery Chemistry / S… ┆ 6.3282e8              ┆ 3                 │
│ 2018-02-01     ┆ Advanced Battery Chemistry / S… ┆ 7.2e7                 ┆ 1                 │
│ 2018-03-01     ┆ Advanced Battery Chemistry / S… ┆ 4.9543e9              ┆ 8                