
# DAIOE SSYK2012 Workflow

Organized end-to-end flow for preparing DAIOE + SCB SSYK data.


## 1) Setup Paths and Source URLs

Define local paths and the remote parquet sources used in this notebook.


In [87]:
import polars as pl
from pathlib import Path
from fcts import inspect_lazy

root = Path.cwd().resolve()
data_dir = root/ "data"
data_dir.mkdir(parents=True, exist_ok=True)

daioe_source: str = (
    "https://raw.githubusercontent.com/joseph-data/AI_Econ_daioe_years/development/"
    "data/daioe_scb_years_all_levels.parquet"
)

scb_source: str = (
        "https://raw.githubusercontent.com/joseph-data/AI_Econ_daioe_months/daioe_pull/"
        "data/scb_months.parquet"

)

In [88]:
daioe_lf = pl.scan_parquet(
    daioe_source
)

scb_lf = pl.scan_parquet(
    scb_source
)

## 2) Obtain Raw DAIOE and SCB Tables

Load both inputs as Polars `LazyFrame`s so transformation steps stay lazy and memory-safe.


## 3) Quick Validation of Raw Inputs

Preview the first rows from each source before applying cleaning rules.


In [90]:
print(daioe_lf.head(5).collect())

shape: (5, 65)
┌───────┬──────────┬───────┬───────┬───┬─────────┬─────────┬─────────┬─────────┐
│ level ┆ ssyk_cod ┆ age   ┆ sex   ┆ … ┆ daioe_l ┆ daioe_t ┆ daioe_s ┆ daioe_g │
│ ---   ┆ e        ┆ ---   ┆ ---   ┆   ┆ ngmod_L ┆ ranslat ┆ peechre ┆ enai_Le │
│ str   ┆ ---      ┆ str   ┆ str   ┆   ┆ evel_Ex ┆ _Level_ ┆ c_Level ┆ vel_Exp │
│       ┆ str      ┆       ┆       ┆   ┆ posure  ┆ Exposur ┆ _Exposu ┆ osure   │
│       ┆          ┆       ┆       ┆   ┆ ---     ┆ e       ┆ re      ┆ ---     │
│       ┆          ┆       ┆       ┆   ┆ i8      ┆ ---     ┆ ---     ┆ i8      │
│       ┆          ┆       ┆       ┆   ┆         ┆ i8      ┆ i8      ┆         │
╞═══════╪══════════╪═══════╪═══════╪═══╪═════════╪═════════╪═════════╪═════════╡
│ SSYK3 ┆ 333      ┆ 16-24 ┆ women ┆ … ┆ 5       ┆ 5       ┆ 5       ┆ 4       │
│ SSYK3 ┆ 442      ┆ 16-24 ┆ men   ┆ … ┆ 2       ┆ 2       ┆ 2       ┆ 1       │
│ SSYK3 ┆ 131      ┆ 40-44 ┆ women ┆ … ┆ 5       ┆ 5       ┆ 4       ┆ 5       │
│ SSYK3 ┆ 262

In [91]:
print(scb_lf.head(5).collect())

shape: (5, 5)
┌────────┬─────┬──────────┬───────┬────────────┐
│ code_1 ┆ sex ┆ month    ┆ value ┆ occupation │
│ ---    ┆ --- ┆ ---      ┆ ---   ┆ ---        │
│ str    ┆ str ┆ str      ┆ f64   ┆ str        │
╞════════╪═════╪══════════╪═══════╪════════════╡
│ 1      ┆ men ┆ 2015-Jan ┆ 169.8 ┆ Managers   │
│ 1      ┆ men ┆ 2015-Feb ┆ 164.8 ┆ Managers   │
│ 1      ┆ men ┆ 2015-Mar ┆ 156.2 ┆ Managers   │
│ 1      ┆ men ┆ 2015-Apr ┆ 171.5 ┆ Managers   │
│ 1      ┆ men ┆ 2015-May ┆ 177.8 ┆ Managers   │
└────────┴─────┴──────────┴───────┴────────────┘


## 4) Clean SCB Monthly Data

Remove military records and extract `year` from `month` for downstream joins.


In [92]:
scb_lf_clean = scb_lf\
    .filter(pl.col("code_1").str.starts_with(0).not_())\
        .with_columns(
    pl.col("month")
      .str.extract(r"^(\d{4})", 1)
      .cast(pl.Int64)
      .alias("year")
)


print(scb_lf_clean.limit(10).collect())

shape: (10, 6)
┌────────┬─────┬──────────┬───────┬────────────┬──────┐
│ code_1 ┆ sex ┆ month    ┆ value ┆ occupation ┆ year │
│ ---    ┆ --- ┆ ---      ┆ ---   ┆ ---        ┆ ---  │
│ str    ┆ str ┆ str      ┆ f64   ┆ str        ┆ i64  │
╞════════╪═════╪══════════╪═══════╪════════════╪══════╡
│ 1      ┆ men ┆ 2015-Jan ┆ 169.8 ┆ Managers   ┆ 2015 │
│ 1      ┆ men ┆ 2015-Feb ┆ 164.8 ┆ Managers   ┆ 2015 │
│ 1      ┆ men ┆ 2015-Mar ┆ 156.2 ┆ Managers   ┆ 2015 │
│ 1      ┆ men ┆ 2015-Apr ┆ 171.5 ┆ Managers   ┆ 2015 │
│ 1      ┆ men ┆ 2015-May ┆ 177.8 ┆ Managers   ┆ 2015 │
│ 1      ┆ men ┆ 2015-Jun ┆ 151.0 ┆ Managers   ┆ 2015 │
│ 1      ┆ men ┆ 2015-Jul ┆ 174.6 ┆ Managers   ┆ 2015 │
│ 1      ┆ men ┆ 2015-Aug ┆ 174.4 ┆ Managers   ┆ 2015 │
│ 1      ┆ men ┆ 2015-Sep ┆ 157.0 ┆ Managers   ┆ 2015 │
│ 1      ┆ men ┆ 2015-Oct ┆ 189.5 ┆ Managers   ┆ 2015 │
└────────┴─────┴──────────┴───────┴────────────┴──────┘


## 5) Prepare DAIOE Level-1 Aggregates

Filter DAIOE to SSYK level 1 and compute yearly weighted means for merge keys.


In [93]:
weighted_daioe = daioe_lf\
    .filter(
        (pl.col("level") == "SSYK1")
        )\
        .select(
            pl.col(["level", "ssyk_code", "year", "weight_sum"]),
            pl.col("^daioe_.*$"),
            pl.col("^pctl_daioe_.*$")
            )\
            .group_by(["level", "ssyk_code", "year"])\
                .agg([
                    pl.col("weight_sum").mean().cast(pl.Int64),
                    pl.col("^daioe_.*$").mean(),
                    pl.col("^pctl_daioe_.*$").mean()
                    ])

weighted_daioe.limit(10).collect()

level,ssyk_code,year,weight_sum,daioe_allapps_avg,daioe_stratgames_avg,daioe_videogames_avg,daioe_imgrec_avg,daioe_imgcompr_avg,daioe_imggen_avg,daioe_readcompr_avg,daioe_lngmod_avg,daioe_translat_avg,daioe_speechrec_avg,daioe_genai_avg,daioe_allapps_wavg,daioe_stratgames_wavg,daioe_videogames_wavg,daioe_imgrec_wavg,daioe_imgcompr_wavg,daioe_imggen_wavg,daioe_readcompr_wavg,daioe_lngmod_wavg,daioe_translat_wavg,daioe_speechrec_wavg,daioe_genai_wavg,daioe_allapps_Level_Exposure,daioe_stratgames_Level_Exposure,daioe_videogames_Level_Exposure,daioe_imgrec_Level_Exposure,daioe_imgcompr_Level_Exposure,daioe_imggen_Level_Exposure,daioe_readcompr_Level_Exposure,daioe_lngmod_Level_Exposure,daioe_translat_Level_Exposure,daioe_speechrec_Level_Exposure,daioe_genai_Level_Exposure,pctl_daioe_allapps_avg,pctl_daioe_stratgames_avg,pctl_daioe_videogames_avg,pctl_daioe_imgrec_avg,pctl_daioe_imgcompr_avg,pctl_daioe_imggen_avg,pctl_daioe_readcompr_avg,pctl_daioe_lngmod_avg,pctl_daioe_translat_avg,pctl_daioe_speechrec_avg,pctl_daioe_genai_avg,pctl_daioe_allapps_wavg,pctl_daioe_stratgames_wavg,pctl_daioe_videogames_wavg,pctl_daioe_imgrec_wavg,pctl_daioe_imgcompr_wavg,pctl_daioe_imggen_wavg,pctl_daioe_readcompr_wavg,pctl_daioe_lngmod_wavg,pctl_daioe_translat_wavg,pctl_daioe_speechrec_wavg,pctl_daioe_genai_wavg
str,str,i64,i64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
"""SSYK1""","""9""",2019,252625,19.367231,0.221288,5.173897,0.238948,0.053375,0.297457,0.105496,0.074861,0.019999,0.228594,0.596149,19.063804,0.217868,5.340586,0.235084,0.05129,0.277531,0.094958,0.067381,0.017449,0.204716,0.550909,1.0,1.0,4.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,75.0,0.0,0.0,0.0,25.0,37.5,37.5,37.5,0.0,0.0,0.0,75.0,0.0,0.0,0.0,0.0,0.0,0.0,12.5,0.0
"""SSYK1""","""5""",2019,991909,19.78637,0.22749,4.440538,0.239493,0.058038,0.336468,0.141836,0.100927,0.027176,0.300289,0.714753,19.414305,0.225751,4.337983,0.235569,0.057113,0.335116,0.140069,0.09952,0.02619,0.28826,0.710157,1.0,1.0,2.0,1.0,2.0,2.0,3.0,3.0,3.0,3.0,3.0,25.0,12.5,25.0,12.5,50.0,37.5,50.0,50.0,50.0,50.0,50.0,12.5,12.5,25.0,12.5,37.5,37.5,50.0,50.0,50.0,50.0,50.0
"""SSYK1""","""9""",2021,239382,22.765384,0.22139,5.211372,0.345287,0.112233,0.459916,0.112948,0.220988,0.023036,0.271673,1.186927,22.277239,0.218067,5.379357,0.339883,0.10794,0.429242,0.101903,0.199146,0.020099,0.243012,1.094453,1.0,1.0,4.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,75.0,0.0,0.0,0.0,25.0,37.5,37.5,37.5,0.0,0.0,0.0,75.0,0.0,0.0,0.0,0.0,0.0,0.0,12.5,0.0
"""SSYK1""","""1""",2018,301386,16.846869,0.271451,3.076886,0.221797,0.068483,0.415952,0.159002,0.101476,0.019201,0.304804,0.818419,17.22406,0.278692,3.122184,0.228654,0.070475,0.42992,0.163588,0.104068,0.019581,0.309932,0.843843,4.0,4.0,1.0,4.0,4.0,5.0,4.0,4.0,4.0,4.0,4.0,62.5,62.5,0.0,62.5,62.5,87.5,75.0,75.0,75.0,75.0,75.0,62.5,62.5,0.0,62.5,62.5,87.5,75.0,75.0,75.0,75.0,75.0
"""SSYK1""","""2""",2016,1023618,15.143314,0.298189,3.001082,0.205392,0.053601,0.381273,0.16149,0.043165,0.010177,0.280166,0.616269,15.26716,0.301984,2.965258,0.205971,0.054054,0.388155,0.164597,0.044397,0.010454,0.290144,0.629137,5.0,5.0,1.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,87.5,100.0,12.5,87.5,87.5,100.0,100.0,87.5,87.5,87.5,100.0,87.5,87.5,12.5,87.5,87.5,100.0,100.0,87.5,87.5,87.5,100.0
"""SSYK1""","""8""",2019,306370,20.609594,0.246965,5.790529,0.26018,0.055627,0.320058,0.099023,0.066972,0.017682,0.207439,0.60773,20.696794,0.244047,5.704984,0.270729,0.057975,0.332547,0.101457,0.068553,0.018357,0.211753,0.628504,3.0,2.0,5.0,3.0,3.0,2.0,1.0,1.0,2.0,2.0,2.0,37.5,37.5,100.0,50.0,25.0,25.0,0.0,0.0,0.0,12.5,12.5,50.0,37.5,100.0,50.0,50.0,25.0,12.5,12.5,25.0,25.0,25.0
"""SSYK1""","""5""",2016,963164,12.509228,0.225797,3.062187,0.154527,0.038962,0.255087,0.097529,0.027708,0.007027,0.219953,0.408247,12.258316,0.22381,2.988328,0.151432,0.038244,0.253154,0.096333,0.027304,0.006777,0.210959,0.404805,1.0,1.0,2.0,1.0,2.0,2.0,3.0,3.0,3.0,3.0,3.0,12.5,12.5,25.0,12.5,50.0,37.5,50.0,50.0,50.0,50.0,50.0,12.5,12.5,25.0,0.0,37.5,37.5,50.0,50.0,50.0,50.0,50.0
"""SSYK1""","""3""",2021,599846,28.364397,0.285361,4.780394,0.445791,0.156424,0.699146,0.206439,0.387446,0.038774,0.412113,1.916286,28.744854,0.289345,4.657259,0.454136,0.160967,0.712325,0.217183,0.40925,0.041123,0.430847,1.982925,4.0,4.0,2.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,75.0,75.0,50.0,75.0,75.0,62.5,62.5,62.5,62.5,62.5,62.5,75.0,75.0,37.5,75.0,75.0,62.5,62.5,62.5,62.5,62.5,62.5
"""SSYK1""","""6""",2019,34161,19.577233,0.231927,5.15114,0.251148,0.055439,0.316091,0.106882,0.073424,0.019383,0.220745,0.618947,19.814406,0.23235,5.231998,0.251659,0.05599,0.31934,0.108886,0.074444,0.019868,0.224309,0.626073,2.0,2.0,4.0,2.0,2.0,1.0,2.0,2.0,2.0,2.0,1.0,12.5,25.0,62.5,25.0,12.5,12.5,37.5,25.0,25.0,25.0,25.0,25.0,25.0,62.5,25.0,25.0,12.5,37.5,37.5,37.5,37.5,12.5
"""SSYK1""","""5""",2015,939548,8.299752,0.181875,2.84279,0.151194,,0.019695,0.003724,0.01667,0.00248,0.19421,0.060471,8.106642,0.179994,2.774004,0.147831,0.0,0.019486,0.003666,0.016365,0.002383,0.185621,0.059662,1.0,1.0,2.0,1.0,3.0,2.0,3.0,3.0,3.0,3.0,3.0,0.0,12.5,25.0,12.5,,37.5,50.0,50.0,50.0,50.0,50.0,0.0,12.5,25.0,0.0,50.0,25.0,50.0,50.0,50.0,50.0,50.0


In [94]:
inspect_lazy(scb_lf_clean)

Rows: 2,376
Columns: 6


In [95]:
inspect_lazy(weighted_daioe)

Rows: 99
Columns: 59


## 6) Merge Cleaned SCB Data with DAIOE Weights

Join the cleaned SCB monthly panel with the DAIOE level-1 yearly metrics.


In [96]:
scb_months_lf =scb_lf_clean\
                .join(
                    weighted_daioe,
                    left_on=["code_1", "year"],
                    right_on=["ssyk_code", "year"],
                    how = "left"
                )\
                    .drop("level")
                    
scb_months_lf.limit(10).collect()

code_1,sex,month,value,occupation,year,weight_sum,daioe_allapps_avg,daioe_stratgames_avg,daioe_videogames_avg,daioe_imgrec_avg,daioe_imgcompr_avg,daioe_imggen_avg,daioe_readcompr_avg,daioe_lngmod_avg,daioe_translat_avg,daioe_speechrec_avg,daioe_genai_avg,daioe_allapps_wavg,daioe_stratgames_wavg,daioe_videogames_wavg,daioe_imgrec_wavg,daioe_imgcompr_wavg,daioe_imggen_wavg,daioe_readcompr_wavg,daioe_lngmod_wavg,daioe_translat_wavg,daioe_speechrec_wavg,daioe_genai_wavg,daioe_allapps_Level_Exposure,daioe_stratgames_Level_Exposure,daioe_videogames_Level_Exposure,daioe_imgrec_Level_Exposure,daioe_imgcompr_Level_Exposure,daioe_imggen_Level_Exposure,daioe_readcompr_Level_Exposure,daioe_lngmod_Level_Exposure,daioe_translat_Level_Exposure,daioe_speechrec_Level_Exposure,daioe_genai_Level_Exposure,pctl_daioe_allapps_avg,pctl_daioe_stratgames_avg,pctl_daioe_videogames_avg,pctl_daioe_imgrec_avg,pctl_daioe_imgcompr_avg,pctl_daioe_imggen_avg,pctl_daioe_readcompr_avg,pctl_daioe_lngmod_avg,pctl_daioe_translat_avg,pctl_daioe_speechrec_avg,pctl_daioe_genai_avg,pctl_daioe_allapps_wavg,pctl_daioe_stratgames_wavg,pctl_daioe_videogames_wavg,pctl_daioe_imgrec_wavg,pctl_daioe_imgcompr_wavg,pctl_daioe_imggen_wavg,pctl_daioe_readcompr_wavg,pctl_daioe_lngmod_wavg,pctl_daioe_translat_wavg,pctl_daioe_speechrec_wavg,pctl_daioe_genai_wavg
str,str,str,f64,str,i64,i64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
"""1""","""men""","""2015-Jan""",169.8,"""Managers""",2015,276948,8.615596,0.217084,2.477278,0.18398,,0.027156,0.005623,0.024023,0.003389,0.234964,0.085313,8.786716,0.223022,2.511899,0.189463,0.0,0.02809,0.005792,0.024659,0.003458,0.23886,0.087917,2.0,4.0,1.0,4.0,3.0,5.0,4.0,4.0,4.0,4.0,4.0,37.5,62.5,0.0,62.5,,87.5,75.0,75.0,75.0,75.0,75.0,37.5,62.5,0.0,62.5,50.0,87.5,75.0,75.0,75.0,75.0,75.0
"""1""","""men""","""2015-Feb""",164.8,"""Managers""",2015,276948,8.615596,0.217084,2.477278,0.18398,,0.027156,0.005623,0.024023,0.003389,0.234964,0.085313,8.786716,0.223022,2.511899,0.189463,0.0,0.02809,0.005792,0.024659,0.003458,0.23886,0.087917,2.0,4.0,1.0,4.0,3.0,5.0,4.0,4.0,4.0,4.0,4.0,37.5,62.5,0.0,62.5,,87.5,75.0,75.0,75.0,75.0,75.0,37.5,62.5,0.0,62.5,50.0,87.5,75.0,75.0,75.0,75.0,75.0
"""1""","""men""","""2015-Mar""",156.2,"""Managers""",2015,276948,8.615596,0.217084,2.477278,0.18398,,0.027156,0.005623,0.024023,0.003389,0.234964,0.085313,8.786716,0.223022,2.511899,0.189463,0.0,0.02809,0.005792,0.024659,0.003458,0.23886,0.087917,2.0,4.0,1.0,4.0,3.0,5.0,4.0,4.0,4.0,4.0,4.0,37.5,62.5,0.0,62.5,,87.5,75.0,75.0,75.0,75.0,75.0,37.5,62.5,0.0,62.5,50.0,87.5,75.0,75.0,75.0,75.0,75.0
"""1""","""men""","""2015-Apr""",171.5,"""Managers""",2015,276948,8.615596,0.217084,2.477278,0.18398,,0.027156,0.005623,0.024023,0.003389,0.234964,0.085313,8.786716,0.223022,2.511899,0.189463,0.0,0.02809,0.005792,0.024659,0.003458,0.23886,0.087917,2.0,4.0,1.0,4.0,3.0,5.0,4.0,4.0,4.0,4.0,4.0,37.5,62.5,0.0,62.5,,87.5,75.0,75.0,75.0,75.0,75.0,37.5,62.5,0.0,62.5,50.0,87.5,75.0,75.0,75.0,75.0,75.0
"""1""","""men""","""2015-May""",177.8,"""Managers""",2015,276948,8.615596,0.217084,2.477278,0.18398,,0.027156,0.005623,0.024023,0.003389,0.234964,0.085313,8.786716,0.223022,2.511899,0.189463,0.0,0.02809,0.005792,0.024659,0.003458,0.23886,0.087917,2.0,4.0,1.0,4.0,3.0,5.0,4.0,4.0,4.0,4.0,4.0,37.5,62.5,0.0,62.5,,87.5,75.0,75.0,75.0,75.0,75.0,37.5,62.5,0.0,62.5,50.0,87.5,75.0,75.0,75.0,75.0,75.0
"""1""","""men""","""2015-Jun""",151.0,"""Managers""",2015,276948,8.615596,0.217084,2.477278,0.18398,,0.027156,0.005623,0.024023,0.003389,0.234964,0.085313,8.786716,0.223022,2.511899,0.189463,0.0,0.02809,0.005792,0.024659,0.003458,0.23886,0.087917,2.0,4.0,1.0,4.0,3.0,5.0,4.0,4.0,4.0,4.0,4.0,37.5,62.5,0.0,62.5,,87.5,75.0,75.0,75.0,75.0,75.0,37.5,62.5,0.0,62.5,50.0,87.5,75.0,75.0,75.0,75.0,75.0
"""1""","""men""","""2015-Jul""",174.6,"""Managers""",2015,276948,8.615596,0.217084,2.477278,0.18398,,0.027156,0.005623,0.024023,0.003389,0.234964,0.085313,8.786716,0.223022,2.511899,0.189463,0.0,0.02809,0.005792,0.024659,0.003458,0.23886,0.087917,2.0,4.0,1.0,4.0,3.0,5.0,4.0,4.0,4.0,4.0,4.0,37.5,62.5,0.0,62.5,,87.5,75.0,75.0,75.0,75.0,75.0,37.5,62.5,0.0,62.5,50.0,87.5,75.0,75.0,75.0,75.0,75.0
"""1""","""men""","""2015-Aug""",174.4,"""Managers""",2015,276948,8.615596,0.217084,2.477278,0.18398,,0.027156,0.005623,0.024023,0.003389,0.234964,0.085313,8.786716,0.223022,2.511899,0.189463,0.0,0.02809,0.005792,0.024659,0.003458,0.23886,0.087917,2.0,4.0,1.0,4.0,3.0,5.0,4.0,4.0,4.0,4.0,4.0,37.5,62.5,0.0,62.5,,87.5,75.0,75.0,75.0,75.0,75.0,37.5,62.5,0.0,62.5,50.0,87.5,75.0,75.0,75.0,75.0,75.0
"""1""","""men""","""2015-Sep""",157.0,"""Managers""",2015,276948,8.615596,0.217084,2.477278,0.18398,,0.027156,0.005623,0.024023,0.003389,0.234964,0.085313,8.786716,0.223022,2.511899,0.189463,0.0,0.02809,0.005792,0.024659,0.003458,0.23886,0.087917,2.0,4.0,1.0,4.0,3.0,5.0,4.0,4.0,4.0,4.0,4.0,37.5,62.5,0.0,62.5,,87.5,75.0,75.0,75.0,75.0,75.0,37.5,62.5,0.0,62.5,50.0,87.5,75.0,75.0,75.0,75.0,75.0
"""1""","""men""","""2015-Oct""",189.5,"""Managers""",2015,276948,8.615596,0.217084,2.477278,0.18398,,0.027156,0.005623,0.024023,0.003389,0.234964,0.085313,8.786716,0.223022,2.511899,0.189463,0.0,0.02809,0.005792,0.024659,0.003458,0.23886,0.087917,2.0,4.0,1.0,4.0,3.0,5.0,4.0,4.0,4.0,4.0,4.0,37.5,62.5,0.0,62.5,,87.5,75.0,75.0,75.0,75.0,75.0,37.5,62.5,0.0,62.5,50.0,87.5,75.0,75.0,75.0,75.0,75.0


In [97]:
inspect_lazy(scb_months_lf)

Rows: 2,376
Columns: 62


## 7) Export the Cleaned Monthly Output

Write the final cleaned and merged monthly dataset to parquet.


In [98]:
scb_months_lf.sink_parquet(data_dir / "scb_months_lvl1.parquet")