# 03 , Build Reporting Tables

This notebook transforms raw DCA data (parquet files and gzipped text from `dca_update_dec_2024`) into the `reporting.system_*` schema that the 24 benchmark SQL queries expect.

Why this step is needed: The benchmark queries reference pre-aggregated "reporting" tables (e.g., `reporting.system_userwait`, `reporting.system_network_consumption`) that Intel built internally. We have the raw event-level data. We need to aggregate it into the same schema.

Reference documents:
- `docs/queries/Reporting Schema table definition.md` , official column definitions for every reporting table
- `docs/queries/scratch reporting analytics queries.sql` , Intel's actual ETL SQL (CREATE TABLE + INSERT statements)
- `docs/dca-dictionary.txt` , raw table column descriptions

Approach: For each reporting table, we:
1. Explain what it is and what raw data it comes from
2. Show the transformation SQL (adapted from Intel's ETL for DuckDB)
3. Verify the output: row counts, guid counts, column names, sample rows
4. Save the result as a parquet file in `data/reporting/`

Output: 19 parquet files in `data/reporting/`, one per reporting table, ready for the benchmark queries.

In [3]:
from pathlib import Path

import duckdb
from IPython.display import display, Markdown

DATA = Path("../data/raw")
REPORTING = DATA / "reporting"
REPORTING.mkdir(exist_ok=True)

con = duckdb.connect()

SYSINFO = str(DATA / "system_sysinfo_unique_normalized" / "*.parquet")
HW_METRIC = str(DATA / "hw_metric_stats" / "*.parquet")
NET_CONSUMPTION = str(DATA / "os_network_consumption_v2" / "*.parquet")
MEM_AVAIL = str(DATA / "os_memsam_avail_percent" / "*.parquet")
WEB_CAT_USAGE = str(DATA / "web_cat_usage_v2" / "*.parquet")
WEB_CAT_PIVOT = str(DATA / "web_cat_pivot" / "*.parquet")
USERWAIT = str(DATA / "userwait_v2" / "*.parquet")


def save_and_verify(table_name: str, query: str) -> None:

    out = REPORTING / f"{table_name}.parquet"
    con.execute(f"COPY ({query}) TO '{out}' (FORMAT PARQUET)")
    
    stats = con.execute(f"SELECT COUNT(*) as rows, COUNT(DISTINCT guid) as guids FROM read_parquet('{out}')").fetchone()
    schema = con.execute(f"DESCRIBE SELECT * FROM read_parquet('{out}')").df()
    
    cols = ", ".join(schema["column_name"].tolist())
    display(Markdown(f"✓ `{table_name}`: {stats[0]:,} rows, {stats[1]:,} guids, {len(schema)} columns\n\nColumns: `{cols}`\n\nSaved to: `{out}`"))
    
    sample = con.execute(f"SELECT * FROM read_parquet('{out}') LIMIT 3").df()
    display(sample)

---
## Group 1: Direct copy / minimal rename

These tables either match the reporting schema directly or need only trivial changes (column selection, type casting). No aggregation required.

### 1. `system_sysinfo_unique_normalized`

The anchor table , client metadata (chassis type, country, CPU, RAM, persona, etc.). Used by nearly every join query.

Source: `data/system_sysinfo_unique_normalized/*.parquet` (8 files, 1M rows)  
Transformation: None , direct copy. This table is identical in the raw and reporting schemas.

In [4]:
save_and_verify("system_sysinfo_unique_normalized", f"""
    SELECT * FROM read_parquet('{SYSINFO}')
""")

✓ `system_sysinfo_unique_normalized`: 1,000,000 rows, 1,000,000 guids, 32 columns

Columns: `load_ts, guid, chassistype, chassistype_2in1_category, countryname, countryname_normalized, modelvendor, modelvendor_normalized, model, model_normalized, ram, os, #ofcores, age_category, graphicsmanuf, gfxcard, graphicscardclass, processornumber, cpuvendor, cpuname, cpucode, cpu_family, cpu_suffix, screensize_category, persona, processor_line, vpro_enabled, firstreportdate, lastreportdate, discretegraphics, cpu_stepping, engagement_id`

Saved to: `../data/raw/reporting/system_sysinfo_unique_normalized.parquet`

Unnamed: 0,load_ts,guid,chassistype,chassistype_2in1_category,countryname,countryname_normalized,modelvendor,modelvendor_normalized,model,model_normalized,...,cpu_suffix,screensize_category,persona,processor_line,vpro_enabled,firstreportdate,lastreportdate,discretegraphics,cpu_stepping,engagement_id
0,2022-09-14 15:13:35,000091c0adc149389235ed2c5f15a59e,Desktop,Unknown,Australia,Australia,Unknown,Unknown,Unknown,Unknown,...,Core-U,23x,Casual User,U-Processor,N,2021-06-10 13:45:04,2021-06-22 07:20:00,N,Intel64 Family 6 Model 78 Stepping 3,Consumer - IDSA
1,2022-09-14 15:13:35,0000af8fab2d4669bad5917875158ab9,Desktop,Unknown,India,India,Gigabyte,Gigabyte,H410M H V2,H410M H V2,...,Other,21x,Casual User,Unknown,N,2021-06-20 13:27:21,2022-09-12 22:23:15,N,Intel64 Family 6 Model 165 Stepping 5,Consumer - IDSA
2,2022-09-14 15:13:35,0000cc165aa744638ec3ba6d7f1ab538,Desktop,Unknown,"Korea, Republic of","Korea, Republic of",Asus,Asus,System Product Name,System Product Name,...,Other,23x,Casual User,Unknown,N,2021-01-18 09:25:20,2021-01-20 09:35:38,Y,Intel64 Family 6 Model 165 Stepping 3,Consumer - IDSA


### 2. `system_cpu_metadata`

Processor specifications per client: CPU code, generation, lithography, market segment, market codename, TDP.

Source: `data/system_cpu_metadata.txt000.gz` (42.5 MiB, 1M rows)  
Transformation: None , direct copy from the pre-built update table.

In [5]:
save_and_verify("system_cpu_metadata", f"""
    SELECT * FROM read_csv('{DATA / "system_cpu_metadata.txt000.gz"}', auto_detect=true)
""")

✓ `system_cpu_metadata`: 1,000,000 rows, 1,000,000 guids, 12 columns

Columns: `guid, cpu, cpucode, processtechnology, lithography, marketsegment, cpugen, launchdate, estfirstusedt, marketcodename, #ofcores, spec_tdp`

Saved to: `../data/raw/reporting/system_cpu_metadata.parquet`

Unnamed: 0,guid,cpu,cpucode,processtechnology,lithography,marketsegment,cpugen,launchdate,estfirstusedt,marketcodename,#ofcores,spec_tdp
0,000626d0e02147d99180021bd03306c5,Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz,i5-8250U,1272,14 nm,MBL,8th Gen i5,2017-08-21 00:00:00,2017-08-21 00:00:00,Kaby Lake R,4,15
1,00145fc17dda4ca6ac386ef0fe7d5f30,Intel(R) Celeron(R) CPU J1900 @ 1.99GHz,J1900,1271,22 nm,DT,Pentium/Celeron-Bay Trail,2013-11-04 00:00:00,2013-11-04 00:00:00,Bay Trail,4,10
2,0018ff5159db4fc698b846779c6e73ed,Intel(R) Core(TM) i7-3820 CPU @ 3.60GHz,i7-3820,1268,32 nm,DT,3rd Gen i7,2012-02-13 00:00:00,2012-02-13 00:00:00,Sandy Bridge E,4,130


### 3. `system_os_codename_history`

History of Windows OS versions per client, with time windows (`min_ts` to `max_ts`) for when each version was active.

Source: `data/system_os_codename_history.txt000.gz` (17.6 MiB, 639K rows)  
Transformation: None , direct copy.

In [6]:
save_and_verify("system_os_codename_history", f"""
    SELECT * FROM read_csv('{DATA / "system_os_codename_history.txt000.gz"}', auto_detect=true)
""")

✓ `system_os_codename_history`: 639,223 rows, 299,099 guids, 6 columns

Columns: `load_ts, guid, min_ts, max_ts, os_name, os_codename`

Saved to: `../data/raw/reporting/system_os_codename_history.parquet`

Unnamed: 0,load_ts,guid,min_ts,max_ts,os_name,os_codename
0,2024-11-14 22:14:45,001d97ba9db74b5baf65f14235919222,2021-04-28 13:13:47,2022-06-15 10:08:58,Win10,20H2
1,2024-11-14 22:14:45,001d97ba9db74b5baf65f14235919222,2022-06-15 10:08:59,2024-11-14 22:14:29,Win10,21H2
2,2024-11-14 22:14:45,00222b9888fa44508b1ceff502de4e72,2021-09-27 15:08:58,2022-10-14 08:18:03,Win10,21H2


### 4. `system_on_off_suspend_time_day`

Daily summary of client on/off/modern-sleep/sleep time in seconds.

Source: `data/guids_on_off_suspend_time_day.txt000.gz` (16.8 MiB, 1.58M rows)  
Transformation: None , already at the `(guid, dt)` granularity the reporting schema expects.

In [7]:
save_and_verify("system_on_off_suspend_time_day", f"""
    SELECT * FROM read_csv('{DATA / "guids_on_off_suspend_time_day.txt000.gz"}', auto_detect=true)
""")

✓ `system_on_off_suspend_time_day`: 1,582,017 rows, 36,958 guids, 7 columns

Columns: `load_ts, guid, dt, on_time, off_time, mods_time, sleep_time`

Saved to: `../data/raw/reporting/system_on_off_suspend_time_day.parquet`

Unnamed: 0,load_ts,guid,dt,on_time,off_time,mods_time,sleep_time
0,2024-10-10 20:39:35,007452469ed4438da7acf59c04e2eea4,2024-07-10,2984,0,0,0
1,2024-10-10 20:39:35,007452469ed4438da7acf59c04e2eea4,2024-07-11,16236,70163,0,0
2,2024-10-10 20:39:35,007452469ed4438da7acf59c04e2eea4,2024-07-12,0,86399,0,0


### 5. `system_display_devices`

Display device usage: connection type, resolution, refresh rate, duration on AC/DC power.

Source: `data/display_devices.txt000.gz` (6.16 GiB, 221M rows, 209K guids)  
Transformation: Direct copy , Intel's ETL SQL confirms this is a straight `INSERT ... SELECT` from `university_prod.display_devices` with only type casts.

In [8]:
save_and_verify("system_display_devices", f"""
    SELECT * FROM read_csv('{DATA / "display_devices.txt000.gz"}', auto_detect=true)
""")

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

✓ `system_display_devices`: 220,997,262 rows, 209,239 guids, 23 columns

Columns: `load_ts, batch_id, audit_zip, audit_internal_path, guid, interval_start_utc, interval_end_utc, interval_local_start, interval_local_end, dt, ts, display_id, adapter_id, port, sink_index, connection_type, vendor_name, status, resolution_width, resolution_heigth, refresh_rate, duration_ac, duration_dc`

Saved to: `../data/raw/reporting/system_display_devices.parquet`

Unnamed: 0,load_ts,batch_id,audit_zip,audit_internal_path,guid,interval_start_utc,interval_end_utc,interval_local_start,interval_local_end,dt,...,port,sink_index,connection_type,vendor_name,status,resolution_width,resolution_heigth,refresh_rate,duration_ac,duration_dc
0,2021-08-24 06:30:11,20210823-200234,2021082323-i-04a9a906e1cd5620c-zJYsXoQZlZ7QJSw...,V8_1_DISPLAY_DEVICES_20210823194114.V8,000d453ce1514126a12c1cf7737ddfbd,2021-08-22 12:44:33,2021-08-23 23:40:25,2021-08-22 08:44:33,2021-08-23 19:40:25,2021-08-22,...,PORT_D,6,HDMI,Other,1,1920,1080,59.94,16090,0
1,2021-08-24 06:30:11,20210823-200234,2021082323-i-04a9a906e1cd5620c-zJYsXoQZlZ7QJSw...,V8_1_DISPLAY_DEVICES_20210823194114.V8,000d453ce1514126a12c1cf7737ddfbd,2021-08-22 12:44:33,2021-08-23 23:40:25,2021-08-22 08:44:33,2021-08-23 19:40:25,2021-08-22,...,PORT_D,6,HDMI,Other,1,1920,1080,59.94,11031,0
2,2021-09-14 03:24:02,20210913-200234,2021091305-i-03f3ce9f3e5d1108e-mmoT5jR6G0gevvd...,V8_1_DISPLAY_DEVICES_20210913014551.V8,000d453ce1514126a12c1cf7737ddfbd,2021-09-11 23:25:57,2021-09-13 05:45:03,2021-09-11 19:25:57,2021-09-13 01:45:03,2021-09-11,...,PORT_D,6,HDMI,Other,1,1920,1080,59.94,109046,0


### 6. `system_frgnd_apps_types`

Daily foreground application usage: app type classification, focal screen time, detection count.

Source: `data/__tmp_fgnd_apps_date.txt003.gz` (1.53 GiB, 56.8M rows, 55.8K guids , partial, 1 of 4 split files)  
Transformation: Direct copy. Note: some rows have malformed `company_short` values (embedded tabs in addresses), so we use `ignore_errors=true`.  
Caveat: This is a partial sample (file 3 of 4). Rankings will be based on this subset.

In [9]:
save_and_verify("system_frgnd_apps_types", f"""
    SELECT * FROM read_csv('{DATA / "__tmp_fgnd_apps_date.txt003.gz"}', auto_detect=true, ignore_errors=true)
""")

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

✓ `system_frgnd_apps_types`: 56,755,998 rows, 55,830 guids, 9 columns

Columns: `guid, date, app_type, exe_name, company_short, totalsecfocal_day, avg_fract_desktop, process_desc, lines_per_day`

Saved to: `../data/raw/reporting/system_frgnd_apps_types.parquet`

Unnamed: 0,guid,date,app_type,exe_name,company_short,totalsecfocal_day,avg_fract_desktop,process_desc,lines_per_day
0,6a3062437f6c4c66a6bd363cab6a64ee,2019-11-18,Other,restricted process,,107.503,0.187484,,12
1,6a3062437f6c4c66a6bd363cab6a64ee,2019-11-18,Other,startmenuexperiencehost.exe,Microsoft Corporation,22.122,0.948,Windows Start Experience Host,7
2,6a3062437f6c4c66a6bd363cab6a64ee,2019-11-18,Productivity,kutoolsforexcelsetup.exe,,0.509,0.03,,1


### 7. `system_mods_top_blocker_hist`

Modern standby blocker events: blocker name, type, activity level, active time.

Source: `data/mods_sleepstudy_top_blocker_hist.txt000.gz` (1.89 GiB, 92.5M rows)  
Transformation: Minimal , the reporting schema expects a `dt` column derived from `dt_utc`. We add it as an alias.

In [10]:
save_and_verify("system_mods_top_blocker_hist", f"""
    SELECT 
        load_ts, guid, ts_utc, dt_utc,
        ts_local,
        dt_utc AS dt,
        blocker_name, active_time_ms, activity_level, blocker_type, blocker_id
    FROM read_csv('{DATA / "mods_sleepstudy_top_blocker_hist.txt000.gz"}', auto_detect=true)
""")

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

✓ `system_mods_top_blocker_hist`: 92,460,980 rows, 65,034 guids, 11 columns

Columns: `load_ts, guid, ts_utc, dt_utc, ts_local, dt, blocker_name, active_time_ms, activity_level, blocker_type, blocker_id`

Saved to: `../data/raw/reporting/system_mods_top_blocker_hist.parquet`

Unnamed: 0,load_ts,guid,ts_utc,dt_utc,ts_local,dt,blocker_name,active_time_ms,activity_level,blocker_type,blocker_id
0,2023-01-06 03:24:35,000626d0e02147d99180021bd03306c5,2023-01-04 18:18:53,2023-01-04,2023-01-04 13:18:53,2023-01-04,Connection Phase,0.0,low,PDC Phase,1-3
1,2023-01-06 03:24:35,000626d0e02147d99180021bd03306c5,2023-01-04 18:18:53,2023-01-04,2023-01-04 13:18:53,2023-01-04,No CS Phase,877156.0,high,PDC Phase,1-1
2,2023-01-06 03:24:35,000626d0e02147d99180021bd03306c5,2023-01-04 18:18:53,2023-01-04,2023-01-04 13:18:53,2023-01-04,PLM Phase,0.0,low,PDC Phase,1-5


### 8. `system_mods_power_consumption`

Per-process power consumption estimates from Windows Modern Standby sleep study reports.

Source: `data/mods_sleepstudy_power_estimation_data_13wks.txt000.gz` (218 KB, 10K rows, 1 guid)  
Transformation: Direct copy.  
Limitation: This is stub/test data with a single client. The 3 queries that use this table (`ranked_process_classifications`, `top_10_processes_per_user_id`, `top_20_most_power_consuming`) aggregate by `user_id`/`app_id` only (no guid reference), so they still produce meaningful rankings. See notebook 02 for the full investigation.

In [11]:
save_and_verify("system_mods_power_consumption", f"""
    SELECT * FROM read_csv('{DATA / "mods_sleepstudy_power_estimation_data_13wks.txt000.gz"}', auto_detect=true)
""")

✓ `system_mods_power_consumption`: 10,000 rows, 1 guids, 21 columns

Columns: `load_ts, batch_id, audit_zip, audit_internal_path, guid, ts_utc, dt_utc, ts_local, app_id, user_id, cpu_power_consumption, display_power_consumption, disk_power_consumption, mbb_power_consumption, network_power_consumption, soc_power_consumption, loss_power_consumption, other_power_consumption, total_power_consumption, recent_usage_hash, scenario_instance_hash`

Saved to: `../data/raw/reporting/system_mods_power_consumption.parquet`

Unnamed: 0,load_ts,batch_id,audit_zip,audit_internal_path,guid,ts_utc,dt_utc,ts_local,app_id,user_id,...,display_power_consumption,disk_power_consumption,mbb_power_consumption,network_power_consumption,soc_power_consumption,loss_power_consumption,other_power_consumption,total_power_consumption,recent_usage_hash,scenario_instance_hash
0,2024-08-11 06:17:21,20240810-200516,2024081015-i-052847c7850604415-lOKmBjHXrhUSsBr...,V8_1_SLEEPSTUDY_REPORT_XML_20240810100122.V8,00126088522545b781ba30ab4a35972e,2024-08-10 02:24:08,2024-08-10,2024-08-09 21:24:08,\Device\HarddiskVolume3\Program Files (x86)\Mc...,UserIdMask,...,0,0,0,0,0,0,0,6,da68998d0f852b4c68af19690462a256,
1,2024-08-12 04:50:07,20240811-200357,2024081115-i-0e440b2078f7aa315-cHF4HQKiejDj5pP...,V8_1_SLEEPSTUDY_REPORT_XML_20240811100614.V8,00126088522545b781ba30ab4a35972e,2024-08-10 16:26:41,2024-08-10,2024-08-10 11:26:41,\Device\HarddiskVolume3\Program Files (x86)\Mc...,SYSTEM,...,0,0,0,0,0,0,0,0,985c2772237e68b88116c9454a83fa91,
2,2024-08-11 06:17:21,20240810-200516,2024081015-i-052847c7850604415-lOKmBjHXrhUSsBr...,V8_1_SLEEPSTUDY_REPORT_XML_20240810100122.V8,00126088522545b781ba30ab4a35972e,2024-08-10 02:24:08,2024-08-10,2024-08-09 21:24:08,\Device\HarddiskVolume3\Program Files (x86)\Mc...,SYSTEM,...,0,0,0,0,0,0,0,0,da68998d0f852b4c68af19690462a256,


---
## Group 2: Simple aggregation

These tables require GROUP BY aggregation to transform event-level raw data into the per-`(guid, dt)` or per-`(guid, dt, key)` granularity the reporting schema expects.

### 9. `system_batt_dc_events`

Summarized battery utilization per client per day: number of DC power-on events and total duration on battery.

Source: `data/__tmp_batt_dc_events.txt000.gz` (12 MiB, ~49K event rows, ~20K guids)  
Transformation: The raw data has one row per DC power-on event. We aggregate per `(guid, dt)`:  
- `num_power_ons = COUNT(*)`
- `duration_mins = SUM(duration_mins)`
- Battery percentage stats via MIN/MAX/AVG

Reference: Reporting Schema table definition, lines 57-73.

In [12]:
batt_file = str(DATA / "__tmp_batt_dc_events.txt000.gz")

save_and_verify("system_batt_dc_events", f"""
    SELECT 
        MAX(load_ts) AS load_ts,
        guid,
        CAST(power_on_dc_ts AS DATE) AS dt,
        SUM(duration_mins) AS duration_mins,
        MAX(power_on_battery_percent) AS max_power_on_battery_percent,
        MIN(CASE WHEN power_on_battery_percent >= 0 THEN power_on_battery_percent END) AS min_power_on_battery_percent,
        ROUND(AVG(CASE WHEN power_on_battery_percent >= 0 THEN power_on_battery_percent END)) AS avg_power_on_battery_percent,
        MAX(power_off_battery_percent) AS max_power_off_battery_percent,
        MIN(CASE WHEN power_off_battery_percent >= 0 THEN power_off_battery_percent END) AS min_power_off_battery_percent,
        ROUND(AVG(CASE WHEN power_off_battery_percent >= 0 THEN power_off_battery_percent END)) AS avg_power_off_battery_percent,
        COUNT(*) AS num_power_ons
    FROM read_csv('{batt_file}', auto_detect=true)
    GROUP BY guid, CAST(power_on_dc_ts AS DATE)
""")

✓ `system_batt_dc_events`: 372,673 rows, 19,780 guids, 11 columns

Columns: `load_ts, guid, dt, duration_mins, max_power_on_battery_percent, min_power_on_battery_percent, avg_power_on_battery_percent, max_power_off_battery_percent, min_power_off_battery_percent, avg_power_off_battery_percent, num_power_ons`

Saved to: `../data/raw/reporting/system_batt_dc_events.parquet`

Unnamed: 0,load_ts,guid,dt,duration_mins,max_power_on_battery_percent,min_power_on_battery_percent,avg_power_on_battery_percent,max_power_off_battery_percent,min_power_off_battery_percent,avg_power_off_battery_percent,num_power_ons
0,2024-10-08 19:34:13,75eef8866c704a7eb166e5ce5c0daaae,2024-08-28,99.0,100,100,100.0,100,100,100.0,1
1,2024-10-08 19:34:13,7c4e64b8d157415daaec7af2bd9daece,2024-08-25,12.0,47,17,27.0,18,15,17.0,3
2,2024-10-08 19:34:13,86cda85558ed4953b759fdb08b9055b8,2024-08-21,4.0,100,100,100.0,100,99,100.0,2


### 10. `system_network_consumption`

Daily network consumption per client: bytes sent/received per second, aggregated across all network interfaces.

Source: `data/os_network_consumption_v2/*.parquet` (1.81 GiB, 121.8M hourly rows, 37.2K guids)  
Transformation (from Intel's ETL, lines 928-941 of scratch SQL):
- GROUP BY `guid, dt, input_description`
- `nrs = SUM(nr_samples)`
- `avg_bytes_sec = SUM(nr_samples * avg_bytes_sec) / SUM(nr_samples)` (weighted average)
- Rename: `input_description` → `input_desc`, `nr_samples` → `nrs`

In [13]:
save_and_verify("system_network_consumption", f"""
    SELECT
        guid,
        dt,
        input_description AS input_desc,
        SUM(nr_samples) AS nrs,
        MIN(min_bytes_sec) AS min_bytes_sec,
        SUM(CAST(nr_samples AS DOUBLE) * avg_bytes_sec) / SUM(nr_samples) AS avg_bytes_sec,
        MAX(max_bytes_sec) AS max_bytes_sec
    FROM read_parquet('{NET_CONSUMPTION}')
    GROUP BY guid, dt, input_description
""")

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

✓ `system_network_consumption`: 5,721,356 rows, 37,224 guids, 7 columns

Columns: `guid, dt, input_desc, nrs, min_bytes_sec, avg_bytes_sec, max_bytes_sec`

Saved to: `../data/raw/reporting/system_network_consumption.parquet`

Unnamed: 0,guid,dt,input_desc,nrs,min_bytes_sec,avg_bytes_sec,max_bytes_sec
0,06e4ff5d51bf4d58b5cba9a8a1ba0695,2022-04-21,OS:NETWORK INTERFACE::BYTES SENT/SEC::,3677.0,0,4361.741,737502
1,06e4ff5d51bf4d58b5cba9a8a1ba0695,2022-06-26,OS:NETWORK INTERFACE::BYTES SENT/SEC::,2437.0,0,1462.706,107464
2,06e81aa4c0634c409a9d319487e10d93,2022-04-16,OS:NETWORK INTERFACE::BYTES RECEIVED/SEC::,3565.0,0,6180430.0,2279038060


### 11. `system_userwait`

Aggregated user wait incidents per client per day, by event type and process.

Source: `data/userwait_v2/0000_part_00.parquet` (4.89 GiB, 175M event rows, 38.1K guids)  
Transformation (from Intel's ETL, lines 338-352 of scratch SQL):
- GROUP BY `guid, dt, event_name, ac_dc_event_name, proc_name_current`
- `acdc = UPPER(SUBSTRING(ac_dc_event_name, 1, 2))` , e.g., "AC_DISPLAY_ON" → "AC"
- `proc_name = proc_name_current`
- `number_of_instances = COUNT(*)`
- `total_duration_ms = SUM(duration_ms)`

In [14]:
save_and_verify("system_userwait", f"""
    SELECT
        guid,
        dt,
        event_name,
        ac_dc_event_name,
        UPPER(SUBSTRING(ac_dc_event_name, 1, 2)) AS acdc,
        proc_name_current AS proc_name,
        COUNT(*) AS number_of_instances,
        SUM(duration_ms) AS total_duration_ms
    FROM read_parquet('{USERWAIT}')
    GROUP BY guid, dt, event_name, ac_dc_event_name, proc_name_current
""")

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

✓ `system_userwait`: 34,655,557 rows, 38,142 guids, 8 columns

Columns: `guid, dt, event_name, ac_dc_event_name, acdc, proc_name, number_of_instances, total_duration_ms`

Saved to: `../data/raw/reporting/system_userwait.parquet`

Unnamed: 0,guid,dt,event_name,ac_dc_event_name,acdc,proc_name,number_of_instances,total_duration_ms
0,5a119d95983045e0bcdbe3f9104bb647,2021-06-12,WAIT,unknown,UN,UNKNOWN,1,2261300.0
1,5a119d95983045e0bcdbe3f9104bb647,2021-06-24,WAIT,AC_DISPLAY_OFF,AC,Mastercam.exe,6,9399.0
2,8acfe933dd644b82be665f6a194448ff,2022-10-03,WAIT,UN_DISPLAY_UN,UN,SIGMA_PhotoPro6.exe,1,3392.0


### 12. `system_web_cat_usage`

Daily web browsing statistics per client, browser, and category.

Source: `data/web_cat_usage_v2/*.parquet` (864 MiB, 21.4M rows, 64.3K guids)  
Transformation: The raw table already has the right granularity (`guid, dt, browser, parent_category, sub_category`). We just need to pass it through , the reporting schema matches the raw columns.

In [15]:
save_and_verify("system_web_cat_usage", f"""
    SELECT * FROM read_parquet('{WEB_CAT_USAGE}')
""")

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

✓ `system_web_cat_usage`: 21,354,922 rows, 64,276 guids, 18 columns

Columns: `load_ts, batch_id, audit_zip, audit_internal_path, guid, interval_start_utc, interval_end_utc, interval_local_start, interval_local_end, dt, browser, parent_category, sub_category, duration_ms, page_load_count, site_count, domain_count, page_visit_count`

Saved to: `../data/raw/reporting/system_web_cat_usage.parquet`

Unnamed: 0,load_ts,batch_id,audit_zip,audit_internal_path,guid,interval_start_utc,interval_end_utc,interval_local_start,interval_local_end,dt,browser,parent_category,sub_category,duration_ms,page_load_count,site_count,domain_count,page_visit_count
0,2021-06-11 03:12:58,20210610-200228,2021061003-i-04a9a906e1cd5620c-Nza8Wx4VDJjXb7j...,V8_2_WEB_CAT_USAGE_20210610134711.V8,000091c0adc149389235ed2c5f15a59e,2021-06-09 03:43:09,2021-06-10 03:45:19,2021-06-09 13:43:09,2021-06-10 13:45:19,2021-06-09,chrome,reference,reference,36087,1,1,1,4
1,2021-06-11 03:12:58,20210610-200228,2021061003-i-04a9a906e1cd5620c-Nza8Wx4VDJjXb7j...,V8_2_WEB_CAT_USAGE_20210610134711.V8,000091c0adc149389235ed2c5f15a59e,2021-06-09 03:43:09,2021-06-10 03:45:19,2021-06-09 13:43:09,2021-06-10 13:45:19,2021-06-09,edge,productivity,other,21667,2,2,1,3
2,2021-06-11 03:12:58,20210610-200228,2021061003-i-04a9a906e1cd5620c-Nza8Wx4VDJjXb7j...,V8_2_WEB_CAT_USAGE_20210610134711.V8,000091c0adc149389235ed2c5f15a59e,2021-06-09 03:43:09,2021-06-10 03:45:19,2021-06-09 13:43:09,2021-06-10 13:45:19,2021-06-09,chrome,other,unclassified,296422,33,24,2,89


---
## Group 3: hw_metric_stats-derived tables

Intel's reporting tables for power, C0 residency, frequency, and temperature were originally built from `power_acdc_usage_v4_hist` (342 GiB , not feasible to download). We substitute with `hw_metric_stats`, which has the same metrics at daily granularity but without the AC/DC `event_name` breakdown.

Key difference: Intel's reporting tables have `event_name` (AC_DISPLAY_ON, DC_DISPLAY_OFF, etc.). Ours won't. This doesn't break any of the benchmark queries because they aggregate across all events anyway (weighted averages).

Common pattern: Filter `hw_metric_stats` by `name`, then aggregate per `(guid, dt)` with weighted stats using `nrs` as the weight.

### 13. `system_psys_rap_watts`

Total system power consumption estimates in Watts.

Metric filter: `name IN ('HW::PACKAGE:RAP:WATTS:', 'HW:::PSYS_RAP:WATTS:')`  
Note: PSYS_RAP has low coverage (816 guids in our 1/8 sample). The 5-way chassis join query will have limited overlap.

In [16]:
save_and_verify("system_psys_rap_watts", f"""
    SELECT
        guid,
        dt,
        SUM(nrs) AS nrs,
        SUM(nrs * min) / SUM(nrs) AS min_psys_rap_watts,
        SUM(nrs * mean) / SUM(nrs) AS avg_psys_rap_watts,
        SUM(nrs * max) / SUM(nrs) AS max_psys_rap_watts
    FROM read_parquet('{HW_METRIC}')
    WHERE name IN ('HW::PACKAGE:RAP:WATTS:', 'HW:::PSYS_RAP:WATTS:')
    GROUP BY guid, dt
""")

✓ `system_psys_rap_watts`: 2,609 rows, 611 guids, 6 columns

Columns: `guid, dt, nrs, min_psys_rap_watts, avg_psys_rap_watts, max_psys_rap_watts`

Saved to: `../data/raw/reporting/system_psys_rap_watts.parquet`

Unnamed: 0,guid,dt,nrs,min_psys_rap_watts,avg_psys_rap_watts,max_psys_rap_watts
0,322805427cd24fc780be319e8dbbb3f3,2020-07-25,14647.0,2.226148,8.166502,29.573238
1,322805427cd24fc780be319e8dbbb3f3,2020-07-28,10834.0,2.547422,6.226409,30.835917
2,322805427cd24fc780be319e8dbbb3f3,2020-08-03,7774.0,2.181254,5.839431,23.854181


### 14. `system_pkg_C0`

Processor C0 state residency , percentage of time the processor is fully active.

Metric filter: `name = 'HW::PACKAGE:C0_RESIDENCY:PERCENT:'`  
Coverage: 8,943 guids (strongest coverage among the hw_metric tables).

In [17]:
save_and_verify("system_pkg_C0", f"""
    SELECT
        guid,
        dt,
        SUM(nrs) AS nrs,
        SUM(nrs * min) / SUM(nrs) AS min_pkg_c0,
        SUM(nrs * mean) / SUM(nrs) AS avg_pkg_c0,
        SUM(nrs * max) / SUM(nrs) AS max_pkg_c0
    FROM read_parquet('{HW_METRIC}')
    WHERE name = 'HW::PACKAGE:C0_RESIDENCY:PERCENT:'
    GROUP BY guid, dt
""")

✓ `system_pkg_C0`: 513,353 rows, 8,943 guids, 6 columns

Columns: `guid, dt, nrs, min_pkg_c0, avg_pkg_c0, max_pkg_c0`

Saved to: `../data/raw/reporting/system_pkg_C0.parquet`

Unnamed: 0,guid,dt,nrs,min_pkg_c0,avg_pkg_c0,max_pkg_c0
0,065a5ba3572c4bb688ca874ea9dff393,2019-10-06,13633.0,13.27,43.86,100.0
1,065c277bab1c411ca08b918a352ec01b,2020-01-30,7207.0,5.43049,14.28327,99.958094
2,06783bd509084e7284e019dde9fa8713,2019-11-06,3880.0,4.39,81.89,99.98


### 15. `system_pkg_avg_freq_mhz`

Average processor clock frequency in MHz.

Metric filter: `name = 'HW::CORE:AVG_FREQ:MHZ:'`  
Coverage: 613 guids.

In [18]:
save_and_verify("system_pkg_avg_freq_mhz", f"""
    SELECT
        guid,
        dt,
        SUM(nrs) AS nrs,
        SUM(nrs * min) / SUM(nrs) AS min_avg_freq_mhz,
        SUM(nrs * mean) / SUM(nrs) AS avg_avg_freq_mhz,
        SUM(nrs * max) / SUM(nrs) AS max_avg_freq_mhz
    FROM read_parquet('{HW_METRIC}')
    WHERE name = 'HW::CORE:AVG_FREQ:MHZ:'
    GROUP BY guid, dt
""")

✓ `system_pkg_avg_freq_mhz`: 2,563 rows, 613 guids, 6 columns

Columns: `guid, dt, nrs, min_avg_freq_mhz, avg_avg_freq_mhz, max_avg_freq_mhz`

Saved to: `../data/raw/reporting/system_pkg_avg_freq_mhz.parquet`

Unnamed: 0,guid,dt,nrs,min_avg_freq_mhz,avg_avg_freq_mhz,max_avg_freq_mhz
0,0036e50bb4e641ffa2d5f562089355c7,2020-06-26,2573.0,834.279689,1432.895181,2363.641003
1,03bde64316504b348bb39bebe0e63662,2020-12-30,25736.0,2411.730942,3789.346331,4391.045477
2,043117a3725a4afdbf8d2c575dbb995d,2020-09-21,19782.0,2799.555,2950.285,3060.725


### 16. `system_pkg_temp_centigrade`

Processor temperature in degrees centigrade.

Metric filter: `name = 'HW::CORE:TEMPERATURE:CENTIGRADE:'`  
Coverage: 622 guids.

In [19]:
save_and_verify("system_pkg_temp_centigrade", f"""
    SELECT
        guid,
        dt,
        SUM(nrs) AS nrs,
        SUM(nrs * min) / SUM(nrs) AS min_temp_centigrade,
        SUM(nrs * mean) / SUM(nrs) AS avg_temp_centigrade,
        SUM(nrs * max) / SUM(nrs) AS max_temp_centigrade
    FROM read_parquet('{HW_METRIC}')
    WHERE name = 'HW::CORE:TEMPERATURE:CENTIGRADE:'
    GROUP BY guid, dt
""")

✓ `system_pkg_temp_centigrade`: 2,639 rows, 622 guids, 6 columns

Columns: `guid, dt, nrs, min_temp_centigrade, avg_temp_centigrade, max_temp_centigrade`

Saved to: `../data/raw/reporting/system_pkg_temp_centigrade.parquet`

Unnamed: 0,guid,dt,nrs,min_temp_centigrade,avg_temp_centigrade,max_temp_centigrade
0,2afeabab61ab4744a15652f8140d43cc,2020-07-22,2056.0,44.5,50.015,68.5
1,2dd8d7ae26054e18b310b00af1e73e90,2020-10-18,69096.0,54.05433,63.228714,92.402165
2,2dd8d7ae26054e18b310b00af1e73e90,2020-10-23,36400.0,47.5,56.6275,92.5


### 17. `system_hw_pkg_power`

Processor package power consumption in Watts (IA component).

Original source: Intel built this from `hw_pack_run_avg_pwr` (not `hw_metric_stats`), using `rap_22` as the max value. We don't have that table, so we substitute with `hw_metric_stats` filtered to `IA_POWER`.  
Metric filter: `name = 'HW::PACKAGE:IA_POWER:WATTS:'`  
Coverage: ~800 guids.

In [20]:
save_and_verify("system_hw_pkg_power", f"""
    SELECT
        guid,
        dt,
        instance,
        SUM(nrs) AS nrs,
        SUM(nrs * mean) / SUM(nrs) AS mean,
        MAX(max) AS max
    FROM read_parquet('{HW_METRIC}')
    WHERE name = 'HW::PACKAGE:IA_POWER:WATTS:'
    GROUP BY guid, dt, instance
""")

✓ `system_hw_pkg_power`: 45,133 rows, 800 guids, 6 columns

Columns: `guid, dt, instance, nrs, mean, max`

Saved to: `../data/raw/reporting/system_hw_pkg_power.parquet`

Unnamed: 0,guid,dt,instance,nrs,mean,max
0,125ef70d367a4c67b709d725d0d41982,2021-07-04,1,2698.0,3.131968,18.15
1,125ef70d367a4c67b709d725d0d41982,2021-08-03,1,3035.0,6.53,15.96
2,125ef70d367a4c67b709d725d0d41982,2021-08-06,1,6651.0,4.350298,17.5


---
## Group 4: Memory utilization (requires JOIN)

This table requires joining raw memory data with sysinfo to get the RAM capacity.

### 18. `system_memory_utilization`

Daily RAM utilization statistics per client.

Source: `data/os_memsam_avail_percent/*.parquet` (2.03 GiB, 21.7M rows, 69.6K guids)  
Transformation (from Intel's ETL, lines 1429-1458 of scratch SQL):
- JOIN with sysinfo to get `ram` (in GB) → convert to MB: `sysinfo_ram = ram * 1024`
- `avg_free_ram = SUM(sample_count * average) / SUM(sample_count)` , weighted average free memory in MB
- `utilized_ram = sysinfo_ram - avg_free_ram`
- `avg_percentage_used = ROUND((sysinfo_ram - avg_free_ram) * 100 / sysinfo_ram)`
- `nrs = SUM(sample_count)`
- Filter: `ram != 0` (avoid division by zero)

Critical note: The raw `average` column is free memory in MB, not a percentage. This was confirmed by both the DCA dictionary and Intel's ETL SQL.

In [21]:
save_and_verify("system_memory_utilization", f"""
    SELECT
        guid,
        dt,
        nrs,
        avg_free_memory AS avg_free_ram,
        sysinfo_ram,
        sysinfo_ram - avg_free_memory AS utilized_ram,
        ROUND((sysinfo_ram - avg_free_memory) * 100.0 / sysinfo_ram) AS avg_percentage_used
    FROM (
        SELECT
            a.guid,
            a.dt,
            SUM(a.sample_count) AS nrs,
            SUM(a.sample_count * a.average) / SUM(a.sample_count) AS avg_free_memory,
            CAST(b.ram * 1024 AS BIGINT) AS sysinfo_ram
        FROM read_parquet('{MEM_AVAIL}') a
        INNER JOIN read_parquet('{SYSINFO}') b ON a.guid = b.guid
        WHERE b.ram != 0 AND b.ram IS NOT NULL
        GROUP BY a.guid, a.dt, b.ram
    ) c
    WHERE sysinfo_ram > 0
""")

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

✓ `system_memory_utilization`: 11,671,422 rows, 69,514 guids, 7 columns

Columns: `guid, dt, nrs, avg_free_ram, sysinfo_ram, utilized_ram, avg_percentage_used`

Saved to: `../data/raw/reporting/system_memory_utilization.parquet`

Unnamed: 0,guid,dt,nrs,avg_free_ram,sysinfo_ram,utilized_ram,avg_percentage_used
0,16aed7801fd64e1cbe09fd7251fa9ba3,2022-05-27,17307.0,6715.954354,16384,9668.045646,59.0
1,16aed7801fd64e1cbe09fd7251fa9ba3,2022-09-04,1296.0,6943.145062,16384,9440.854938,58.0
2,16af6375361a4218af03dbe8a4e585ed,2019-11-09,4086.0,7053.116985,12288,5234.883015,43.0


---
## Group 5: Web category pivot (requires CASE WHEN pivot)

The persona query needs web browsing duration pivoted by category into 28 columns. We have two options:
1. Use the pre-built `web_cat_pivot` table (already pivoted, but uses long column names)
2. Build it from `web_cat_usage_v2` using Intel's ETL SQL (CASE WHEN pivot)

We'll use option 2 because the query SQL uses short column names (e.g., `education`, `finance`) that match Intel's ETL output, not the long names in our raw pivot table (e.g., `education_education`, `finance_banking_and_accounting`).

### 19. `system_web_cat_pivot_duration`

Web browsing duration pivoted by category per `(guid, dt)`. 28 category columns, values in milliseconds.

Source: `data/web_cat_usage_v2/*.parquet` (864 MiB, 21.4M rows)  
Transformation (from Intel's ETL, lines 1087-1121 of scratch SQL): CASE WHEN pivot on `parent_category` and `sub_category`, summing `duration_ms` into 28 named columns with the SHORT names the query SQL expects.

In [22]:
save_and_verify("system_web_cat_pivot_duration", f"""
    SELECT
        guid,
        dt,
        SUM(CASE WHEN parent_category = 'content creation' AND sub_category = 'photo edit/creation' THEN duration_ms ELSE 0 END) AS content_creation_photo_edit_creation,
        SUM(CASE WHEN parent_category = 'content creation' AND sub_category = 'video/audio edit/creation' THEN duration_ms ELSE 0 END) AS content_creation_video_audio_edit_creation,
        SUM(CASE WHEN parent_category = 'content creation' AND sub_category = 'web design / development' THEN duration_ms ELSE 0 END) AS content_creation_web_design_development,
        SUM(CASE WHEN parent_category = 'education' THEN duration_ms ELSE 0 END) AS education,
        SUM(CASE WHEN parent_category = 'entertainment' AND sub_category = 'music / audio streaming' THEN duration_ms ELSE 0 END) AS entertainment_music_audio_streaming,
        SUM(CASE WHEN parent_category = 'entertainment' AND sub_category = 'other' THEN duration_ms ELSE 0 END) AS entertainment_other,
        SUM(CASE WHEN parent_category = 'entertainment' AND sub_category = 'video streaming' THEN duration_ms ELSE 0 END) AS entertainment_video_streaming,
        SUM(CASE WHEN parent_category = 'finance' THEN duration_ms ELSE 0 END) AS finance,
        SUM(CASE WHEN parent_category = 'games' AND sub_category = 'other' THEN duration_ms ELSE 0 END) AS games_other,
        SUM(CASE WHEN parent_category = 'games' AND sub_category = 'video games' THEN duration_ms ELSE 0 END) AS games_video_games,
        SUM(CASE WHEN parent_category = 'mail' THEN duration_ms ELSE 0 END) AS mail,
        SUM(CASE WHEN parent_category = 'news' THEN duration_ms ELSE 0 END) AS news,
        SUM(CASE WHEN parent_category = 'other' THEN duration_ms ELSE 0 END) AS unclassified,
        SUM(CASE WHEN parent_category = 'private' THEN duration_ms ELSE 0 END) AS private,
        SUM(CASE WHEN parent_category = 'productivity' AND sub_category = 'crm' THEN duration_ms ELSE 0 END) AS productivity_crm,
        SUM(CASE WHEN parent_category = 'productivity' AND sub_category = 'other' THEN duration_ms ELSE 0 END) AS productivity_other,
        SUM(CASE WHEN parent_category = 'productivity' AND sub_category = 'presentations' THEN duration_ms ELSE 0 END) AS productivity_presentations,
        SUM(CASE WHEN parent_category = 'productivity' AND sub_category = 'programming' THEN duration_ms ELSE 0 END) AS productivity_programming,
        SUM(CASE WHEN parent_category = 'productivity' AND sub_category = 'project management' THEN duration_ms ELSE 0 END) AS productivity_project_management,
        SUM(CASE WHEN parent_category = 'productivity' AND sub_category = 'spreadsheets' THEN duration_ms ELSE 0 END) AS productivity_spreadsheets,
        SUM(CASE WHEN parent_category = 'productivity' AND sub_category = 'word processing' THEN duration_ms ELSE 0 END) AS productivity_word_processing,
        SUM(CASE WHEN parent_category = 'recreation' AND sub_category = 'travel' THEN duration_ms ELSE 0 END) AS recreation_travel,
        SUM(CASE WHEN parent_category = 'reference' THEN duration_ms ELSE 0 END) AS reference,
        SUM(CASE WHEN parent_category = 'search' THEN duration_ms ELSE 0 END) AS search,
        SUM(CASE WHEN parent_category = 'shopping' THEN duration_ms ELSE 0 END) AS shopping,
        SUM(CASE WHEN parent_category = 'social' AND sub_category = 'social network' THEN duration_ms ELSE 0 END) AS social_social_network,
        SUM(CASE WHEN parent_category = 'social' AND sub_category = 'communication' THEN duration_ms ELSE 0 END) AS social_communication,
        SUM(CASE WHEN parent_category = 'social' AND sub_category = 'communication - live' THEN duration_ms ELSE 0 END) AS social_communication_live
    FROM read_parquet('{WEB_CAT_USAGE}')
    GROUP BY guid, dt
""")

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

✓ `system_web_cat_pivot_duration`: 4,537,100 rows, 64,276 guids, 30 columns

Columns: `guid, dt, content_creation_photo_edit_creation, content_creation_video_audio_edit_creation, content_creation_web_design_development, education, entertainment_music_audio_streaming, entertainment_other, entertainment_video_streaming, finance, games_other, games_video_games, mail, news, unclassified, private, productivity_crm, productivity_other, productivity_presentations, productivity_programming, productivity_project_management, productivity_spreadsheets, productivity_word_processing, recreation_travel, reference, search, shopping, social_social_network, social_communication, social_communication_live`

Saved to: `../data/raw/reporting/system_web_cat_pivot_duration.parquet`

Unnamed: 0,guid,dt,content_creation_photo_edit_creation,content_creation_video_audio_edit_creation,content_creation_web_design_development,education,entertainment_music_audio_streaming,entertainment_other,entertainment_video_streaming,finance,...,productivity_project_management,productivity_spreadsheets,productivity_word_processing,recreation_travel,reference,search,shopping,social_social_network,social_communication,social_communication_live
0,00075517414d434fb03f4e8027f0ab61,2022-07-14,0.0,0.0,0.0,52229.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,115092.0,1054675.0,421654.0,0.0,121848.0,0.0
1,000a6e3e9fab43a6a1e1b65fb63bdccd,2021-08-27,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7695.0,...,0.0,0.0,0.0,0.0,0.0,1098101.0,0.0,10553735.0,0.0,0.0
2,000a6e3e9fab43a6a1e1b65fb63bdccd,2021-09-02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2986925.0,0.0,0.0


---
## Summary

In [23]:
import os

lines = []
lines.append("| Table | Size (MB) | Rows | Guids |")
lines.append("|---|---|---|---|")

total_size = 0
for f in sorted(REPORTING.glob("*.parquet")):
    size_mb = f.stat().st_size / 1e6
    total_size += size_mb
    rows = con.execute(f"SELECT COUNT(*) FROM read_parquet('{f}')").fetchone()[0]
    guids = con.execute(f"SELECT COUNT(DISTINCT guid) FROM read_parquet('{f}')").fetchone()[0]
    lines.append(f"| `{f.name}` | {size_mb:.1f} | {rows:,} | {guids:,} |")

lines.append(f"| TOTAL | {total_size:.1f} | | |")

display(Markdown("\n".join(lines)))
display(Markdown(f"{len(list(REPORTING.glob('*.parquet')))} reporting tables ready for benchmark queries."))


| Table | Size (MB) | Rows | Guids |
|---|---|---|---|
| `system_batt_dc_events.parquet` | 9.2 | 372,673 | 19,780 |
| `system_cpu_metadata.parquet` | 41.1 | 1,000,000 | 1,000,000 |
| `system_display_devices.parquet` | 7384.3 | 220,997,262 | 209,239 |
| `system_frgnd_apps_types.parquet` | 974.4 | 56,755,998 | 55,830 |
| `system_hw_pkg_power.parquet` | 0.7 | 45,133 | 800 |
| `system_memory_utilization.parquet` | 352.4 | 11,671,422 | 69,514 |
| `system_mods_power_consumption.parquet` | 0.1 | 10,000 | 1 |
| `system_mods_top_blocker_hist.parquet` | 1012.1 | 92,460,980 | 65,034 |
| `system_network_consumption.parquet` | 173.1 | 5,721,356 | 37,224 |
| `system_on_off_suspend_time_day.parquet` | 15.2 | 1,582,017 | 36,958 |
| `system_os_codename_history.parquet` | 18.8 | 639,223 | 299,099 |
| `system_pkg_C0.parquet` | 15.9 | 513,353 | 8,943 |
| `system_pkg_avg_freq_mhz.parquet` | 0.1 | 2,563 | 613 |
| `system_pkg_temp_centigrade.parquet` | 0.1 | 2,639 | 622 |
| `system_psys_rap_watts.parquet` | 0.1 | 2,609 | 611 |
| `system_sysinfo_unique_normalized.parquet` | 67.6 | 1,000,000 | 1,000,000 |
| `system_userwait.parquet` | 407.7 | 34,655,557 | 38,142 |
| `system_web_cat_pivot_duration.parquet` | 217.1 | 4,537,100 | 64,276 |
| `system_web_cat_usage.parquet` | 789.6 | 21,354,922 | 64,276 |
| TOTAL | 11479.8 | | |

19 reporting tables ready for benchmark queries.