# Notebook 02: Querying Waggle Data

This notebook demonstrates how to work with raw Array of Things (AoT) data.  
The research paper analyzed the **full AoT deployment** (~500 nodes, 30-second resolution).  

In this repository, we provide a **subset** (10 nodes, 6 sensors, 27 days) for reproducibility.  

We demonstrate:
- How to filter raw traces using the provided `wg_datatool.py` utility.  
- How to load metadata (`nodes.csv`, `sensors.csv`) from the `/data/metadata/` folder.  
- How this subset workflow connects to the full Waggle system.

> ⚠️ **Note**: The Waggle API allows direct querying of nodes, sensors, and time ranges.  
> Here, we use the sample raw trace and processing scripts for demonstration.

In [2]:
#Imports needed

import subprocess
import pandas as pd
import os


In [5]:
# Load metadata directly from the metadata folder
nodes = pd.read_csv("../data/metadata/nodes.csv")
sensors = pd.read_csv("../data/metadata/sensors.csv")

print("Nodes metadata shape:", nodes.shape)
print("Sensors metadata shape:", sensors.shape)

display(nodes.head(), sensors.head())

Nodes metadata shape: (126, 9)
Sensors metadata shape: (193, 8)


Unnamed: 0,node_id,project_id,vsn,address,lat,lon,description,start_timestamp,end_timestamp
0,001e0610ba46,AoT_Chicago,004,State St & Jackson Blvd Chicago IL,41.878377,-87.627678,AoT Chicago (S) [C],2017/10/09 00:00:00,
1,001e0610ba3b,AoT_Chicago,006,18th St & Lake Shore Dr Chicago IL,41.858136,-87.616055,AoT Chicago (S),2017/08/08 00:00:00,
2,001e0610f02f,AoT_Chicago,00A,Lake Shore Drive & Fullerton Ave Chicago IL,41.926261,-87.630758,AoT Chicago (S) [CA],2018/05/07 00:00:00,
3,001e0610ba8f,AoT_Chicago,00D,Cornell & 47th St Chicago IL,41.810342,-87.590228,AoT Chicago (S),2017/08/08 00:00:00,
4,001e0610ba16,AoT_Chicago,010,Homan Ave & Roosevelt Rd Chicago IL,41.866349,-87.710543,AoT Chicago (S) [C],2018/07/18 00:00:00,


Unnamed: 0,ontology,subsystem,sensor,parameter,hrf_unit,hrf_minval,hrf_maxval,datasheet
0,/sensing/air_quality/gases/co,chemsense,co,concentration,ppm,0.0,1000.0,https://github.com/waggle-sensor/sensors/raw/m...
1,/sensing/air_quality/gases/h2s,chemsense,h2s,concentration,ppm,0.0,50.0,https://github.com/waggle-sensor/sensors/raw/m...
2,/sensing/air_quality/gases/no2,chemsense,no2,concentration,ppm,0.0,20.0,https://github.com/waggle-sensor/sensors/raw/m...
3,/sensing/air_quality/gases/o3,chemsense,o3,concentration,ppm,0.0,20.0,https://github.com/waggle-sensor/sensors/raw/m...
4,/sensing/air_quality/gases/oxidizing_gases,chemsense,oxidizing_gases,concentration,ppm,0.0,100.0,https://github.com/waggle-sensor/sensors/blob/...


Add here later

In [9]:
# Use wg_datatool.py to filter the sample raw trace
input_file = "../data/raw/sample_raw_trace.csv"
output_file = "../data/raw/query_output.csv"

# Example: filter for "pm25" records
subprocess.run([
    "python", "../scripts/query/wg_datatool.py",
    "-i", input_file,
    "-o", output_file,
    "-g", "no2"
])

[ INFO  ] Took 0.00 seconds for input file indexing
[ INFO  ] Took 0.31 seconds for the manipulation
[ INFO  ] Took 0.00 seconds for merging output
[ INFO  ] Manipulation is completed.


CompletedProcess(args=['python', '../scripts/query/wg_datatool.py', '-i', '../data/raw/sample_raw_trace.csv', '-o', '../data/raw/query_output.csv', '-g', 'no2'], returncode=0)

⚠️ Note: `wg_datatool.py` checks for metadata (`nodes.csv`, `sensors.csv`) in the same 
folder as the input file. Since we store metadata separately in `/data/metadata/`, 
the script prints warnings. These can safely be ignored, as filtering is still applied.

In [10]:
if os.path.exists(output_file):
    df_query = pd.read_csv(output_file)
    print("Filtered data shape:", df_query.shape)
    display(df_query.head())
else:
    print("No output file created.")

Filtered data shape: (72, 5)


Unnamed: 0,timestamp,node_id,sensor,parameter,value_hrf
0,2020-01-12 00:00:00,001e0610ee36,no2,concentration,0.0078
1,2020-01-12 00:00:00,001e0610ee43,no2,concentration,0.041622
2,2020-01-12 00:00:00,001e06113107,no2,concentration,0.014561
3,2020-01-12 01:00:00,001e0610ee36,no2,concentration,0.009984
4,2020-01-12 01:00:00,001e0610ee43,no2,concentration,0.047444


In [8]:
# Load the sample raw trace directly (without filtering)
df_sample = pd.read_csv(input_file)
print("Sample raw trace shape:", df_sample.shape)
display(df_sample.head())

Sample raw trace shape: (576, 5)


Unnamed: 0,timestamp,node_id,sensor,parameter,value_hrf
0,2020-01-12 00:00:00,001e0610ee36,hih6130,humidity,100.0
1,2020-01-12 00:00:00,001e0610ee36,hih6130,temperature,125.01
2,2020-01-12 00:00:00,001e0610ee36,htu21d,humidity,118.99
3,2020-01-12 00:00:00,001e0610ee36,htu21d,temperature,128.86
4,2020-01-12 00:10:00,001e0610ee36,hih6130,humidity,100.0


## Notes

- The **full AoT dataset** contains ~500 nodes and is accessed through the Waggle repository.  
- This repository provides only a **subset** and a **sample raw trace** for reproducibility.  
- `wg_datatool.py` demonstrates how to process raw CSVs once downloaded.  

### About the Warnings
When running `wg_datatool.py`, you may see warnings such as:

[WARNING] nodes.csv not exist under ../data/raw. Ignore "nodes." operations...
[WARNING] sensors.csv not exist under ../data/raw. Ignore "sensors." operations...


These occur because the script expects metadata files (`nodes.csv`, `sensors.csv`)  
to be in the same folder as the input file. Since we keep them cleanly in `/data/metadata/`,  
the warnings are harmless and can be safely ignored.  

Filtering and manipulation still run correctly (`Manipulation is completed`).