# Time Series Data Management with JSONL

This notebook demonstrates how to use the `jsonlfile` package to manage time series data efficiently. We'll simulate sensor data collection and perform various operations like:
- Generating sensor readings with realistic patterns
- Storing and retrieving time series data using datetime keys
- Analyzing data within specific time ranges
- Applying calibration updates
- Implementing data retention policies

## Setup
First, let's import the required libraries and set up our environment.

In [1]:
import sys
import os
import random
import math
from datetime import datetime, timedelta

# Add parent directory to path to import jsonlfile
sys.path.append(os.path.dirname(os.path.dirname(os.getcwd())))

from jsonldb.jsonlfile import save_jsonl, load_jsonl, select_jsonl, update_jsonl, delete_jsonl,lint_jsonl

## Generate Sensor Data

We'll create a function that generates realistic sensor data with the following characteristics:
- Temperature varies with a daily cycle (warmer during day, cooler at night)
- Humidity inversely correlates with temperature
- Random noise is added to make the data more realistic
- Each sensor has a slight bias to simulate real-world variations
- Data is keyed by datetime objects for efficient time-based operations

In [2]:
def generate_sensor_data(start_time, num_points, sensor_id):
    """Generate simulated sensor data with realistic patterns.
    
    Args:
        start_time (datetime): Starting time for the data series
        num_points (int): Number of data points to generate
        sensor_id (str): Identifier for the sensor (affects baseline values)
    
    Returns:
        dict: Dictionary of timestamped sensor readings with datetime keys
    """
    data = {}
    sensor_num = int(sensor_id.split('_')[1])
    
    # Add slight bias per sensor
    temp_bias = sensor_num * 0.5
    humidity_bias = sensor_num * -1.0
    
    for i in range(num_points):
        timestamp = start_time + timedelta(minutes=5*i)
        # Time of day in radians (0 to 2π)
        time_of_day = (timestamp.hour * 3600 + timestamp.minute * 60) * 2 * math.pi / 86400
        
        # Base temperature varies in a sine wave (24-hour cycle)
        base_temp = 22 + 5 * math.sin(time_of_day)
        temperature = base_temp + temp_bias + random.uniform(-0.5, 0.5)
        
        # Humidity inversely correlates with temperature
        base_humidity = 60 - 2 * (temperature - 22)
        humidity = base_humidity + humidity_bias + random.uniform(-2, 2)
        
        # Ensure values are in realistic ranges
        temperature = round(max(10, min(35, temperature)), 2)
        humidity = round(max(30, min(90, humidity)), 2)
        
        data[timestamp] = {
            "temperature": temperature,
            "humidity": humidity,
            "sensor_id": sensor_id,
            "status": "active"
        }
    
    return data

# Generate 24 hours of data for three sensors (readings every 5 minutes)
start_time = datetime.now().replace(hour=0, minute=0, second=0, microsecond=0)
points_per_day = 12

all_data = {}
for sensor_id in [f"sensor_{i:02d}" for i in range(3)]:
    sensor_data = generate_sensor_data(start_time, points_per_day, sensor_id)
    all_data.update(sensor_data)

print(f"Generated {len(all_data)} readings across 3 sensors")
print("\nSample reading:")
sample_key = next(iter(all_data))
print(f"{sample_key}: {all_data[sample_key]}")

Generated 12 readings across 3 sensors

Sample reading:
2025-03-26 00:00:00: {'temperature': 22.53, 'humidity': 55.86, 'sensor_id': 'sensor_02', 'status': 'active'}


In [3]:
len(all_data)

12

## Save Time Series Data

Now we'll save our sensor data to a JSONL file. The datetime keys will be automatically handled by the JSONL serialization.

In [4]:
print("Saving sensor data...")
save_jsonl("sensor_data.jsonl", all_data)

# Verify the files were created
print(f"\nJSONL file exists: {os.path.exists('sensor_data.jsonl')}")
print(f"Index file exists: {os.path.exists('sensor_data.jsonl.idx')}")

Saving sensor data...

JSONL file exists: True
Index file exists: True


## Select Time Series Data between a range

In [7]:
start_date = datetime(2025,3,25,0,0,0)
end_date = datetime(2025,3,25,0,25,0)


select_data=select_jsonl("sensor_data.jsonl",start_date,end_date)

## Lint data

In [8]:
lint_jsonl("sensor_data.jsonl")

## Cleanup

Finally, let's clean up our test files.

In [9]:
print("Cleaning up...")
os.remove("sensor_data.jsonl")
os.remove("sensor_data.jsonl.idx")
print("Done!")

Cleaning up...
Done!
