# Time Series Data with FolderDB

This notebook demonstrates how to work with time series data using the FolderDB class. We'll show:
- Generating time series data with datetime keys
- Storing and retrieving time-based records
- Performing range queries with timestamps
- Calculating statistics on time series data

## Setup and Imports

First, let's import the required libraries and set up our environment.

In [2]:
import os
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
from folderdb import FolderDB

## Initialize Database

Let's create a folder for our database and initialize the FolderDB instance.

In [3]:
# Create a folder for our database
db_folder = "timeseries_db"
os.makedirs(db_folder, exist_ok=True)

# Initialize the database
db = FolderDB(db_folder)

## Generate Sample Data

Let's create a function to generate sample sensor data with temperature, humidity, and pressure readings.

In [4]:
def generate_sensor_data(start_time: datetime, duration_minutes: int, interval_minutes: int = 1) -> pd.DataFrame:
    """Generate sample sensor data.
    
    Args:
        start_time: Starting datetime
        duration_minutes: Duration in minutes
        interval_minutes: Time interval between readings in minutes
        
    Returns:
        DataFrame with sensor readings
    """
    # Generate timestamps
    timestamps = [start_time + timedelta(minutes=i) for i in range(0, duration_minutes, interval_minutes)]
    
    # Generate random sensor data
    data = {
        'temperature': np.random.normal(25, 2, len(timestamps)),
        'humidity': np.random.normal(60, 5, len(timestamps)),
        'pressure': np.random.normal(1013, 5, len(timestamps))
    }
    
    # Create DataFrame
    df = pd.DataFrame(data, index=timestamps)
    
    # Round values
    df['temperature'] = df['temperature'].round(1)
    df['humidity'] = df['humidity'].round(1)
    df['pressure'] = df['pressure'].round(1)
    
    return df

# Generate data for two sensors
start_time = datetime.now() - timedelta(hours=1)
sensor1_data = generate_sensor_data(start_time, 60)
sensor2_data = generate_sensor_data(start_time, 60)

print("Sensor 1 Data (first 5 records):")
display(sensor1_data.head())
print("\nSensor 2 Data (first 5 records):")
display(sensor2_data.head())

Sensor 1 Data (first 5 records):


Unnamed: 0,temperature,humidity,pressure
2025-03-26 10:44:35.183498,23.6,52.7,1005.9
2025-03-26 10:45:35.183498,23.4,60.6,1014.5
2025-03-26 10:46:35.183498,27.8,52.5,1017.1
2025-03-26 10:47:35.183498,24.1,61.7,1016.1
2025-03-26 10:48:35.183498,28.8,57.6,1010.0



Sensor 2 Data (first 5 records):


Unnamed: 0,temperature,humidity,pressure
2025-03-26 10:44:35.183498,24.2,65.2,1002.4
2025-03-26 10:45:35.183498,25.4,58.6,1015.1
2025-03-26 10:46:35.183498,24.9,63.0,1015.4
2025-03-26 10:47:35.183498,24.6,58.5,1011.1
2025-03-26 10:48:35.183498,27.0,68.2,1002.1


## Save Data to Database

Now let's save our sensor data to the database using the `upsert_df` method.

In [5]:
# Save DataFrames to database
db.upsert_df("sensor1", sensor1_data)
db.upsert_df("sensor2", sensor2_data)

print("Database state after saving:")
print(str(db))

Database state after saving:
FolderDB at timeseries_db
--------------------------------------------------
Found 2 JSONL files:

sensor1.jsonl:
  Size: 5160 bytes
  Key range: 2025-03-26T10:44:35.183498 to 2025-03-26T11:43:35.183498
  Count: 60

sensor2.jsonl:
  Size: 5159 bytes
  Key range: 2025-03-26T10:44:35.183498 to 2025-03-26T11:43:35.183498
  Count: 60


## Query Recent Data

Let's query the last 30 minutes of data from both sensors.

In [6]:
# Get current time and calculate time range
end_time = datetime.now()
start_time = end_time - timedelta(minutes=30)

# Query recent data
recent_data = db.get_df(["sensor1", "sensor2"], lower_key=start_time, upper_key=end_time)

print("Recent Sensor 1 Data:")
display(recent_data["sensor1"].head())
print("\nRecent Sensor 2 Data:")
display(recent_data["sensor2"].head())

Recent Sensor 1 Data:


Unnamed: 0,temperature,humidity,pressure
2025-03-26T11:15:35.183498,24.5,64.5,1013.6
2025-03-26T11:16:35.183498,23.7,59.9,1016.1
2025-03-26T11:17:35.183498,29.7,60.1,1014.3
2025-03-26T11:18:35.183498,23.3,58.8,1005.7
2025-03-26T11:19:35.183498,23.4,45.9,1016.6



Recent Sensor 2 Data:


Unnamed: 0,temperature,humidity,pressure
2025-03-26T11:15:35.183498,28.1,46.9,1009.2
2025-03-26T11:16:35.183498,25.8,56.4,1016.4
2025-03-26T11:17:35.183498,24.8,51.4,1015.5
2025-03-26T11:18:35.183498,26.7,54.7,1010.6
2025-03-26T11:19:35.183498,24.1,58.7,1006.5


## Calculate Statistics

Let's calculate some basic statistics on the sensor data.

In [7]:
# Get all data
all_data = db.get_df(["sensor1", "sensor2"])

print("Sensor 1 Statistics:")
display(all_data["sensor1"].describe())
print("\nSensor 2 Statistics:")
display(all_data["sensor2"].describe())

Sensor 1 Statistics:


Unnamed: 0,temperature,humidity,pressure
count,60.0,60.0,60.0
mean,24.8,60.01,1013.113333
std,1.995418,5.399049,4.298698
min,20.1,45.6,1004.3
25%,23.4,57.125,1010.9
50%,24.45,60.0,1012.95
75%,26.05,63.275,1015.8
max,29.7,71.8,1025.0



Sensor 2 Statistics:


Unnamed: 0,temperature,humidity,pressure
count,60.0,60.0,60.0
mean,25.333333,60.818333,1012.588333
std,1.937994,4.89872,5.751583
min,21.7,46.9,993.2
25%,23.9,58.25,1009.15
50%,24.9,60.7,1014.15
75%,26.325,64.7,1016.175
max,32.1,69.5,1023.5


## Apply Calibration

Let's apply a calibration factor to one of the sensors.

In [8]:
# Apply calibration to sensor1
calibration_factor = 1.1
sensor1_calibrated = all_data["sensor1"].copy()
sensor1_calibrated['temperature'] *= calibration_factor

# Save calibrated data
db.upsert_df("sensor1", sensor1_calibrated)

print("Calibrated Sensor 1 Data (first 5 records):")
display(sensor1_calibrated.head())

Calibrated Sensor 1 Data (first 5 records):


Unnamed: 0,temperature,humidity,pressure
2025-03-26T10:44:35.183498,25.96,52.7,1005.9
2025-03-26T10:45:35.183498,25.74,60.6,1014.5
2025-03-26T10:46:35.183498,30.58,52.5,1017.1
2025-03-26T10:47:35.183498,26.51,61.7,1016.1
2025-03-26T10:48:35.183498,31.68,57.6,1010.0


## Delete Old Data

Let's delete data older than 30 minutes.

In [9]:
# Delete old data from both sensors
cutoff_time = datetime.now() - timedelta(minutes=30)

db.delete_file_range("sensor1", None, cutoff_time)
db.delete_file_range("sensor2", None, cutoff_time)

print("Database state after deletion:")
print(str(db))

Database state after deletion:
FolderDB at timeseries_db
--------------------------------------------------
Found 2 JSONL files:

sensor1.jsonl:
  Size: 10813 bytes
  Key range: 2025-03-26T10:44:35.183498 to 2025-03-26T11:43:35.183498
  Count: 119

sensor2.jsonl:
  Size: 5159 bytes
  Key range: 2025-03-26T10:44:35.183498 to 2025-03-26T11:43:35.183498
  Count: 60


## Cleanup

Finally, let's clean up by removing the database folder and its contents.

In [10]:
# Cleanup
for file in os.listdir(db_folder):
    os.remove(os.path.join(db_folder, file))
os.rmdir(db_folder)

print("Database folder has been cleaned up.")

Database folder has been cleaned up.
