# Time Series Data with FolderDB

This notebook demonstrates how to work with time series data using the FolderDB class. We'll show:
- Generating time series data with datetime keys
- Storing and retrieving time-based records
- Performing range queries with timestamps
- Calculating statistics on time series data

## Setup and Imports

First, let's import the required libraries and set up our environment.

In [1]:
import os
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
from folderdb import FolderDB

## Initialize Database

Let's create a folder for our database and initialize the FolderDB instance.

In [2]:
# Create a folder for our database
db_folder = "timeseries_db"
os.makedirs(db_folder, exist_ok=True)

# Initialize the database
db = FolderDB(db_folder)

## Generate Sample Data

Let's create a function to generate sample sensor data with temperature, humidity, and pressure readings.

In [3]:
def generate_sensor_data(start_time: datetime, duration_minutes: int, interval_minutes: int = 1) -> pd.DataFrame:
    """Generate sample sensor data.
    
    Args:
        start_time: Starting datetime
        duration_minutes: Duration in minutes
        interval_minutes: Time interval between readings in minutes
        
    Returns:
        DataFrame with sensor readings
    """
    # Generate timestamps
    timestamps = [start_time + timedelta(minutes=i) for i in range(0, duration_minutes, interval_minutes)]
    
    # Generate random sensor data
    data = {
        'temperature': np.random.normal(25, 2, len(timestamps)),
        'humidity': np.random.normal(60, 5, len(timestamps)),
        'pressure': np.random.normal(1013, 5, len(timestamps))
    }
    
    # Create DataFrame
    df = pd.DataFrame(data, index=timestamps)
    
    # Round values
    df['temperature'] = df['temperature'].round(1)
    df['humidity'] = df['humidity'].round(1)
    df['pressure'] = df['pressure'].round(1)
    
    return df

# Generate data for two sensors
start_time = datetime.now() - timedelta(hours=1)
sensor1_data = generate_sensor_data(start_time, 60)
sensor2_data = generate_sensor_data(start_time, 60)

print("Sensor 1 Data (first 5 records):")
display(sensor1_data.head())
print("\nSensor 2 Data (first 5 records):")
display(sensor2_data.head())

Sensor 1 Data (first 5 records):


Unnamed: 0,temperature,humidity,pressure
2025-03-26 15:58:04.689184,22.7,61.0,1013.0
2025-03-26 15:59:04.689184,24.4,69.7,1011.2
2025-03-26 16:00:04.689184,27.5,66.2,1011.2
2025-03-26 16:01:04.689184,23.3,55.7,1015.1
2025-03-26 16:02:04.689184,27.6,57.0,1012.3



Sensor 2 Data (first 5 records):


Unnamed: 0,temperature,humidity,pressure
2025-03-26 15:58:04.689184,24.7,64.6,1017.2
2025-03-26 15:59:04.689184,24.8,69.5,1018.5
2025-03-26 16:00:04.689184,27.3,59.6,1013.4
2025-03-26 16:01:04.689184,24.9,57.6,1008.5
2025-03-26 16:02:04.689184,25.4,61.2,1010.1


## Save Data to Database

Now let's save our sensor data to the database using the `upsert_df` method.

In [4]:
# Save DataFrames to database
db.upsert_df("sensor1", sensor1_data)
db.upsert_df("sensor2", sensor2_data)

print("Database state after saving:")
print(str(db))

Database state after saving:
FolderDB at timeseries_db
--------------------------------------------------
sensor1.jsonl:
  Size: 26631 bytes
  Count: 180
  Key range: 2025-03-26T15:55:23.039563 to 2025-03-26T16:57:04.689184
  Linted: False
sensor2.jsonl:
  Size: 15478 bytes
  Count: 180
  Key range: 2025-03-26T15:55:23.039563 to 2025-03-26T16:57:04.689184
  Linted: False


## Query Recent Data

Let's query the last 30 minutes of data from both sensors.

In [5]:
# Get current time and calculate time range
end_time = datetime.now()
start_time = end_time - timedelta(minutes=30)

# Query recent data
recent_data = db.get_df(["sensor1", "sensor2"], lower_key=start_time, upper_key=end_time)

print("Recent Sensor 1 Data:")
display(recent_data["sensor1"].head())
print("\nRecent Sensor 2 Data:")
display(recent_data["sensor2"].head())

Recent Sensor 1 Data:


Unnamed: 0,temperature,humidity,pressure
2025-03-26T16:28:23.039563,25.96,56.2,1011.8
2025-03-26T16:28:32.690735,27.94,59.1,1009.1
2025-03-26T16:29:04.689184,24.2,58.7,1018.6
2025-03-26T16:29:23.039563,25.08,64.3,1008.1
2025-03-26T16:29:32.690735,28.27,58.0,1018.4



Recent Sensor 2 Data:


Unnamed: 0,temperature,humidity,pressure
2025-03-26T16:28:23.039563,21.1,57.6,1010.2
2025-03-26T16:28:32.690735,23.9,58.7,1021.7
2025-03-26T16:29:04.689184,22.3,55.0,1003.7
2025-03-26T16:29:23.039563,27.2,63.3,1018.0
2025-03-26T16:29:32.690735,21.9,54.4,1009.3


## Calculate Statistics

Let's calculate some basic statistics on the sensor data.

In [6]:
# Get all data
all_data = db.get_df(["sensor1", "sensor2"])

print("Sensor 1 Statistics:")
display(all_data["sensor1"].describe())
print("\nSensor 2 Statistics:")
display(all_data["sensor2"].describe())

Sensor 1 Statistics:


Unnamed: 0,temperature,humidity,pressure
count,180.0,180.0,180.0
mean,26.676889,60.21,1013.093889
std,2.461669,4.412971,4.668295
min,20.7,47.5,998.6
25%,24.7,56.925,1009.875
50%,26.895,60.25,1013.05
75%,28.2775,63.2,1016.1
max,34.21,72.6,1025.8



Sensor 2 Statistics:


Unnamed: 0,temperature,humidity,pressure
count,180.0,180.0,180.0
mean,25.160556,59.725556,1013.108333
std,2.06307,4.709198,5.20584
min,20.8,48.7,994.3
25%,23.5,56.425,1009.8
50%,25.2,59.65,1013.2
75%,26.8,62.425,1016.825
max,29.9,74.7,1028.4


## Apply Calibration

Let's apply a calibration factor to one of the sensors.

In [7]:
# Apply calibration to sensor1
calibration_factor = 1.1
sensor1_calibrated = all_data["sensor1"].copy()
sensor1_calibrated['temperature'] *= calibration_factor

# Save calibrated data
db.upsert_df("sensor1", sensor1_calibrated)

print("Calibrated Sensor 1 Data (first 5 records):")
display(sensor1_calibrated.head())

Calibrated Sensor 1 Data (first 5 records):


Unnamed: 0,temperature,humidity,pressure
2025-03-26T16:08:23.039563,25.41,60.5,1008.7
2025-03-26T16:26:32.690735,27.83,61.0,1013.1
2025-03-26T16:34:32.690735,31.46,58.6,1010.6
2025-03-26T15:55:23.039563,31.218,59.4,1015.6
2025-03-26T15:56:23.039563,30.734,56.5,1009.6


## Delete Old Data

Let's delete data older than 30 minutes.

In [8]:
# Delete old data from both sensors
cutoff_time = datetime.now() - timedelta(minutes=30)

db.delete_file_range("sensor1", None, cutoff_time)
db.delete_file_range("sensor2", None, cutoff_time)

print("Database state after deletion:")
print(str(db))

Database state after deletion:
FolderDB at timeseries_db
--------------------------------------------------
sensor1.jsonl:
  Size: 36720 bytes
  Count: 180
  Key range: 2025-03-26T15:55:23.039563 to 2025-03-26T16:57:04.689184
  Linted: False
sensor2.jsonl:
  Size: 15478 bytes
  Count: 180
  Key range: 2025-03-26T15:55:23.039563 to 2025-03-26T16:57:04.689184
  Linted: False


## Lint DB

In [9]:
db.lint_db()

Found 2 JSONL files to lint.
Linting file: sensor1.jsonl
Successfully linted and updated metadata for sensor1.jsonl.
Linting file: sensor2.jsonl
Successfully linted and updated metadata for sensor2.jsonl.


## Cleanup

Finally, let's clean up by removing the database folder and its contents.

In [10]:
# Cleanup
for file in os.listdir(db_folder):
    os.remove(os.path.join(db_folder, file))
os.rmdir(db_folder)

print("Database folder has been cleaned up.")

Database folder has been cleaned up.
