# Time Series Data with FolderDB

This notebook demonstrates how to work with time series data using the FolderDB class. We'll show:
- Generating time series data with datetime keys
- Storing and retrieving time-based records
- Performing range queries with timestamps
- Calculating statistics on time series data

## Setup and Imports

First, let's import the required libraries and set up our environment.

In [11]:
import os
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import sys

# Add the parent directory to the Python path
sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.getcwd()))))


from jsonldb.folderdb import FolderDB

## Initialize Database

Let's create a folder for our database and initialize the FolderDB instance.

In [12]:
# Create a folder for our database
db_folder = "timeseries_db"
os.makedirs(db_folder, exist_ok=True)

# Initialize the database
db = FolderDB(db_folder)

## Generate Sample Data

Let's create a function to generate sample sensor data with temperature, humidity, and pressure readings.

In [None]:
def generate_sensor_data(start_time: datetime, duration_minutes: int, interval_minutes: int = 1) -> pd.DataFrame:
    """Generate sample sensor data.
    
    Args:
        start_time: Starting datetime
        duration_minutes: Duration in minutes
        interval_minutes: Time interval between readings in minutes
        
    Returns:
        DataFrame with sensor readings
    """
    # Generate timestamps
    timestamps = [start_time + timedelta(minutes=i) for i in range(0, duration_minutes, interval_minutes)]
    
    # Generate random sensor data
    data = {
        'temperature': np.random.normal(25, 2, len(timestamps)),
        'humidity': np.random.normal(60, 5, len(timestamps)),
        'pressure': np.random.normal(1013, 5, len(timestamps))
    }
    
    # Create DataFrame
    df = pd.DataFrame(data, index=timestamps)
    
    # Round values
    df['temperature'] = df['temperature'].round(1)
    df['humidity'] = df['humidity'].round(1)
    df['pressure'] = df['pressure'].round(1)
    
    return df

# Generate data for two sensors
start_time = datetime.now() - timedelta(hours=1)
sensor1_data = generate_sensor_data(start_time, 60)
sensor2_data = generate_sensor_data(start_time, 60)

print("Sensor 1 Data (first 5 records):")
display(sensor1_data.head())
print("\nSensor 2 Data (first 5 records):")
display(sensor2_data.head())

## Save Data to Database

Now let's save our sensor data to the database using the `upsert_df` method.

In [None]:
# Save DataFrames to database
db.upsert_df("sensor1", sensor1_data)
db.upsert_df("sensor2", sensor2_data)

print("Database state after saving:")
print(str(db))

## Query Recent Data

Let's query the last 30 minutes of data from both sensors.

In [None]:
# Get current time and calculate time range
end_time = datetime.now()
start_time = end_time - timedelta(minutes=30)

# Query recent data
recent_data = db.get_df(["sensor1", "sensor2"], lower_key=start_time, upper_key=end_time)

print("Recent Sensor 1 Data:")
display(recent_data["sensor1"].head())
print("\nRecent Sensor 2 Data:")
display(recent_data["sensor2"].head())

## Calculate Statistics

Let's calculate some basic statistics on the sensor data.

In [None]:
# Get all data
all_data = db.get_df(["sensor1", "sensor2"])

print("Sensor 1 Statistics:")
display(all_data["sensor1"].describe())
print("\nSensor 2 Statistics:")
display(all_data["sensor2"].describe())

## Apply Calibration

Let's apply a calibration factor to one of the sensors.

In [None]:
# Apply calibration to sensor1
calibration_factor = 1.1
sensor1_calibrated = all_data["sensor1"].copy()
sensor1_calibrated['temperature'] *= calibration_factor

# Save calibrated data
db.upsert_df("sensor1", sensor1_calibrated)

print("Calibrated Sensor 1 Data (first 5 records):")
display(sensor1_calibrated.head())

## Delete Old Data

Let's delete data older than 30 minutes.

In [None]:
# Delete old data from both sensors
cutoff_time = datetime.now() - timedelta(minutes=30)

db.delete_file_range("sensor1", None, cutoff_time)
db.delete_file_range("sensor2", None, cutoff_time)

print("Database state after deletion:")
print(str(db))

## Lint DB

In [None]:
db.lint_db()

## Cleanup

Finally, let's clean up by removing the database folder and its contents.

In [None]:
# Cleanup
for file in os.listdir(db_folder):
    os.remove(os.path.join(db_folder, file))
os.rmdir(db_folder)

print("Database folder has been cleaned up.")