### Time Series Changes and Versioning with TimeDB

This notebook demonstrates how to manually change values, tags, and annotations of a time series and view all versions of the changes.

#### What you'll learn:
1. **Creating and inserting a time series** - Insert initial time series data
2. **Reading and visualizing time series** - Read and plot the original data
3. **Updating time series** - Manually change values, tags, and annotations
4. **Reading all versions** - Query all versions of the time series using `all_versions` flag
5. **Visualizing changes** - Plot original vs updated versions to see the differences
6. **Batch updates and version history** - Update multiple values and view complete version history with metadata

**Key Concepts:**
- TimeDB maintains a full version history of all changes
- Each update creates a new version while keeping the old version for audit trail
- The `all_versions=True` flag allows you to see all historical versions with `changed_by` and `change_time` metadata
- The `tags_and_annotations=True` flag includes tags and annotations as DataFrame columns
- When using `all_versions=True` with `return_value_id=True`, the DataFrame uses a MultiIndex `(valid_time, value_id)` to preserve multiple versions
- Updates can modify values, annotations, and tags independently or together
- Using `value_id` from the initial read makes updates simple - no need to query the database again


In [None]:
from timedb import TimeDataClient
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime, timezone, timedelta
import numpy as np
from dotenv import load_dotenv
load_dotenv()

# Use the TimeDataClient to interact with TimeDB
td = TimeDataClient()

# Reset database (create/delete schema)
td.delete()
td.create()

## Part 1: Insert a Time Series

First, let's insert an initial time series.


In [None]:
# Create a time series with hourly data for 24 hours
base_time = datetime(2025, 1, 1, 0, 0, tzinfo=timezone.utc)
times = [base_time + timedelta(hours=i) for i in range(24)]

# Generate sample data: temperature values with a daily pattern
# Starting at 15, rising to 25 during the day, then cooling down
np.random.seed(42)  # For reproducibility
base_temp = 20.0
amplitude = 5.0
temperature_values = [
    base_temp + amplitude * np.sin(2 * np.pi * i / 24) + np.random.normal(0, 0.5)
    for i in range(24)
]
temperature_values = [round(v, 2) for v in temperature_values]

# Create DataFrame
df = pd.DataFrame({
    'valid_time': times,
    'temperature': temperature_values
})

# Ensure the series exists, then insert the time series via TimeDataClient
# Use dimensionless units for this simple example (no pint quantities)
if td.series("temperature").count() == 0:
    td.create_series(name="temperature", unit="dimensionless")

result = td.series("temperature").insert_batch(df=df)

print(f"✓ Time series inserted successfully!")
print(f"  Batch ID: {result.batch_id}")
print(f"  Series ID: {result.series_ids['temperature']}")
print(f"  Time range: {times[0]} to {times[-1]}")
print(f"  Number of data points: {len(df)}")


# Store the series_id for later use
series_id = result.series_ids['temperature']

## Part 2: Read and Plot the Original Time Series

Now let's read back the time series and visualize it.

In [None]:
# Read the time series with return_value_id=True so we can use value_id later for updates
df_read = td.series("temperature").read(return_value_id=True)

print(f"✓ Read {len(df_read)} data points")
print(f"\nDataFrame shape: {df_read.shape}")
print(f"\nColumns: {list(df_read.columns)}")
print(f"\nFirst few rows:")
df_read.head(10)


In [None]:
# Plot the original time series
plt.figure(figsize=(12, 6))

# Use plain numeric values (dimensionless) for plotting
y_vals = df_read['temperature'].astype(float)
plt.plot(df_read.index, y_vals, marker='o', linewidth=2, markersize=6, 
         label='Original Temperature', color='blue', alpha=0.7)

plt.xlabel('Time', fontsize=12)
plt.ylabel('Temperature (dimensionless)', fontsize=12)
plt.title('Original Time Series - Temperature', fontsize=14, fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

print("✓ Original time series plotted successfully!")

## Part 3: Update the Time Series

Now we'll make an update to change a value, add a tag, and add an annotation. We'll update the value at hour 2 (early in the series) to correct what appears to be an anomalous reading.


In [None]:
# The time point we want to update (hour 2, early in the series for better visibility)
update_time = base_time + timedelta(hours=2)

# Get the original value for reference
original_value = float(df_read.loc[update_time, 'temperature'])
print(f"Original value at {update_time}: {original_value:.2f}")

# Get the value_id for the time point we want to update (from the first read)
value_id = df_read.loc[update_time, 'value_id']
print(f"Value ID for update: {value_id}")

# Create an update that:
# 1. Changes the value to a corrected temperature (e.g., 24.5 instead of the original)
# 2. Adds an annotation explaining the correction
# 3. Adds tags to mark this as reviewed and corrected
corrected_value = 24.5

print(f"\nUpdating to: {corrected_value:.2f}")
print(f"Adding annotation: 'Manual correction: sensor reading was anomalous'")
print(f"Adding tags: ['reviewed', 'corrected']")

# Create the update record using value_id (simplest approach - no need for run_id, tenant_id, etc.)
record_update = {
    "value_id": value_id,
    "value": corrected_value,  # New corrected value
    "annotation": "Manual correction: sensor reading was anomalous",  # Annotation explaining the change
    "tags": ["reviewed", "corrected"],  # Tags for quality tracking
    "changed_by": "analyst@example.com",  # Who made the change
}

# Execute the update using the SDK (handles connection internally)
result = td.series("temperature").update_records(updates=[record_update])

print(f"\n✓ Update completed!")
print(f"  Updated records: {len(result['updated'])}")
print(f"  Skipped (no-op): {len(result['skipped_no_ops'])}")
if result['updated']:
    print(f"  New value ID: {result['updated'][0]['value_id']}")

## Part 4: Read All Versions of the Time Series

Now let's read the time series again, but this time with `all_versions=True` to see both the original version and the updated version.


In [None]:
# Read with all_versions=True to get all versions (original and updated)
# Also include tags_and_annotations=True to see tags and annotations
df_all_versions = td.series("temperature").read(
    start_valid=base_time,
    end_valid=base_time + timedelta(hours=24),
    all_versions=True,
    return_value_id=True,
    tags_and_annotations=True
)

print(f"✓ Read {len(df_all_versions)} data points (including all versions)")
print(f"\nDataFrame shape: {df_all_versions.shape}")
print(f"\nColumns: {list(df_all_versions.columns)}")
if isinstance(df_all_versions.index, pd.MultiIndex):
    print(f"\nIndex: MultiIndex with levels {df_all_versions.index.names}")
else:
    print(f"\nIndex: {df_all_versions.index.name}")
print(f"\nNote: With all_versions=True, we get both current and historical versions.")
print(f"      The DataFrame includes changed_by and change_time columns for audit trail.")
print(f"\nFirst few rows:")
df_all_versions.head(15)

## Part 5: Plot Original vs Updated Time Series

Now let's plot both the original and updated versions to visualize the change.


In [None]:
# Read current version (default behavior - only is_current=true)
df_current = td.series("temperature").read(
    start_valid=base_time,
    end_valid=base_time + timedelta(hours=24),
    all_versions=False  # Only current versions
)

# Create a DataFrame with original values for comparison (from our initial read)
df_original_plot = df_read.copy()

# Convert to numeric arrays for plotting (dimensionless, plain floats)
orig_y = df_original_plot['temperature'].astype(float)
current_y = df_current['temperature'].astype(float)

# Plot both versions
plt.figure(figsize=(14, 7))

# Plot original time series
plt.plot(df_original_plot.index, orig_y, 
         marker='o', linewidth=2, markersize=6, 
         label='Original Time Series', color='blue', alpha=0.6, linestyle='--')

# Plot current (updated) time series
plt.plot(df_current.index, current_y, 
         marker='s', linewidth=2, markersize=6, 
         label='Updated Time Series (Current)', color='red', alpha=0.8)

# Highlight the updated point
if update_time in df_current.index:
    updated_value = float(current_y.loc[update_time])
    original_value_at_update = float(orig_y.loc[update_time])
    
    # Draw a vertical line at the update point
    plt.axvline(x=update_time, color='green', linestyle=':', linewidth=2, alpha=0.7, label='Update Point')
    
    # Annotate the change
    plt.annotate(
        f'Updated: {original_value_at_update:.2f} → {updated_value:.2f}',
        xy=(update_time, updated_value),
        xytext=(10, 20),
        textcoords='offset points',
        bbox=dict(boxstyle='round,pad=0.5', facecolor='yellow', alpha=0.7),
        arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0'),
        fontsize=10,
        fontweight='bold'
    )

plt.xlabel('Time', fontsize=12)
plt.ylabel('Temperature (dimensionless)', fontsize=12)
plt.title('Original vs Updated Time Series - Showing the Change', fontsize=14, fontweight='bold')
plt.legend(loc='best', fontsize=10)
plt.grid(True, alpha=0.3)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

print("✓ Comparison plot created successfully!")
print(f"\nKey observation:")
print(f"  - The original time series (blue dashed line) shows the initial values")
print(f"  - The updated time series (red solid line) shows the corrected value at hour 2")
print(f"  - The green dotted line marks the point where the update was made")

# Detailed view of the updated time point
# Read with all_versions and return_value_id to see both versions
# Include tags_and_annotations to see tags and annotations
df_update_point = td.series("temperature").read(
    start_valid=update_time,
    end_valid=update_time + timedelta(hours=1),
    all_versions=True,
    return_value_id=True,
    tags_and_annotations=True
)

print("Detailed view of the updated time point:")
print("=" * 80)

# Get current and original values
df_current_at_update = td.series("temperature").read(
    start_valid=update_time,
    end_valid=update_time + timedelta(hours=1),
    all_versions=False,
    return_value_id=True
)

if update_time in df_current_at_update.index:
    # Convert to numeric magnitude for printing (dimensionless)
    try:
        current_val = float(df_current_at_update.loc[update_time, 'temperature'])
    except Exception:
        current_val = df_current_at_update.loc[update_time, 'temperature']

    print(f"\nCurrent Version:")
    print(f"  Valid Time: {update_time}")
    print(f"  Value: {current_val:.2f}")
    print(f"  Value ID: {df_current_at_update.loc[update_time, 'value_id']}")
    
    # Get original value from our initial read
    if update_time in df_original_plot.index:
        original_value_at_update = float(df_original_plot.loc[update_time, 'temperature'])
        print(f"\nOriginal Version (from initial data):")
        print(f"  Valid Time: {update_time}")
        print(f"  Value: {original_value_at_update:.2f}")
        print(f"  Note: Original version is marked is_current=false in database")

print("\n" + "=" * 80)
print(f"\nSummary:")
print(f"  - TimeDB maintains both versions in the database")
print(f"  - The original version is marked is_current=false")
print(f"  - The updated version is marked is_current=true")
print(f"  - Both versions are preserved for audit trail and historical analysis")
print(f"  - Using value_id makes updates simple - no need for run_id, tenant_id, etc.")

**Note:** The detailed view showing all versions for a single time point was moved into the plotting cell above for convenience. Use `td.series("<name>").read(all_versions=True, return_value_id=True, tags_and_annotations=True)` to retrieve all versions for a series.

## Part 6: Update Multiple Values and View All Versions

Now let's update multiple values (including the one we already updated) and then display all versions of these values in a pandas DataFrame.


In [None]:
# Read the current time series to get value_ids for multiple updates
df_current = td.series("temperature").read(return_value_id=True)

# Convert temperature column to numeric magnitudes for arithmetic (dimensionless, inline)
current_temp_mag = df_current['temperature'].astype(float)

# Define multiple time points to update (including the one we already updated at hour 2)
update_times = [
    base_time + timedelta(hours=2),  # Already updated, will update again
    base_time + timedelta(hours=3),  # New update
    base_time + timedelta(hours=4),  # New update
]

# Prepare update records for all time points
updates = []
for update_time in update_times:
    if update_time in df_current.index:
        value_id = df_current.loc[update_time, 'value_id']
        original_value = current_temp_mag.loc[update_time]
        
        # Create a new value (slightly different for each)
        new_value = float(original_value) + 2.0  # Add 2 to each value
        
        update_record = {
            "value_id": value_id,
            "value": new_value,
            "annotation": f"Batch correction: adjusted temperature by +2.0",
            "tags": ["batch_corrected", "reviewed"],
            "changed_by": "batch_processor@example.com",
        }
        updates.append(update_record)
        print(f"Prepared update for {update_time}: {original_value:.2f} → {new_value:.2f}")

# Execute all updates
if updates:
    result = td.series("temperature").update_records(updates=updates)
    print(f"\n✓ Batch update completed!")
    print(f"  Updated records: {len(result['updated'])}")
    print(f"  Skipped (no-op): {len(result['skipped_no_ops'])}")
else:
    print("No updates to perform")

In [None]:
# Read all versions of the updated time points
# Use all_versions=True to see all historical versions, including change_time
df_all_versions, mapping = td.series("temperature").read(
    start_valid=base_time + timedelta(hours=1),  # Start from hour 1
    end_valid=base_time + timedelta(hours=6),    # End at hour 6 to show the updated values
    return_mapping=True,
    all_versions=True,
    return_value_id=True,
    tags_and_annotations=True
)

print("All versions of the updated time points:")
print("=" * 100)
print(f"\nDataFrame shape: {df_all_versions.shape}")
print(f"\nColumns: {list(df_all_versions.columns)}")
print(f"\nIndex type: {type(df_all_versions.index)}")
if isinstance(df_all_versions.index, pd.MultiIndex):
    print(f"Index levels: {df_all_versions.index.names}")
print("\n" + "=" * 100)

print(mapping)
# Display the DataFrame
df_all_versions


## Summary

This notebook demonstrated:

1. **Creating and inserting a time series**: Using `td.series("temperature").insert_batch()` to create and store initial time series data

2. **Reading and visualizing time series**: Using `td.series("temperature").read()` with `return_value_id=True` to retrieve data and get value_ids for later updates

3. **Updating time series**: Using `td.series("temperature").update_records()` with dictionary inputs to manually change:
   - **Values**: Correcting erroneous data points
   - **Tags**: Adding quality flags (e.g., "reviewed", "corrected")
   - **Annotations**: Adding explanatory notes about the changes

4. **Reading all versions**: Using `all_versions=True` flag to retrieve both current and historical versions of the data, including:
   - `changed_by`: Who made the change
   - `change_time`: When the change was made
   - `tags` and `annotation`: When `tags_and_annotations=True`

5. **Visualizing changes**: Plotting original vs updated versions to see the differences

6. **Batch updates and version history**: Updating multiple values and viewing complete version history with all metadata columns

**Key Takeaways:**
- TimeDB maintains a complete version history of all changes
- Each update creates a new version while preserving the old version
- The `all_versions=True` flag allows you to access the full audit trail with `changed_by` and `change_time` columns
- The `tags_and_annotations=True` flag includes tags and annotations as DataFrame columns
- When `all_versions=True` and `return_value_id=True`, the DataFrame uses a MultiIndex `(valid_time, value_id)` to preserve multiple versions
- Updates can modify values, annotations, and tags independently or together
- The `is_current` flag indicates which version is the active one
- All versions are preserved for compliance, auditing, and historical analysis
- Using `value_id` makes updates simple - no need for run_id, tenant_id, etc.