# BiVital time_index handling Example
This notebook demonstrates how to use the BiVital library for processing data.

## Import the processing lib from bivtial and set up logging 

In [None]:
from bivital import processing as bvl
from tkinter import filedialog, Tk
import logging

#Set up logging
logging.basicConfig(encoding='utf-8', level=logging.INFO)

## Choose the project directory for loading

In [None]:
#Choose the example project data or choose your own directory
# 0 = example data
# 1 = add your path 
# else = Choose Directory --> This will open a file dialog for you to choose the directory, check the opened windows in the background

Data_to_use = 0

if Data_to_use == 0:
    project_directory = bvl.path_to_example_data()
else:
    project_directory = "Path/to/your/BiVital/Project"  # Replace with the path to your data
    # macOS/Linux example: "/home/you/Documents/BI_Vital_Projects/Bi_Vital_Tutorial"
    # Windows example:     "C:/Users/you/Documents/BI_Vital_Projects/Bi_Vital_Tutorial"

## Load the data

In [None]:
project_data = bvl.ProjectData(project_directory)

## Creating a Uniform Time Index

There are different ways to create a suitable time index.

1. A time index can be created directly by specifying a custom start time, end time, and update interval, as shown [here](#Step-3:-create-the-new-time-index).

2. The time index can also be generated dynamically based on the data, using functions demonstrated in the following cells. In this case, the time values from the project are used.


#### Step 1: Get the Time Range of the Data

The `.time_range()` function has one optional parameter for selecting series:

- `.time_range()` – returns the time ranges for all series in the project
- `.time_range([X])` – returns the time range only for the specified series `X`
    - `X` can be a single value or a list of `str` or `int` (series names or indices)

This function returns four key time values for each selected series:

- `overall_start_time` – the timestamp of the very first data point in the entire series
- `overall_end_time` – the timestamp of the very last data point in the entire series
- `valid_start_time` – the latest start time among all devices (e.g., BiVital units) in the series
- `valid_end_time` – the earliest end time among all devices in the series

In addition to the times, the dataset names from which the times were taken are also included:

- `overall_start_dataset`
- `overall_end_dataset`
- `valid_start_dataset`
- `valid_end_dataset`


In [None]:
ranges = project_data.time_range()
print(ranges)

#### Step 2: Get the Smallest Update Rates of the Data

This function returns the smallest update rate for each configuration (`config`) found in the selected series. It accepts the same parameter options as `.time_range()`:

- `.smallest_update_rates()` – returns the update rates for all series in the project  
- `.smallest_update_rates([X])` – returns the update rates only for the specified series `X`  
    - `X` can be a single value or a list of `str` or `int` (series names or indices)

The result is a dictionary that maps each `config_label` to the smallest update rate (in ms) observed in the corresponding dataset.


In [None]:
rates = project_data.smallest_update_rates()
print(rates)

#### Step 3: Create the New Time Index

After collecting information about the time ranges in the project, it's easy to create a uniform time index for the series based on those values.

However, it is also possible to define your own time index without relying on the project's internal time data.

The function `.time_index.generate_uniform()` takes three parameters:

- `start_time` – a `datetime.time` value that defines the start of the time index  
- `end_time` – a `datetime.time` value that defines the end of the time index  
- `update_interval` – an `int` value representing the step size in milliseconds


In [None]:
start_time = ranges["Workout_series"]['valid_start_time']
end_time = ranges["Workout_series"]['valid_end_time']
update_interval = rates["Workout_series"]['config_B']
new_time_index = project_data.time_index.generate_uniform(start_time, end_time, update_interval)
n = 5
print(new_time_index[:n], '...', new_time_index[-n:])

## Integrate the New Time Index into the Project Data

The function `.time_index.apply()` can be used for this purpose.  
It has one required parameter and two optional ones:

- `new_time_index` *(required)* – the new time index generated by `.generate_uniform()`  
- `inplace` *(optional)* – if `True`, the data inside the current project will be replaced.  
  If `False` (default), a new `ProjectData` instance is returned.  
- `region` *(optional)* – specifies which part of the data should be processed.  
  - `[]` – the whole project (default)  
  - `[series]` – all datasets within a given series  
  - `[series, dataset]` – a specific dataset within a series


In [None]:
new_data = project_data.time_index.apply(new_time_index, inplace = False)

## Interpolate the Data Before Filtering

Once the `new_time_index` has been applied to the selected data,  
the next step is to interpolate the values.  

This ensures that no information is lost when rows are later removed during filtering.

For this, use the `.interpolate()` and `label_fill()` function,  
which is explained in more detail in the `data_and_label_interpolate_example`.

In [None]:
new_data.interpolate(method='linear', inplace=True)
new_data.label_fill(method='ffill', inplace=True)

## Filter the Data

To remove rows that are not part of the new time index, use `.time_index.filter()`.

This function takes the same parameters as `.time_index.apply()`.


In [None]:
new_data.time_index.filter(new_time_index, inplace = True)

### Visualize the result

In [None]:
from IPython.display import display_html

def display_side_by_side(df1, df2, start_row=20, num_rows=50, names=['Original', 'Interpolated'], round_digits=3):
    # Get the slice of data from start_row to start_row + num_rows
    df1_slice = df1.iloc[start_row:start_row + num_rows]
    df2_slice = df2.iloc[start_row:start_row + num_rows]
    
    # Round numeric columns for display
    df1_slice = df1_slice.round(round_digits)
    df2_slice = df2_slice.round(round_digits)
    
    html_str = f'''<div style="display:flex">
                   <div style="flex:1">
                     <h3>{names[0]} (Rows {start_row}-{start_row + num_rows})</h3>
                     {df1_slice.to_html()}
                   </div>
                   <div style="flex:1">
                     <h3>{names[1]} (Rows {start_row}-{start_row + num_rows})</h3>
                     {df2_slice.to_html()}
                   </div>
                   </div>'''
    display_html(html_str, raw=True)

# Compare rows 200–250 of original and new data, rounded to 3 decimals
display_side_by_side(project_data[0,0], new_data[0,0], start_row=0, num_rows=25, round_digits=3)
