# 02 - Data Preparation and Transformation

This notebook processes the raw battery data retrieved from InfluxDB by transforming it from a long format to a wide format (pivot table). This makes the data more suitable for machine learning model training.


## Load and Transform Raw Data

Load the raw battery data from CSV and pivot it so that each field becomes a separate column, making it easier to work with for analysis and modeling.


In [1]:
import pandas as pd

df = pd.read_csv("../data/battery_raw.csv")
df_pivot = df.pivot(index=["_time", "batteryId"], columns="_field", values="_value").reset_index()
df_pivot.rename(columns={"_time": "timestamp"}, inplace=True)

## Save Prepared Data

Save the transformed data to a new CSV file that will be used for model training.


In [2]:
df_pivot.to_csv("../data/battery_data.csv", index=False)

## Preview Prepared Data

Display the structure and sample of the prepared data to verify the transformation was successful.


In [3]:
print(f"Prepared dataset shape: {df_pivot.shape}")
df_pivot.head(10)

Prepared dataset shape: (443, 11)


_field,timestamp,batteryId,ambientTemp,batteryCurrent,batteryTemp,batteryVoltage,currentLoad,distance,kmh,stateOfCharge,stateOfHealth
0,2026-01-04 19:52:28.338000+00:00,1,19.47,5.2,25.01,47.76,100.0,0.04,3.68,0.9994,99.9992
1,2026-01-04 19:52:31.344000+00:00,1,19.37,9.69,25.01,47.55,100.0,0.05,7.61,0.9993,99.9991
2,2026-01-04 19:52:34.334000+00:00,1,18.98,12.32,25.01,47.43,100.0,0.06,10.34,0.9992,99.999
3,2026-01-04 19:52:37.337000+00:00,1,18.68,6.02,25.01,47.72,100.0,0.06,4.62,0.9991,99.9989
4,2026-01-04 19:52:40.339000+00:00,1,18.74,1.01,25.01,47.95,100.0,0.06,0.0,0.9991,99.9988
5,2026-01-04 19:52:43.339000+00:00,1,18.76,1.0,25.0,47.95,100.0,0.06,0.0,0.9991,99.9987
6,2026-01-04 19:52:46.341000+00:00,1,19.09,1.96,25.0,47.91,100.0,0.06,0.83,0.9991,99.9986
7,2026-01-04 19:52:49.345000+00:00,1,19.33,1.0,25.0,47.95,100.0,0.06,0.0,0.999,99.9985
8,2026-01-04 19:52:52.335000+00:00,1,18.96,2.68,25.0,47.88,100.0,0.06,1.44,0.999,99.9984
9,2026-01-04 19:52:55.335000+00:00,1,19.19,9.55,25.0,47.56,100.0,0.07,7.53,0.9989,99.9983
