# 02 - Data Preparation and Transformation

This notebook processes the raw battery data retrieved from InfluxDB by transforming it from a long format to a wide format (pivot table). This makes the data more suitable for machine learning model training.


## Load and Transform Raw Data

Load the raw battery data from CSV and pivot it so that each field becomes a separate column, making it easier to work with for analysis and modeling.


In [1]:
import pandas as pd

df = pd.read_csv("./data/battery_raw.csv")
df_pivot = df.pivot(index=["_time", "batteryId"], columns="_field", values="_value").reset_index()
df_pivot.rename(columns={"_time": "timestamp"}, inplace=True)

## Save Prepared Data

Save the transformed data to a new CSV file that will be used for model training.


In [2]:
df_pivot.to_csv("./data/battery_data.csv", index=False)

## Preview Prepared Data

Display the structure and sample of the prepared data to verify the transformation was successful.


In [5]:
print(f"Prepared dataset shape: {df_pivot.shape}")
df_pivot.head(10)

Prepared dataset shape: (1200, 11)


_field,timestamp,batteryId,ambientTemp,batteryCurrent,batteryTemp,batteryVoltage,currentLoad,distance,kmh,stateOfCharge,stateOfHealth
0,2025-11-20 19:50:01.424000+00:00,1,28.09,38.54,25.15,36.62,100.0,51.73,23.14,0.2261,99.52
1,2025-11-20 19:50:04.424000+00:00,1,28.02,44.57,25.17,36.25,100.0,51.75,25.2,0.2257,99.5199
2,2025-11-20 19:50:07.424000+00:00,1,27.8,45.03,25.18,36.22,100.0,51.77,25.2,0.2254,99.5198
3,2025-11-20 19:50:10.425000+00:00,1,28.27,45.06,25.19,36.22,100.0,51.79,25.2,0.2251,99.5197
4,2025-11-20 19:50:13.424000+00:00,1,28.65,32.61,25.18,36.99,100.0,51.81,21.33,0.2248,99.5196
5,2025-11-20 19:50:16.424000+00:00,1,28.64,37.94,25.18,36.66,100.0,51.83,23.18,0.2245,99.5195
6,2025-11-20 19:50:19.424000+00:00,1,28.95,30.73,25.16,37.1,100.0,51.85,20.29,0.2243,99.5194
7,2025-11-20 19:50:22.425000+00:00,1,29.32,43.99,25.17,36.29,100.0,51.87,25.2,0.2239,99.5193
8,2025-11-20 19:50:25.424000+00:00,1,29.58,44.98,25.19,36.22,100.0,51.89,25.2,0.2236,99.5192
9,2025-11-20 19:50:28.424000+00:00,1,29.81,45.06,25.2,36.22,100.0,51.91,25.2,0.2232,99.5191
