# Synthetic dataset for thermal control unit
This notebook generates a synthetic dataset that models the maximum temperature of a control unit as a function of ambient temperature, power load, and cooling type. Use this dataset to train and evaluate machine learning models.

## 1. Install dependencies
Run the cell below to ensure required packages (`pandas`, `numpy`) are installed. Restart the kernel if prompted.

In [27]:
%pip install --upgrade pip
%pip install pandas numpy

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


## 2. Import required libraries
Import core libraries used in the data-generation pipeline (pandas and numpy).

In [28]:
import pandas as pd
import numpy as np
print('pandas', pd.__version__)
print('numpy', np.__version__)

pandas 2.3.3
numpy 2.4.0


## 3. Reproducibility
Set a fixed random seed so dataset generation is deterministic and results are reproducible across runs.

In [29]:
np.random.seed(42)

## 4. Data generation parameters
Configure the number of samples and the ranges used to synthesize `AmbientTemp`, `PowerLoad`, and `CoolingType` (0=Passive, 1=Fan, 2=High-Conductivity).

In [30]:
n_samples = 500
ambient_temp = np.random.uniform(15, 45, n_samples)  # °C
power_load = np.random.uniform(5, 50, n_samples)     # W
cooling_type = np.random.choice([0,1,2], n_samples)  # 0=Passive,1=Fan,2=High-conduction

## 5. Thermal model (data generation)
We use a simplified linear model to synthesize the target variable:

T_max = T_ambient + 0.5 * P_load - 5 * C_eff + ε

- C_eff: cooling efficiency index (0 = Passive, 1 = Fan, 2 = High-Conductivity).
- ε: random noise sampled from a normal distribution (σ ≈ 1.5) to simulate measurement/variation.

This model captures the basic physics: higher power load increases max temperature, and better cooling reduces it.

In [31]:
max_temp = ambient_temp + power_load * 0.5 - cooling_type * 5 + np.random.normal(0, 1.5, n_samples)

## 6. Build DataFrame
Assemble the generated arrays into a pandas `DataFrame` with these columns: `AmbientTemp`, `PowerLoad`, `CoolingType`, `MaxTemp`.

In [32]:
df = pd.DataFrame({
    'AmbientTemp': ambient_temp,
    'PowerLoad': power_load,
    'CoolingType': cooling_type,
    'MaxTemp': max_temp
})

## 7. Save dataset
Write the `DataFrame` to `thermal_dataset.csv` so it can be reused by other notebooks and scripts. Verify the saved file before using it for training.

In [33]:
df.to_csv("thermal_dataset.csv", index=False)
print("Synthetic dataset saved as thermal_dataset.csv")

Synthetic dataset saved as thermal_dataset.csv
