# 4.2 Transfer Learning

This notebook reproduces the example from Section 4.2 of the paper 'Linux Kernel Configurations at Scale: A Dataset for Performance and Evolution Analysis' (EASE 2025). It trains a linear regression model on TuxKConfig data from versions 5.4 and 5.7 to predict binary size (in MB) for version 5.8.

## Steps:
1. **Load Datasets**: Fetch versions 5.4 (ID: 46742), 5.7 (ID: 46743), and 5.8 (ID: 46744) from OpenML.
2. **Combine Source Data**: Merge 5.4 and 5.7 as training data.
3. **Align Features**: Ensure feature consistency between source and target.
4. **Train and Predict**: Fit the model and evaluate with MAE.



In [None]:
import openml
from sklearn.linear_model import LinearRegression
import pandas as pd
import numpy as np

# Step 1: Load datasets
source_504 = openml.datasets.get_dataset(46742)
source_507 = openml.datasets.get_dataset(46743)
target_508 = openml.datasets.get_dataset(46744)
X_504, y_504 = source_504.get_data(target='Binary_Size')
X_507, y_507 = source_507.get_data(target='Binary_Size')
X_target, y_target = target_508.get_data(target='Binary_Size')

# Step 2: Combine source data (5.4 and 5.7)
X_source = pd.concat([X_504, X_507], axis=0)
y_source = pd.concat([y_504, y_507], axis=0)

# Step 3: Align features
common_features = X_source.columns.intersection(X_target.columns)
X_source = X_source[common_features]
X_target = X_target[common_features]

# Step 4: Train and predict
model = LinearRegression()
model.fit(X_source, y_source)
predictions = model.predict(X_target)
mae = np.mean(np.abs(predictions - y_target))
print(f'MAE: {mae:.2f} MB')