In this script, IForest and XGBOD models from PyOD library are fitted on the scaled data. Feature importance scores are then obtained using the feature_importances_ attribute of each model. The final importance scores are calculated as the average of these two sets of scores. The indices of the top N features are then used to select the corresponding features from the original dataframe.

In [None]:
from pyod.models.iforest import IForest
from pyod.models.xgbod import XGBOD
from sklearn.preprocessing import StandardScaler
import pandas as pd

# Scaling the data
scaler = StandardScaler()
df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)

# Fit the Isolation Forest model
iso = IForest()
iso.fit(df_scaled)

# Get feature importances from Isolation Forest
iso_importances = iso.estimator_.feature_importances_

# Fit the XGBOD model
xgb = XGBOD()
xgb.fit(df_scaled)

# Get feature importances from XGBOD
xgb_importances = xgb.estimators_[0].feature_importances_

# Combine importances
average_importances = (iso_importances + xgb_importances) / 2

# Define the number of top features to select
N = 10

# Get the top N features
top_N_features = df.columns[average_importances.argsort()[-N:]]

# Select these features from the original dataframe
df_reduced = df[top_N_features]
