**Model** :
Based on the research question, we are interested in understanding the factors that contribute to the perception and evaluation of weather-based energy management strategies. This is a multivariate problem and can be approached using a variety of machine learning models also for better understanding after comparing results from differents models.

We will use a decision tree classifier. This model can handle both categorical and numerical data, and it can provide insights into which features are most important in predicting the target variable. 

In [10]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report
from sklearn.preprocessing import LabelEncoder

# Load data
df = pd.read_csv('survey_data.csv')

In [11]:
# Convert categorical variables to numerical
le = LabelEncoder()
for column in df.columns:
    if df[column].dtype == type(object):
        df[column] = le.fit_transform(df[column])

In [12]:
# Define the features and target variable
X = df.drop('weather-based_energy_management_strategies_adoption', axis=1)
y = df['weather-based_energy_management_strategies_adoption']

In [13]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [14]:
# Initialize and fit the model
model = DecisionTreeClassifier()
model.fit(X_train, y_train)

In [15]:
# Make predictions and evaluate model
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.40      0.40      0.40         5
           1       0.00      0.00      0.00         2
           2       0.00      0.00      0.00         1
           5       0.00      0.00      0.00         1
           6       0.00      0.00      0.00         1
           7       0.00      0.00      0.00         2
           9       0.00      0.00      0.00         0
          11       0.00      0.00      0.00         0
          12       0.43      0.75      0.55         4
          15       0.50      0.25      0.33         8
          16       0.00      0.00      0.00         1
          17       0.00      0.00      0.00         1
          22       0.00      0.00      0.00         1
          23       0.00      0.00      0.00         2
          24       0.00      0.00      0.00         0

    accuracy                           0.24        29
   macro avg       0.09      0.09      0.09        29
weighted avg       0.27   

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


The code above train a decision tree classifier on survey_data.csv  and evaluate its performance using precision, recall, and F1 score. But as seen above the results are not very good lets try different ML algorithms.

1 - Random Forest

In [16]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier

# Random Forest
rf = RandomForestClassifier()
rf.fit(X_train, y_train)
y_pred_rf = rf.predict(X_test)
print(classification_report(y_test, y_pred_rf))

              precision    recall  f1-score   support

           0       0.29      0.40      0.33         5
           1       0.00      0.00      0.00         2
           2       0.00      0.00      0.00         1
           5       0.00      0.00      0.00         1
           6       0.00      0.00      0.00         1
           7       0.00      0.00      0.00         2
           9       0.00      0.00      0.00         0
          12       0.22      0.50      0.31         4
          15       0.43      0.38      0.40         8
          16       0.00      0.00      0.00         1
          17       0.00      0.00      0.00         1
          22       0.00      0.00      0.00         1
          23       0.00      0.00      0.00         2
          24       0.00      0.00      0.00         0

    accuracy                           0.24        29
   macro avg       0.07      0.09      0.07        29
weighted avg       0.20      0.24      0.21        29



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


2 - SVM

In [17]:
# SVM
svm = SVC()
svm.fit(X_train, y_train)
y_pred_svm = svm.predict(X_test)
print(classification_report(y_test, y_pred_svm))

              precision    recall  f1-score   support

           0       0.00      0.00      0.00         5
           1       0.00      0.00      0.00         2
           2       0.00      0.00      0.00         1
           5       0.00      0.00      0.00         1
           6       0.00      0.00      0.00         1
           7       0.00      0.00      0.00         2
          12       0.00      0.00      0.00         4
          15       0.28      1.00      0.43         8
          16       0.00      0.00      0.00         1
          17       0.00      0.00      0.00         1
          22       0.00      0.00      0.00         1
          23       0.00      0.00      0.00         2

    accuracy                           0.28        29
   macro avg       0.02      0.08      0.04        29
weighted avg       0.08      0.28      0.12        29



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


3 - KNN

In [18]:
# KNN
knn = KNeighborsClassifier()
knn.fit(X_train, y_train)
y_pred_knn = knn.predict(X_test)
print(classification_report(y_test, y_pred_knn))

              precision    recall  f1-score   support

           0       0.00      0.00      0.00         5
           1       0.00      0.00      0.00         2
           2       0.00      0.00      0.00         1
           5       0.00      0.00      0.00         1
           6       0.00      0.00      0.00         1
           7       0.00      0.00      0.00         2
           9       0.00      0.00      0.00         0
          12       0.17      0.25      0.20         4
          15       0.40      0.25      0.31         8
          16       0.00      0.00      0.00         1
          17       0.00      0.00      0.00         1
          22       0.00      0.00      0.00         1
          23       0.00      0.00      0.00         2

    accuracy                           0.10        29
   macro avg       0.04      0.04      0.04        29
weighted avg       0.13      0.10      0.11        29



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


The models using differents algorith did not improve so lets use **Hyperparameter Tuning**

In [22]:
from sklearn.model_selection import GridSearchCV

parameters = {'n_estimators': [100, 200, 300, 400, 500], 'max_depth': [None, 10, 20, 30, 40, 50]}
grid_search = GridSearchCV(estimator = RandomForestClassifier(), param_grid = parameters, cv = 5)
grid_search.fit(X_train, y_train)
best_parameters = grid_search.best_params_
print("Best Parameters:", best_parameters)



Best Parameters: {'max_depth': 50, 'n_estimators': 200}


**Handling Imbalanced Data**

In [23]:
from imblearn.over_sampling import RandomOverSampler

ros = RandomOverSampler()
X_train_ros, y_train_ros = ros.fit_resample(X_train, y_train)
model.fit(X_train_ros, y_train_ros)

In [24]:
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()
model.fit(X_train_ros, y_train_ros)

In [25]:
# Make predictions
y_pred = model.predict(X_test)

In [26]:
# Import necessary metrics from sklearn
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

In [27]:
# Print classification report
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.00      0.00      0.00         5
           1       0.00      0.00      0.00         2
           2       0.00      0.00      0.00         1
           3       0.00      0.00      0.00         0
           5       0.00      0.00      0.00         1
           6       0.00      0.00      0.00         1
           7       0.00      0.00      0.00         2
          12       0.12      0.25      0.17         4
          15       0.67      0.50      0.57         8
          16       0.00      0.00      0.00         1
          17       0.00      0.00      0.00         1
          18       0.00      0.00      0.00         0
          19       0.00      0.00      0.00         0
          22       0.00      0.00      0.00         1
          23       0.00      0.00      0.00         2
          24       0.00      0.00      0.00         0

    accuracy                           0.17        29
   macro avg       0.05   

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [28]:
# Print confusion matrix
print(confusion_matrix(y_test, y_pred))

[[0 1 0 0 1 0 0 2 0 1 0 0 0 0 0 0]
 [1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0]
 [1 0 0 0 0 0 0 3 4 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0]
 [0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0]
 [0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]


In [29]:
# Print accuracy score
print("Accuracy: ", accuracy_score(y_test, y_pred))

Accuracy:  0.1724137931034483


The above gives the classification report, confusion matrix, and accuracy score of the model's predictions. These metrics will helps understand how well the model is performing. In this case, the model's accuracy is approximately 17.24%, which is quite low. This suggests that the model is not performing well on the given data.

The model that is not complex enough to capture the patterns in the data, or perhaps the factors considered in the model do not have a strong influence on the attitudes towards energy management.