# ML with GPT

This notebook is based on code generated from chat GPT. We will divide the notebook to the prompts/queries we give to the openai model. The model provides code snippets, so we may need to add some more code to make them work.

# Prompt 1

Can you give me a snippet of python code that trains a classification model on the publicly available German Credit loan default dataset?

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load the dataset
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data'
columns = ['checking_account', 'duration', 'credit_history', 'purpose', 'credit_amount', 'savings_account', 'employment', 'installment_rate', 'personal_status', 'other_debtors', 'residence_since', 'property', 'age', 'other_installment_plans', 'housing', 'existing_credits', 'job', 'num_dependents', 'phone', 'foreign_worker', 'class']
data = pd.read_csv(url, sep=' ', header=None, names=columns)

# Encode categorical features
categorical_features = ['checking_account', 'credit_history', 'purpose', 'savings_account', 'employment', 'personal_status', 'other_debtors', 'property', 'other_installment_plans', 'housing', 'job', 'phone', 'foreign_worker']
le = LabelEncoder()
for feature in categorical_features:
   data[feature] = le.fit_transform(data[feature])

# Split the data into training and testing sets
X = data.drop('class', axis=1)
y = data['class']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a decision tree classifier
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)

# Make predictions on the testing set and calculate accuracy
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')

Accuracy: 0.73


It correctly split the dataset to train/test, chose Decision Tree Classifier as a model and printed its accuracy on test dataset

# Prompt 2

Can you improve the performance of the model?

In [2]:
# those imports are missing
# ->inserted code from user
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import confusion_matrix
# ->end inserted code

# Create new feature combining duration and installment_rate
data['total_paid'] = data['duration'] * data['installment_rate']

# Split the data into training and testing sets
# ->inserted code from user
X = data.drop('class', axis=1)
y = data['class']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# ->end inserted code

# Scale the numerical features
scaler = StandardScaler()
numerical_features = ['duration', 'credit_amount', 'installment_rate', 'residence_since', 'age', 'existing_credits','num_dependents', 'total_paid']
X_train[numerical_features] = scaler.fit_transform(X_train[numerical_features])
X_test[numerical_features] = scaler.transform(X_test[numerical_features])

# Hyperparameter tuning using GridSearchCV
param_grid = {
    'n_estimators': [50, 100, 200, 500],
    'max_depth': [None, 5, 10, 20]
}
clf = RandomForestClassifier(random_state=42)
grid_search = GridSearchCV(clf, param_grid=param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)
clf = grid_search.best_estimator_

# Make predictions on the testing set and calculate accuracy and confusion matrix
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
conf_mat = confusion_matrix(y_test, y_pred)
print('Accuracy:', accuracy)
print('Confusion matrix:\n', conf_mat)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  isetter(loc, value[:, i].tolist())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pand

Accuracy: 0.79
Confusion matrix:
 [[131  10]
 [ 32  27]]


As you can see above, ChatGPT offered a lot of suggestions of how we could improve model performance, such as hyperparameter tuning, ensemble methods, data preprocessing, addressing class imbalance and using more advanced models.

This shows it has knowledge of the general approach of training a classifier.

# Prompt 3

Without doing a grid search, can you give me a snippet of code that does all the necessary preprocessing steps for this specific dataset and trains a model for classification? Also, can you do the preprocessing and the inference inside of a pipeline?

In [3]:
# ->inserted code from user
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
# ->inserted code

categorical_transformer = OneHotEncoder(handle_unknown='ignore')
numerical_transformer = StandardScaler()
preprocessor = ColumnTransformer(
   transformers=[
       ('cat', categorical_transformer, categorical_features),
       ('num', numerical_transformer, numerical_features)
   ])

# Train the classifier using a pipeline
clf = Pipeline(steps=[
   ('preprocessor', preprocessor),
   ('classifier', RandomForestClassifier(random_state=42))
])

clf.fit(X_train, y_train)

Pipeline(steps=[('preprocessor',
                 ColumnTransformer(transformers=[('cat',
                                                  OneHotEncoder(handle_unknown='ignore'),
                                                  ['checking_account',
                                                   'credit_history', 'purpose',
                                                   'savings_account',
                                                   'employment',
                                                   'personal_status',
                                                   'other_debtors', 'property',
                                                   'other_installment_plans',
                                                   'housing', 'job', 'phone',
                                                   'foreign_worker']),
                                                 ('num', StandardScaler(),
                                                  ['duration', 'credit_amount',
  

In [4]:
# ->inserted code from user
from sklearn.metrics import classification_report
y_pred = clf.predict(X_test)
a = classification_report(y_test, y_pred, output_dict=True)
print('Class 1')
print(a['1'])
print('\n Class 2')
print(a['2'])
print('\nAccuracy')
print(a['accuracy'])
print('\nMacro average')
print(a['macro avg'])
print('\nWeighted average')
print(a['weighted avg'])
# ->end inserted code

Class 1
{'precision': 0.7784431137724551, 'recall': 0.9219858156028369, 'f1-score': 0.8441558441558443, 'support': 141}

 Class 2
{'precision': 0.6666666666666666, 'recall': 0.3728813559322034, 'f1-score': 0.47826086956521735, 'support': 59}

Accuracy
0.76

Macro average
{'precision': 0.7225548902195609, 'recall': 0.6474335857675202, 'f1-score': 0.6612083568605308, 'support': 200}

Weighted average
{'precision': 0.7454690618762476, 'recall': 0.76, 'f1-score': 0.7362168266516094, 'support': 200}


Noticing there is a lower performance on the positive class by comparing the f1-score for each of the predicted classes, we tried giving the chatbot the task of increasing the performance for that class to see what approach it will take.

# Prompt 4

The performance for the positive class is lower than the performance for the negative class. How can the code be modified to mitigate that?

In [5]:
clf = Pipeline(steps=[
   ('preprocessor', preprocessor),
   ('classifier', RandomForestClassifier(random_state=42, class_weight='balanced'))
])

# ->inserted code
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
a = classification_report(y_test, y_pred, output_dict=True)
print('Class 1')
print(a['1'])
print('\n Class 2')
print(a['2'])
print('\nAccuracy')
print(a['accuracy'])
print('\nMacro average')
print(a['macro avg'])
print('\nWeighted average')
print(a['weighted avg'])
# ->end inserted code

Class 1
{'precision': 0.7976190476190477, 'recall': 0.950354609929078, 'f1-score': 0.8673139158576053, 'support': 141}

 Class 2
{'precision': 0.78125, 'recall': 0.423728813559322, 'f1-score': 0.5494505494505494, 'support': 59}

Accuracy
0.795

Macro average
{'precision': 0.7894345238095238, 'recall': 0.6870417117442, 'f1-score': 0.7083822326540774, 'support': 200}

Weighted average
{'precision': 0.7927901785714286, 'recall': 0.795, 'f1-score': 0.7735442227675239, 'support': 200}


For this task, the only change to the previously provided code was to set the RandomForestClassifier parameter class_weight to “balanced” to possibly give a higher weight to the positive class, if the number of samples in the training set corresponding to each class was highly imbalanced.

Even though this is a good strategy when dealing with an imbalanced dataset like the one we used, the performance of the resulting model for the positive class actually decreased if we compare the f1-score.

Hence, we tried asking for other ways of increasing performance without any oversampling techniques.

# Prompt 5

The performance difference between the two classes is still high. What else can be done to solve that other than oversampling?

In [6]:
# inserted code
import numpy as np
# ->end inserted code

y_pred_prob = clf.predict_proba(X_test)[:, 1]
y_pred = (y_pred_prob >= 0.4).astype(int) # Adjust threshold to 0.4

# ->inserted code
# y pred has 0 and 1 but our classes are 1 and 2
# if y == 0 then 1 else 2
y_pred = np.where(y_pred == 0, 1, 2)
a = classification_report(y_test, y_pred, output_dict=True)
print('Class 1')
print(a['1'])
print('\n Class 2')
print(a['2'])
print('\nAccuracy')
print(a['accuracy'])
print('\nMacro average')
print(a['macro avg'])
print('\nWeighted average')
print(a['weighted avg'])
# ->end inserted code

Class 1
{'precision': 0.841726618705036, 'recall': 0.8297872340425532, 'f1-score': 0.8357142857142857, 'support': 141}

 Class 2
{'precision': 0.6065573770491803, 'recall': 0.6271186440677966, 'f1-score': 0.6166666666666666, 'support': 59}

Accuracy
0.77

Macro average
{'precision': 0.7241419978771082, 'recall': 0.7284529390551748, 'f1-score': 0.7261904761904762, 'support': 200}

Weighted average
{'precision': 0.7723516924165585, 'recall': 0.77, 'f1-score': 0.7710952380952381, 'support': 200}


The code was modified so that after training, the predicted probabilities given by the model for the test set are used together with a threshold of 0.4 to obtain new predictions for the data. The response also contained a correct explanation of how the value of the threshold can provide a trade-off between precision and recall.

Although correct, we wanted the threshold adjustment to be set inside of the prediction pipeline, and this is where the ChatGPT began having problems when trying to provide code that would run without errors and achieve the desired outcome.

# Prompt 6 - One that fails!

Can you modify the code so that the threshold adjustment is inside of the pipeline?

In [7]:
from sklearn.base import BaseEstimator, TransformerMixin

class ThresholdAdjuster(BaseEstimator, TransformerMixin):
   def __init__(self, threshold):
       self.threshold = threshold

   def fit(self, X, y):
       return self

   def transform(self, X):
       y_pred_prob = self.clf.predict_proba(X)[:, 1]
       y_pred = (y_pred_prob >= self.threshold).astype(int)
       return y_pred.reshape(-1, 1)
    
clf = Pipeline(steps=[
   ('preprocessor', preprocessor),
   ('classifier', RandomForestClassifier(random_state=42, class_weight='balanced')),
   ('threshold_adjuster', ThresholdAdjuster(threshold=0.4))
])

clf.fit(X_train, y_train)

TypeError: All intermediate steps should be transformers and implement fit and transform or be the string 'passthrough' 'RandomForestClassifier(class_weight='balanced', random_state=42)' (type <class 'sklearn.ensemble._forest.RandomForestClassifier'>) doesn't

This time around, an instance of a custom class was added in the pipeline after the RandomForestClassifier, but this code could not be run without error as the RandomForestClassifier doesn’t have the necessary fit and transform methods to be able to add any other components after it in the pipeline.

However, it’s worth mentioning that although the provided code had an error, it’s still pretty impressive that it knew how to properly define a custom class by extending scikit-learn classes and overriding some of their methods in the attempt to achieve the desired result.

After several tries to obtain a different output from ChatGPT by regenerating the response, this error would still persist, so we tried to explain what the problem was in the hope of getting a working solution.

# Prompt 7 -- Another fail!

The ThresholdAdjuster can’t be added in the pipeline after the RandomForestClassifier because the RandomForestClassifier does not have the methods fit and transform.

In [8]:
# Classification pipeline with threshold adjustment
class ThresholdAdjuster(BaseEstimator, TransformerMixin):
   def __init__(self, threshold):
       self.threshold = threshold

   def fit(self, X, y=None):
       return self

   def predict(self, X):
       y_pred_prob = self.clf.predict_proba(X)[:, 1]
       y_pred = (y_pred_prob >= self.threshold).astype(int)
       return y_pred

clf = Pipeline(steps=[
   ('preprocessor', preprocessor),
   ('classifier', RandomForestClassifier(random_state=42, class_weight='balanced')),
   ('threshold_adjuster', ThresholdAdjuster(threshold=0.4))
])

# Train the classifier
clf.fit(X_train, y_train)

TypeError: All intermediate steps should be transformers and implement fit and transform or be the string 'passthrough' 'RandomForestClassifier(class_weight='balanced', random_state=42)' (type <class 'sklearn.ensemble._forest.RandomForestClassifier'>) doesn't

After acknowledging the mistake, the response continued by specifying that the ThresholdAdjuster will still be added in the pipeline after the RandomForestClassifier.

This time, the only difference between the initial code and the current one was renaming the transform method to predict, but the provided code still produced the same error as before.

Instead of just explaining why the code fails, we tried adding the returned error to see if the bot would get a better understanding of the problem and how to fix it.

Did it do better this time around?

# Prompt 8

When I tried running the code I got the following error: TypeError: “All intermediate steps should be transformers and implement fit and transform or be the string ‘passthrough’ ‘RandomForestClassifier(class_weight=‘balanced’, random_state=42)’ (type <class ‘sklearn.ensemble._forest.RandomForestClassifier’>) doesn’t”. How can the code be modified to fix this?

In [9]:
from sklearn.preprocessing import FunctionTransformer

clf = Pipeline(steps=[
   ('preprocessor', preprocessor),
   ('classifier', FunctionTransformer(RandomForestClassifier(random_state=42, 
    class_weight='balanced').predict)),
   ('threshold_adjuster', ThresholdAdjuster(threshold=0.4))
])

clf.fit(X_train, y_train)

NotFittedError: This RandomForestClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

This time around, the generated code produced a different error: “NotFittedError: This RandomForestClassifier instance is not fitted yet. Call ‘fit’ with appropriate arguments before using this estimator.".

This was due to the replacing of the RandomForestClassifier in the pipeline with a FunctionTransformer that uses the predict method of an instance of RandomForestClassifier that was not fitted yet.

Once again, we gave the chatbot the returned error to see if it could find a working solution.

# Prompt 9 - Finally, it works!

When I run the code now I get a different error: “NotFittedError: This RandomForestClassifier instance is not fitted yet. Call ‘fit’ with appropriate arguments before using this estimator.". How can the code be modified to fix this?

In [10]:
# Custom estimator for RandomForestClassifier with predict_proba method
class CustomRandomForestClassifier(RandomForestClassifier):
   def predict_proba(self, X):
       return super().predict_proba(X)[:, 1]

# Classification pipeline with threshold adjustment
class ThresholdAdjuster(BaseEstimator, TransformerMixin):
   def __init__(self, threshold):
       self.threshold = threshold

   def fit(self, X, y=None):
       self.clf = CustomRandomForestClassifier(random_state=42, class_weight='balanced')
       self.clf.fit(X, y)
       return self

   def predict(self, X):
       y_pred_prob = self.clf.predict_proba(X)
       y_pred = (y_pred_prob >= self.threshold).astype(int)
       return y_pred

clf = Pipeline(steps=[
   ('preprocessor', preprocessor),
   ('classifier', ThresholdAdjuster(threshold=0.4))
])

clf.fit(X_train, y_train)

Pipeline(steps=[('preprocessor',
                 ColumnTransformer(transformers=[('cat',
                                                  OneHotEncoder(handle_unknown='ignore'),
                                                  ['checking_account',
                                                   'credit_history', 'purpose',
                                                   'savings_account',
                                                   'employment',
                                                   'personal_status',
                                                   'other_debtors', 'property',
                                                   'other_installment_plans',
                                                   'housing', 'job', 'phone',
                                                   'foreign_worker']),
                                                 ('num', StandardScaler(),
                                                  ['duration', 'credit_amount',
  

In [11]:
# ->inserted code
y_pred = clf.predict(X_test)
# y pred has 0 and 1 but our classes are 1 and 2
# if y == 0 then 1 else 2
y_pred = np.where(y_pred == 0, 1, 2)
a = classification_report(y_test, y_pred, output_dict=True)
print('Class 1')
print(a['1'])
print('\n Class 2')
print(a['2'])
print('\nAccuracy')
print(a['accuracy'])
print('\nMacro average')
print(a['macro avg'])
print('\nWeighted average')
print(a['weighted avg'])
# ->end inserted code

Class 1
{'precision': 0.841726618705036, 'recall': 0.8297872340425532, 'f1-score': 0.8357142857142857, 'support': 141}

 Class 2
{'precision': 0.6065573770491803, 'recall': 0.6271186440677966, 'f1-score': 0.6166666666666666, 'support': 59}

Accuracy
0.77

Macro average
{'precision': 0.7241419978771082, 'recall': 0.7284529390551748, 'f1-score': 0.7261904761904762, 'support': 200}

Weighted average
{'precision': 0.7723516924165585, 'recall': 0.77, 'f1-score': 0.7710952380952381, 'support': 200}
