# Assingment 1 - Predictive Modeling on Tabular Data 

   In this report, we presented our approach, pipelines that we used and results of the predictive analysis. We started with explanatory analysis, we checked the distribution of the data, and we detected outliers if there are any. Then we conducted preprocessing, such as imputing missing values, feature scaling, indicating categorical features, and then we conducted various machine learning algorithms. 
   
  First, we imported necessary libraries. 

In [5]:
import tensorflow as tf 
from tensorflow import keras
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import skew
import numpy as np
from sklearn import metrics
from scipy.stats import zscore
from sklearn.linear_model import LogisticRegression, Ridge, LassoCV, Lasso
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, classification_report, mean_squared_error
from sklearn.decomposition import PCA
from imblearn.over_sampling import SMOTE, RandomOverSampler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
import xgboost as xgb
from xgboost import XGBClassifier

2024-05-24 14:40:02.797470: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


### Pre-processing

We converted the csv file to pandas dataframe to work with that. We dropped “id” column since it did not provide any explanatory power. Then, “Connect_Date” column separated into “year”, “month” and “day”. In addition, since there are 5040 rows in our dataset, we decided to drop missing values since there were only 4 missing values for 3 features which is relatively low compared to overall. Furthermore, we checked for the possible duplicates in the train dataset. 

In [8]:
train_df = pd.read_csv("/Users/umutkurt/Desktop/train.csv")

train_df = train_df.drop("id", axis = 1)

train_df['Connect_Date'] = pd.to_datetime(train_df['Connect_Date'], format='%d/%m/%y')
train_df["day"] = train_df["Connect_Date"].dt.day
train_df["month"] = train_df["Connect_Date"].dt.month
train_df["year"] = train_df["Connect_Date"].dt.year

train_df = train_df.drop("Connect_Date", axis = 1)
train_df.dropna(inplace=True)

#Checking duplicates
duplicate_rows = train_df.duplicated()
print("Number of duplicate rows:", duplicate_rows.sum())


if duplicate_rows.sum() > 0:
    print(train_df[duplicate_rows])

Number of duplicate rows: 0


After updating the training data, we proceeded to the step where we defined X and y. Since our target variable is either 0 or 1, we did not convert it into binary. Then we used train_test_split to create X_train, X_valid, y_train and y_valid. After describing X and y, we shifted our focus on categorical variables. Given the limited number of categories and lack of ordinality, we opted for one-hot encoding. Michael (2023) stated that if there is no ordinality and since one-hot encoder is appropriate for many machine learning models, we decided it’s going to be safer for our models. Furthermore, after using one-hot encoding, we only have 55 features which is going to be manageable. To avoid multicollinearity, we dropped one of the columns while doing the one-hot encoding. After converting categorical variables into 0 and 1 with one-hot encoder, we used StandardScaler to scale our variables since there is going to be an imbalance if we do not do so. Also, for both one-hot encoder and scaler, we used them for X_train, X_valid and test but while doing that we used ".fit" for train datasets and used ".transform" for valid and test because we want to do everything with respect to our training data to get protection for data leakage.

In [13]:
X = train_df.drop("target", axis = 1)
y = train_df["target"]

X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.2, random_state=30)

encoder = OneHotEncoder(handle_unknown="ignore", drop="first")
categorical_columns = ['Gender', 'tariff', 'Handset', 'high Dropped calls', 'No Usage', 'Tariff_OK', 'Usage_Band']
X_train_encoded = encoder.fit_transform(X_train[categorical_columns]).toarray()
X_valid_encoded = encoder.transform(X_valid[categorical_columns]).toarray()
other_cols_train = X_train.drop(columns=categorical_columns)
other_cols_valid = X_valid.drop(columns=categorical_columns)

X_train_encoded_df = pd.DataFrame(X_train_encoded, index=other_cols_train.index, columns=encoder.get_feature_names(categorical_columns))
X_valid_encoded_df = pd.DataFrame(X_valid_encoded, index=other_cols_valid.index, columns=encoder.get_feature_names(categorical_columns))

X_train = pd.concat([other_cols_train, X_train_encoded_df], axis=1)
X_valid = pd.concat([other_cols_valid, X_valid_encoded_df], axis=1)

columns_to_scale = ['Age', 'L_O_S', 'Dropped_Calls', 'Peak_calls_Sum', 'Peak_mins_Sum',
                        'OffPeak_calls_Sum', 'OffPeak_mins_Sum', 'Weekend_calls_Sum',
                        'Weekend_mins_Sum', 'International_mins_Sum', 'Nat_call_cost_Sum',
                        'AvePeak', 'AveOffPeak', 'AveWeekend', 'National_calls', 'National mins',
                        'AveNational', 'All_calls_mins', 'Dropped_calls_ratio', 'Mins_charge',
                        'call_cost_per_min', 'actual call cost', 'Total_call_cost', 'Total_Cost',
                        'average cost min', 'Peak ratio', 'OffPeak ratio', 'Weekend ratio', 'Nat-InterNat Ratio', 'day', 'month', 'year']

scaler = StandardScaler()
X_train[columns_to_scale] = scaler.fit_transform(X_train[columns_to_scale])
X_valid[columns_to_scale] = scaler.transform(X_valid[columns_to_scale])




We can see that our train dataset consists of 55 features. First, we began by examining the correlations of each feature with each other and with other variables. In addition, we checked the correlations of each feature with each other, the point for doing that is if we want to use principal component analysis (PCA) we can consider these correlations to decide on the hyperparameter that we are going to use with the PCA. Furthermore, to deal with the outliers, we decided to truncate them. **CHANGE THE LAST SENTENCE ABOUT OUTLIERS SINCE IM NOT SURE**

In [20]:
#Checking correlations
correlation_matrix = X_train.corr()
high_corr = (correlation_matrix.abs() > 0.9) & (correlation_matrix.abs() < 1)
correlated_features = correlation_matrix[high_corr].stack()
print(correlated_features)

L_O_S                   year                     -0.968874
Peak_calls_Sum          National_calls            0.924453
Peak_mins_Sum           National mins             0.927996
                        All_calls_mins            0.910999
International_mins_Sum  Total_call_cost           0.920791
Nat_call_cost_Sum       actual call cost          0.998969
National_calls          Peak_calls_Sum            0.924453
National mins           Peak_mins_Sum             0.927996
                        All_calls_mins            0.983445
All_calls_mins          Peak_mins_Sum             0.910999
                        National mins             0.983445
                        Total_Cost                0.935193
actual call cost        Nat_call_cost_Sum         0.998969
Total_call_cost         International_mins_Sum    0.920791
                        Total_Cost                0.921048
Total_Cost              All_calls_mins            0.935193
                        Total_call_cost           0.9210

**IM NOT SURE ABOUT THIS OUTLIERS PART**

In [None]:
#Checking for outliers
features = ['Age', 'L_O_S', 'Dropped_Calls', 'Peak_calls_Sum', 'Peak_mins_Sum',
                        'OffPeak_calls_Sum', 'OffPeak_mins_Sum', 'Weekend_calls_Sum',
                        'Weekend_mins_Sum', 'International_mins_Sum', 'Nat_call_cost_Sum',
                        'AvePeak', 'AveOffPeak', 'AveWeekend', 'National_calls', 'National mins',
                        'AveNational', 'All_calls_mins', 'Dropped_calls_ratio', 'Mins_charge',
                        'call_cost_per_min', 'actual call cost', 'Total_call_cost', 'Total_Cost',
                        'average cost min', 'Peak ratio', 'OffPeak ratio', 'Weekend ratio', 'Nat-InterNat Ratio', 'day', 'month', 'year']
def cap_outliers(df, feature, lower_percentile=0.01, upper_percentile=0.99):

    lower_threshold = df[feature].quantile(lower_percentile)
    upper_threshold = df[feature].quantile(upper_percentile)

    # Cap values below and above thresholds
    df[feature] = np.where(df[feature] < lower_threshold, lower_threshold, df[feature])
    df[feature] = np.where(df[feature] > upper_threshold, upper_threshold, df[feature])

    return df

for feature in columns_to_scale:
    train_df = cap_outliers(train_df, feature)
    


### Class Imbalance

Since there is a class imbalance, we must fix it, this can be a real challenge for machine learning algorithms, Bockel-Rickermann, Verdonck, and Verbeke (2023) discussed that class imbalances can lead to overfitting and should be addressed, one should avoid overfitting and there should be enough data to train the minority class (p.6). To solve this issue, we decided to use Synthetic Minority Over-Sampling (SMOTE) and applied it to our training data. In addition, SMOTE is going to ensure effective technique without any information loss. Furthermore, SMOTE is only applied to training data since we want to train it for the minority class and distinguish its performance on the validation set. 

In [21]:
smote = SMOTE()
X_train_smote, y_train_smote = smote.fit_resample(X_train, y_train)

### Feature Selection

For feature selection, we used Lasso in the first place to decide on the features however, we switched to Principal Component Analysis, because we checked our correlation table and we decided that it’s going to be easier to use PCA rather than Lasso. Moreover, since the dataset consists of 55 features, its going to be relatively appropriate to use PCA rather than Lasso or Ridge. On top of that, as seen from the correlation table provided, there could be possible multicollinearity issues which can be resolved by PCA since its going to generate superior representation.  In addition, as Jain (2024) stated in case of highly correlated variables Lasso regression can lead to information loss. Therefore, we decided to use PCA for feature selection with considering correlations between features. In addition to that, we used feature selection to reduce overfitting and to improve our predictive model. After some inspection, we decided on 40-45 as a hyperparameter for PCA.   


In [None]:
lasso = LassoCV(cv=5).fit(X_train, y_train)
selected_features = lasso.coef_ != 0 # or lower than 0.01
print("Selected features via Lasso:", X_train.columns[selected_features])
#According to printed columns, one can put new columns into SMOTE to generate synthetic rows.

In [24]:
pca = PCA(n_components=42)
X_train_pca = pca.fit_transform(X_train_smote)
X_valid_pca = pca.transform(X_valid)

### Pre-Processing Test Data

After completing preprocessing, feature selection and addressing the class imbalance problem, we transformed the test data so that we can further use this for machine learning models that we are going to generate. While dealing with missing values, we chose not to eliminate any, as the test data is relatively small compared to the training data. Instead, we used imputation with median for continuous variables and mode for categorical variables. We used median since its more robust to outliers. Then we applied the pipeline we generated with the training data, first we applied one-hot encoder to convert categorical variables, then we used StandardScaler and PCA. While doing these steps, we used ".transform" to avoid any data leakage and use the parameters that came from the training dataset.

In [33]:
test_df_1 = pd.read_csv("/Users/umutkurt/Desktop/test.csv")

test_df = test_df_1.drop("id", axis = 1)
test_df['Connect_Date'] = pd.to_datetime(test_df['Connect_Date'], format='%d/%m/%y')
test_df["day"] = test_df["Connect_Date"].dt.day
test_df["month"] = test_df["Connect_Date"].dt.month
test_df["year"] = test_df["Connect_Date"].dt.year


test_df = test_df.drop("Connect_Date", axis = 1)

median_dropped_calls_ratio = train_df['Dropped_calls_ratio'].median()
test_df['Dropped_calls_ratio'].fillna(median_dropped_calls_ratio, inplace=True)

median_call_cost_per_min = train_df['call_cost_per_min'].median()
test_df['call_cost_per_min'].fillna(median_call_cost_per_min, inplace=True)

mode_usage_band = train_df['Usage_Band'].mode()[0]
test_df['Usage_Band'].fillna(mode_usage_band, inplace=True)


X_test_encoded = encoder.transform(test_df[categorical_columns]).toarray()
X_test_encoded_df = pd.DataFrame(X_test_encoded, columns=encoder.get_feature_names_out(categorical_columns))

non_categorical_data = test_df.drop(columns=categorical_columns)

non_categorical_data.reset_index(drop=True, inplace=True)
X_test_encoded_df.reset_index(drop=True, inplace=True)
test_df = pd.concat([non_categorical_data, X_test_encoded_df], axis=1)

test_df[columns_to_scale] = scaler.transform(test_df[columns_to_scale])

test_df_pca = pca.transform(test_df)





### Implementation of ML Models and Comparing Metrics

#### Logistic Regression

In [46]:
log_reg = LogisticRegression(max_iter = 1000, class_weight="balanced")

log_reg.fit(X_train_pca, y_train_smote)

y_pred_lg = log_reg.predict(X_valid_pca)

accuracy_lg = accuracy_score(y_valid, y_pred_lg)
precision_lg = precision_score(y_valid, y_pred_lg)
recall_lg = recall_score(y_valid, y_pred_lg)

print(f"Accuracy for Logistic Regression on Validation Set: {accuracy_lg}")
print(f"Precision for Logistic Regression on Validation Set: {precision_lg}")
print(f"Recall for Logistic Regression on Validation Set: {recall_lg}")

Accuracy for Logistic Regression on Validation Set: 0.9136904761904762
Precision for Logistic Regression on Validation Set: 0.6721311475409836
Recall for Logistic Regression on Validation Set: 0.82


#### GBoost Classifier

In [42]:
gb = GradientBoostingClassifier()
gb.fit(X_train_pca, y_train_smote)
#y_pred = gb.predict_proba(test_df)

y_pred_gb = gb.predict(X_valid_pca)

accuracy_gb = accuracy_score(y_valid, y_pred_gb)
precision_gb = precision_score(y_valid, y_pred_gb)
recall_gb = recall_score(y_valid, y_pred_gb)

print(f"Accuracy for GB Classifier on Validation Set: {accuracy_gb}")
print(f"Precision for GB Classifier on Validation Set: {precision_gb}")
print(f"Recall for GB Classifier on Validation Set: {recall_gb}")


Accuracy for GB Classifier on Validation Set: 0.9037698412698413
Precision for GB Classifier on Validation Set: 0.6432432432432432
Recall for GB Classifier on Validation Set: 0.7933333333333333


#### RandomForest Classifier

In [41]:
rf_classifier = RandomForestClassifier()
rf_classifier.fit(X_train_pca, y_train_smote)
#y_test_pred = rf_classifier.predict_proba(test_df)
y_pred_rf = rf_classifier.predict(X_valid_pca)

accuracy_rf = accuracy_score(y_valid, y_pred_rf)
precision_rf = precision_score(y_valid, y_pred_rf)
recall_rf = recall_score(y_valid, y_pred_rf)

print(f"Accuracy for RF Classifier on Validation Set: {accuracy_rf}")
print(f"Precision for RF Classifier on Validation Set: {precision_rf}")
print(f"Recall for RF Classifier on Validation Set: {recall_rf}")

Accuracy for RF Classifier on Validation Set: 0.9236111111111112
Precision for RF Classifier on Validation Set: 0.7417218543046358
Recall for RF Classifier on Validation Set: 0.7466666666666667


#### XGBoost Classifier

In [44]:
xgb_classifier = XGBClassifier(objective='binary:logistic')

# Train the model
xgb_classifier.fit(X_train_pca, y_train_smote)

y_pred_xgb = xgb_classifier.predict(X_valid_pca)

accuracy_xgb = accuracy_score(y_valid, y_pred_xgb)
precision_xgb = precision_score(y_valid, y_pred_xgb)
recall_xgb = recall_score(y_valid, y_pred_xgb)

print(f"Accuracy for XGBoost on Validation Set: {accuracy_xgb}")
print(f"Precision for XGBoost on Validation Set: {precision_xgb}")
print(f"Recall for XGBoost on Validation Set: {recall_xgb}")

Accuracy for XGBoost on Validation Set: 0.9216269841269841
Precision for XGBoost on Validation Set: 0.7290322580645161
Recall for XGBoost on Validation Set: 0.7533333333333333


As seen from the performance metrics, all models are similar in terms of precision, recall and accuracy. However, when we uploaded our predicted csv file into the system. We found that AUC metric was similar but there was a problem with the profit metric. Therefore, we decided to use another approach, we defined a custom loss function and deep learning layers to get more precise result.

#### Applying Custom Function with Multiple Layers

  First, we indicated the feature that’s going to be used in profit calculation which is “average cost min” then we created our custom loss function with binary cross entropy, we used binary cross entropy to get the desired outcome. Ibrahim (2023) stated that binary cross entropy can be advantageous for imbalanced data with its high penalties, and it can be useful for fraud detection etc. Therefore, we decided to use a loss function to increase the profit metric. Then we added sigmoid and relu activation function as well to train our data. We avoid using too high epoch value while training since it can cause overfitting. 

In [50]:
financial_metric = X_train_smote['average cost min'].values 
financial_metric
y_combined = np.column_stack((y_train_smote, financial_metric))
print("Shape of y_train:", y_train_smote.shape)
print("Shape of financial_metric:", financial_metric.shape)
y_combined = np.column_stack((y_train_smote, financial_metric))
print("Shape of y_combined:", y_combined.shape)

Shape of y_train: (6878,)
Shape of financial_metric: (6878,)
Shape of y_combined: (6878, 2)


In [55]:
def custom_loss(y_true, y_pred):
    
    y_pred = tf.squeeze(y_pred, axis=1)
    bce = keras.losses.binary_crossentropy(y_true[:, 0], y_pred)
    
    
    #weighted_loss = bce * (y_true[:, 1] + 1) Can be used to imply the importance for weights

    return tf.reduce_mean(bce)


X_train_reduced = X_train_smote.drop('average cost min', axis=1) 

model = Sequential([
    keras.layers.Input(shape=(X_train_reduced.shape[1],)),  
    keras.layers.Dense(10, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')
    
                       
])

model.compile(loss=custom_loss, optimizer="adam")

model.fit(X_train_reduced, y_combined, epochs=100, batch_size=16)  

Epoch 1/100
[1m430/430[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - loss: 0.6528
Epoch 2/100
[1m430/430[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 927us/step - loss: 0.4705
Epoch 3/100
[1m430/430[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 920us/step - loss: 0.3362
Epoch 4/100
[1m430/430[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 904us/step - loss: 0.2924
Epoch 5/100
[1m430/430[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1000us/step - loss: 0.2863
Epoch 6/100
[1m430/430[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 926us/step - loss: 0.2591
Epoch 7/100
[1m430/430[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - loss: 0.2440
Epoch 8/100
[1m430/430[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 920us/step - loss: 0.2364
Epoch 9/100
[1m430/430[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 923us/step - loss: 0.2290
Epoch 10/100
[1m430/430[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m

[1m430/430[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 987us/step - loss: 0.1356
Epoch 81/100
[1m430/430[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 970us/step - loss: 0.1482
Epoch 82/100
[1m430/430[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 0.1315
Epoch 83/100
[1m430/430[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 0.1336  
Epoch 84/100
[1m430/430[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 998us/step - loss: 0.1257
Epoch 85/100
[1m430/430[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 978us/step - loss: 0.1402
Epoch 86/100
[1m430/430[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - loss: 0.1303
Epoch 87/100
[1m430/430[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 915us/step - loss: 0.1259
Epoch 88/100
[1m430/430[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 915us/step - loss: 0.1386
Epoch 89/100
[1m430/430[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m

<keras.src.callbacks.history.History at 0x7fedc28954c0>

In [56]:
test_df_reduced = test_df.drop('average cost min', axis=1)
y_test_pred_proba = model.predict(test_df_reduced)
y_test_pred_proba

[1m53/53[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step


array([[0.00059315],
       [0.00520602],
       [0.00079747],
       ...,
       [0.54738694],
       [0.01663261],
       [0.00147558]], dtype=float32)

After implementing our deep learning framework with custom loss function, our results were better. Initially we found that our AUC raised to 94% and profit metric is improved from around 2 to 4.1, we were at 15th place. However, now we are at 25th place with profit metric 2.96 and 94%. We managed to keep the AUC high but the profit metric stayed around 3.  

# ADD WHAT CAN BE IMPROVED WHY WE COULDNT INCREASE THE PROFIT METRIC ETC.?

#### Generating CSV files to upload

In [12]:
probabilities_df = pd.DataFrame(y_test_pred_proba, columns=['PRED'])

combined_df = pd.concat([test_df_1['id'], probabilities_df], axis=1)

combined_df.to_csv('file.csv', index=False)

In [None]:
a=[]
for values in y_test_pred:
    a.append(values[1])



a=pd.DataFrame(a,columns=['PRED'])


test_df_f = pd.read_csv("/Users/umutkurt/Desktop/test.csv")
test_df_f= test_df_f["id"]

combined_df = pd.concat([test_df_f, a], axis=1)

combined_df.to_csv('logistic_reg.csv', index=False) 
#Name of the file can change



### References


Oyebamiji, M. (2023). A comprehensive comparison between one-hot and ordinal encoding. Medium. Retrieved from https://medium.com/@oyebamijimicheal10/a-comprehensive-comparison-between-one-hot-and-ordinal-encoding-6f899c4f08b3 

Jain, S. (2024). Lasso & Ridge Regression | A Comprehensive Guide in Python & R. AnalyticsVidhya. Retrieved from https://www.analyticsvidhya.com/blog/2017/06/a-comprehensive-guide-for-linear-ridge-and-lasso-regression/#:~:text=The%20main%20problem%20with%20lasso,lower%20accuracy%20in%20our%20model. 

Bockel-Rickermann, C., Verdonck, T., & Verbeke, W. (2023). Fraud Analytics: A Decade of Research – Organizing Challenges and Solutions in the Field. Expert Systems with Applications. Elsevier. https://doi.org/10.1016/j.eswa.2023.120605. 

Ibrahim, M. (2023). Understanding the difference in performance between binary cross-entropy and categorical cross-entropy. Weights & Biases. Retrieved from https://wandb.ai/mostafaibrahim17/ml-articles/reports/Understanding-the-Difference-in-Performance-Between-Binary-Cross-Entropy-and-Categorical-Cross-Entropy--Vmlldzo0Nzk4NDI2#:~:text=Binary%20Cross%2DEntropy%3A%20Use%20Cases%20in%20Neural%20Network,-Binary%20cross%2Dentropy&text=It%20is%20commonly%20employed%20for,probabilities%20for%20the%20positive%20class. 
