# Mining Big Datasets - Assignment 1

## Portuguese Bank's Marketing Campaign

> Konstantinos Ninas, f2822108 <br />
> Stamatis Sideris, f2822113 <br />
> MSc in Business Analytics <br />
> Department of Management Science and Technology <br />
> Athens University of Economics and Business <br />

In [1]:
#import libraries
import pandas as pd
import numpy as np

## Part 1 - Import of the .csv file and missing values management

In [2]:
#read the csv file using read_csv command
marketing_campaign = pd.read_csv("bank.csv",  sep = ";")

In [3]:
#view the top 5 observations using the head function
marketing_campaign.head(5)

Unnamed: 0,Age,Job,Marital,Education,Default,Balance,Housing,Loan,Rating,Products
0,33.0,entrepreneur,married,secondary,no,2,yes,yes,poor,13161719
1,35.0,management,married,tertiary,no,231,yes,no,good,4816
2,,management,single,tertiary,no,447,yes,yes,fair,716
3,42.0,entrepreneur,divorced,tertiary,yes,2,yes,no,fair,1381011121819
4,58.0,retired,married,primary,no,121,yes,no,good,4567111819


In [4]:
#identify the datatypes of each column
marketing_campaign.dtypes

Age          float64
Job           object
Marital       object
Education     object
Default       object
Balance        int64
Housing       object
Loan          object
Rating        object
Products      object
dtype: object

* There are missing values in the age and the balance variables.

In [5]:
#calculation of the average age
average_age = round(np.mean(marketing_campaign["Age"]),0)
average_age

41.0

In [6]:
#calculation of the average balance
average_balance = round(np.mean(marketing_campaign["Balance"]),0)
average_balance

1354.0

In [7]:
#replace of the missing values in the age variable with the average age
marketing_campaign.loc[np.isnan(marketing_campaign.Age), 'Age'] = average_age

In [8]:
#replace of the missing values in the balance variable with the average balance
marketing_campaign.loc[np.isnan(marketing_campaign.Balance), 'Balance'] = average_balance

In [9]:
print("The Age variable has",sum(np.isnan(marketing_campaign.Age)), "missing values")
print("The Balance variable has",sum(np.isnan(marketing_campaign.Balance)), "missing values")

The Age variable has 0 missing values
The Balance variable has 0 missing values


In [10]:
#update the data type of age from float to integer
marketing_campaign['Age'] = marketing_campaign['Age'].astype(np.int64)

In [11]:
#check if the data type conversion was succesful
marketing_campaign.dtypes

Age           int64
Job          object
Marital      object
Education    object
Default      object
Balance       int64
Housing      object
Loan         object
Rating       object
Products     object
dtype: object

In [12]:
marketing_campaign.head(5)

Unnamed: 0,Age,Job,Marital,Education,Default,Balance,Housing,Loan,Rating,Products
0,33,entrepreneur,married,secondary,no,2,yes,yes,poor,13161719
1,35,management,married,tertiary,no,231,yes,no,good,4816
2,41,management,single,tertiary,no,447,yes,yes,fair,716
3,42,entrepreneur,divorced,tertiary,yes,2,yes,no,fair,1381011121819
4,58,retired,married,primary,no,121,yes,no,good,4567111819


## Part 2 - Calculation of dissimilarity between customers in the dataset

In [13]:
import itertools

#define jaccard similarity (size of intersection over size of union)
def jaccard_sim(list1, list2):
    intersection = len(list(set(list1).intersection(list2)))
    union = (len(list1) + len(list2)) - intersection
    return float(intersection) / union

In [14]:
#update the products column into lists of products
marketing_campaign.iloc[:,9] = marketing_campaign.iloc[:,9].str.split(',')
marketing_campaign.head(5)

Unnamed: 0,Age,Job,Marital,Education,Default,Balance,Housing,Loan,Rating,Products
0,33,entrepreneur,married,secondary,no,2,yes,yes,poor,"[1, 3, 16, 17, 19]"
1,35,management,married,tertiary,no,231,yes,no,good,"[4, 8, 16]"
2,41,management,single,tertiary,no,447,yes,yes,fair,"[7, 16]"
3,42,entrepreneur,divorced,tertiary,yes,2,yes,no,fair,"[1, 3, 8, 10, 11, 12, 18, 19]"
4,58,retired,married,primary,no,121,yes,no,good,"[4, 5, 6, 7, 11, 18, 19]"


In [15]:
educ_mapping = {
    'secondary' : 1, 
    'tertiary' : 2,
    'primary' : 3, 
}

# If no mapping provided, return x
f = lambda x: educ_mapping.get(x, x) 
#update the values of education with their corresponding numeric value
marketing_sim = marketing_campaign
marketing_sim.loc[:, 'Education'] = marketing_sim.loc[:, 'Education'].map(f)

In [16]:
rating_mapping = {
    'poor' : 1, 
    'good' : 2,
    'fair' : 3, 
    'very_good':4,
    'excelent':5
}

# If no mapping provided, return x
f2 = lambda x: rating_mapping.get(x, x) 
#update the values of education with their corresponding numeric value
marketing_sim.loc[:, 'Rating'] = marketing_sim.loc[:, 'Rating'].map(f2)

In [17]:
#estimate the number of observations in the dataset
n = len(marketing_campaign)

#empty dissimilarity matrix
cust_dissimilarity = [[0.0 for x in range(0,n+1)] for y in range(0,n+1)]
age_range = max(marketing_sim.Age) - min(marketing_sim.Age)
balance_range = max(marketing_sim.Balance) - min(marketing_sim.Balance)
education_range = max(marketing_sim.Education) - min(marketing_sim.Education)
rating_range = max(marketing_sim.Rating) - min(marketing_sim.Rating)
def similarity_estimation(campaign, customer_index=None):
    best = {}
    #if no specific customer is given, a dissimilarity matrix will be formed
    if customer_index==None:
        for i in range(0,n):
            for j in range(i+1,n):
                #calculating the age dissimilarity - numerical
                age_dis = (abs(campaign.iloc[i,0] - campaign.iloc[j,0]))/age_range

                #calculating the balance dissimilarity - numerical
                balance_dis = (abs(campaign.iloc[i,5] - campaign.iloc[j,5]))/balance_range

                #calculating the job dissmilarity - nominal (categorical)
                if campaign.iloc[i,1] == campaign.iloc[j,1]:
                    job_dis = 0
                else:
                    job_dis = 1

                #calculating the marital status dissmilarity - nominal (categorical)
                if campaign.iloc[i,2] == campaign.iloc[j,2]:
                    marital_dis = 0
                else:
                    marital_dis = 1

                #calculating the loan default index dissmilarity - nominal (binary - categorical)
                if campaign.iloc[i,4] == campaign.iloc[j,4]:
                    default_dis = 0
                else:
                    default_dis = 1

                #calculating the housing loan index dissmilarity - nominal (binary - categorical)
                if campaign.iloc[i,6] == campaign.iloc[j,6]:
                    housing_dis = 0
                else:
                    housing_dis = 1

                #calculating the personal loan index dissmilarity - nominal (binary - categorical)
                if campaign.iloc[i,7] == campaign.iloc[j,7]:
                    loan_dis = 0
                else:
                    loan_dis = 1

                #calculating the education dissmilarity - oridnal
                educ_dis = abs(campaign.iloc[i,3] - campaign.iloc[j,3])/education_range

                #calculating the rating dissmilarity - oridnal
                rating_dis = abs(campaign.iloc[i,8] - campaign.iloc[j,8])/rating_range

                #calculating the jaccard dissmilarity of the products - sets of products
                products_dis =  1 - jaccard_sim(campaign.iloc[i,9], campaign.iloc[j,9])

                #calculating the customers dissimilarity and assigning it in the dissimilarity matrix
                cust_dissimilarity[i][j] = (age_dis + balance_dis + job_dis + marital_dis + default_dis + housing_dis \
                                                  + loan_dis + educ_dis + rating_dis + products_dis)/10
    #if a specific customer is given, his 10 nearest customers will be identified along with their similarity scores
    #(similarity)
    else:
        i = customer_index
        sp_cust_similarity = None
        for j in range(0,n):
            if i==j:
                continue
            else:
                #calculating the age similarity - numerical
                age_sim = 1 - (abs(campaign.iloc[i,0] - campaign.iloc[j,0]))/age_range

                #calculating the balance similarity - numerical
                balance_sim = 1 - (abs(campaign.iloc[i,5] - campaign.iloc[j,5]))/balance_range

                #calculating the job similarity - nominal (categorical)
                if campaign.iloc[i,1] == campaign.iloc[j,1]:
                    job_sim = 1
                else:
                    job_sim = 0
                    
                #calculating the marital status similarity - nominal (categorical)
                if campaign.iloc[i,2] == campaign.iloc[j,2]:
                    marital_sim = 1
                else:
                    marital_sim = 0

                #calculating the loan default index similarity - nominal (binary - categorical)
                if campaign.iloc[i,4] == campaign.iloc[j,4]:
                    default_sim = 1
                else:
                    default_sim = 0

                #calculating the housing loan index similarity - nominal (binary - categorical)
                if campaign.iloc[i,6] == campaign.iloc[j,6]:
                    housing_sim = 1
                else:
                    housing_sim = 0

                #calculating the personal loan index similarity - nominal (binary - categorical)
                if campaign.iloc[i,7] == campaign.iloc[j,7]:
                    loan_sim = 1
                else:
                    loan_sim = 0

                #calculating the education similarity - oridnal
                educ_sim = 1 - abs(campaign.iloc[i,3] - campaign.iloc[j,3])/education_range

                #calculating the rating similarity - oridnal
                rating_sim = 1 - abs(campaign.iloc[i,8] - campaign.iloc[j,8])/rating_range

                #calculating the jaccard similarity of the products - sets of products
                products_sim = jaccard_sim(campaign.iloc[i,9], campaign.iloc[j,9])

                #calculating the customers similarity and assigning it in the dissimilarity matrix
                sp_cust_similarity = (age_sim + balance_sim + job_sim + marital_sim + default_sim + housing_sim \
                                                  + loan_sim + educ_sim + rating_sim + products_sim)/10
                if j <= 9:
                    best[j] = sp_cust_similarity
                else:
                    if sp_cust_similarity > min(best.values()):
                        min_key = min(best, key=best.get)
                        del best[min_key]
                        best[j] = sp_cust_similarity
        best = {k: v for k, v in sorted(best.items(), key=lambda item: item[1])} 
        return best

    

In [18]:
def printing_fun(cust_set, customer_index):
    print('─' * 39)
    print("| 10 Nearest Neighbors for Customer",customer_index, "|")
    print('─' * 39)
    print("|  Customer ID", " "*1, "|", " "*1,"Similarity Score |")
    print('─' * 39)
    # print each data item.
    for key, value in cust_set.items():
        print("|      ",key,"     ","|"\
              , "       ", round(value,2), "      |")
    print('─' * 39)
       

## Part 3 - Identification of the 10 nearest neighbors for some specific customers

In [19]:
#identification of the 10 nearest neighbors for the customer 1200
cust_index = 1200
nn_1200 = similarity_estimation(marketing_sim,cust_index)
printing_fun(nn_1200,cust_index)

───────────────────────────────────────
| 10 Nearest Neighbors for Customer 1200 |
───────────────────────────────────────
|  Customer ID   |   Similarity Score |
───────────────────────────────────────
|       313       |         0.93       |
|       36452       |         0.93       |
|       13730       |         0.93       |
|       8604       |         0.93       |
|       7034       |         0.93       |
|       34503       |         0.93       |
|       14912       |         0.93       |
|       1660       |         0.93       |
|       7448       |         0.93       |
|       24897       |         0.94       |
───────────────────────────────────────


In [20]:
#identification of the 10 nearest neighbors for the customer 3650
cust_index = 3650
nn_3650 = similarity_estimation(marketing_sim,cust_index)
printing_fun(nn_3650,cust_index)

───────────────────────────────────────
| 10 Nearest Neighbors for Customer 3650 |
───────────────────────────────────────
|  Customer ID   |   Similarity Score |
───────────────────────────────────────
|       38320       |         0.93       |
|       25016       |         0.94       |
|       33540       |         0.94       |
|       36416       |         0.94       |
|       4472       |         0.94       |
|       26915       |         0.94       |
|       16558       |         0.94       |
|       30584       |         0.95       |
|       8783       |         0.95       |
|       5964       |         0.96       |
───────────────────────────────────────


In [21]:
#identification of the 10 nearest neighbors for the customer 10400
cust_index = 10400
nn_10400 = similarity_estimation(marketing_sim,cust_index)
printing_fun(nn_10400,cust_index)

───────────────────────────────────────
| 10 Nearest Neighbors for Customer 10400 |
───────────────────────────────────────
|  Customer ID   |   Similarity Score |
───────────────────────────────────────
|       41843       |         0.92       |
|       22547       |         0.92       |
|       20525       |         0.92       |
|       20404       |         0.93       |
|       17420       |         0.93       |
|       27072       |         0.93       |
|       9331       |         0.94       |
|       34689       |         0.94       |
|       19547       |         0.94       |
|       42394       |         0.94       |
───────────────────────────────────────


In [22]:
#identification of the 10 nearest neighbors for the customer 14930
cust_index = 14930
nn_14930 = similarity_estimation(marketing_sim,cust_index)
printing_fun(nn_14930,cust_index)

───────────────────────────────────────
| 10 Nearest Neighbors for Customer 14930 |
───────────────────────────────────────
|  Customer ID   |   Similarity Score |
───────────────────────────────────────
|       10890       |         0.89       |
|       10729       |         0.9       |
|       13072       |         0.9       |
|       29092       |         0.9       |
|       15797       |         0.9       |
|       12628       |         0.9       |
|       16439       |         0.9       |
|       11778       |         0.91       |
|       7583       |         0.91       |
|       17418       |         0.92       |
───────────────────────────────────────


In [23]:
#identification of the 10 nearest neighbors for the customer 22330
cust_index = 22330
nn_22330 = similarity_estimation(marketing_sim,cust_index)
printing_fun(nn_22330,cust_index)

───────────────────────────────────────
| 10 Nearest Neighbors for Customer 22330 |
───────────────────────────────────────
|  Customer ID   |   Similarity Score |
───────────────────────────────────────
|       34613       |         0.9       |
|       13285       |         0.9       |
|       4884       |         0.9       |
|       14698       |         0.91       |
|       23394       |         0.91       |
|       16406       |         0.91       |
|       23250       |         0.91       |
|       16042       |         0.93       |
|       30118       |         0.94       |
|       15717       |         0.99       |
───────────────────────────────────────


In [24]:
#identification of the 10 nearest neighbors for the customer 25671
cust_index = 25671
nn_25671 = similarity_estimation(marketing_sim,cust_index)
printing_fun(nn_25671,cust_index)

───────────────────────────────────────
| 10 Nearest Neighbors for Customer 25671 |
───────────────────────────────────────
|  Customer ID   |   Similarity Score |
───────────────────────────────────────
|       41282       |         0.93       |
|       9606       |         0.93       |
|       23519       |         0.93       |
|       11102       |         0.93       |
|       32030       |         0.93       |
|       19547       |         0.93       |
|       27373       |         0.94       |
|       23602       |         0.94       |
|       40717       |         0.94       |
|       21113       |         0.94       |
───────────────────────────────────────


In [25]:
#identification of the 10 nearest neighbors for the customer 29311
cust_index = 29311
nn_29311 = similarity_estimation(marketing_sim,cust_index)
printing_fun(nn_29311,cust_index)

───────────────────────────────────────
| 10 Nearest Neighbors for Customer 29311 |
───────────────────────────────────────
|  Customer ID   |   Similarity Score |
───────────────────────────────────────
|       35247       |         0.91       |
|       33851       |         0.91       |
|       7053       |         0.91       |
|       34959       |         0.91       |
|       7929       |         0.92       |
|       28454       |         0.92       |
|       6596       |         0.92       |
|       26497       |         0.93       |
|       234       |         0.93       |
|       4542       |         0.93       |
───────────────────────────────────────


In [26]:
#identification of the 10 nearest neighbors for the customer 34650
cust_index = 34650
nn_34650 = similarity_estimation(marketing_sim,cust_index)
printing_fun(nn_34650,cust_index)

───────────────────────────────────────
| 10 Nearest Neighbors for Customer 34650 |
───────────────────────────────────────
|  Customer ID   |   Similarity Score |
───────────────────────────────────────
|       8169       |         0.92       |
|       652       |         0.92       |
|       1351       |         0.93       |
|       2968       |         0.93       |
|       34618       |         0.93       |
|       8462       |         0.93       |
|       35282       |         0.93       |
|       24793       |         0.93       |
|       32087       |         0.94       |
|       2258       |         0.94       |
───────────────────────────────────────


In [27]:
#identification of the 10 nearest neighbors for the customer 39200
cust_index = 39200
nn_39200 = similarity_estimation(marketing_sim,cust_index)
printing_fun(nn_39200,cust_index)

───────────────────────────────────────
| 10 Nearest Neighbors for Customer 39200 |
───────────────────────────────────────
|  Customer ID   |   Similarity Score |
───────────────────────────────────────
|       39631       |         0.94       |
|       27681       |         0.94       |
|       29726       |         0.94       |
|       38913       |         0.94       |
|       41136       |         0.94       |
|       32422       |         0.95       |
|       40901       |         0.95       |
|       41662       |         0.95       |
|       40086       |         0.95       |
|       39233       |         0.95       |
───────────────────────────────────────


In [28]:
#identification of the 10 nearest neighbors for the customer 42000
cust_index = 42000
nn_42000 = similarity_estimation(marketing_sim,cust_index)
printing_fun(nn_42000,cust_index)

───────────────────────────────────────
| 10 Nearest Neighbors for Customer 42000 |
───────────────────────────────────────
|  Customer ID   |   Similarity Score |
───────────────────────────────────────
|       32895       |         0.94       |
|       8078       |         0.95       |
|       29573       |         0.95       |
|       32523       |         0.95       |
|       29886       |         0.95       |
|       40642       |         0.95       |
|       26216       |         0.95       |
|       26435       |         0.95       |
|       27907       |         0.96       |
|       39897       |         0.97       |
───────────────────────────────────────


## Part 4 - Customer rating prediction
* Next we will attempt to predict the customer's ratings based on their nearest neighbors' rating by utilizing different methods.

In [29]:
#estimate the number of observations in the dataset
n = len(marketing_campaign)

def rating_prediction(campaign, customer_index=None):
    best = {}
    sp_cust_similarity = None
    i=customer_index
    pred_ranks_avg = []
    for j in range(0,n):
        if i==j:
            continue
        else:
            #calculating the age similarity - numerical
            age_sim = 1 - (abs(campaign.iloc[i,0] - campaign.iloc[j,0]))/age_range

            #calculating the balance similarity - numerical
            balance_sim = 1 - (abs(campaign.iloc[i,5] - campaign.iloc[j,5]))/balance_range

            #calculating the job similarity - nominal (categorical)
            if campaign.iloc[i,1] == campaign.iloc[j,1]:
                job_sim = 1
            else:
                job_sim = 0

            #calculating the marital status similarity - nominal (categorical)
            if campaign.iloc[i,2] == campaign.iloc[j,2]:
                marital_sim = 1
            else:
                marital_sim = 0

            #calculating the loan default index similarity - nominal (binary - categorical)
            if campaign.iloc[i,4] == campaign.iloc[j,4]:
                default_sim = 1
            else:
                default_sim = 0

            #calculating the housing loan index similarity - nominal (binary - categorical)
            if campaign.iloc[i,6] == campaign.iloc[j,6]:
                housing_sim = 1
            else:
                housing_sim = 0

            #calculating the personal loan index similarity - nominal (binary - categorical)
            if campaign.iloc[i,7] == campaign.iloc[j,7]:
                loan_sim = 1
            else:
                loan_sim = 0

            #calculating the education similarity - oridnal
            educ_sim = 1 - abs(campaign.iloc[i,3] - campaign.iloc[j,3])/education_range

            #calculating the jaccard similarity of the products - sets of products
            products_sim =  jaccard_sim(campaign.iloc[i,9], campaign.iloc[j,9])

            #calculating the customers similarity and assigning it in the dissimilarity matrix
            sp_cust_similarity = (age_sim + balance_sim + job_sim + marital_sim + default_sim + housing_sim \
                                              + loan_sim + educ_sim + products_sim)/10
            if j <= 9:
                best[j] = sp_cust_similarity
            else:
                if sp_cust_similarity > min(best.values()):
                    min_key = min(best, key=best.get)
                    del best[min_key]
                    best[j] = sp_cust_similarity
    best = {k: v for k, v in sorted(best.items(), key=lambda item: item[1])} 
    
    #calculation of the average rank of a given customer given the average rank of his/her 10 nearest neighbors
    rank_sum = 0
    #for loop to find the sum of the ranks of his/her 10 nearest neighbors
    for keys in best:
        rank_sum += campaign.iloc[keys,8]
    
    #divide the sum of the neighbors' rating with the number of the nearest neighbors
    avg_rank_of_nn = round(rank_sum/len(best.keys()),0)
    
    
    #calculation of the weighted average rank of a given customer given the average rank of his/her 10 nearest neighbors
    rank_weighted_sum = 0
    similarity_sum = 0

    #for loop to find the sum of the ranks multiplied by their similarity scores of his/her 10 nearest neighbors
    for keys, values in best.items():
        rank_weighted_sum += values * campaign.iloc[keys,8]
        similarity_sum += values
    
    #divide the weighted sum of the neighbors' rating with the sum of their similarity scores
    w_avg_rank_of_nn = round(rank_weighted_sum/similarity_sum,0)
    
    
    return best, avg_rank_of_nn, w_avg_rank_of_nn


* Next, we will identify the 10 nearest neighbors of the customer-100 by using the function that does not consider any customer's rating

In [30]:
#we assign the customer's (100) index
cust_index = 100

#we find the 10 nearest neighbors of the customer along with their similarity scores
nn_100, avg_rating, w_avg_rating = rating_prediction(marketing_campaign,cust_index)
printing_fun(nn_100,cust_index)
print("The predicted rating of the customer", cust_index, "according to the average rating and the weighted average rating\
 of his her 10 nearest neighbors is equal to",avg_rating,"and", w_avg_rating, "respectively.")

───────────────────────────────────────
| 10 Nearest Neighbors for Customer 100 |
───────────────────────────────────────
|  Customer ID   |   Similarity Score |
───────────────────────────────────────
|       25494       |         0.77       |
|       5021       |         0.77       |
|       25939       |         0.77       |
|       32458       |         0.78       |
|       11795       |         0.79       |
|       33494       |         0.8       |
|       8064       |         0.8       |
|       11217       |         0.81       |
|       10475       |         0.81       |
|       34482       |         0.82       |
───────────────────────────────────────
The predicted rating of the customer 100 according to the average rating and the weighted average rating of his her 10 nearest neighbors is equal to 2.0 and 2.0 respectively.


* Finally, we will try to predict the rating scores for the first 50 customers in the dataset, according to the two classification methods that consider their 10-nearest neighbors and evaluate their effectiveness by using the Mean Prediction Error metric.

In [31]:
#we create two empty lists that contain the predicted rating scores from each classification method 
#(average, weighted average) 
list_of_predicted_rating_avg = []
list_of_predicted_rating_w_avg = []

#for loop that will iterate through the first 50 customers
for i in range(0,50):
    #we assign the results of the function in three variables
    nn_dict, avg, w_avg = rating_prediction(marketing_campaign,i)
    #we append each predicted rating to its corresponding list
    list_of_predicted_rating_avg.append(avg)
    list_of_predicted_rating_w_avg.append(w_avg)


In [34]:
#we find the true rating scores of the first 50 customers
true_ratings_of_first_50_cust = marketing_campaign.iloc[0:50,8]

#calculation of the mean prediction error for the predictions based on the average ratings of the 10-nn
mpe_score_avg = sum(abs(list_of_predicted_rating_avg- true_ratings_of_first_50_cust))/len(true_ratings_of_first_50_cust)

#calculation of the mean prediction error for the predictions based on the weighted average ratings of the 10-nn
mpe_score_w_avg = sum(abs(list_of_predicted_rating_w_avg- true_ratings_of_first_50_cust))/len(true_ratings_of_first_50_cust)

In [35]:
print("Mean Prediction Error for the predictions based on nn-average rating\
 is equal to", mpe_score_avg)

print("Mean Prediction Error for the predictions based on nn-weighted average rating\
 is equal to", mpe_score_w_avg)

Mean Prediction Error for the predictions based on nn-average rating is equal to 0.46
Mean Prediction Error for the predictions based on nn-weighted average rating is equal to 0.48


#### It is observed that the weighted average classification method has slightly worse predictions on the customers' ratings. In other words, the average rating score of the nearest customers is a slightly better measure to predict one's true rating. Thus, we can assume that it is a more efficient method to predict any given customer's rating based on the average rating of his/her nearest neighbors-customers. 