# **Customer Classification for Bank Marketing Campaign - Part 1** 🏢

**Created by: Fathimah Yasmin**

**Table of content :**

1. Business Problem Understanding
2. Data Understanding
****

## **Business Understanding**

### **Context:**
Banks need to maintain their liquidity ratio to ensure that cash needs are continuously met. This ratio indicates a bank's ability to handle short-term financial responsibilities, which includes providing loans to consumer and business. The more cash reverse a bank maintains, the greater its ability to provide loans and generate profits. 

One way for them to keep this ratio in check is by offering term deposit investments to customers. **Term deposits are a type of investment where customers commit a certain amount of money to a bank or financial institution that cannot be withdrawn for a set period.** In return, customers receive a fixed interest rate on the amount they deposit. 

Portuguese banks conduct telemarketing campaigns to promote term deposit investments, contacting customers believed to be potential investors. To improve the effectiveness of these campaigns, the banks aim to predict which customers are likely to open term deposits. Accurate predictions enable the banks to focus on promising customers, thereby reducing the financial losses from misdirected campaign efforts.

### **Problem Statement:**
The Mass Funding Division plays a crucial role in securing the funds needed for the bank's everyday operations and its larger financial strategy. One way they do this is through term deposits. These are a reliable source of funds, offering low risk and steady returns. However, there's a challenge: term deposits require a long-term commitment and offer fixed returns, which makes them less appealing in a market where many people want investments with higher returns.

To tackle this, the division uses targeted telemarketing campaigns, involving repetitive calls to gauge customer interest. But this method can be time-consuming and costly if not done effectively. If the division just relies on manual methods to identify potential customers, it could lead to biased and inaccurate predictions. This means they could end up targeting the wrong people, wasting time and money.

To improve the effectiveness of their marketing efforts, a more strategic approach is needed. By using a precise classification model, the division can focus on customers who are more likely to open term deposits. This way, they can run more efficient campaigns and make better use of their resources.

By employing a classification model that not only prioritizes accurate predictions but also takes profitability into account, Mass Funding Division can strategically target customers who are more likely to open term deposits. This approach allows this division to optimize campaigns and resource allocation, ultimately enhancing their profitability. 

To develop a solution, we need to consider these key questions:

**`Key Questions:`**
- Which factor or variable should the Mass Funding division focus on to enhance term deposit sign-ups?
- How can the Mass Funding Division forecast customer likelihood to commit to term deposits?


### **Goals:**

The main goal here is to **create an effective classification model**. This model's job is to predict how likely it is that clients will choose term deposits. It's important that this model is good at predicting both clients that will say yes and they who will say no to term deposits.

Also, we want to **understand what makes customers decide to sign up for term deposits**. Knowing these key factors will help banks use their resources better, make their interactions with clients more effective, and ultimately, get more clients to agree to term deposits through their marketing.


### **Analytical Approach:**
We analyze the characteristics of potential customers who are likely to open a term deposit account and then build a supervised machine learning model, specifically a classification model, to predict which prospects are inclined to open a term deposit account. This way, these potential customers can be prioritized in the campaign, making the process more efficient.

### **Metric Evaluation:**

This analysis will focus on customers interested in investing in term deposits. The targets are defined as follows:

- **0:** indicates a customer who does not invest in a term deposit.
- **1:** represents a customer who invests in a term deposit.

Ensuring the accuracy of our model is crucial to avoid the financial repercussions of misclassification, which could come in the form of false positives or false negatives.

| **Error Type**     |**Explanation** | **Consequences** | 
|-----------------|------------|----------------|
| **False Positive / Type 1 Error**  |This occurs when the model incorrectly predicts that a customer will invest in a term deposit when they actually will not. In other words, the model's prediction is positive (1), but the true value is negative (0)| The bank will incur unnecessary marketing expenses by targeting customers who are not interested in investing in term deposits.|
| **False Negative / Type 2 Error**  |This happens when the model incorrectly predicts that a customer will not invest in a term deposit when they actually will. Here, the model's prediction is negative (0), but the true value is positive (1)| The bank will miss out on potential profits from customers who are likely to invest in term deposits.| 

----------------
#### **`Simulation`**
---------------------
We will conduct a simulation to get an idea of the consequences of each type of error using the following data: 
- Based on the dataset, the total number of customers contacted is 7805, of which 4075 did not open a term-deposit account and 3730 opened a term-deposit account.
- According to [callcenterhelper.com](https://www.callcentrehelper.com/how-do-i-calculate-cost-per-call-113990.htm) the average cost per call is EUR 3.64.
- According to [Brian William, PhD](https://blog.thebrevetgroup.com/21-mind-blowing-sales-stats), the average number of call attempts to reach a prospect is 8 times. 
- In 2014 when the data was generated, the [deposit interest rate](https://tradingeconomics.com/portugal/interest-rate) in Portugal was 0.25% while the [lending interest rate](https://tradingeconomics.com/portugal/bank-lending-rate) at the same time was around 5.59%.
- The minimum deposit account opening amount according to [withportugal.com](https://withportugal.com/en/immigration/depositos-portugal) is EUR 100 and the maximum is EUR 250,000.
- The average annual [wages](https://www.statista.com/statistics/419498/average-annual-wages-portugal-y-on-y/) in Portugal in 2014 was EUR 18,806. The average annual [living cost](https://www.numbeo.com/cost-of-living/historical-data?itemId=105&city_id=6029&name_city_id=Lisbon%2C+Portugal) in Portugal in 2014 was EUR 10,422.24 (excluding education costs etc.), leaving a balance of EUR 8,384. If calculated using the [moderate conservative](https://www.investopedia.com/managing-wealth/achieve-optimal-asset-allocation/) investment assumption where the allocation to fixed income investment (deposit) is 55% - 60%, then each client invests at least 0.55 x EUR 8,384 = EUR 4611.2 (≈ EUR 4611).

-----------------------------------------------------
- **`Type 1 Error Simulation`**
-----------------------------------------------------
This simulation was conducted to provide an overview of marketing budget losses caused by type 1 errors. The data used is the average cost incurred in one call which is EUR 4.15.

By using the average telemarketing call for each customer which is 8 times, the cost that must be incurred to contact each customer in one year is:

**Call cost for each person**= EUR 4.15 * 8
                            = EUR 33.2

Using this calculation, the bank incurs an average call cost of EUR 33.2 per customer per year. 

**Note**:
Note that the average cost per call used may be higher or lower than the actual. The above calculation is only a rough calculation without considering other cost components.

------------------------------------
- **`Type 2 Error Simulation`**
-----------------------
This simulation will illustrate the consequences in terms of profit in the event of a type 2 error. Funds obtained from term-deposits can be loaned by the bank so that the bank can benefit from the interest on the loan. The following is the calculation of profit per customer using the minimum opening deposit of EUR 4611.

**profit per customer** = (loan interest - deposit interest) x deposit amount
                        = (5.59% - 0.25%) x 4611
                        = EUR 257 per year

In considering the need to find a model that can enhance profitability, we will employ a business metric called **Profit** that will be calculated based on profit and loss data, as used in the simulation above. To ensure that the model performs well in distinguishing between positive and negative classes, we will also utilize the metric **AUC** (Area Under the Curve). AUC evaluates the model's performance by assessing the area beneath the ROC (Receiver Operating Characteristic) curve. The ROC curve itself is generated by plotting the True Positive Rate (sensitivity) against the False Positive Rate (1-specificity) at various classification thresholds.

This combined approach allows us to strike a balance between maximizing profit, which is a crucial business objective, and assessing the model's ability to effectively separate the two classes. The ROC AUC metric provides valuable insights into the model's overall performance in classification tasks. It's important to note that adjusting the classification threshold can influence the trade-off between true positives and false positives, and this adjustment should be made in accordance with business priorities and cost considerations.

# **Data Understanding**

### **About the Dataset**
The [Bank Marketing Campaign](https://drive.google.com/drive/folders/13lrEDlKfnTPNREfGLBaYGYf8dSjHBzfW) dataset is associated with the direct marketing campaigns (phone calls) of a Portuguese banking institution. The dataset is adapted from the [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/dataset/222/bank+marketing) and was taken in the year 2014. Bank Marketing Campaign dataset has been tailored to fit specific needs by reducing the number of records.

The Bank Marketing Campaign dataset comprises 7,813 rows and 11 columns. It includes information about **customer profiles** as well as data related to **customer's financial** and **bank's marketing efforts**. 

### **Features**
#### **Customer profile**
| Variable      | Explanation| 
|------------|-----------|
| age  | A numeric variable representing the client's age, ranging from 18 to 95.    | 
|job| The categorization of the client's occupation|

The following data elements support the prediction of whether a customer will invest in a term deposit or not:

- `age` : 

     Age plays a crucial role on investment decision[1] meanwhile, term deposits are a somewhat segmented investment product, so age can be an influential predictor. People in their productive years tend to choose investment products with higher returns, even if they carry greater risk, due to a preference for higher liquidity. Older customers are generally assumed to prefer low-risk investments with fixed interest rates, as indicated by research on investment behavior across age groups
- `job` :  

    An individual's occupation can serve as a predictor because different types of employment, such as those with fixed versus variable income, exhibit distinct financial behaviors [2]. Studies have shown that job security and income stability can influence investment decisions 

Based on research conducted by Geetha et.al [3], demographic profiles such as gender, age, education, occupation, income, savings and family size also affect investment decisions. Age and job data are not representative enough to represent these diverse demographic data. Since demographic data can be an important feature in classification, it needs to be added to better address the problem. Especially for demographic data related to spending and income such as family size, marital status and education.

#### **Customer financial**
| Variable      | Explanation| 
|------------|-----------|
| balance| A numeric variable representing the client's average yearly balance amount   |
|housing| A binary value denoting whether the client has a housing loan or not   |
|loan| A binary value indicating the presence or absence of client's personal loan |

- `balance`: 

    The average yearly balance can be a predictor because individuals who invest are assumed to have a stable balance throughout the year. This assumption is based on the theory that financial stability facilitates the capacity for long-term investments
- `housing` & `loan` : 

    The presence of loans can affect investment behavior. People with consumer loans are usually less inclined to engage in long-term investments, especially those that are less liquid. This is supported by studies linking debt obligations to reduced investment activity  

Individual financial management is closely related to investment decisions [4]. If the data above states about the loans owned, it is also necessary to look for things related to the ability to fulfill debt payment obligations because this can affect investment decisions. 

#### **Marketing data**
| Variable      | Explanation| 
|------------|-----------|
|contact  | Contact communication type. | 
|month| Last contact month of the year.|
|campaign|Number of contacts performed during this campaign and for this client. |
|pdays| Number of days after the client was contacted from the previous campaign |
|poutcome| Outcome of the previous marketing campaign.|
|deposit| Whether the customer deposits or not.|

Data on the campaigns that have been carried out and the outcome of the campaigns are needed to find out which aspects of the campaigns have a significant effect on increasing the number of customers investing in term-deposits.  

Because telemarketing is personalized marketing where marketers are expected to establish good relationships with clients, it is necessary to add indicators related to this. For example, when was the last time the marketer communicated with the customer, how long was the duration of communication and others. These indicators can be a predictor of campaign success and can be used to estimate whether a customer wants to invest in term-deposit or not. 

#### **Adding New Data**

To fulfill the data needs in performing classification, additional data is added from the original dataset, because in the original dataset there is additional information needed such as marital status, education, etc. The addition is done by merging and ensuring that the rows added are indeed the same rows as those in the dataset. The following is the final result of the data addition:

| Variable      | Explanation| 
|------------|-----------|
| age  | A numeric variable representing the client's age, ranging from 18 to 95.    | 
|job| The categorization of the client's occupation|
| balance| A numeric variable representing the client's average yearly balance amount   |
|housing| A binary value denoting whether the client has a housing loan or not   |
|loan| A binary value indicating the presence or absence of client's personal loan |
|contact  | Contact communication type. | 
|month| Last contact month of the year.|
|campaign|Number of contacts performed during this campaign and for this client. |
|pdays| Number of days after the client was contacted from the previous campaign |
|poutcome| Outcome of the previous marketing campaign.|
|deposit| Whether the customer deposits or not.|
| marital* | Marital status (categorical: 'divorced','married','single') |
| education* | Education level of customer (categorical: 
| default* | Has credit in default? (categorical: 'no','yes') |
| day* | Last contact day (numeric) |
| duration* | Last contact duration, in seconds (numeric) |
| previous* | number of contacts performed before this campaign and for this client (numeric) |

By using this completed data, the dataset was declared fit for use to answer the problem.

**Note :** Added columns are marked with a star (*)

#### **Data Types**

Before conducting another process, it is essential to comprehend the data types associated with each column. Below is the table illustrating the data types along with their corresponding columns:

| Type       | Category  | Columns                                                                                         |
|------------|-----------|------------------------------------------------------------------------------------------------|
| Numerical  | Discrete  | age, campaign, pdays, day, previous |
|            | Continuous| balance, duration                                                                        |
| Categorical| Ordinal   | poutcome, month, education                                                                                    |
|            | Nominal   | job, contact, marital|
|            | Nominal(Binary) | housing, loan, default, deposit|
 
 
##### **Source :**
[1] Charles, Mr.A. and Kasilingam, Dr.R. (2013) ‘Does the investor’s age influence their investment behaviour?’, Paradigm, 17(1–2), pp. 11–24. doi:10.1177/0971890720130103. 

[2] Bhola, Sarang Shankar and Shah, Vrushali and Zanvar, Priyanka, A Study of Relationship between Occupation and Individual Investment (July 5, 2013). Sinhgad International Business Review Vol. V, Issue I, July 2011-January 2012, Available at SSRN: https://ssrn.com/abstract=2290192

[3] Geetha, N., &amp; Ramesh, M. (2012). A study on relevance of demographic factors in investment decisions. Perspectives of Innovations, Economics and Business, 14–27. https://doi.org/10.15208/pieb.2012.02 

[4] Oppong, C., Atchulo, A. S., Akwaa-Sekyi, E. K., Grant, D., & Kpegba, S. A. (2023). Financial literacy, investment and personal financial management nexus: Empirical evidence on private sector employees. Cogent Business & Management, 10(2). https://doi.org/10.1080/23311975.2023.2229106