# MINDD TP1

## I. Business Understanding


### Objective

The objective of the project is to build data-driven models to predict the success of telemarketing calls aimed at selling long-term deposits.

In this context, by applying data mining, it is expected to identify influencing factors related to customers and campaigns that can significantly improve business decisions.

Marketing is targeted at a specific segment of customers of a certain type. The purpose of this project is to develop a model, using the CRISP-DM methodology, that can identify and prioritize these customers.


### The data

The data is related to direct marketing campaigns of a banking institution.
The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required
to access if the product (bank term deposit) is (or not) subscribed.

It consists of:
-  Attribute information bank client data
-  Social and economic context attributes
-  Other attributes (Related to the contacts made with the client prior and during campaigns)

### What is a term deposit

> *"A term deposit is a savings' tool where money is deposited into an account at a financial institution. Term deposit investments usually have short-term maturities ranging from one month to a few years, have varying levels of required minimum deposits, and pay a fixed interest rate to the investor."*
>
> Quoted from [`Term Deposit: Definition, How It's Used, Rates, and How to Invest`](https://www.investopedia.com/terms/t/termdeposit.asp), James Chen.




### Determining factors for individual customers to opt for term deposits
> *"... there are 5 groups of factors that greatly affect the decisions of individual customers to choose a savings' bank: Service quality, Safety, Related effects, Benefits financial benefits, convenience"*
>
>  Tuan, L. A., Nhu, M. T. Q., & Nhan, N. le. (2021). Factors Affecting the Decision of Selecting Banking to Save Money of Individual Customers – Experimental in Da Nang City. Advances in Science, Technology and Engineering Systems Journal, 6(3), 409–417. https://doi.org/10.25046/aj060345

In other words, we can expect higher conversion rates if these factors are present. It would therefore be useful to analyze the data to determine whether these elements can be identified and quantified.



### Determining factors for telemarketing success

> *"High-quality, accurate customer data and well-targeted segments are key drivers of telemarketing success."*
>
> ICTSD. (2021). Factors affecting telemarketing productivity. International Centre for Trade and Sustainable Development. https://www.ictsd.org/unraveling-productivity-challenges-in-the-telemarketing-department

> *"Timing calls appropriately and implementing consistent follow-ups significantly improve customer engagement and conversion rates."*
>
> Tuan, L. A., Nhu, M. T. Q., & Nhan, N. L. (2018). Factors influencing customer purchasing behavior in telemarketing. So09.tci-thaijo.org. https://so09.tci-thaijo.org/index.php/PMR/article/view/5570

We can also expect higher conversion rates if these factors are present. It would be useful to analyze the data and determine whether these elements can be identified and quantified.


### Plan

#### I. Collect data

#### II. Clean Data 
As mentioned in bank-information.txt, there is missing client categorical information that is defined as "unknown".

If these values are not present in a great part of the data, the clients that have missing data should be excluded because they would either give unreliable information about the "unknown" group, or if they are absorbed into other groups, it would also not provide reliable information.

It must be also investigated if there is abnormal data and also the conversion of data to proper numeric values and units.

#### III. Explorative Data Analysis
-  Understand the relation between properties. Try to unravel groups of customers and how they behave.

-  Investigate the relation between conversion rates and the various factors

-  Determine the social economic factors and how they sway the conversion rates.
  
-  Determine the quality of the telemarketing campaign and how it affects conversion rates.

##### IIII. Create a classification machine learning model 

Create a model using various algorithms to fit the data onto the best possible solution


## II. Data Understanding

### Imports

In [1]:
import pandas as pd # Pandas

# Math / Statistics
import scipy as sp

# Visualization Libraries
import matplotlib as mplt
import matplotlib.pyplot as plt
import seaborn as sns


### Load Data

In [2]:
bankDF = pd.read_csv("bank.csv", delimiter=";")

In [3]:
bankDF.shape

(41188, 21)

The Initial Dataset has 41188 rows e 21 columns

In [1]:
print("Columns:")
for column in bankDF.columns:
    print(f"\n - {column}")

Columns:


NameError: name 'bankDF' is not defined

In [5]:
bankDF.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41188 entries, 0 to 41187
Data columns (total 21 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   age             41188 non-null  int64  
 1   job             41188 non-null  object 
 2   marital         41188 non-null  object 
 3   education       41188 non-null  object 
 4   default         41188 non-null  object 
 5   housing         41188 non-null  object 
 6   loan            41188 non-null  object 
 7   contact         41188 non-null  object 
 8   month           41188 non-null  object 
 9   day_of_week     41188 non-null  object 
 10  duration        41188 non-null  int64  
 11  campaign        41188 non-null  int64  
 12  pdays           41188 non-null  int64  
 13  previous        41188 non-null  int64  
 14  poutcome        41188 non-null  object 
 15  emp.var.rate    41188 non-null  float64
 16  cons.price.idx  41188 non-null  float64
 17  cons.conf.idx   41188 non-null 