<h1>Credit Card Routing for Online Purchase via Predictive Modelling</h1>

<h1>1. Introduction</h1>

"Online retail business" implies a business process in which customers have a plethora of options in searching for, selecting, and purchasing products, information, and services over the internet.

Most retailers sell their goods online, and this explains why online retail business is simply a form of electronic commerce that enables consumers to buy goods or services or shop directly from their phones, tabs, laptops, or other internet-connected devices anywhere in the world.

<h2>1.1 Methodology</h2>

For the case study the cross-industry standard process for data mining or CRISP-DM will be used. The process icludes 6 phases (steps) which include further generic tasks: 
<ul>
<li>Business Understanding</li>
<li>Data Understanding</li>
<li>Data Preparation</li>
<li>Modeling</li>
<li>Evaluation</li>
<li>Deployment</li>
</ul>

The generic tasks describe what needs to be done in a phase before moving on to the next one. 

<img width="450" src="https://miro.medium.com/max/640/1*ezhOVnxDcwudVc3eJYe5hQ.jpeg"/>

<h2>1.2 Business Understanding</h2>

Over the past decade, more and more payment options have become available for customers to make online payments, making them more convenient than ever. Due to this trend, any business can benefit massively from offering multiple payment options to its customers. This helps in acquiring new customers, as well as retaining current ones. However, maintaining a growing number of payment options can create a lot of extra costs, labor, and security issues. This is where Payment Service Providers (PSPs) can help.

PSPs (also called Merchant Service Providers) are third-party companies that help business owners accept a wide range of online payment methods, like online banking, credit cards, debit cards, e-wallets, cash cards, and more. Basically, they make sure your transactions make it from point A to point B, safely and securely.

PSPs see to it that transactions are completed — from the start, when a customer enters their details and initiates a payment, to finish, when you receive it.

<h3>1.2.1 Problem definition</h3>

Every day, businesses face more and more pressure to reduce costs, reduce process times and increase profit. 

Online credit card payments are performed via so-called payment service providers, referred to as “PSPs”. The current routing logic is manual and rule-based. The company has contracts with four different PSPs and pays transaction fees for every single payment.

Over the past year the Online retail have encountered a high failure rate of online credit card payments. The company loses a lot of money due to failed transactions and customers become increasingly unsatisfied with the online shop. 

<h3>1.2.2 Main Goal And Specific Objective</h3>

Business decision-makers, however, hope that with predictive modelling and with your help, a smarter way of routing a PSP to a transaction is possible.

1. Analyze the problem root cause. How success rate is changing over time?
2. Build a model that increase the payment success rate by finding the best possible PSP for each transaction and at the same time keep the transaction fees low.

<h3>1.2.3 Tools</h3>

The tools used for the project are: Jupiter Notebook, Python, pandas, NumPy, matplotlib, seaborn, scikit-learn.

<h1>2. Data Understanding</h1>

The Data understanding phase is where we focus on understanding the data we had to support the Business Understanding and solve the business problem.

<h2>2.1 Data Sources</h2>

There are two data sources: List of credit card transactions for DACH countries (Germany, Switzerland, Austria) for January and February 2019 "PSP_Jan_Feb_2019.csv" and List of payments service providers (PSPs) and service fees "Service_fees.csv". 

The data are collected from online purchasing. 

In [32]:
import pandas as pd
transactions = pd.read_csv('datasets/PSP_Jan_Feb_2019.csv')

<h2>2.2 Describe data</h2>

Now we can see all the general information of the dataset. First we will see the first 5 rows of the dataset. 

In [33]:
transactions.head()

Unnamed: 0,id,tmsp,country,amount,success,PSP,3D_secured,card
0,0,2019-01-01 00:01:11,Germany,89,0,UK_Card,0,Visa
1,1,2019-01-01 00:01:17,Germany,89,1,UK_Card,0,Visa
2,2,2019-01-01 00:02:49,Germany,238,0,UK_Card,1,Diners
3,3,2019-01-01 00:03:13,Germany,238,1,UK_Card,1,Diners
4,4,2019-01-01 00:04:33,Austria,124,0,Simplecard,0,Diners


In [34]:
print(transactions.shape)
print(transactions.info())

(50410, 8)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50410 entries, 0 to 50409
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   id          50410 non-null  int64 
 1   tmsp        50410 non-null  object
 2   country     50410 non-null  object
 3   amount      50410 non-null  int64 
 4   success     50410 non-null  int64 
 5   PSP         50410 non-null  object
 6   3D_secured  50410 non-null  int64 
 7   card        50410 non-null  object
dtypes: int64(4), object(4)
memory usage: 3.1+ MB
None


The dataset has 50410 entries and 8 columns:
    
Feature | Variable Type | Data Type | Description 
:--------: | ------- | :-------: | -------  
`id` | continuous |  int  | index
`tmsp` | continuous |  timestamp  | timestamp of transaction
`country` | categorical | string | country of transaction (Germany, Switzerland, Austria)
`amount` | continuous | float | transaction amount (euro)
`success` | categorical | boolean | 1 - payment is successful, 0 payment has failed
`PSP` | categorical | string | name of payments service provider (Moneycard, Goldcard, UK_Card, Simplecard) 
`3D_secured` | categorical | boolean | 1 - customer is 3D identified, 0 customer is Not 3D identified
`card` | categorical | string | credit card provider (Master, Visa, Diners)
    

<h2>2.3 Verify data quality</h2>

Are there any missing or duplicate values variables? There are no missing or duplicate values. 

In [31]:
print('There are:', transactions[transactions.duplicated()==True].shape[0],'data duplicate.')

There are: 0 data duplicate.


<h2>2.4 Explore data</h2>

<h1>3. Data Preparation</h1>

After you understand the data you have, it is time for the Data Preparation. This phase is what we did to prepare the data for the modeling phase. 

<h2>3.1 Data Selection</h2>

<h2>3.2 Data Cleaning</h2>

<h2>3.3 Feature Engineering</h2>

<h2>3.4 Data Integration</h2>

<h2>3.5 Data Formatting</h2>