# problem Statement

-  The Challenge is for you to develop a machine learning model to predict suspected elder fraud in the digital payments space as described in Rule 4 of the attached Campus Analytics 2021 Challenge Rules document. Your machine learning model (“Solution”) must meet: (a) the Challenge Criteria, (b) follow the Challenge Instructions and Requirements, and (c) incorporate the Key Deliverables, each described in detail below.
- 
Deliverables
Download the attached trainset dataset
Download the attached testset dataset
Download and read the attached Campus Analytics 2021 Challenge Rules document. This contains critical information about challenge deliverables, instructions, suggestions, judging criteria, and winner eligibility.
To Complete a Submission:

# Build a classification model for predicting elder fraud in the digital payments space as described in Rule 4, which:

- Handles missing variables
- Maximizes the F1 score
- Uses the given data set
- Includes suitable encoding schemes
- Has the least set of feature variables
________________________________________________________________________________________________________________________________________

The dataset provided on the Challenge page is synthetic. Conditional GAN (“CTGAN”) was used to generate the synthetic dataset for this Challenge. CTGAN is a neural network model that helps to detect the distributions for the dataset and tries to generate data records with similar distribution compared with the original datasets. It can deal with both continuous and categorical features.

________________________________________________________________________________________________________________________________________

Challenge Instructions and Requirements:

When creating your Solution, you may use a novel combination of existing machine learning and/or statistical methods, or develop your own novel method in order to extract and/or represent thematic information from the data file.

The output needs to include prediction of a target variable. Additionally, your Solution must meet the following requirements:

You must use Python 3.
You must provide citations and sources.

________________________________________________________________________________________________________________________________________


Challenge Suggestions: You may use any clustering, dimensionality reduction, or other algorithm families. Please note that among other criteria, you will be evaluated on whether your selections of methods are appropriate for structured data.

________________________________________________________________________________________________________________________________________

Key Deliverables to Submit:

Deliverable 1: Your results

A table of your results assigning topics to a list of dataset description identifiers in the format shown below.

Very important note: Each solution received should ensure that the dataset_id is present in every record and arranged numerically in the same order as the testset_for_participants.csv file. The solution should also contain a column called ‘FRAUD_NONFRAUD’, which is the predicted class (either FRAUD or NONFRAUD); value of this column needs to be 0 (FRAUD) and 1 (NonFraud). If this is not the case, the submission will be automatically disqualified.






Deliverable 2: Your method

A description of your approach delivered as:

A visual description (flow chart or similar) of the path of the data through your pipeline. Note the areas where your approach is novel.
A few paragraphs describing the rationale behind your method.
Deliverable 3: Your code

Well-commented code that is operational and can be run using the data provided and generating the output of your approach.
An environment configuration file that lists the names and versions of the libraries you used.
NOTE** - You will use the attached "trainset" dataset to build your model. After you have the model, you will run it on the attached "testset_for_participants," produce the scores, and submit them.



In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import os 
%matplotlib inline

In [None]:
train=pd.read_excel("Train Dataset.xlsx") #loading the dataset

In [6]:
train.head(20)

Unnamed: 0,TRAN_AMT,ACCT_PRE_TRAN_AVAIL_BAL,CUST_AGE,OPEN_ACCT_CT,WF_dvc_age,PWD_UPDT_TS,CARR_NAME,RGN_NAME,STATE_PRVNC_TXT,ALERT_TRGR_CD,...,CUST_STATE,PH_NUM_UPDT_TS,CUST_SINCE_DT,TRAN_TS,TRAN_DT,ACTN_CD,ACTN_INTNL_TXT,TRAN_TYPE_CD,ACTVY_DT,FRAUD_NONFRAUD
0,5.38,23619.91,47,4,2777,1/16/2018 11:3:58,cox communications inc.,southwest,nevada,MOBL,...,NV,2/24/2021 15:55:10,1993-01-06,5/3/2021 18:3:58,5/3/2021,SCHPMT,P2P_COMMIT,P2P,5/3/2021,Non-Fraud
1,65.19,0.0,45,5,2721,,charter communications,southwest,california,MOBL,...,CA,,1971-01-07,1/13/2021 19:19:37,1/13/2021,SCHPMT,P2P_COMMIT,P2P,1/13/2021,Non-Fraud
2,54.84,34570.63,36,8,1531,12/22/2021 10:42:51,utah broadband llc,mountain,utah,ONLN,...,MD,5/5/2019 1:8:39,1994-02-01,4/8/2021 9:42:51,4/8/2021,SCHPMT,P2P_COMMIT,P2P,4/8/2021,Fraud
3,0.01,0.0,62,3,835,2/8/2020 7:28:31,t-mobile usa inc.,southwest,california,MOBL,...,NV,2/16/2019 6:45:37,2001-11-01,8/10/2021 15:28:31,8/10/2021,SCHPMT,P2P_COMMIT,P2P,8/10/2021,Non-Fraud
4,497.08,12725.18,81,2,1095,12/28/2020 12:12:44,cogent communications,south central,texas,MOBL,...,UT,5/8/2020 10:27:6,1987-02-07,6/27/2021 11:12:44,6/27/2021,SCHPMT,P2P_COMMIT,P2P,6/27/2021,Fraud
5,488.55,2851.44,45,8,1,3/15/2021 15:36:36,"ultimate internet access, inc",southwest,california,ONLN,...,CO,5/18/2021 9:50:5,2011-06-13,5/18/2021 14:36:36,5/18/2021,SCHPMT,P2P_COMMIT,P2P,5/18/2021,Fraud
6,490.6,3018.98,55,7,531,4/30/2021 19:16:2,cox communications inc.,southwest,california,MOBL,...,CA,3/16/2018 16:50:5,1971-10-02,1/8/2021 12:16:2,1/8/2021,SCHPMT,P2P_COMMIT,P2P,1/8/2021,Fraud
7,468.4,0.0,56,6,47,5/22/2021 18:34:33,t-mobile usa inc.,southwest,california,ONLN,...,CA,7/28/2019 12:4:47,1991-10-30,6/14/2021 12:34:33,6/14/2021,SCHPMT,P2P_COMMIT,P2P,6/14/2021,Non-Fraud
8,0.01,0.0,36,6,1182,1/27/2021 16:7:20,cox communications inc.,southwest,california,ONLN,...,TX,12/6/2019 6:4:6,2020-07-08,7/4/2021 12:0:51,7/4/2021,SCHPMT,P2P_COMMIT,P2P,7/4/2021,Non-Fraud
9,14.23,1890.65,72,4,276,4/22/2020 9:56:55,,,,ONLN,...,VA,5/12/2017 10:54:10,1976-12-23,3/3/2021 7:14:46,3/3/2021,SCHPMT,P2P_COMMIT,P2P,3/3/2021,Fraud
