**PREDICTING CHURN SCORE FOR A WEBSITE**

**INTRODUCTION**

In the dynamic landscape of customer-centric industries, understanding and mitigating churn is a critical aspect of sustaining business success. Churn, defined as the rate at which customers disengage from a service, poses a significant challenge for businesses aiming to foster long-term relationships with their clientele. Recognizing the multifaceted nature of customer behavior, our project delves into the realm of predictive analytics to develop a robust model for estimating churn risk scores.



**BUSINESS UNDERSTANDING**

The churn rate, a key performance indicator in marketing, encapsulates the departure of customers within a specified timeframe. Leveraging a diverse array of data sources, our predictive model assigns a churn risk score to each user, offering a nuanced perspective on their likelihood of discontinuing their engagement with the business. This score is intricately derived from a comprehensive set of factors, encompassing user demographics, browsing behavior, historical purchase data, and more.

Our dataset, comprising extensive customer information, serves as the foundation for constructing a predictive model that goes beyond mere identification of churn. By incorporating user-centric attributes such as age, membership category, referral details, and transactional patterns, we aim to empower businesses with actionable insights into customer behavior. The temporal aspect is crucial, as our model dynamically updates churn risk scores daily for users who have experienced at least one conversion, ensuring real-time relevance.

The dataset, organized into training and testing subsets, encapsulates a diverse range of features, including customer profiles, interaction history, and feedback. As we navigate through the data, our objective is to develop a machine learning model capable of discerning patterns, relationships, and trends that contribute to a deeper understanding of customer churn. By doing so, we aim to equip businesses with the foresight to implement targeted retention strategies, ultimately bolstering customer satisfaction and long-term loyalty.

In essence, our project aligns with the imperative for businesses to transition from reactive to proactive churn management. Through the fusion of advanced analytics and a rich dataset, we aspire to provide actionable insights that empower businesses to not only predict customer churn but also to strategically intervene and cultivate enduring customer relationships.


**PROBLEM STATEMENT**

The Churn rate, a pivotal marketing metric, quantifies the count of customers disengaging from a business within a defined time frame. Our objective is to develop a predictive model that assigns a churn risk score to each user, offering an estimate of their likelihood to discontinue engagement. This score is derived from a synthesis of diverse data sources, including user demographic information, browsing behavior, historical purchase data, and other pertinent factors.

The predictive model incorporates our proprietary algorithms, which uniquely gauge how long a user is expected to remain a customer. This churn risk score is dynamically updated on a daily basis for users who have experienced a minimum of one conversion. The assigned values range between 1 and 5, providing a granular assessment of the customer's propensity to churn.


**DATA UNDERSTANDING**

The dataset folder contains the following files:

train.csv: 36992 x 25

test.csv: 19919 x 24

sample_submission.csv: 5 x 2

The columns provided in the dataset are as follows:

**customer_id**:	Represents the unique identification number of a customer

**Name**:	Represents the name of a customer

**age**:	Represents the age of a customer

**security_no**:	Represents a unique security number that is used to identify a person

**region_category**:	Represents the region that a customer belongs to 

**membership_category**:	Represents the category of the membership that a customer is using

**joining_date**:	Represents the date when a customer became a member

**joined_through_referral**:	Represents whether a customer joined using any referral code or ID

**referral_id**:	Represents a referral ID

**preferred_offer_types**:	Represents the type of offer that a customer prefers

**medium_of_operation**:	Represents the medium of operation that a customer uses for transactions

**internet_option**:	Represents the type of internet service a customer uses

**last_visit_time**:	Represents the last time a customer visited the website

**days_since_last_login**:	Represents the no. of days since a customer last logged into the website

**avg_time_spent**:	Represents the average time spent by a customer on the website

**avg_transaction_value**:	Represents the average transaction value of a customer

**avg_frequency_login_days**:	Represents the no. of times a customer has logged in to the website

**points_in_wallet**:	Represents the points awarded to a customer on each transaction

**used_special_discount**:	Represents whether a customer uses special discounts offered

**offer_application_preference**:	Represents whether a customer prefers offers 

**past_complaint**:	Represents whether a customer has raised any complaints

**complaint_status**:	Represents whether the complaints raised by a customer was resolved

**feedback**:	Represents the feedback provided by a customer

**churn_risk_score(TARGET)**:	Represents the churn risk score that ranges from 1 to 5

**EVALUATION METRIC**

score = 100 x metrics.f1_score(actual, predicted, average="macro")


In [1]:
import pandas as pd

In [2]:
# Specify the path to the folder containing the datasets
folder_path = 'dataset/'

# Read the training dataset (train.csv)
train_file_path = folder_path + 'train.csv'
train_df = pd.read_csv(train_file_path)
train_df

Unnamed: 0,customer_id,Name,age,gender,security_no,region_category,membership_category,joining_date,joined_through_referral,referral_id,...,avg_time_spent,avg_transaction_value,avg_frequency_login_days,points_in_wallet,used_special_discount,offer_application_preference,past_complaint,complaint_status,feedback,churn_risk_score
0,fffe4300490044003600300030003800,Pattie Morrisey,18,F,XW0DQ7H,Village,Platinum Membership,2017-08-17,No,xxxxxxxx,...,300.630000,53005.25,17.0,781.750000,Yes,Yes,No,Not Applicable,Products always in Stock,2
1,fffe43004900440032003100300035003700,Traci Peery,32,F,5K0N3X1,City,Premium Membership,2017-08-28,?,CID21329,...,306.340000,12838.38,10.0,,Yes,No,Yes,Solved,Quality Customer Care,1
2,fffe4300490044003100390032003600,Merideth Mcmeen,44,F,1F2TCL3,Town,No Membership,2016-11-11,Yes,CID12313,...,516.160000,21027.00,22.0,500.690000,No,Yes,Yes,Solved in Follow-up,Poor Website,5
3,fffe43004900440036003000330031003600,Eufemia Cardwell,37,M,VJGJ33N,City,No Membership,2016-10-29,Yes,CID3793,...,53.270000,25239.56,6.0,567.660000,No,Yes,Yes,Unsolved,Poor Website,5
4,fffe43004900440031003900350030003600,Meghan Kosak,31,F,SVZXCWB,City,No Membership,2017-09-12,No,xxxxxxxx,...,113.130000,24483.66,16.0,663.060000,No,Yes,Yes,Solved,Poor Website,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
36987,fffe43004900440035003500390036003100,Cuc Tarr,46,F,6F51HFO,,Basic Membership,2017-09-21,No,xxxxxxxx,...,-650.682759,27277.68,6.0,639.510000,No,Yes,Yes,No Information Available,No reason specified,4
36988,fffe43004900440033003500380036003600,Jenni Stronach,29,F,21KSM8Y,Town,Basic Membership,2016-06-27,No,xxxxxxxx,...,-638.123421,11069.71,28.0,527.990000,Yes,No,No,Not Applicable,Poor Customer Service,5
36989,fffe4300490044003500330034003100,Luciana Kinch,23,F,XK1IM9H,,Basic Membership,2016-09-11,Yes,CID3838,...,154.940000,38127.56,Error,680.470000,No,Yes,Yes,Unsolved,Poor Website,4
36990,fffe43004900440031003200390039003000,Tawana Ardoin,53,M,K6VTP1Z,Village,Platinum Membership,2017-06-15,No,xxxxxxxx,...,482.610000,2378.86,20.0,197.264414,Yes,Yes,No,Not Applicable,No reason specified,3


In [3]:
# Read the testing dataset (test.csv)
test_file_path = folder_path + 'test.csv'
test_df = pd.read_csv(test_file_path)
test_df

Unnamed: 0,customer_id,Name,age,gender,security_no,region_category,membership_category,joining_date,joined_through_referral,referral_id,...,days_since_last_login,avg_time_spent,avg_transaction_value,avg_frequency_login_days,points_in_wallet,used_special_discount,offer_application_preference,past_complaint,complaint_status,feedback
0,fffe43004900440031003700300030003400,Alethia Meints,50,F,OQJ1XAY,Village,Premium Membership,2015-11-02,No,xxxxxxxx,...,12,386.26,40721.44,7.0,733.830000,Yes,No,No,Not Applicable,Poor Product Quality
1,fffe43004900440031003900370037003300,Ming Lopez,41,M,OUQRPKO,Village,Gold Membership,2016-03-01,No,xxxxxxxx,...,11,37.80,9644.40,9.0,726.000000,Yes,No,No,Not Applicable,Poor Website
2,fffe43004900440034003800360037003000,Carina Flannigan,31,F,02J2RE7,Town,Silver Membership,2017-03-03,No,xxxxxxxx,...,18,215.36,3693.25,21.0,713.780000,Yes,No,Yes,Solved in Follow-up,No reason specified
3,fffe43004900440036003200370033003400,Kyung Wanner,64,M,5YEQIF1,Town,Silver Membership,2017-08-18,Yes,CID8941,...,-999,44.57,36809.56,11.0,744.970000,Yes,No,Yes,No Information Available,Too many ads
4,fffe43004900440035003000370031003900,Enola Gatto,16,F,100RYB5,Town,No Membership,2015-05-05,Yes,CID5690,...,6,349.88,40675.86,8.0,299.048351,No,Yes,Yes,Solved in Follow-up,Poor Website
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19914,fffe43004900440035003600330037003800,Kraig Peele,12,M,2V0HA0O,,Gold Membership,2015-01-25,No,xxxxxxxx,...,16,103.57,46279.35,18.0,708.120000,No,Yes,No,Not Applicable,Poor Product Quality
19915,fffe43004900440032003900370037003100,Damaris Sabol,40,F,VJGQD6Q,Village,No Membership,2017-12-31,Yes,CID45490,...,21,63.19,23466.26,Error,574.340000,No,Yes,No,Not Applicable,No reason specified
19916,fffe43004900440036003100310036003700,Loura Huckstep,55,M,ADE7LWA,Town,No Membership,2015-09-09,No,xxxxxxxx,...,18,68.72,17903.75,24.0,564.300000,No,Yes,Yes,Unsolved,No reason specified
19917,fffe43004900440034003200330033003600,Sharita Clubb,17,F,A35KUBS,City,Silver Membership,2016-04-17,Yes,CID37167,...,3,119.54,14057.09,22.0,606.340000,No,Yes,No,Not Applicable,Poor Website
