# Feature Exploration

Given the [Kaggle Dataset on Lead Scoring](https://www.kaggle.com/datasets/amritachatterjee09/lead-scoring-dataset?select=Lead+Scoring.csv), I will first do some general EDA to discover potential trends within the data to determine possible next steps.

From the Kaggle site, X Education is education company selling online courses to industry professionals. The typical conversion rate is ~30%. The goal is to increase the conversion rate, with the CEO hoping to achieve a lead conversion rate of about 80%. Our task is to assign a lead score to each of the leads such that the customers with higher lead score *h* have a higher conversion chance (IE create a propensity score).

In [1]:
# Modules
import numpy as np
import pandas as pd
from pathlib import Path
import os
from ydata_profiling import ProfileReport

In [2]:
# Pull in data
raw_data = pd.read_csv(Path(os.getcwd()).parents[0].joinpath("data", "Lead Scoring.csv"))

# General EDA

Clearly, the goal of this exercise is to increase the conversion rate, making the `Converted` column our target variable.

To get a clear picture of how variables might relate to one another, I will be using the `ProfileReport` function to create an HTML page summarizing distributions, correlations, and other measures.

In [None]:
# Use ProfileReport to create HTML doc of general summary
profile = ProfileReport(raw_data, title = "Profiling Report")
profile.to_file("feature_eda.html")

In [5]:
print(raw_data.columns)

Index(['Prospect ID', 'Lead Number', 'Lead Origin', 'Lead Source',
       'Do Not Email', 'Do Not Call', 'Converted', 'TotalVisits',
       'Total Time Spent on Website', 'Page Views Per Visit', 'Last Activity',
       'Country', 'Specialization', 'How did you hear about X Education',
       'What is your current occupation',
       'What matters most to you in choosing a course', 'Search', 'Magazine',
       'Newspaper Article', 'X Education Forums', 'Newspaper',
       'Digital Advertisement', 'Through Recommendations',
       'Receive More Updates About Our Courses', 'Tags', 'Lead Quality',
       'Update me on Supply Chain Content', 'Get updates on DM Content',
       'Lead Profile', 'City', 'Asymmetrique Activity Index',
       'Asymmetrique Profile Index', 'Asymmetrique Activity Score',
       'Asymmetrique Profile Score',
       'I agree to pay the amount through cheque',
       'A free copy of Mastering The Interview', 'Last Notable Activity'],
      dtype='object')
<bound meth