Health Insurance Lead Prediction - Kaggle Competition

Job-A-Thon - Analytics Vidhya, Health Insurance

Author: Miguel Santana

Photo by Lukas Blazek on Unsplash

Thank you for reviewing this repository. The author's contact info, blog post, sources and social media profiles are listed below under further information.

Project Methodology

FinMan Company is looking to leverage their client base by cross selling insurance products to existing customers. Insurance policies are offered to prospective and existing clients based on website landing and consumer election to fill out additional information forms. FinMan company would like to leverage their acquired information to classify positive leads for outreach programs using machine learning classifiers.

Data and Analytical Structure

The project dataset is provided by Analytics Vidhya via Kaggle. Data includes demographic features, policy features (for current customers) and example positive classifications for ML model validation and interpretation. The source can be found here. The project analysis followed the OSEMN framework: Obtain, Scrub, Explore, Model and Interpret.

Data Processing and Modeling

Once data processing (filling nulls, feature engineering, etc) was completed, a single column was dropped in order to combat multicollinearity. Categorical variables were encoded using a helper function which groups the category by a target variable, sorts the target values and enumerates the labels (treating them as ordinal values). This allowed the features to be converted and representative of the appropriate scale in a single step. This method is credited to Dr. Soledad Galli.

Pycaret Modeling

Target Imbalance

Balanced Target | SMOTE

Scikit-learn Modeling

GridSearchCV using Gradient Boosting Classifier was performed.

Model Validation

Scores

Interpreting Results | Feature Importance

Reco Policy Category

Reco Policy Premium

City Code

Recommendations

The model's top 3 features were Reco Policy Category, Reco Policy Premium and City Code. Within those three categories, subcategories yielded the highest positive to total response ratios. It is recommended to focus on clients in/with:

City Codes: C1, C2, C13, C23

Reco Policy Categories: 15, 22

Reco Policy Premiums between: 15,000 & 19,999.

Limitations

The project was limited by the anonymity of the data. Specifically the geographic data that could have been used for additional feature engineering leading to higher scores.

Future Work

Future models can be created using more complicated feature engineering and analysis such as clustering of the geographic features. For the purposes of this project, doing so would have complicated the output and made it difficult to implement within a real workplace.

Further Information

Please review the narrative of our analysis in our jupyter notebook or review our presentation

For any additional questions, please reach out via email at santana2.miguel@gmail.com, on LinkedIn or on Twitter.

Repository Structure:


├── README.md               <- The top-level README for reviewers of this project.
├── insurance_notebook.ipynb     <- narrative documentation of analysis in jupyter notebook
├── presentation.pdf        <- pdf version of project presentation

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
images		images
.gitignore		.gitignore
README.md		README.md
insurance_notebook.ipynb		insurance_notebook.ipynb
presentation.pdf		presentation.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

images

images

.gitignore

.gitignore

README.md

README.md

insurance_notebook.ipynb

insurance_notebook.ipynb

presentation.pdf

presentation.pdf

Repository files navigation

Health Insurance Lead Prediction - Kaggle Competition

Job-A-Thon - Analytics Vidhya, Health Insurance

Project Methodology

Data and Analytical Structure

Data Processing and Modeling

Pycaret Modeling

Target Imbalance

Balanced Target | SMOTE

Scikit-learn Modeling

Model Validation

Interpreting Results | Feature Importance

Reco Policy Category

Reco Policy Premium

City Code

Recommendations

Limitations

Future Work

Further Information

Repository Structure:

About

Releases

Packages

Languages

miguelangelsantana/Health-Insurance-Lead-Prediction

Folders and files

Latest commit

History

Repository files navigation

Health Insurance Lead Prediction - Kaggle Competition

Job-A-Thon - Analytics Vidhya, Health Insurance

Project Methodology

Data and Analytical Structure

Data Processing and Modeling

Pycaret Modeling

Target Imbalance

Balanced Target | SMOTE

Scikit-learn Modeling

Model Validation

Interpreting Results | Feature Importance

Reco Policy Category

Reco Policy Premium

City Code

Recommendations

Limitations

Future Work

Further Information

Repository Structure:

About

Resources

Stars

Watchers

Forks

Languages