Author: Miguel Santana
Photo by Lukas Blazek on Unsplash
Thank you for reviewing this repository. The author's contact info, blog post, sources and social media profiles are listed below under further information.
FinMan Company is looking to leverage their client base by cross selling insurance products to existing customers. Insurance policies are offered to prospective and existing clients based on website landing and consumer election to fill out additional information forms. FinMan company would like to leverage their acquired information to classify positive leads for outreach programs using machine learning classifiers.
The project dataset is provided by Analytics Vidhya via Kaggle. Data includes demographic features, policy features (for current customers) and example positive classifications for ML model validation and interpretation. The source can be found here. The project analysis followed the OSEMN framework: Obtain, Scrub, Explore, Model and Interpret.
Once data processing (filling nulls, feature engineering, etc) was completed, a single column was dropped in order to combat multicollinearity. Categorical variables were encoded using a helper function which groups the category by a target variable, sorts the target values and enumerates the labels (treating them as ordinal values). This allowed the features to be converted and representative of the appropriate scale in a single step. This method is credited to Dr. Soledad Galli.
GridSearchCV using Gradient Boosting Classifier was performed.
Scores
The model's top 3 features were Reco Policy Category, Reco Policy Premium and City Code. Within those three categories, subcategories yielded the highest positive to total response ratios. It is recommended to focus on clients in/with:
City Codes: C1, C2, C13, C23
Reco Policy Categories: 15, 22
Reco Policy Premiums between: 15,000 & 19,999.
The project was limited by the anonymity of the data. Specifically the geographic data that could have been used for additional feature engineering leading to higher scores.
Future models can be created using more complicated feature engineering and analysis such as clustering of the geographic features. For the purposes of this project, doing so would have complicated the output and made it difficult to implement within a real workplace.
Please review the narrative of our analysis in our jupyter notebook or review our presentation
For any additional questions, please reach out via email at santana2.miguel@gmail.com, on LinkedIn or on Twitter.
├── README.md <- The top-level README for reviewers of this project.
├── insurance_notebook.ipynb <- narrative documentation of analysis in jupyter notebook
├── presentation.pdf <- pdf version of project presentation