For this lab, we will be using the Marketing Customer Value Analysis database from before (marketing_customer_analysis.csv). An auto insurance company has collected some data about its customers including their demographic data, education, employment, policy details, vehicle information, insurance policy and claim amounts.
This lab will focus on data cleaning and wrangling, this is a crucial step in the EDA process.
- Remove the outliers in the dataset using one of the methods we've discussed by defining a function and applying it to the dataframe.
- Create a copy of the dataframe for the data wrangling.
- Normalize the continuous variables.
- Encode the categorical variables.
- Transform the time variables (day, week and month) to integers.
- Since the model will only accept numerical data, check and make sure that every column is numerical, if some are not, change it using encoding.