# Customer Personality Analysis
## Joshua Hess

![Personality Analysis]()

## Introduction

This project's purpose is to examine customer data and perform a customer personality analysis. This can help identify the common groups that customers fall into, and allow us to perform a customer segmentation. A **customer segmentation** is a method where a business can divide their customers into a few specific subgroups. This is helpful because a business can decide how to market differently to each one of these subgroups to maximize the business' overall profit. We will perform this segmentation using a machine learning model called *K-Means Clustering*. 

## How K-Means Clustering Works

*K-Means Clustering* is an unsupervised model, which means it attempts to find patterns or trends within the data. It plots multiple details about a customer population, and then attempts to find "clusters" of customers that are similar to each other. This allows the model to identify each cluster as a subgroup of our customer population. Once we've identified our different subgroups, we can do some analysis to learn more details about each subgroup. 

This project will have the following structure:

1. Loading in and inspecting data.
2. Cleaning data. (handling missing values, checking for outliers)
3. Feature engineering.
4. Data preprocessing for machine learning.
5. Creating K-Means Clustering model.
6. Compare each customer subgroup defined by our K-means model. 

# 1) Loading in and Inspecting Data

Let's first load in our data and take a look at the given variables:

In [10]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

data = pd.read_csv("/kaggle/input/customer-personality-analysis/marketing_campaign.csv", sep = "\t")
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2240 entries, 0 to 2239
Data columns (total 29 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   ID                   2240 non-null   int64  
 1   Year_Birth           2240 non-null   int64  
 2   Education            2240 non-null   object 
 3   Marital_Status       2240 non-null   object 
 4   Income               2216 non-null   float64
 5   Kidhome              2240 non-null   int64  
 6   Teenhome             2240 non-null   int64  
 7   Dt_Customer          2240 non-null   object 
 8   Recency              2240 non-null   int64  
 9   MntWines             2240 non-null   int64  
 10  MntFruits            2240 non-null   int64  
 11  MntMeatProducts      2240 non-null   int64  
 12  MntFishProducts      2240 non-null   int64  
 13  MntSweetProducts     2240 non-null   int64  
 14  MntGoldProds         2240 non-null   int64  
 15  NumDealsPurchases    2240 non-null   i