# <center>Customer Segmentation using R
![](https://img.freepik.com/premium-vector/customer-segmentation-target-audience-analysis-vector-isometric-illustration-audience-segmentation-i_103044-1952.jpg?w=2000)

### What is Customer Segmentation?
Customer segmentation is the process of dividing customers into groups based on common characteristics such as demographics or behaviours, so you can market to those customers more effectively and appropriately.

### Types of customer segmentation :
There are different factors of segmentation that should be given careful consideration. These are not one-size-fits-all, and you should do what is right for your business.

Customer segmentation can be broken down into two types:

#### 1) Segmenting customers based on who they are:
The process of understanding who customers are typically focuses on demographics. This will include factors such as:
- Age
- Geography
- Urbanisation – are they city or rural?
- Income
- Relationship status
- Family
- Job type

#### 2) Segmenting customers based on what they do:
You can also segment customers based on how much they spend (share of wallet), how often, and what products (this allows you to see how much you can increase spend). This is more behaviour focused.

Breaking this down even further, behaviour can vary and you might want to look to separate as follows:
- Basket size
- Share of wallet
- Tenure (how long they stay with you)
- Longterm loyalty (a function of share of wallet and tenure)

#### 3) Why Segment Customers?

- Customer segmentation is popular because it helps you market and sell more effectively. This is because you can develop a better understanding of your customers’ needs and desires.
- The business impact of doing this is even more important, and effective customer segmentation will help you to increase customer lifetime value. This means they will stay longer, and spend more.


#### 4) Using Customer Segmentation...

A Company can :
- Create and communicate targeted marketing messages that will resonate with specific groups of customers, but not with others (who will receive messages tailored to their needs and interests, instead).
- Select the best communication channel for the segment, which might be email, social media posts, radio advertising, or another approach, depending on the segment. 
- Identify ways to improve products or new product or service opportunities.
- Establish better customer relationships.
- Test pricing options.
- Focus on the most profitable customers.
- Improve customer service.
- Upsell and cross-sell other products and services.

## About dataset :
The Dataset contains the information about the customers like Sex, Marital status, Age, Education, Income Occupation etc. 

**Sex:**

- 0 = Male
- 1 = Female

**Marital status:**

- 0 = Single
- 1 = Non-single (divorced / separated / married / widowed)

**Education:**

- 0 = Unknown! 
- 1 = High school
- 2 = Unversity
- 3 = Graduate

**Occupation:**

- 0 = Unemployed / unskilled
- 1 = Skilled employee / official
- 2 = Management / self-employed / highly qualified employee / officer

**Settlement size:**

- 0 = Small Cities
- 1 = Medium Cities
- 2 = Large Cities

### Importing data

In [None]:
data <- read.csv("../input/customer-clustering/segmentation data.csv")
str(data)

In [None]:
head(data)

In [None]:
summary(data)

In [None]:
# Making a copy of data for better visualizations with cateorical values
df <- data.frame(data)  
head(df)

### Replacing numerical value with its real values:

In [None]:
df['Sex'][df['Sex'] == 0] <- 'Male'
df['Sex'][df['Sex'] == 1] <- 'Female'

df['Marital.status'][df['Marital.status'] == 0 ] <- 'Single'
df['Marital.status'][df['Marital.status'] == 1 ] <- 'Non-Single'

df['Education'][df['Education'] == 0] <- 'Unknown'
df['Education'][df['Education'] == 1] <- 'High-School'
df['Education'][df['Education'] == 2] <- 'University'
df['Education'][df['Education'] == 3] <- 'Graduate'

df['Occupation'][df['Occupation'] == 0] <-  'Unemployed'
df['Occupation'][df['Occupation'] == 1] <-  'Skilled Employed'
df['Occupation'][df['Occupation'] == 2] <-  'Highly qualified Employed'

df['Settlement.size'][df['Settlement.size'] == 0] <- 'Small cities'
df['Settlement.size'][df['Settlement.size'] == 1] <- 'Medium cities'
df['Settlement.size'][df['Settlement.size'] == 2] <- 'Large cities'

In [None]:
head(df)

In [None]:
#barplot for sex column
a=table(df$Sex)
barplot(a,main="Barplot for Gender Comparison",
       ylab="Count",
       xlab="Sex",
       col=rainbow(6),
       legend=rownames(a))

In [None]:
#histogram to see age distribution of customers
hist(df$Age,
    col="purple",
    main="Histogram for value counts of Age",
    xlab="Age",
    ylab="Frequency",
    labels=TRUE)

- More than 50% of customers are under age of 35.

In [None]:
#histogram to see income distribution of customers
hist(df$Income,
    col="blue",
    main="Income of customers",
    xlab="Income",
    ylab="Count",
    labels=TRUE)

- Cutomers' average range of Income is 100000-150000

In [None]:
# Customer and their cities
a=table(df$Settlement.size)
barplot(a,main="City groups of Customers",
       ylab="Count",
       xlab="Type of city",
       col=rainbow(4),
       legend=rownames(a))

- Most of the customers are from small cities.
- Number of customers in Large cities & medium cities are almost same.

## Creating Clusters of Customer groups
- Here we'll be using a Machine Learning Technique called **K-Means Clustering** 
- K-Means is a clustering algorithm based on distance to determine the similarity of different points.
- It creates clusters by assigning points to the cluster nearest to them.

In [None]:
head(data)

In [None]:
#selecting importsnt columns for clustering
cdf <- data.frame(data$Age,data$Sex,data$Settlement.size,data$Income)
head(cdf)

In [None]:
#Determining the optimal no of clusters

library(cluster)

#using the gap statistics method.
set.seed(125)
stat_gap <- clusGap(cdf, FUN = kmeans, nstart = 25,
            K.max = 10, B = 50)
#plot(stat_gap)

#Diviging into 6 clusters
k6<-kmeans(cdf,6,iter.max=100,nstart=50,algorithm="Lloyd")
k6

In [None]:
#PLOTTING CUSTPLOT
options(repr.plot.width = 12, repr.plot.height = 10)
clusplot(cdf, k6$cluster, color=TRUE, shade=TRUE, labels=0,lines=0)

### Principle Component Analysis

In [None]:
#principal component analysis
pcclust=prcomp(cdf,scale=FALSE) 
summary(pcclust)
pcclust$rotation[,1:2]

### Visualizing Clusters using ggplot

In [None]:
library("ggplot2")
set.seed(1)
options(repr.plot.width = 12, repr.plot.height =8)
ggplot(cdf, aes(x = data.Income, y = data.Age)) + 
  geom_point(stat = "identity", aes(color = as.factor(k6$cluster))) +
  scale_color_discrete(name=" ",
              breaks=c("1", "2", "3", "4", "5","6"),
              labels=c("Cluster 1", "Cluster 2", "Cluster 3", "Cluster 4", "Cluster 5","Cluster 6")) +
  ggtitle("Segments of Customers", subtitle = "Using K-means Clustering")

### 📌If you're also a Beginner like me in R - copy & edit this code and experiment with it💁‍✨

# 🥰Kindly Upvote & support me if you find the notebook useful🌟