The project aims to define martketing strategies to increase the revenues generated by the customers of the Hotel chain website through a better understanding of the customer segments.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
Data
Library
Marketing Strategy & Analysis.Rmd
Marketing Strategy & Analysis.Rproj
MarketingStrategyAnalytics.html
README.md

README.md

Marketing Analysis for Hotel Chain Website

The Business Questions

Define martketing strategies to increase the revenues generated by the customers of the website through a better understanding of the customer segments.

The Process

We followed a 3 steps process:

Part 1: Data set inspection and dimensionality reduction

Part 2: Data analysis and marketing insights

Part 3: Final recommendation and strategy

Part 1 - Data Inspection

Key Customer Characteristics

First step: we load the data to use:

The data refers to the users and their interaction with the website until that time. Overall the dataset contains information on almost 1 million customers grouped in 667 types of customers described by 19 number of variables.

Variables description:

  • Type of User: Returning or New Visitor
  • Source: Organic vs Paid Promotions (cpc, cpm, referral)
  • Users: Number of User Visits
  • Sessions: Number of User Sessions
  • No of Pages Visited: Number of total pages visited
  • No of Transactions: Number of monetary transactions
  • Revenue: Revenue generated
  • Dummy Variable: Created dummy variables categorical data such as 'user type', 'source' and 'device type'
factor_attributes_used = c(1:19)
factor_selectionciterion = 
minimum_variance_explained = 65
manual_numb_factors_used = 5
rotation_used = "varimax"
max_data_report = 5
factor_attributes_used <- intersect(factor_attributes_used, 1:ncol(myData))
ProjectDataFactor <- myData[,factor_attributes_used]
ProjectDataFactor <- as.matrix(ProjectDataFactor)

Check the Data

We start by doing a basic visual exploration of the first customer types:

**Data Visualization**

Customer01 Customer02 Customer03 Customer04 Customer05
Type.of.User Returning Visitor Returning Visitor New Visitor Returning Visitor Returning Visitor
Source google / organic google / organic (direct) / (none) google / organic google / cpc
Device desktop desktop desktop desktop desktop
Returning.Visitor 1 1 0 1 1
New.Visitor 0 0 1 0 0
Google.CPC 0 0 0 0 1
TripAdvisor 0 0 0 0 0
Direct 0 0 1 0 0
DFA 0 0 0 0 0
Email 0 0 0 0 0
Referral 0 0 0 0 0
Desktop 1 1 1 1 1
Mobile 0 0 0 0 0
Tablet 0 0 0 0 0
Users 7240 21442 2989 2637 3538
Sessions 10776 35015 2996 4112 5179
No.of.Pages.Visited 45328 173192 7989 16936 21534
No.of.Transactions 64 130 13 15 35
Revenue 456710.54 362368.88 341553.90 278226.12 216213.00

Check Correlations

We can then analyze the data in terms of correlation between different variables. The objetctive is to get a first idea of possible cross-relationships between variables that could lead to a dimensionality reduction as a first step for the analysis.

**Correlation Table**

Returning.Visitor New.Visitor Google.CPC TripAdvisor Direct DFA Email Referral Desktop Mobile Tablet Users Sessions No.of.Pages.Visited No.of.Transactions Revenue
Returning.Visitor 1.00 -1.00 -0.03 -0.01 -0.19 0.11 0.08 0.09 0.10 -0.06 -0.07 -0.25 -0.18 -0.16 0.05 0.03
New.Visitor -1.00 1.00 0.03 0.01 0.19 -0.11 -0.08 -0.09 -0.10 0.06 0.07 0.25 0.18 0.16 -0.05 -0.03
Google.CPC -0.03 0.03 1.00 -0.07 -0.13 -0.09 -0.07 -0.23 0.13 -0.09 -0.09 0.04 0.05 0.05 0.14 0.10
TripAdvisor -0.01 0.01 -0.07 1.00 -0.10 -0.07 -0.05 0.10 -0.09 0.01 0.10 -0.06 -0.07 -0.06 -0.06 -0.06
Direct -0.19 0.19 -0.13 -0.10 1.00 -0.13 -0.10 -0.31 -0.13 0.12 0.06 0.15 0.14 0.10 0.11 0.16
DFA 0.11 -0.11 -0.09 -0.07 -0.13 1.00 -0.07 -0.22 0.01 -0.02 0.01 -0.05 -0.03 -0.04 0.01 -0.01
Email 0.08 -0.08 -0.07 -0.05 -0.10 -0.07 1.00 -0.15 0.12 -0.07 -0.09 -0.07 -0.07 -0.07 -0.06 -0.06
Referral 0.09 -0.09 -0.23 0.10 -0.31 -0.22 -0.15 1.00 -0.05 0.04 0.03 -0.22 -0.24 -0.23 -0.19 -0.20
Desktop 0.10 -0.10 0.13 -0.09 -0.13 0.01 0.12 -0.05 1.00 -0.60 -0.74 -0.07 -0.10 -0.13 0.13 0.11
Mobile -0.06 0.06 -0.09 0.01 0.12 -0.02 -0.07 0.04 -0.60 1.00 -0.10 0.12 0.13 0.18 -0.07 -0.06
Tablet -0.07 0.07 -0.09 0.10 0.06 0.01 -0.09 0.03 -0.74 -0.10 1.00 -0.01 0.01 0.00 -0.10 -0.09
Users -0.25 0.25 0.04 -0.06 0.15 -0.05 -0.07 -0.22 -0.07 0.12 -0.01 1.00 0.98 0.92 0.46 0.39
Sessions -0.18 0.18 0.05 -0.07 0.14 -0.03 -0.07 -0.24 -0.10 0.13 0.01 0.98 1.00 0.95 0.53 0.45
No.of.Pages.Visited -0.16 0.16 0.05 -0.06 0.10 -0.04 -0.07 -0.23 -0.13 0.18 0.00 0.92 0.95 1.00 0.44 0.40
No.of.Transactions 0.05 -0.05 0.14 -0.06 0.11 0.01 -0.06 -0.19 0.13 -0.07 -0.10 0.46 0.53 0.44 1.00 0.74
Revenue 0.03 -0.03 0.10 -0.06 0.16 -0.01 -0.06 -0.20 0.11 -0.06 -0.09 0.39 0.45 0.40 0.74 1.00

From correlation matrix we can see that there is a good positive correlation between number of Sessions, number of pages visited, number of transactions and revenues. We can also see that Desktop device is the one that generates most of the traffic and revenues in absolute value.

Choose number of factors

To use Factor Analysis, we first need to adjust data to have only numeric data. We can then remove first columns and keep only binary values.

num_data <- myData[,4:ncol(myData)]
scaled_data <- apply(num_data,2, function(r) {if (sd(r)!=0) res=(r-mean(r))/sd(r) else res=0*r; res})

Now that we have only numeric data, we can proceed with Factor Analysis.

Variance_Explained_Table_results<-PCA(scaled_data, graph=FALSE)
Variance_Explained_Table<-Variance_Explained_Table_results$eig
Variance_Explained_Table<-as.data.frame(Variance_Explained_Table)
colnames(Variance_Explained_Table)<-c("Eigenvalue", "Percentage_of_explained_variance", "Cumulative_percentage_of_explained_variance")

eigenvalues  <- Variance_Explained_Table[,2]

Let's look at the variance explained as well as the eigenvalues

**Variance Explained**

Eigenvalue Percentage_of_explained_variance Cumulative_percentage_of_explained_variance
comp 1 3.84 24.01 24.01
comp 2 2.34 14.61 38.61
comp 3 1.81 11.29 49.91
comp 4 1.29 8.07 57.98
comp 5 1.17 7.33 65.31
comp 6 1.10 6.88 72.19
comp 7 1.05 6.55 78.74
comp 8 1.02 6.39 85.13
comp 9 0.92 5.74 90.87
comp 10 0.78 4.89 95.76
comp 11 0.34 2.15 97.91
comp 12 0.25 1.56 99.47
comp 13 0.07 0.45 99.93
comp 14 0.01 0.07 100.00
comp 15 0.00 0.00 100.00
comp 16 0.00 0.00 100.00

Interpret the Components

We can see from the chart above that we could use only 8 components. This can be easily explained looking at the variables. Some variables are binary values conveying the same meaning as the text variables. We can identify the following groups:

  • Device
  • Returning/New Visitor
  • Source
  • Revenues
  • Number of Sessions
  • Number of pages visited
  • Number of Users
  • Number of Transactions

In this case we suggest to keep all the original variables since they are just a way to provide the same information in a binary form.

Part 2: Marketing Strategy

To analyze different scenarios and identify possible strategies, we can focus on different behaviors across devices.

averageTransaction <- aggregate(myData$No.of.Transactions,list(myData$Device), FUN=sum)
colnames(averageTransaction) <- c("Device","No of Transactions")
knitr::kable(averageTransaction)
Device No of Transactions
desktop 3553
mobile 119
tablet 120

We can clearly see that desktop is the main channel used by customers to access the website, generating about 95% of total transactions. We can now focus on revenues to check if we find the same trend or a different behavios depending on the device.

averageRevenue <- aggregate(myData$Revenue,list(myData$Device), FUN=sum)
colnames(averageRevenue) <- c("Device","Revenues")
knitr::kable(averageRevenue)
Device Revenues
desktop 8459779.1
mobile 291844.2
tablet 310228.0
Even an analysis on revenues shows that desktop is the most used device, as expected from previous result.
We can then analyze revenues per transaction to check profitability for each device.
RevTransaction <- averageRevenue
RevTransaction[,2] <- averageRevenue[,2]/averageTransaction[,2] 
colnames(RevTransaction) <- c("Device","Revenue per Transaction")
knitr::kable(RevTransaction)
Device Revenue per Transaction
desktop 2381.024
mobile 2452.472
tablet 2585.233

We can see that revenues per transaction are quite stable cross-device, even though tablet is more profitable if compared to desktop and mobile devices.

Moving from devices to sources, we can then compare organic sources to payed sources (e.g. "cpc") and see if there is a strong relationship between transaction and revenues and different sources.

averageRevenueSource <- aggregate(myData$Revenue,list(myData$Source), FUN=sum)
colnames(averageRevenueSource) <- c("Source","Revenues")
averageRevenueSource <- averageRevenueSource[order(averageRevenueSource[,2],averageRevenueSource[,1],decreasing=TRUE),]
rownames(averageRevenueSource) <- c(1:nrow(averageRevenueSource))
knitr::kable(averageRevenueSource[1:10,])
Source Revenues
(direct) / (none) 2951215.28
google / organic 2793075.46
google / cpc 1466490.42
dfa / cpm 430017.88
dfa / cpc 226526.63
bing / organic 186316.63
email / email 109285.60
yahoo / organic 92137.69
olehotels.com / referral 69812.64
adquiramexico.com.mx / referral 65988.00

Focusing on top 10 sources, we can see that direct access and organic searches on google or bing provide more than 4 times the revenues of payed sources like "cpc" and "cpm".

Part 3: Recommendation

From previous analysis, we can suggest the following strategy:

  • reduce spending in payed sources;
  • provide app/mobile portal to facilitate the access from tablet;