-
Notifications
You must be signed in to change notification settings - Fork 1
/
project.Rmd
706 lines (493 loc) · 24.7 KB
/
project.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
---
title: "MKTG 6600 Final Project"
author: "Meag Tessmann"
date: "10/10/2020"
output:
html_document:
toc: true
toc_depth: 3
toc_float: true
---
# Description
A movie chain in the southwest region, MovieMagic is considering ways in which it can increase spending on concessions. It has collected information of 2000 of its customers, some of whom are part of their loyalty program and some who are not. They have information on the following 8 variables, which they plan to use as predictors. They plan to use amount_spent (i.e. the amount spent on concessions) as the outcome variable since they have learnt from observations that much of the profit is derived from concession sales.
## Predictors
* *age* = age of the customer
* *job* = type of job e.g. management, technician, entrepreneur,
* *marital* = married, single, divorced
* *education* = primary, secondary, tertiary
* *seen_alone* = whether the movie was seen alone or with some others (yes/no)
* *discount* = whether they were sent a discount coupon (yes/no)
* *days_member* = days since member of MovieMagic
* *movies_seen* = number of movies seen last time period
## Outcome
amount_spent = amount spent on concessions
Their objective is to find out what factors can increase concession sales and how they can improve their prediction of the outcome variable so that they can plan better.
Along with amount_spent, MovieMagic was also able to collect information form about 150 of its existing customers in the form of reviews. They feel that this text data can provide a different insight into what customers think and feel about MovieMagic.
They realize that their objective has two components: interpretation and prediction. Hence, they decide to run 3 different types of analysis. - 1. Linear regression - 2. Support Vector Regression (SVR) - 3. Text analysis
When the project, henceforth, mentions 3 analysis, the above-mentioned would be the 3 analysis.
Consider that you have been asked to run the analysis and answer the questions MovieMagic wants answered.
---
---
---
# Analysis
## Libraries + Data
```{r libraries, warning=F, message=F}
# EDA + general
library(skimr)
library(tidyverse)
library(psych)
library(scales)
library(knitr)
library(kableExtra) # fancy tables
library(rminer)
library(effects)
library(car)
# model training
library(caret)
library(elasticnet)
library(glmnet)
library(ROCR)
library(e1071)#to run svm
# nlp
library(quanteda)
library(seededlda)
library(topicmodels)
library(RTextTools)
library(wordcloud)
library(tm)
# Read in the data
purchases <- read.csv("http://data.mishra.us/files/project_data.csv")
reviews <- read.csv("http://data.mishra.us/files/project_reviews.csv")
```
## Descriptive Analytics
### EDA: Purchasing dataset
This dataset has five categorical variables and 4 numeric veriables, with no missing data in any observation. The majority of movie viewers:
* are married
* have a secondary education
* are not seen along
* are using a discount
Ages range from 21 to 61, with a slight left skewing. On average, a person is seeing 1.9 movies and median spend rate is $216.
At first glance of a pairs panel, it appears 'experienced' movie goers, such as those who've seen a lot of movies at the theatre or who've been members for awhile, are spending less on concessions. 'Expereinced' movie-goers may be developing traditions or habits such as going to dinner beforehand or 'tricks' like bringing in their own food. When looking at just movies seen ~ spending, we find the outliers are mostly coming from those who are seeing 3 or fewer movies. There is a slight bump in
Those who were seen alone appear to spend less, though this may be misleading as we don't know the size of their group. I can only imagine those who are spending over 10k on consessions are hosting parties or large groups at the theatre. Might be worth looking at removing if they're outliers as this group would require a whole different marketing strategy catering towards coordinating groups.
``` {r eda}
skim(purchases)
pairs.panels(purchases)
hist(log(purchases$amount_spent))
ggplot(purchases, aes(job, amount_spent)) +
geom_boxplot() +
scale_y_continuous(labels=dollar)
ggplot(purchases, aes(education, amount_spent)) +
geom_boxplot() +
scale_y_continuous(labels=dollar)
ggplot(purchases, aes(factor(movies_seen), amount_spent)) +
geom_boxplot() +
scale_y_continuous(labels=dollar)
```
### Linear Regression
Linear regression model has an R-squared value of 0.02532, which seems very low for basing judegements on whether variables are correlated or not with the amount spent. There's two variables which are significant.
``` {r model-lm}
set.seed(123)
model.lm.fullset <- lm(amount_spent ~., purchases)
summary(model.lm.fullset)
# normally distributed residuals?
mean(model.lm.fullset$residuals)
# homoscedasticity?
plot(model.lm.fullset)
# multicollinearity?
car::vif(model.lm.fullset)
# autocorrelation
lmtest::dwtest(model.lm.fullset)
```
### Modifications to LM
I have a hypothesis that single individuals are hosting groups and spending a lot larger amount of money than normal. If true, I think these two groups would require different marketing strategies. For this, I predict large spending over \$1,000 separately from those which are under \$100, something a family 5 could spend on dinner and drinks at a theatre. Separating out these two groups results in two models both with higher R^2 values, niether of which would be high enough to feel comfortable basing heavy decisions on.
I also look at a log function of amount_spent since it's highly skewed and zero-bounded.
``` {r lm-alternatives}
purchases_eng <- purchases %>%
mutate(
isMember = factor(ifelse(days_member ==0, 0,1)),
movies_seen = factor(movies_seen)
)
purchases_high <- purchases %>%
filter(amount_spent>1000)
purchases_low <- purchases %>%
filter(amount_spent<100)
## I was going to look into predicting if someone spent at all, but will reserve for another analysis.
# purchases_zero <- purchases_eng %>%
# mutate(
# didPurchase = factor(ifelse(amount_spent==0, 0,1)),
# ) %>%
# select(-amount_spent)
model_low <- train(amount_spent ~., purchases_low, method='lm')
summary(model_low)
model_high <- train(amount_spent ~., purchases_high, method='lm')
summary(model_high)
model_log <- train(log(amount_spent+1) ~., purchases, method='lm')
summary(model_log)
```
## Predictive Analytics
### One-hot and test/train
Splitting the test and train sets in preparation for comparing prediction models.
```{r test-train-split}
one.hot <- as.data.frame(model.matrix(~. -1, purchases))
set.seed(345)
trainIndex <- sample(nrow(one.hot), (nrow(one.hot)*.7))
train <- one.hot[trainIndex,]
test <- one.hot[-trainIndex,]
test_outcome <- test[,"amount_spent"]
train_outcome <- train[,"amount_spent"]
test <- test %>% select(-amount_spent)
```
### Reusable compare function
A short function to quickly compare test and train RMSE calues across multiple models.
``` {r predict-func}
compare_models <- function(test, train, expected, model) {
predict_test <- predict(model, test)
predict_train <- predict(model, train)
stats_svm <- as.matrix(rbind(
mmetric(train$amount_spent, predict_train,c("MAE","MSE","RMSE","RAE")),
mmetric(expected,predict_test,c("MAE","MSE","RMSE","RAE"))
))
rownames(stats_svm) <- c("Train Set", "Test Set")
knitr::kable(stats_svm, digits=2, caption = deparse(substitute(model))) %>%
kable_styling(bootstrap_options = c("hover"))
}
```
### SVR Tuning
Taking a couple tuning rounds using the e1071 tuning method.
``` {r svr-tuning}
set.seed(123)
model.svm.radial <- svm(amount_spent~., data= train, kernal='radial', cost=10, scale=FALSE)
compare_models(test, train, test_outcome, model.svm.radial)
set.seed(123)
# perform grid search
tuneResult <- tune(svm, amount_spent~., data= train,
ranges = list(epsilon = seq(0,1,0.1), cost = 2^(2:9))
)
print(tuneResult)
# Draw the tuning graph
plot(tuneResult)
set.seed(123)
tuneResult2 <- tune(svm, amount_spent~., data= train,
ranges = list(epsilon = seq(.42,.55,0.01), cost = seq(2,8,1))
)
print(tuneResult2)
plot(tuneResult2)
set.seed(123)
tuneResult3 <- tune(svm, amount_spent~., data= train,
ranges = list(epsilon = seq(.47,.49,0.005), cost = seq(.1,3.1,.5))
)
print(tuneResult3)
plot(tuneResult3)
set.seed(123)
tuneResult4 <- tune(svm, amount_spent~., data= train,
ranges = list(epsilon = seq(.485,.51,0.005), cost = seq(.05,.15,.1))
)
print(tuneResult4)
plot(tuneResult4)
set.seed(500)
tuneResult5 <- tune(svm, amount_spent~., data= train,
ranges = list(epsilon = seq(.5,.52,0.002), cost = seq(.14,.2,.01))
)
print(tuneResult5)
plot(tuneResult5)
model.svm.bestTune <- tuneResult5$best.model
compare_models(test, train, test_outcome, model.svm.bestTune)
````
``` {r svr-tuning-log}
# # perform a grid search
# set.seed(123)
# tuneResult.log <- tune(svm, log(amount_spent+1)~., data= train,
# ranges = list(epsilon = seq(0,1,0.1), cost = 2^(2:9))
# )
# print(tuneResult.log)
# # Draw the tuning graph
# plot(tuneResult.log)
#
# set.seed(123)
# tuneResult2.log <- tune(svm, log(amount_spent+1)~., data= train,
# ranges = list(epsilon = seq(.42,.55,0.01), cost = seq(2,8,1))
# )
# print(tuneResult2.log)
# plot(tuneResult2.log)
#
# set.seed(123)
# tuneResult3.log <- tune(svm, log(amount_spent+1)~., data= train,
# ranges = list(epsilon = seq(.48,.53,0.01), cost = seq(.1,3.1,.5))
# )
# print(tuneResult3.log)
# plot(tuneResult3.log)
#
# set.seed(123)
# tuneResult4.log <- tune(svm, log(amount_spent+1)~., data= train,
# ranges = list(epsilon = seq(.495,.51,0.005), cost = seq(.05,.15,.1))
# )
# print(tuneResult4.log)
# plot(tuneResult4.log)
#
# set.seed(123)
# tuneResult5.log <- tune(svm, log(amount_spent+1)~., data= train,
# ranges = list(epsilon = seq(.505,.52,0.002), cost = seq(.01,.06,.01))
# )
# print(tuneResult5.log)
# plot(tuneResult5.log)
set.seed(123)
tuneResult6.log <- tune(svm, log(amount_spent+1)~., data= train,
ranges = list(epsilon = seq(.528,.552,0.003), cost = seq(.01,.03,.01))
)
print(tuneResult6.log)
plot(tuneResult6.log)
model.svm.log.bestTune <- tuneResult6.log$best.model
compare_models(test, train, test_outcome, model.svm.log.bestTune)
````
### Model Training
``` {r model-training}
## since heavily positive-skewed and bounded at 0, trying log of amount spent
set.seed(123)
model.log <- train(log(amount_spent+1) ~., train, method='lm')
set.seed(123)
model.svm.tuned <- svm(amount_spent ~., data=train, epsilon=.51, cost=.16)
set.seed(123)
model.svm.log.tuned <- svm(log(amount_spent+1) ~., data=train, epsilon=.549, cost=.01)
set.seed(123)
model.svm.simple <- svm(amount_spent~., data = train)
set.seed(123)
model.lm <- train(amount_spent ~., train, method='lm')
```
### Model Comparison
``` {r model-comparison}
compare_models(test, train, test_outcome, model.svm.simple)
compare_models(test, train, test_outcome, model.svm.tuned)
compare_models(test, train, test_outcome, model.svm.log.tuned)
compare_models(test, train, test_outcome, model.lm)
compare_models(test, train, test_outcome, model.log)
```
## Review Text Analysis
``` {r eda-text}
## check for null review text
which(!complete.cases(reviews$text))
reviews <- reviews %>%
mutate(
text = as.character(text),
valence = factor(ifelse(reviews$star<3, "Negative", "Positive"))
)
summary(reviews)
ggplot(reviews, aes(str_length(text), fill=factor(star))) +
geom_boxplot() +
ggtitle("review length ~ star rating")
```
### Data prep
``` {r dfm-tfidf}
## create corpus and include star rating for segmenting
reviews.corpus <- corpus(reviews$text)
docvars(reviews.corpus, "star") <- reviews$star
docvars(reviews.corpus, "valence") <- reviews$valence
reviews.dfm <- dfm(reviews.corpus,
remove=stopwords('english'),
remove_punct=TRUE,
remove_symbols=TRUE,
remove_separators=TRUE,
)
reviews.dfm.valence <- dfm(reviews.corpus,
remove=stopwords('english'),
remove_punct=TRUE,
remove_symbols=TRUE,
remove_separators=TRUE,
groups = 'valence'
)
reviews.dfm.star <- dfm(reviews.corpus,
remove=stopwords('english'),
remove_punct=TRUE,
remove_symbols=TRUE,
remove_separators=TRUE,
groups = 'star'
)
reviews.tfidf <- dfm_tfidf(reviews.dfm)
```
### Word Clouds
``` {r word-clouds, warning=F, message=F}
set.seed(100)
textplot_wordcloud(reviews.dfm,
min_count=3,
random_order = FALSE,
rotation=.25,
color = RColorBrewer::brewer.pal(8,"Dark2")
)
textplot_wordcloud(reviews.dfm.valence,
comparison=TRUE,
min_count=3
)
textplot_wordcloud(reviews.dfm.star,
comparison=TRUE,
min_count=3
)
```
### Keyness
``` {r keyness}
reviews.keyness <- textstat_keyness(reviews.dfm, target=reviews$valence=="Positive")
textplot_keyness(reviews.keyness, margin=.1, n=13)
```
### Topic Modeling: Quanteda
I used Quanteda and SeededLDA to practice a different LDA implementation. A knitting error is preventing me from including the code. The output in RStudio is:
| topic1 | topic2 | topic3 |
|:------:|:------:|:------:|
| movie | time | food |
| great | back | just |
| place | moviemagic | go |
| food | cinema | like |
| fun | movies | popcorn |
| good | also | get |
| love | really | beer |
| can | going | ordered |
| pizza | good | really |
| theater | awesome | got |
``` {r lda-1, eval=FALSE}
model_lda <- textmodel_lda(reviews.dfm, k=3)
# as.data.frame(terms(model_lda, 10))
```
### Topic Modeling: e1071
```{r lda-2}
# perform a Latent Dirichlet Analysis (several lines of code to get you started)
# first remove stop words
corpus <- VCorpus(VectorSource(reviews$text))
# a function to clean /,@,\\,|
toSpace <- content_transformer(function(x, pattern) gsub(pattern, " ", x))
corpus <- tm_map(corpus, toSpace, "/|@|\\|")
corpus<- tm_map(corpus, stripWhitespace) # remove white space
# covert all to lower case else same word as lower and uppercase will classified as different
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, removeNumbers) # remove numbers
corpus <- tm_map(corpus, removePunctuation) # remove punctuations
corpus <- tm_map(corpus, removeWords, stopwords("en"))
dtm <- DocumentTermMatrix(corpus)
set.seed(234)
rowTotals <- apply(dtm , 1, sum)
dtm <- dtm[rowTotals> 0, ]
lda <- LDA(dtm, k = 3, method = "Gibbs", control = NULL)
topics <- tidytext::tidy(lda, matrix = "beta") # beta is the topic-word density
top_terms <- topics %>%
group_by(topic) %>%
top_n(10, beta) %>% # top_n picks 10 topics.
ungroup() %>%
arrange(topic, -beta)
top_terms %>%
mutate(term = reorder(term, beta)) %>%
ggplot(aes(term, beta, fill = factor(topic))) +
geom_col(show.legend = FALSE) +
facet_wrap(~ topic, scales = "free") +
coord_flip()
# perplexity calculation - change k = values
lda.def <- LDA(dtm, k = 4, control = NULL)
perplexity(lda.def)
```
---
# Questions
* For questions 1, 2, 3, and 4, the first dataset project_data would be used.
* For Questions 1, 2 and 3 use the complete dataset (do not split it into train and test sets)
* For question 4 split the dataset into train and test. You can try different splits; indicate
what proportion you split the data when you compared SVR and linear regression.
* For questions 5 and 6, you would be using the second dataset, the project_reviews
dataset.
---
## Q1-3 Model
``` {r q1-q2-q3}
summary(model.lm.fullset)
```
### Question 1
**Of the 8 predictors, which predictors have a significant influence on amount spent on concessions?**
Running a simple linear regression including all the original variables and no pre-processing yields a model showing 2 variables are highly correlated with concession purchasing amount (p<.01):
* having a job as wait-staff
* having a tiertiary education degree
In my opinion, these results should not be acted upon, as the model is only able to explain 2.5% of the variability of amount someone spends on consessions. I used the model *model.lm.fullset* to answer this question.
### Question 2
**Which predictors have a positive influence and which predictors have a negative influence on the amount spent on concessions? Which analysis, regression or SVR, helped you answer this question?**
These variables have a *positive* correlation or influence with the amount spent on consessions:
* Age (age)
* Having a job as an entrepreneur, management, services,unemployed, unknown, or as wait-staff (jobentrepreneur, jobmanagement, jobservices, jobunemployed, jobunknown, jobwait-staff)
* unknown, secondar, or tertiary education status (educationsecondary, educationtertiary, educationunknown)
* Received a discount coupon (discountyes)
* number of days they've been a member (days_member)
These variables have a *negative* correlation or influence with the amount spent on consessions:
* Having a blue collar job, being self employed or retired, or being a student or technician (jobblue-collar, jobretired, jobself-employed, jobstudent, jobtechnician)
* Being married or single (maritalmarried, maritalsingle)
* They were seen along (seen_aloneyes)
* The number of movies they saw (movies_seen)
Since the intercept is positive, you can say holding all variables at 0, the default state has a positive influence. Since we one-hot encoded, these variables would be:
* Being divorced
* Having a primary education
* Having a job as an admin
* Did not receive a discount coupon
* Were not seen alone
I used a white-box, linear regression model to answer this question. SVR is a black box model and would not show coefficient values.
### Question 3
**Given the significant predictors, what strategies can MovieMagic come up with to increase amount spent on concessions?**
* Create a marketing campaign aimed at those who've been a member for awhile. This campaign could center around a special incentive, like a discount coupon.
* Create a marketing campaign geared towards entrepreneurs and management who are hosting large groups of people. You can even look into special catering services if this proves to be a prolific spending group.
* Offer a coupon or discount for groups buying at the consession stand.
* Create various sized 'family deals' geared towards groups of 2-10 which are a pre-determained set of food + drink combos offered at a light discount
* Offer a slight discount for first time members buying movie entrances on multiple days, with the intention of them buying more on consessions since they're more likely to come back for the 5th movie they pre-paid for.
---
## Q4 Model Comparisons
``` {r q4}
compare_models(test, train, test_outcome, model.svm.simple)
compare_models(test, train, test_outcome, model.svm.tuned)
compare_models(test, train, test_outcome, model.svm.log.tuned)
compare_models(test, train, test_outcome, model.lm)
compare_models(test, train, test_outcome, model.log)
```
### Question 4
**Which analysis, linear regression or SVR, provides better prediction? Which metric would you focus on to support your answer?**
I looked at both MAE and RMSE when comparing models. The simple SVR model, model.svm.simple, which used default epsilon and cost, had the lowest MAE of all models - \$476.60. The linear regression, model.lm, however, had the lowest RMSE - \$1133.40. What this tells me is model.svm.simple had greater variance of error - meaning when it was wrong, it was wrong by more than when model.lm was wrong. Which metric (and hence which model) to use will depend on the application of the prediction and how undesirable residual variance is.
While not shown here, RMSLE might be a better metric to compare models since it might account better for the curve of amount spent - if we predict someone will spend \$50 when they spent \$5, that's a different situatin than if we predicted someone would spend \$3050 when they actually spent \$3000. RMSLE might be more appropriate if we're looking at the whole range of spending.
---
## Q5 Wordcloud
``` {r q5, warning=F, message=F}
textplot_wordcloud(reviews.dfm.valence,
comparison=TRUE,
min_count=3
)
textplot_wordcloud(reviews.dfm.star,
comparison=TRUE,
min_count=3
)
textplot_keyness(reviews.keyness, margin=.1, n=13)
```
### Question 5
**MovieMagic wants to visualize the reviews through a wordcloud and wants to find out which words are used most commonly in the reviews that customers write for MovieMagic. Create 2 wordclouds - one for reviews that received 3, 4, or 5 star ratings and another with reviews that received 1 or 2 stars ratings. Knowing the prominent words in each of the wordclouds, what strategies can be developed in messaging customers? Would the strategies differ?**
It appears waiting for food may be one factor in a customer's review of the theatre. I would suggest looking into measuring duratin between order submission and food delivery. This might help determine how to reduce poor reviews around consessions purchases. Such an analysis might focus on if certain items take longer to prepare, potentially causing people to miss their movie. Other factors might also influence these poor experiences such as number of employees working consessions at the time, time of day, number of movies starting within 15 minutes, or availability of certain items.
Popcorn was used in positive reviews. You might explore whether a separate line for popcorn only might increase these particular sales. I would suggest looking at whether other item sales decreae when this popcorn-only line is available. While this may gain sales for those who would be otherwise deterred from long lines, it may also decrease sales of items which would have otherwise been purchased concurrently.
---
## Q6 Topic Models
``` {r q6}
### from quanteda package
# as.data.frame(terms(model_lda, 10))
### from topic models pkg
top_terms %>%
mutate(term = reorder(term, beta)) %>%
ggplot(aes(term, beta, fill = factor(topic))) +
geom_col(show.legend = FALSE) +
facet_wrap(~ topic, scales = "free") +
coord_flip()
```
_Additional SeededLDA topic output:_
| topic1 | topic2 | topic3 |
|:------:|:------:|:------:|
| movie | time | food |
| great | back | just |
| place | moviemagic | go |
| food | cinema | like |
| fun | movies | popcorn |
| good | also | get |
| love | really | beer |
| can | going | ordered |
| pizza | good | really |
| theater | awesome | got |
### Question 6
**MovieMagic also wants to use topic modeling to find out whether the content in the reviews could be categorized into specific topics. If you used LDA to create 3 topic groups (k = 3) what would be the titles you would assign to these 3 topics. MovieMagic wants you to use the words within the topic to infer topic title. Given the topics you inferred what strategies would you suggest are possible for MovieMagic if it wants to increase concession sales. Would you recommend promotions or advertising or loyalty program; justify your choice of strategy?**
*Topic 1 - Classic Date Night*
Need to get back to basics with your beau? We have a full menu for your and yours to indulge over while snuggled up in front of the big screen.
Suggestions: When the alternative for date night includes finding and paying for both dinner and the movie and rushing in traffic to make sure you arrive on time, make it an easy choice for couples to dine at the teatre instead. This might include offering higher end meals paired with wines or a quiet, more upscale seating area where kids are not allowed for couples to reconnect before the movie. Everyone loves a good wine pairing.
*Topic 2 - The Magic of Movies -or- Wholesome Family Tradition*
Grab some popcorn and snacks and enjoy a fun flick with the whole family.
Suggestions: Make it easy for parents bringing a kid or two -or even their 4 or 5 friends! Put together 'Family Bundles' with the most frequently co-purchased items catering to families or easy low-choice selections - each kid chooses 1 snack and 1 drink from a list. Take out the stress of coordinating food selections.
*Topic 3 - Kick Back*
Powerful sound system, full range high density graphics, and comfortable recliners to enjoy some cold beer and hot pizza with friends - this sounds like a place to relieve some stress. If you haven't yet, look to expand your beer offerings! If there's a better selection than competitors, you might be able to charge more for simply having a better selection available. Look to offer a discount when purchasing beer with food, especially in bulk - free 6th beer when bought with 2 pizzas, for example.