In [1]:
library(tm)
library(nnet)
library(e1071)
library(text2vec)
library(data.table)
library(glmnet)

Loading required package: NLP
Loading required package: Matrix
Loading required package: foreach
Loaded glmnet 2.0-5



In [56]:
head(review.train$review_id)
head(review.train$business_id)

In [2]:
review_train = read.csv('yelp_academic_dataset_review_train.csv')
stopWords = c(stopwords("en"), "") 
review_train = review_train[,c("business_id", "review_id", "text", "stars")]
review_train$stars <- review_train$stars
review_train$text <- as.character(review_train$text)
review_train$review_id <- as.character(review_train$review_id)

In [4]:
setDT(review_train)  
setkey(review_train, review_id)
set.seed(2017L)  
all_ids = review_train$review_id
train_ids = sample(all_ids, 80000)  
test_ids = setdiff(all_ids, train_ids)  
train = review_train[J(train_ids)]  
test = review_train[J(test_ids)]  

In [5]:
prep_fun = tolower  
tok_fun = word_tokenizer  

In [6]:
it_train = itoken(train$text, preprocessor = prep_fun, tokenizer = tok_fun, ids = train$review_id, progressbar = FALSE)
vocab = create_vocabulary(it_train, stopwords = stopWords)
vectorizer = vocab_vectorizer(vocab)

In [7]:
dtm_train = create_dtm(it_train, vectorizer)

In [8]:
identical(rownames(dtm_train), train$review_id)  

In [9]:
identical(rownames(dtm_test), test$review_id)  

ERROR: Error in rownames(dtm_test):  オブジェクト 'dtm_test' がありません 


In [None]:
NFOLDS = 4  
glmnet_classifier = cv.glmnet(x = dtm_train, y = train[['stars']],   
                              family = 'multinomial',   
                              # L1 penalty  
                              alpha = 1,  
                              # interested in the area under ROC curve  
                              type.measure = "auc",  
                              # 5-fold cross-validation  
                              nfolds = NFOLDS,  
                              # high value is less accurate, but has faster training  
                              thresh = 1e-3,  
                              # again lower number of iterations for faster training  
                              maxit = 1e3)  

In [None]:
plot(glmnet_classifier)

In [None]:
it_test = test$text %>%
  prep_fun %>%
  tok_fun %>%
  itoken(ids = test$review_id,
         # turn off progressbar because it won't look nice in rmd
         progressbar = FALSE)
dtm_test = create_dtm(it_test, vectorizer)

In [None]:
preds = predict(glmnet_classifier, dtm_test, type = 'response')

In [None]:
make.prediction <- function(pred) {
    round(pred[, 1, 1] + 2 * pred[, 2, 1] + 3 * pred[, 3, 1] + 4 * pred[, 4, 1] + 5 * pred[, 5, 1])
}

In [None]:
pred.result <- make.prediction(preds)

In [80]:
head(review_test$review_id)

In [46]:
head(pred.result)

In [73]:
review_test = read.csv("yelp_academic_dataset_review_test.csv")
review_test = review_test[,c("business_id", "review_id", "text")]

In [74]:
it_test = review_test$text %>%   
  prep_fun %>%   
  tok_fun %>%   
  itoken(ids = test$review_id,   
         # turn off progressbar because it won't look nice in rmd  
         progressbar = FALSE)  

dtm_test = create_dtm(it_test, vectorizer)

In [3]:
preds = predict(glmnet_classifier, dtm_test, type = 'response')

ERROR: Error in predict(glmnet_classifier, dtm_test, type = "response"):  オブジェクト 'glmnet_classifier' がありません 


In [76]:
pred.result <- make.prediction(preds)

In [84]:
dim(test)

In [80]:
head(review_test)

review_id,text
HNt7ueGUxOARNvh-TEObkw,"I have to say, I agree with Cher on this one. I really liked this place. The food was very tasty, and the staff was very friendly. I moved from the area shortly after finding it, but I would probably stop back by if I was in the area again."
TFJf4AvqmshlDffKyBuLiA,"I have a confession to make my fellow yelpers... I am SO sorry to say this, but I've been hiding a lil gem all to myself. China Buffet is my all-time favorite chinese food buffet. We're always greeted with a smile and a nice autmosphere, I like the sing-songy contempo chinese pop in the background, it's not too loud so it's good. The dining area is vast and is always kept clean. The uniformed staff are always on the ball - clearing plates as soon as you take your last bite and always offer to refil drinks. The food is very very good. I love the bbq pork. The sesame chicken & beef n broccoli are favorites of mine as well. Their wonton soup is super yummy! I always start every meal a nice bowl of a couple of 'tons. lol. But the real attraction for me is the Sushi Bar!! Their sushi is real hit or miss... but is more hit than miss... and since it comes with the meal - it's worth atleast taking a gander at. Their salmon nigiri is always good when in stock, as is their shrimp nigir - but I have had a couple of pieces of bad tuna nigiri, which disappoints me but I still go back and give it a go. When tipped well - the sushi chef will make things to order. The sushi isn't the best in the city - by far, but it is good. I've taken a lot of people to China Buffet and everyone liked it except for one friend who thought their abolone was gross... but abolone IS gross... regardless of where it comes from. And so i say unto you, my fellow yelpers please forgive me for keeping this place a secret. I hope you try China Buffet and enjoy your visit. It's one of my favorite places to dine anywhere and I constantly use this place a measuring rod for the new chinese places I visit and they rarely compare."
yRe4b9YUQck10QnMF_R6Mg,This place has good food and friendly service but hardly any business. I was there on a weekday at 7 pm and I was the only person there. Not sure if that's do to panda express being next door.
x-JL9R9fp9qg6moS3q2-kQ,"This place is down right, nasty since the new management took over, I got my favorite dish that I have been getting for yrs ( beef w/broccoli), they changed the beef or something it was totally inedible didn't taste like or had the texture of beef, it was gummy mushy I threw the whole dinner away after 3 bites! I will never ever go to china kitchen 2 again, why not leave well enough alone this place was fantastic!!HOW sad they really don't deserve the one star that I gave."
bbJsXf--iqACxbS-PQpI5w,"This used to be the best Chinese take food in northeastern Ohio. I am getting the feeling that they either sold or changed management because the quality of the food has declined greatly. The pot stickers, which used to be amazing, are now the same that you would get frozen from the grocery store. The portion sizes are still huge, but the taste is different. Its disappointing that this place has changed so much. It changed everything that set it apart from every other Chinese take out place in the area."
mJsh2q-NtbANcxxyrQgPIA,"The food is okay, it's the same old Chinese food. I ordered the generals chicken and the outside crust of each piece fell off. It was soggy, and it wasn't tossed in the sauce. Just placed on top of it. The chicken was great but the lack of effort to put it together messed the whole dish up. I will only order fried rice from now on."


In [18]:
review.train = read.csv('yelp_academic_dataset_review_train.csv')
head(review.train)

X,funny,user_id,review_id,text,business_id,stars,date,useful,type,cool
2546,1,_5FF5NN5kHZmGTNuJwpnhg,egN-7YtD9vAfOsSqdsGwlQ,"I almost got sick on all of the chocolate coffee beans in this place, they are so good. I sat outside and had a great conversation about the direction music is taking in this day in age. If you don't follow a formula it seems that you can't get a record contract, even in the indie scene. As you can tell, I thoroughly enjoyed the crowd here. Inside there is a bar opposite the coffee line with Creative Loafing and other local newspapers to read, as well as chocolate coffee beans in little quarter machines. Don't mix these with a strong cup of coffee or you will be jumping off the walls.",F1tOtPzcsQk8PqNOatVsCg,3,2008-09-11,0,review,0
2547,0,lhWPrEBzorygXA4TpimQ3g,1rvre2ib6ahlx9JMXMakzA,"I wrote a book at Coffeeworks, or at least a good chunk of one. It's a great place to hang out and work. I usually hang out here on Saturdays before a standing 6:00 date, so I know the place well. Sometimes, there are lots of angsty, purple-haired teens hanging around the place, and occasionally there have been old farty-types arguing loudly about politics...but other than the sometimes unpleasant noise, it tends to be a pretty laid-back shop. Their smoothies are delicious, and the coffee is good. I agree with Garrett G that the chocolate coffee beans are indeed delicious and sickening. The music is just loud enough to hear, but not so loud that you can't think. They have a decent selection of things to eat. It's not a place for lunch or anything, but they have the standard coffee-shop style foods you'd expect...biscotti (Try the grasshopper mint!) and muffins and scones and other pastries.",F1tOtPzcsQk8PqNOatVsCg,4,2008-09-20,1,review,0
2548,0,aDYNz8cujkDdmbiOh95ANA,T1ZI-H9KV9A-9pKfZei2JA,"We LOVE CoffeeWorks - they roast their own beans at a warehouse in Matthews and have a terrific selection of different roasts. They also have a frequent buyers club so you can get a free lb of coffee after making x number of purchases. Most of the people who work here are pretty educated about the various roasts and most are very helpful. 2011 Update -- Sadly, this is now closed. I understand they are still roasting coffeebeans at their warehouse in Matthews. I have heard that the Ben & Jerry's in the Arbo is selling CoffeeWorks coffee.",F1tOtPzcsQk8PqNOatVsCg,4,2009-10-06,0,review,0
2549,0,_uqIpl5tzucKuIlZZPBZRw,d3nYIMDDtSbdRfIXpxxPPQ,"We popped into CoffeeWorks on a dreary, overcast day and were greeted by a friendly woman behind the counter. I was pleasantly surprised to find that they had flavored coffee! Other than my favorite coffee chain (Barnie's Coffee, which is slowly closing each and every store), nobody seems to offer flavored coffee. Unless you go to a convenience store attached to a gas station (what's up with that anyway?). Starbucks and Caribou only offer different coffees like dark or light roast. Nothing else. I only wished that CoffeeWorks had more than one flavored coffee available for the day. I saw a wall of glass jars full of many, many flavors for coffee that they sell by the bag. The coffee sizes were reasonably priced, too. My only suggestions for them would be to (1) create a more relaxed seating area indoors other than basic tables and (2) switch to something besides the environment unfriendly styrofoam cups. Otherwise, keep up the great work (and flavored coffees!); we'll be back!",F1tOtPzcsQk8PqNOatVsCg,4,2008-12-11,2,review,1
2550,0,nYfu8osX5V1xEg-YeDewzg,3-XTiQXGZ5lZd4Tahl72RQ,"Really good coffee--get it in a mug to drink there. Small enough to listen to other people's conversations, if you like that sort of thing. Lovely aroma. Decent atmosphere. Bit drafty on a cold day (door would kind of blow open, if I recall correctly), so I usually kept my jacket on when I went. Lots of people meeting friends or working on laptops. Wifi available. Here's my tiny beef with them: I went there as someone freshly moved into the city, after a grueling four-day trip through the snow over Christmas break with 4 kids, 3 pets, and 2 vehicles. I really would have loved a friendly face who remembered me and made me feel like the frequent customer I was for a week or two. I mean, I was there on almost a daily basis, since our moving truck (with my coffee maker hidden WAY in the back under EVERYTHING) had not arrived. They just weren't feeling chatty, I guess. Still, the coffee was great. Parking sometimes hard to find nearby. So, go have some wonderful coffee in a warm mug. Bring a friend. :-)",F1tOtPzcsQk8PqNOatVsCg,3,2010-02-21,0,review,0
2551,0,VCJkYFP_OKQeoESHM6CvcA,O6WevGqmzfBRbfqf0TDzrQ,Small area. Drab atmosphere. Very dark inside. Hard place to have a private conversation. The coffee is great though. So get the coffee and go elsewhere.,F1tOtPzcsQk8PqNOatVsCg,4,2010-07-27,0,review,0


In [28]:
stopWords