### Background

- The purpose of text classification is to automatically classify texts or documents into one or more predefined categories. 

- From a business perspective, text classification can help understand the sentiment of social media users, identify spam and normal mail, automatically mark users' queries, classify news by existing topics, and so on. 

- From the programming view, text classification is focusing on training a model to predict the new unlabeled texts/documents.


### Project Overview

- This assignment provides consumer reviews of 5 categories, they are Automotive (1455 reviews), Bars (1460 reviews), Health and medical (1450 reviews), Hotels and travel (1430 reviews), and Restaurants (1440 reviews), which are available at http://mlg.ucd.ie/modules/yalp/.

- Each review has a star rating. For this assignment, assuming that 1-star to 3-star reviews are “negative”, and 4-star to 5-star reviews as “positive”.

- The objective of this assignment is to scrape consumer reviews from a set of web pages and evaluate the performance of text classification on the data.


### Task Decomposition
#### There are three main tasks for this assignment:

***Task 1:  Scrape all reviews and store them***
 - Scrape reviews and parse the html page.
 - Use the number of stars to label the reviews. 
 - Store the review text and the class label.

***Task 2:  Create a numeric representation of data and build the classifier***
 - Apply appropriate preprocessing steps and find proper numeric representations.
 - Build a classification model.
 - Evaluate the model.

***Task 3:  Evaluate the model's ability of transfering between two categories***
 - Train a classification model on the data from “Category A”, and evaluate its performance on the data from “Category B”.
 - Train a classification model on the data from “Category B”, and evaluate its performance on the data from “Category A”.
 
***Read me before executing the code:***
 - In order to avoid unnecessary errors when the program runs, please execute the code in sequence, it is best not to skip a code segment to execute.
 - Training models are involved in this project, so it may take some time to get results when executing, please be patient.

### Task 1

In this project, we are trying to compare the performance of different classifiers on a dataset, then training to get the best model to apply this model on another dataset to show its performance.

My hypothesis is that a classifier model has similar performance on a similar dataset, so I choose two similar categories of reviews: **Bars and Restaurants**, which may both include reviews of food, location, services,  environment and so on.

I will create 5 classification models to predict class labels by applying KNN classifier, Decision Tree classifier, Naive Bayes classifier, SVM classifier, and Logistic Regression classifier to find the best model among them. 

First, import all packages needed and scrape all reviews.

In [34]:
import numpy as np
import requests
from bs4 import BeautifulSoup
import pandas as pd
from nltk.tokenize import WordPunctTokenizer
import re
from gensim.models.doc2vec import Doc2Vec, TaggedDocument
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import KFold
from sklearn import linear_model
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.model_selection import GridSearchCV
from sklearn import svm
from sklearn import tree
from sklearn.naive_bayes import GaussianNB
import warnings
warnings.filterwarnings('ignore')

In [2]:
# Create a class reviewSpider to scrape reviews.
class reviewSpider(object):

    def __init__(self, start_url, category, csv_path='data.csv'):
        self.start_url = start_url  # start_url is http://mlg.ucd.ie/modules/yalp/
        self.category = category  # one category that been chosen
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'
        }
        self.text = []  # store all reviews under this category
        self.labels = []  # label the data by the number of stars
        self.csv_path = csv_path  # the path to store csv file

    # 1. Return the object processed by BeautifulSoup,
    # and grab the hyperlink of each category (getCategoryURL) based on the parsed content.
    # Beautiful Soup provides functions for handling navigation, searching, modifying parse trees, and so on. 
    # It's a toolbox that provides users with the data they need to crawl by parsing the document and it's simple.
    def parseHTML(self, url):
        '''
        :param url: input the url which needs to be parsed
        '''
        r = requests.get(url, headers=self.headers)
        html = r.content.decode('utf-8')
        soup = BeautifulSoup(html, "html.parser")
        return soup

    # 2. According to the hyperlink of each category, get into the hyperlink, parse the webpage, 
    # and fetch the hyperlink (generateBusinessURL) of each business(bar/restaurant) based on the parsed content.
    def getCategoryURL(self):
        '''
        get the category start url
        :return: the categorie's start URL
        '''
        self.category_dict = {}
        soup = self.parseHTML(self.start_url)
        # By checking the source code, each category is stored in the h4 tag.
        for h in soup.find_all('h4'): 
            self.category_dict[h.a.text.split(': ')[1]] = self.start_url + h.a['href']
        return self.category_dict[self.category]

    # 3. According to the hyperlink of each business, get into the hyperlink, parse the webpage, 
    # grab and store(store_csv) each of the comments and number of stars 
    # in the business(bar/restaurant) according to the parsed content.
    def generateBusinessURL(self, categoryURL):
        '''
        from the Category URL, generate Business URL
        :return: yield Business URLs
        '''
        soup = self.parseHTML(categoryURL)
        # Each business(bar/restaurant) is stored in the h6 tag.
        for h in soup.find_all('h6'): 
            # for check and debug, check the progress of crawling reviews.
            print(h.text)  
            yield self.start_url + h.a['href']

    def store_csv(self):
        # As DataFrame is "a 2-dimensional labelled data structure with columns of data that can be of different types".
        # So I store the data as the DataFrame structure for the purpose of Task 2,
        # displaying data in a numerical way.
        df = pd.DataFrame()
        df['review'] = self.text
        df['label'] = self.labels
        df.to_csv(self.csv_path, index=False)

    def run(self):
        categoryURL = self.getCategoryURL()
        for url in self.generateBusinessURL(categoryURL):
            soup = self.parseHTML(url)
            # Find the content in the review div.
            for d in soup.find_all('div', {'class': "review"}):
                # Get the image name under the alt tag. (1-star, 2-star...)
                star_num = int(d.p.next_sibling.img['alt'][0]) 
                # Specify what the label represents.
                self.labels.append(1 if star_num > 3 else 0)
                # 1-star to 3-star are “negative”, and 4-star to 5-star reviews as “positive”. 
                # Determine that "positive" is stored as 1, "negative" is 0.
                text = d.p.next_sibling.next_sibling.text
                self.text.append(text)
        self.store_csv()

In [3]:
 # Create a reviewSpider object. 
 # Instantiate a reviewSpider object for each category selected, crawl and store the reviews and their labels. 
 # Print names of the bar/restaurant under the category.
start_url = 'http://mlg.ucd.ie/modules/yalp/'
categories = ['Bars', 'Restaurants']
for category in categories:
    print(category)
    # Store review and their lables whith a file name "category(bars/restaurants).csv"
    rs = reviewSpider(start_url, category, csv_path = category + '.csv') 
    rs.run()

Bars
1. Applebee's Neighborhood Grill & Bar
2. Bar George
3. Barrel Grill & Modern Saloon
4. Blaqcat Ultra Hookah Lounge
5. Blu Burger Grille
6. Boca Taqueria
7. Boondocks Patio & Grill
8. Boston Pizza
9. Buck & Badger
10. Buffalo Wild Wings
11. Cabin Fever
12. California Pizza Kitchen at Summerlin
13. Chalker's Pub Billiards & Bistro
14. Chili's
15. Condado Tacos
16. Dark Horse Sports Bar & Grill
17. Ellis Island Hotel, Casino & Brewery
18. Fly 2.0
19. Furco
20. Game Time Sports Grill
21. George & Dragon II English Restaurant Pub
22. Goldwater Brewing
23. Gray's Tied House
24. JangBang Bar&Grill
25. Jodi B's Restaurant
26. Kirks Korner
27. La Piñata
28. Latitude 360
29. Laziza Hookah Lounge & Restaurant
30. Le Diner Frunchroom
31. Let's Be Frank
32. Mac's Speed Shop
33. Midtown
34. Outback Steakhouse
35. Paloma Family Restaurant
36. Park Place Pub
37. Pepper's Cafe
38. Piazza Lounge
39. Pireas
40. Primanti Bros
41. Primo Tuscan Grille
42. Roma Italian Restaurant and Pizzeria
43. Saltl

### Task 2

#### From the reviews in this category, apply appropriate preprocessing steps to create a numeric representation of the data, suitable for classification.

Printing the reviews that scraped in Task1 and found that all the content of a piece review is stored in a cell.

As we want to classify the labels of reviews, a good way to represent the text is to build word vector, "make words with similar context occupy close spatial positions. Mathematically, the cosine of the angle between such vectors should be close to 1, i.e. angle close to 0"(Dhruvil, 2018). 

The word vector provides a mathematical method of transforming symbolic information in natural language into digital information in vector form. This turns the problem of natural language understanding into a problem of data science.

Word2vec is a computationally efficient predictive model for learning word embedding in raw text. It is divided into two types: the continuous word bag model (CBOW) and the Skip-Gram model.

Another tool to build word vector is Doc2Vec, which is similar to Word2Vec, it adds a paragraph vector under the basis of word2vec.

### Bars
***In order to distinguish Bars and Restaurants, I separated the implementation of them.***

In [4]:
# Print the first five pieces of reviews and their labels, all the content of a piece review is stored in a cell.
data_bars = pd.read_csv("Bars.csv")
data_bars.head()

Unnamed: 0,review,label
0,It would help if the front girl don't just sit...,0
1,One star because that's the least amount you c...,0
2,"Bad time today. Dirty windows, table sticky, h...",0
3,My second visit in the last year. Both experie...,0
4,"Absolutely awful! Took forever to get food, fo...",0


- ***Pre-processing steps:***

1) Tokenize: Split each sentence into a series of words.

2) Normalization: Remove all non-alphanumeric letters and convert uppercase letters to lowercase letters.

From a computer perspective, it cannot distinguish the different meanings between 'Car', ‘car’, and 'CAR', so we generally convert all the letters in the text to lowercase or uppercase (usually lowercase).


In [26]:
# Tokenize the reviews and build a corpus.
tokenizer_bars = WordPunctTokenizer()
corpus = []
for rev in data_bars.review:
    cor = []
    for ele in tokenizer_bars.tokenize(rev):
        # Normalization: Use the sub regex of the re module to match all non-alphanumeric letters,
        # and replace them with spaces.
        ele = re.sub(r'[^a-zA-Z0-9]', " ", ele)
        # Normalization: Convert uppercase letters to lowercase letters.
        cor.append(ele.lower())
    corpus.append(cor)
# Check the results of tokenization.
print(corpus[0: 1])

[['it', 'would', 'help', 'if', 'the', 'front', 'girl', 'don', ' ', 't', 'just', 'sit', 'us', 'down', 'and', 'not', 'ask', 'us', 'for', 'drinks', 'or', 'put', 'us', 'with', 'a', 'waitress', ' ', 'instead', ' ', 'let', 'us', 'just', 'sit', 'here', 'for', 'almost', 'an', 'hour', 'unattended', ' ', 'if', 'it', ' ', 's', 'time', 'for', 'you', 'to', 'clock', 'out', 'then', 'it', ' ', 's', 'not', 'our', 'problem', ' ', 'but', 'if', 'you', ' ', 're', 'gonna', 'seat', 'us', 'be', 'more', 'professional', 'about', 'it', 'at', 'least', 'then', 'leave', ' ']]


The reason why I didn't remove stop words, Stem and Lemmatize the text is that it might get a high accuracy but we don't know the reason is the text itself or the classification we build caused the overfitting.

- ***Numeric representation***

Here I use gensim toolkit in Doc2Vec to build word vectors, instead of Word2Doc. Doc2vec is an unsupervised learning algorithm, which is used to predict a vector to represent different documents and doesn't require a sentence with a fixed length. The structure of this model potentially overcomes the disadvantages of the word bag model.

"In a plain Word2Vec model the word would have exactly the same representation in both sentences, in Doc2Vec it will not."

In [27]:
model_bars_path = 'review_bars.model'
documents = [TaggedDocument(doc, [i]) for i, doc in enumerate(corpus)]
model = Doc2Vec(documents, vector_size=64, window=3, min_count=1, workers=4)
# The explanation of these parameters in the official document:
# documents: Input corpus
# vector_size: Dimensionality of the feature vectors.
# window: The maximum distance between the current and predicted word within a sentence.
# min_count: Ignores all words with total frequency lower than this.
# workers: Use these many worker threads to train the model (=faster training with multicore machines).
# I use the default settings here. But it is necessary to test the best parameter settings.
model.save(model_bars_path)

- ***Build classification models***

I choose 5 classifiers, and applying GridSearchCV to find the best classifier with the best parameters. 

The simple grid search process is: after the original data set is divided into a training set and a test set, the test set is used to measure the quality of the model in addition to being used as an adjustment parameter; this results in a final scoring result that is better than the actual result. Because the test set is sent to the model during the tuning process, and our goal is to apply the training model to unseen data; cross-validation can be used to handle this problem, so in this project, I use an approach combing cross-validation and grid search here, GridSearchCV.

In [30]:
X_bars = []
for review in corpus:
    X_bars.append(model.infer_vector(review))
X_bars = np.array(X_bars)
y_bars = data_bars.label

***1) KNN***

In [31]:
knn = KNeighborsClassifier()
# The n_neighbors of KNN generally take odd numbers, 
# in order to avoid the case where the classification result of the neighboring node 
# is half and half when the even number is selected, and then the classification result needs to be randomly selected.
params = {'n_neighbors': [2 * i + 1 for i in range(8)]}  
# The optimal hyperparameter combination is obtained by 
# calculating the average accuracy of the 5-fold cross-validation under different hyperparameter combinations.
clf_knn_bars = GridSearchCV(knn, params, cv=5)  

In [32]:
clf_knn_bars.fit(X_bars, y_bars)
# Print the average accuracy of cross-validation of the optimal model.
print(clf_knn_bars.best_score_)  
# Print the hyperparameter of the optimal model.
print(clf_knn_bars.best_params_)

0.6678082191780822
{'n_neighbors': 9}


In [77]:
# Cross Validation
# kf = KFold(n_splits=5)
# X_train = X_bars;
# y_train = y_bars;
# for train_index, test_index in kf.split(X_train):
#   X_t, X_s = X_train[train_index], X_train[test_index]
#   y_t, y_s = y_train[train_index], y_train[test_index]

***2) Decision Tree***

In [42]:
dtc = tree.DecisionTreeClassifier()
params = {'max_depth': [2 * i + 1 for i in range(8)]}
clf_dtc_bars = GridSearchCV(dtc, params, cv=5)

In [43]:
clf_dtc_bars.fit(X_bars, y_bars)
print(clf_dtc_bars.best_score_)
print(clf_dtc_bars.best_params_)

0.6719178082191781
{'max_depth': 3}


***3) Naive Bayes***

In [46]:
clf_gnb = GaussianNB()
scores = cross_val_score(clf_gnb, X_bars, y_bars, cv=5)
print(np.mean(scores))

0.6424657534246576


***4) SVM***

In [47]:
svc = svm.SVC(gamma='auto')
parameters = {'kernel':('linear', 'rbf', 'poly'), 'C': [1, 10, 100, 150], 'degree': [2, 3, 4]}
# 'linear', 'rbf' and 'poly' are most frequently used.
# C indicates the penalty coefficient of the regular term, and degree only works when the kernel is poly, 
# indicating several polynomials.
clf_svm_bars = GridSearchCV(svc, parameters, cv=5)

In [48]:
clf_svm_bars.fit(X_bars, y_bars)
print(clf_svm_bars.best_score_)
print(clf_svm_bars.best_params_)

0.7280821917808219
{'C': 150, 'degree': 2, 'kernel': 'linear'}


***5) Logistic Regression***

In [50]:
lr_clf = linear_model.LogisticRegression()
parameters = {'penalty': ['l1', 'l2'], 'C': [1, 10, 100, 150]}
# Penalty indicates the regular term calculation method (L1 normal form or L2 normal form), 
# C still indicates the regular term penalty coefficient.
clf_lr_bars = GridSearchCV(lr_clf, parameters, cv=5)

In [51]:
clf_lr_bars.fit(X_bars, y_bars)
print(clf_lr_bars.best_score_)
print(clf_lr_bars.best_params_)

0.7294520547945206
{'C': 10, 'penalty': 'l1'}


***Choose the best model***

In [55]:
X_train_bars, X_test_bars, y_train_bars, y_test_bars = train_test_split(X_bars, y_bars, random_state=1)
clf_lr_bars = linear_model.LogisticRegression(penalty='l2', C=150)
clf_lr_bars.fit(X_train_bars, y_train_bars)

LogisticRegression(C=150, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=None, solver='warn',
          tol=0.0001, verbose=0, warm_start=False)

***Evaluate the model***

In [56]:
y_pred_bars = clf_lr_bars.predict(X_test_bars)
accuracy = accuracy_score(y_test_bars, y_pred_bars)
print('accuracy:', accuracy)
cm = confusion_matrix(y_test_bars, y_pred_bars)
print(cm)

accuracy: 0.7232876712328767
[[ 68  70]
 [ 31 196]]


### Restaurants

***Same steps with Bars.***

In [57]:
data_restaurants = pd.read_csv("Restaurants.csv")
data_restaurants.head()

Unnamed: 0,review,label
0,Too expensive for what they had... i had an eg...,0
1,"Very rustic place. Mismatched furniture, off K...",1
2,I highly recommend Au Festin de Babette for th...,1
3,Amazing soup and dauphinoise. BUT the wait for...,0
4,I went here by recommendation of a friend. Tho...,0


- ***Pre-processing steps: Build corpus***

In [59]:
tokenizer_restaurants = WordPunctTokenizer()
corpus_res = []
for rev in data_restaurants.review:
    cor = []
    for ele in tokenizer_restaurants.tokenize(rev):
        ele = re.sub(r'[^a-zA-Z0-9]', " ", ele)
        cor.append(ele.lower())
    corpus_res.append(cor)
print(corpus_res[0: 1])

[['too', 'expensive', 'for', 'what', 'they', 'had', '   ', 'i', 'had', 'an', 'egg', 'benedict', 'plate', 'called', 'la', 'drolet', 'but', 'it', 'had', 'a', 'weird', 'side', 'soup', 'with', 'a', 'desert', 'that', 'i', 'dont', 'personally', 'like', ' ', 'i', 'dont', 'think', 'i', 'will', 'go', 'back', 'there', 'again', '    ']]


- ***Numeric representation***

In [60]:
model_res_path = 'review_restaurants.model'
documents = [TaggedDocument(doc, [i]) for i, doc in enumerate(corpus)]
model = Doc2Vec(documents, vector_size=64, window=3, min_count=1, workers=4)
model.save(model_res_path)

- ***Build classification models***

In [61]:
X_res = []
for review in corpus_res:
    X_res.append(model.infer_vector(review))
X_res = np.array(X_res)
y_res = data_restaurants.label

***1) KNN***

In [62]:
knn = KNeighborsClassifier()
params = {'n_neighbors': [2 * i + 1 for i in range(8)]}
clf_knn_res = GridSearchCV(knn, params, cv=5)
clf_knn_res.fit(X_res, y_res)
print(clf_knn_res.best_score_)
print(clf_knn_res.best_params_)

0.6805555555555556
{'n_neighbors': 13}


***2) Decision Tree***

In [64]:
dtc = tree.DecisionTreeClassifier()
params = {'max_depth': [2 * i + 1 for i in range(8)]}
clf_dtc_res = GridSearchCV(dtc, params, cv=5)
clf_dtc_res.fit(X_res, y_res)
print(clf_dtc_res.best_score_)
print(clf_dtc_res.best_params_)

0.6590277777777778
{'max_depth': 3}


***3) Naive Bayes***

In [65]:
clf_gnb = GaussianNB()
scores = cross_val_score(clf_gnb, X_res, y_res, cv=5)
print(np.mean(scores))

0.6451427821248059


***4) SVM***

In [66]:
svc = svm.SVC(gamma='auto')
parameters = {'kernel':('linear', 'rbf', 'poly'), 'C': [1, 10, 100, 150], 'degree': [2, 3, 4]}
clf_svm_res = GridSearchCV(svc, parameters, cv=5)
clf_svm_res.fit(X_res, y_res)
print(clf_svm_res.best_score_)
print(clf_svm_res.best_params_)

0.7125
{'C': 100, 'degree': 2, 'kernel': 'linear'}


***5) Logistic Regression***

In [67]:
lr_clf = linear_model.LogisticRegression()
parameters = {'penalty': ['l1', 'l2'], 'C': [1, 10, 100, 150]}
clf_lr_res = GridSearchCV(lr_clf, parameters, cv=5)
clf_lr_res.fit(X_res, y_res)
print(clf_lr_res.best_score_)
print(clf_lr_res.best_params_)

0.7180555555555556
{'C': 100, 'penalty': 'l2'}


***Choose the best model***

In [68]:
X_train_res, X_test_res, y_train_res, y_test_res = train_test_split(X_res, y_res, random_state=1)
clf_lr_res = linear_model.LogisticRegression(penalty='l2', C=150)
clf_lr_res.fit(X_train_res, y_train_res)

LogisticRegression(C=150, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=None, solver='warn',
          tol=0.0001, verbose=0, warm_start=False)

***Evaluate the model***

In [69]:
y_pred_res = clf_lr_res.predict(X_test_res)
accuracy = accuracy_score(y_test_res, y_pred_res)
print('accuracy:', accuracy)
cm = confusion_matrix(y_test_res, y_pred_res)
print(cm)

accuracy: 0.725
[[ 51  76]
 [ 23 210]]


### Task 3

#### Evaluate the model's ability of transfering between two categories.

It is shown that Logistic Regression worked best on the Bars dataset, with accuracy: 0.723, parameters {'C': 10, 'penalty': 'l1'}. And Logistic Regression with paremeters {'C': 100, 'penalty': 'l2'} worked best on the Restaurants data set with accuracy: 0.725.

- ***Transer the classifier***

***1) Train a classification model on the data from “Bars”, and evaluate its performance on the data from “Restaurants”***

In [70]:
y_pred_res = clf_lr_bars.predict(X_test_res)
accuracy = accuracy_score(y_test_res, y_pred_res)
print('accuracy:', accuracy)
cm = confusion_matrix(y_test_res, y_pred_res)
print(cm)

accuracy: 0.65
[[ 99  28]
 [ 98 135]]


***<font color=black size=2.5 bold>2) Train a classification model on the data from “Restaurants”, and evaluate its performance on the data from “Bars” </font>***

In [71]:
y_pred_bars = clf_lr_res.predict(X_test_bars)
accuracy = accuracy_score(y_test_bars, y_pred_bars)
print('accuracy:', accuracy)
cm = confusion_matrix(y_test_bars, y_pred_bars)
print(cm)

accuracy: 0.6931506849315069
[[ 57  81]
 [ 31 196]]


From the experiments, both the model works not as good as with their original training dataset. I guess the reason is that only models are transfered, they still have different word vectors, so I did another experiment:

- ***Transer both the doc2vec model and the classifier***

In [72]:
model_bars = Doc2Vec.load(model_bars_path)
model_res = Doc2Vec.load(model_res_path)

***1) Train a classification model on the data from “Bars”, and evaluate its performance on the data from “Restaurants”***

In [73]:
X_res2 = []
for review in corpus_res:
    X_res2.append(model_bars.infer_vector(review))
X_res2 = np.array(X_res2)
y_res2 = data_restaurants.label

In [74]:
X_train_res2, X_test_res2, y_train_res2, y_test_res2 = train_test_split(X_res2, y_res2, random_state=1)
y_pred_res2 = clf_lr_bars.predict(X_test_res2)
accuracy = accuracy_score(y_test_res2, y_pred_res2)
print('accuracy:', accuracy)
cm = confusion_matrix(y_test_res2, y_pred_res2)
print(cm)

accuracy: 0.7416666666666667
[[ 67  60]
 [ 33 200]]


***2) Train a classification model on the data from “Restaurants”, and evaluate its performance on the data from “Bars”***

In [75]:
X_bars2 = []
for review in corpus_bars:
    X_bars2.append(model_res.infer_vector(review))
X_bars2 = np.array(X_bars2)
y_bars2 = data_bars.label

In [76]:
X_train_bars2, X_test_bars2, y_train_bars2, y_test_bars2 = train_test_split(X_bars2, y_bars2, random_state=1)
y_pred_bars2 = clf_lr_res.predict(X_test_bars2)
accuracy = accuracy_score(y_test_bars2, y_pred_bars2)
print('accuracy:', accuracy)
cm = confusion_matrix(y_test_bars2, y_pred_bars2)
print(cm)

accuracy: 0.6904109589041096
[[ 46  92]
 [ 21 206]]


The result is still not as good as I expected. Although the accuracy of the Bars data set is improved when both the model and the word vector are transferred, while the accuracy on the Restaurants data set is reduced.
I haven't figure out the reason until I submitted the project.
I should also try the grid search and cross validation strategy.