# Introductory Example

Using the same example as we did for logistic regression, we will predict if a restaurant is good or bad, with 1 and 2 star ratings indicating a bad business and 3-5 star ratings indicating a good one. We will use the following features:

    * Average rating of a given business
    * Average rating made by a user
    * Number of reviews made by a user
    * Number of reviews that concern a business

The usage is similar to the logistic regression module:

https://turi.com/learn/userguide/supervised-learning/svm.html

In [2]:
import graphlab as gl
import graphlab
# keep data visualizations within notebook
graphlab.canvas.set_target('ipynb')

# Load the data
# The data can be downloaded using
data =  gl.SFrame('https://static.turi.com/datasets/regression/yelp-data.csv')

# Restaurants with rating >=3 are good
data['is_good'] = data['stars'] >= 3

# Make a train-test split
train_data, test_data = data.random_split(0.8)

# Create a model.
model = gl.svm_classifier.create(train_data, 
                                 target='is_good',
                                 features = ['user_avg_stars',
                                             'business_avg_stars',
                                             'user_review_count',
                                             'business_review_count'])

# Save predictions (class only) to an SFrame
predictions = model.predict(test_data)

# Evaluate the model and save the results into a dictionary
results = model.evaluate(test_data)

------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[str,str,str,long,str,str,str,dict,long,long,long,list,str,str,float,float,str,long,long,float,str,str,float,str,long,str,long,long,long,dict]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------


PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.



In [3]:
predictions = model.classify(test_data)
print predictions

+-------+
| class |
+-------+
|   1   |
|   1   |
|   1   |
|   1   |
|   1   |
|   1   |
|   1   |
|   1   |
|   1   |
|   1   |
+-------+
[43241 rows x 1 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.


SVM does not currently support predictions as probability estimates.

In [5]:
pred_class = model.predict(test_data, output_type = "class")    # Class
pred_margin = model.predict(test_data, output_type = "margin")  # Margins

In [6]:
model = gl.svm_classifier.create(train_data, 
                                 target='is_good', 
                                 penalty=100,
                                 features = ['user_avg_stars',
                                             'business_avg_stars',
                                             'user_review_count',
                                             'business_review_count'])

PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.



In [7]:
model.summary()

Class                          : SVMClassifier

Schema
------
Number of coefficients         : 5
Number of examples             : 164004
Number of classes              : 2
Number of feature columns      : 4
Number of unpacked features    : 4

Hyperparameters
---------------
Mis-classification penalty     : 100.0

Training Summary
----------------
Solver                         : lbfgs
Solver iterations              : 10
Solver status                  : TERMINATED: Iteration limit reached.
Training time (sec)            : 0.4349

Settings
--------
Train Loss                     : 55413.1266

Highest Positive Coefficients
-----------------------------
(intercept)                    : 0.2199
user_avg_stars                 : 0.127
business_avg_stars             : 0.1051
user_review_count              : 0.0

Lowest Negative Coefficients
----------------------------
business_review_count          : -0.0003



In [13]:
coefficients = model['coefficients']     # an SFrame
coefficients

name,index,class,value
(intercept),,1,0.219906034822
user_avg_stars,,1,0.127041491896
business_avg_stars,,1,0.105149903919
user_review_count,,1,1.04164533024e-05
business_review_count,,1,-0.000313648153717


In [17]:
model.show()

In [14]:
# Make predictions (as margins, or class)
predictions = model.predict(data)    # Predicts 0/1
predictions = model.predict(data, output_type='margin')

In [16]:
# Evaluate the model
results = model.evaluate(data)               # a dictionary
results

{'accuracy': 0.8339069571380264, 'confusion_matrix': Columns:
 	target_label	int
 	predicted_label	int
 	count	int
 
 Rows: 2
 
 Data:
 +--------------+-----------------+--------+
 | target_label | predicted_label | count  |
 +--------------+-----------------+--------+
 |      0       |        1        | 35856  |
 |      1       |        1        | 180023 |
 +--------------+-----------------+--------+
 [2 rows x 3 columns], 'f1_score': 0.9094321321943309, 'precision': 0.8339069571380264, 'recall': 1.0}