# Classification, Regression and Other Prediction Model

## Dataset

We‘ll use "201707-citibike-tripdata.csv.zip" (after preprocessed in HW0)

## Schema

- Every station’s information
    - id, name, lat, lng
- Every stations’ flow data
    - id, time, in-flow, out-flow

### Import packages

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.plotly as py
import os
from time import time
from plotly.graph_objs import *
from mpl_toolkits.mplot3d import Axes3D
from sklearn.multiclass import OneVsRestClassifier
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor
from sklearn.naive_bayes import MultinomialNB, GaussianNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC, SVR, LinearSVC, LinearSVR
from sklearn.tree import DecisionTreeRegressor, ExtraTreeClassifier
from sklearn.linear_model import BayesianRidge
from statsmodels.tsa.arima_model import ARIMA
from sklearn.cross_validation import train_test_split
from sklearn.metrics import confusion_matrix
%matplotlib inline


This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.



### Read csv to dataframe
use pandas to read data

In [3]:
# preprocessed dataset
df = pd.read_csv('./201707-citibike-tripdata-preprocessed.csv')
df.head()

Unnamed: 0,tripduration,starttime,stoptime,start station id,start station name,start station latitude,start station longitude,end station id,end station name,end station latitude,end station longitude,bikeid,usertype,birth year,gender
0,364,2017-07-01 00:00:00,2017-07-01 00:06:05,539,Metropolitan Ave & Bedford Ave,40.715348,-73.960241,3107,Bedford Ave & Nassau Ave,40.723117,-73.952123,14744,Subscriber,1986.0,1
1,2142,2017-07-01 00:00:03,2017-07-01 00:35:46,293,Lafayette St & E 8 St,40.730207,-73.991026,3425,2 Ave & E 104 St,40.78921,-73.943708,19587,Subscriber,1981.0,1
2,328,2017-07-01 00:00:08,2017-07-01 00:05:37,3242,Schermerhorn St & Court St,40.691029,-73.991834,3397,Court St & Nelson St,40.676395,-73.998699,27937,Subscriber,1984.0,2
3,2530,2017-07-01 00:00:11,2017-07-01 00:42:22,2002,Wythe Ave & Metropolitan Ave,40.716887,-73.963198,398,Atlantic Ave & Furman St,40.691652,-73.999979,26066,Subscriber,1985.0,1
4,2534,2017-07-01 00:00:15,2017-07-01 00:42:29,2002,Wythe Ave & Metropolitan Ave,40.716887,-73.963198,398,Atlantic Ave & Furman St,40.691652,-73.999979,29408,Subscriber,1982.0,2


In [4]:
# every station's information
station_info = pd.read_csv('./station_info.csv')
station_info.head()

Unnamed: 0,station id,station name,station latitude,station logitude
0,539,Metropolitan Ave & Bedford Ave,40.715348,-73.960241
1,293,Lafayette St & E 8 St,40.730207,-73.991026
2,3242,Schermerhorn St & Court St,40.691029,-73.991834
3,2002,Wythe Ave & Metropolitan Ave,40.716887,-73.963198
4,361,Allen St & Hester St,40.716059,-73.991908


In [5]:
# every station's in-flow data
station_in_flow = pd.read_csv('./in_flow.csv')
station_in_flow.head()

Unnamed: 0,72,79,82,83,116,119,120,127,128,143,...,2003,2005,2006,2008,2009,2010,2012,2021,2022,2023
0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,3.0,1.0,0.0,1.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,0.0,...,1.0,0.0,0.0,2.0,0.0,1.0,0.0,2.0,0.0,1.0
2,2.0,0.0,0.0,0.0,0.0,0.0,1.0,2.0,1.0,0.0,...,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0
3,0.0,1.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,...,2.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,2.0,1.0,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [6]:
# every station's out-flow data
station_out_flow = pd.read_csv('./out_flow.csv')
station_out_flow.head()

Unnamed: 0,72,79,82,83,116,119,120,127,128,143,...,2003,2005,2006,2008,2009,2010,2012,2021,2022,2023
0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,3.0,0.0,...,0.0,0.0,1.0,3.0,0.0,0.0,0.0,1.0,0.0,0.0
1,0.0,0.0,0.0,1.0,1.0,0.0,1.0,1.0,4.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
2,1.0,1.0,0.0,2.0,0.0,0.0,2.0,0.0,2.0,0.0,...,0.0,0.0,0.0,2.0,0.0,0.0,0.0,1.0,0.0,0.0
3,1.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,2.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0
4,0.0,2.0,0.0,0.0,1.0,0.0,2.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0


## Using historical (14 days) data to predict every station's outflow tomorrow (1 day)

### Extract following values

- station_id
- outflow(and this is we want to predict)

In [7]:
station_out_flow.head()

Unnamed: 0,72,79,82,83,116,119,120,127,128,143,...,2003,2005,2006,2008,2009,2010,2012,2021,2022,2023
0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,3.0,0.0,...,0.0,0.0,1.0,3.0,0.0,0.0,0.0,1.0,0.0,0.0
1,0.0,0.0,0.0,1.0,1.0,0.0,1.0,1.0,4.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
2,1.0,1.0,0.0,2.0,0.0,0.0,2.0,0.0,2.0,0.0,...,0.0,0.0,0.0,2.0,0.0,0.0,0.0,1.0,0.0,0.0
3,1.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,2.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0
4,0.0,2.0,0.0,0.0,1.0,0.0,2.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0


### Discretize outflow
- discretize with divided by every station's outflow standard deviation and round to integer
- process them so it can be solve as a classification problem

By previous homework's results, we can find divided by 5 is a good way to discretize so apply it and round the value to integer.

In [8]:
station_out_dis = (station_out_flow / 5).round(0)
station_out_dis.head()

Unnamed: 0,72,79,82,83,116,119,120,127,128,143,...,2003,2005,2006,2008,2009,2010,2012,2021,2022,2023
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Use previous (14 days) data to estimate next days’ outflow
- use a sliding window to increase our data (shift k days each time, and determine the k = 1 )

In [9]:
def get_data(isdis, idx, st):
    if isdis == True:
        df = pd.DataFrame(station_out_dis.iloc[st * 48 : (15 + st) * 48, idx]).T
    else:
        df = pd.DataFrame(station_out_flow.iloc[st * 48 : (15 + st) * 48, idx]).T
    df.columns = [i for i in range(df.shape[1])]
    return df

def get_station(isdis, idx):
    data = pd.DataFrame()
    res = []
    for i in range(16):
        data = data.append(get_data(isdis, idx, i))
    return data

- We can use ```get_station(is_discrete, index)``` to get the station's outflow data from 7/01 - 7/15 to 7/16 - 7/30 in each row

In [10]:
get_station(True, 1).head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,710,711,712,713,714,715,716,717,718,719
79,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0
79,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0
79,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
79,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,2.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
79,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0


### Evaluate each model

Calculate the mean accuracy, mean square error and using time

In [11]:
def eval_model(isdis, clf):
    ans = 0
    t = time()
    for idx in range(634):
        data = get_station(isdis, idx)
        train_x, test_x, train_y, test_y = train_test_split(data.iloc[:, :14 * 48], data.iloc[:, 14 * 48:], test_size = 0.3)
        for i in range(48):
            ans += clf.fit(train_x, train_y.iloc[:, i]).score(test_x, test_y.iloc[:, i])
    if isdis == True:
        print 'Average accuracy for 48 timeslot: {:.4f}'.format((ans / 634.0 / 48.0))
    else:
        print 'Mean square error for 48 timeslot: {:.4f}'.format((ans / 634.0 / 48.0))
    print 'Time: {:.2f} sec'.format(time() - t)

## Try following models (as classification problem)

compare the computation time and result ( average accuracy for 48 timeslot )

### K-Nearest-Neighbor

Classifier implementing the k-nearest neighbors vote.

By previous homework, the results of Kmeans and PCA => Agglomerative Clustering look like we can divided the data into 3 - 4 parts so we choose k = 3 or 4.

[package](http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html)

In [12]:
clf = OneVsRestClassifier(KNeighborsClassifier(n_neighbors = 3))
eval_model(True, clf)

Average accuracy for 48 timeslot: 0.7744
Time: 113.93 sec


In [13]:
clf = OneVsRestClassifier(KNeighborsClassifier(n_neighbors = 4))
eval_model(True, clf)

Average accuracy for 48 timeslot: 0.7927
Time: 111.33 sec


### Naive Bayes

Try multinomial and Gaussian to predict the data.

- Naive Bayes classifier for multinomial models

The multinomial Naive Bayes classifier is suitable for classification with discrete features (e.g., word counts for text classification). The multinomial distribution normally requires integer feature counts. However, in practice, fractional counts such as tf-idf may also work.

- Gaussian Naive Bayes (GaussianNB)

Can perform online updates to model parameters via partial_fit method. For details on algorithm used to update feature means and variance online, see Stanford CS tech report STAN-CS-79-773 by Chan, Golub, and LeVeque:
http://i.stanford.edu/pub/cstr/reports/cs/tr/79/773/CS-TR-79-773.pdf

[package](http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html)

In [14]:
clf = OneVsRestClassifier(MultinomialNB())
eval_model(True, clf)

Average accuracy for 48 timeslot: 0.7914
Time: 100.05 sec


In [15]:
clf = OneVsRestClassifier(GaussianNB())
eval_model(True, clf)

Average accuracy for 48 timeslot: 0.7692
Time: 104.89 sec


### Random Forest

That is a random forest classifier and setting max_depth to prevent the decision tree being too deep leads to overfitting and wasting time.

A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True (default).

[package](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)

In [16]:
clf = OneVsRestClassifier(RandomForestClassifier(max_depth = 2))
eval_model(True, clf)

Average accuracy for 48 timeslot: 0.7858
Time: 812.33 sec


In [17]:
clf = OneVsRestClassifier(RandomForestClassifier(max_depth = 5))
eval_model(True, clf)

Average accuracy for 48 timeslot: 0.7853
Time: 791.89 sec


### Support vector machine(SVC)

C-Support Vector Classification and the implementation is based on libsvm. The fit time complexity is more than quadratic with the number of samples which makes it hard to scale to dataset with more than a couple of 10000 samples.

Try to use different kernels to see the accuracy and using time.

[package](http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html)

In [18]:
ker = ['linear', 'poly', 'rbf', 'sigmoid']

for item in ker:
    print 'kernel: {}'.format(item)
    clf = OneVsRestClassifier(SVC(kernel = item))
    eval_model(True, clf)

kernel: linear
Average accuracy for 48 timeslot: 0.7938
Time: 106.20 sec
kernel: poly
Average accuracy for 48 timeslot: 0.7913
Time: 104.60 sec
kernel: rbf
Average accuracy for 48 timeslot: 0.7944
Time: 109.78 sec
kernel: sigmoid
Average accuracy for 48 timeslot: 0.7915
Time: 109.39 sec


### Other

Use extremely randomized tree classifier to predict the data.

Extra-trees differ from classic decision trees in the way they are built. When looking for the best split to separate the samples of a node into two groups, random splits are drawn for each of the max_features randomly selected features and the best split among those is chosen. When max_features is set 1, this amounts to building a totally random decision tree.

Also setting max_depth to prevent the decision tree being too deep leads to overfitting and wasting time.

[package](http://scikit-learn.org/stable/modules/generated/sklearn.tree.ExtraTreeClassifier.html)

In [19]:
clf = OneVsRestClassifier(ExtraTreeClassifier(max_depth = 2))
eval_model(True, clf)

Average accuracy for 48 timeslot: 0.7402
Time: 95.40 sec


In [20]:
clf = OneVsRestClassifier(ExtraTreeClassifier(max_depth = 5))
eval_model(True, clf)

Average accuracy for 48 timeslot: 0.7291
Time: 96.90 sec


### Compare and Observation

- K-Nearest-Neighbor

n_neighbors is set to 4 is more accuracy but the time it takes is also more. They take about 105 seconds and the accuracy is almost 80%.

- Naive Bayes

Choose the multinomial naive bayes is better because the outflows are match the multinomial model rather than Gaussian during weekday and weekend. The time they take is less than KNN and Multinomial's accuracy is almost 80% too.

- Random Forest

Compared with the max_depth and I find the time is getting much without improving accuracy. The time is much more than other models because it builds many trees to decide the predicted results.

- Support vector machine(SVC)

Every kernel's results are almost the same but sometime we can find accuracy is a little bit higher and taking less time in linear kernel.

- Extremely Randomized Tree

Extremely randomized tree classifier's accuracy is less than other models and the using time is not decreasing. QQ


## Calculate the confusion matrix

generate the label by collecting all the target values and construct the confusion matrix.

In [21]:
label = set()

for i in range(48):
    for item in pd.unique(station_out_dis.iloc[:, -i]):
        label.add(item)
label = list(label)
num_l = len(label)

clf = OneVsRestClassifier(MultinomialNB())
mat = np.zeros([num_l, num_l], dtype = np.int)
for idx in range(634):
        data = get_station(True, idx)
        train_x, test_x, train_y, test_y = train_test_split(data.iloc[:, :14 * 48], data.iloc[:, 14 * 48:], test_size = 0.3)
        mat += (confusion_matrix(clf.fit(train_x, train_y.iloc[:, 0]).predict(test_x), test_y.iloc[:, 0], labels = label))

Print the confusion matrix for predicting the first hour in one day
for Naive Bayes.

In [22]:
mat

array([[2977,  156,    3,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0],
       [  24,   10,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0],
       [   0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0],
       [   0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0],
       [   0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0],
       [   0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0],
       [   0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0],

## Performance with different parameters in SVM

Test following parameters

- kernel
    - linear
    - poly
    - rbf
    - sigmoid

They are almost the same performance in using time about 130 seconds and the random training data causing the results a little different.

The linear and poly kernel are different from the power of distribution and the predicted results are higher than tbf sometimes because the testing data matches the distribution sometime.

We can find the accuracy in rbf kernel sometimes higher than the others and I think the reason is rbf can approximate of any non-linear function in  high precision. So it could be more match than the others and from previos work we know the data is more close to mutinormal distribution.

Sigmoid model is similar to the logistic regression model and the using time is more than others because it defines curves according to where the logistic value is greater than some value (modeling probability).

According to above models, I consider the best model is rbf because it can match the training data in whatever linear, polynomial or something high dimensional distribution and takes not much time.

## Try following models (as regression problem)

compare the computation time and result ( Mean square error )

### ARIMA

In [23]:
rng = list(pd.date_range("2017-07-01 00:00:00", "2017-07-31 23:30:00", freq = "30min"))
ts = pd.DataFrame(station_out_flow.iloc[:, 0].values, index = rng)

ts.columns = ['outflow']
mod = ARIMA(ts, order = (1,1,1))

res = mod.fit(disp = False)
print res.summary()

                             ARIMA Model Results                              
Dep. Variable:              D.outflow   No. Observations:                 1487
Model:                 ARIMA(1, 1, 1)   Log Likelihood               -3537.951
Method:                       css-mle   S.D. of innovations              2.612
Date:                Fri, 29 Dec 2017   AIC                           7083.902
Time:                        13:25:40   BIC                           7105.120
Sample:                    07-01-2017   HQIC                          7091.810
                         - 07-31-2017                                         
                      coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------------------------------------------
const               0.0010      0.031      0.031      0.975      -0.060       0.061
ar.L1.D.outflow     0.0221      0.059      0.374      0.708      -0.094       0.138
ma.L1.D.outflow    -0.5549      

### Bayesian regression

Use Bayesian ridge regression and try to set n_iter to see how much time it takes and how accuracy it inprove.

Fit a Bayesian ridge model and optimize the regularization parameters lambda (precision of the weights) and alpha (precision of the noise).

[package](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.BayesianRidge.html)

In [24]:
clf = OneVsRestClassifier(BayesianRidge(n_iter = 300))
eval_model(False, clf)

Mean square error for 48 timeslot: 0.4735
Time: 439.15 sec


In [25]:
clf = OneVsRestClassifier(BayesianRidge(n_iter = 500))
eval_model(False, clf)

Mean square error for 48 timeslot: 0.4758
Time: 453.75 sec


### Decision tree regression

A decision tree but use a regressor. Also setting max_depth to prevent the decision tree being too deep leads to overfitting and wasting time. And see how much time it takes and how accuracy it inprove.

[package](http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html)

In [26]:
clf = OneVsRestClassifier(DecisionTreeRegressor(max_depth = 2))
eval_model(False, clf)

Mean square error for 48 timeslot: 0.3789
Time: 206.85 sec


In [27]:
clf = OneVsRestClassifier(DecisionTreeRegressor(max_depth = 5))
eval_model(False, clf)

Mean square error for 48 timeslot: 0.3765
Time: 205.92 sec


### Support vector machine(SVR)

Use Epsilon-Support Vector Regression and the implementation is based on libsvm.

Also try some kernel to see the results and time it takes.

[package](http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html)

In [28]:
ker = ['linear', 'poly', 'rbf', 'sigmoid']

for item in ker:
    print 'kernel: {}'.format(item)
    clf = OneVsRestClassifier(SVR(kernel = item))
    eval_model(False, clf)

kernel: linear
Mean square error for 48 timeslot: 0.4746
Time: 193.11 sec
kernel: poly
Mean square error for 48 timeslot: 0.4774
Time: 204.27 sec
kernel: rbf
Mean square error for 48 timeslot: 0.4807
Time: 209.20 sec
kernel: sigmoid
Mean square error for 48 timeslot: 0.4633
Time: 192.35 sec


### Other


Regression based on k-nearest neighbors.

The target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set.

In [29]:
clf = OneVsRestClassifier(KNeighborsRegressor(n_neighbors = 3))
eval_model(False, clf)

Mean square error for 48 timeslot: 0.4308
Time: 207.16 sec


In [30]:
clf = OneVsRestClassifier(KNeighborsRegressor(n_neighbors = 4))
eval_model(False, clf)

Mean square error for 48 timeslot: 0.4502
Time: 208.90 sec


### Compare and Observation

- ARIMA

Autoregressive Integrated Moving Average ARIMA(p,d,q) Model and the order is (p,d,q) order of the model for the number of AR parameters, differences, and MA parameters to use. And the model's error rate is less than others sometimes.

- Bayesian regression

Using Bayesian ridge regression, fitting a Bayesian ridge model and optimizing the regularization parameters lambda (precision of the weights) and alpha (precision of the noise). So the using time is much more.

- Decision tree regression

The strategy used to choose the split at each node, supported strategies are “best” to choose the best split. And criteria are “mse” for the mean squared error, which is equal to variance reduction as feature selection criterion and minimizes the L2 loss using the mean of each terminal node. So the mean square error is low.

- Support vector machine(SVR)

Compared to SVC, the results are more less error rate. And also taking more time, to other regression models are more high error rate.

- Regression k-nearest neighbors

The regression knn weighted every node and than do knn, so the using time is more than SVR, decision tree. More fitable than knn but using time is more much too.

The using time of regression models is more than classification models but the results seens more accuracy and low error rate.

## Other

Try other method to solve this prediction problem,and give a result and some explanation.

Using linear SVC and linear SVR.

Linear Support Vector Classification.

Similar to SVC with parameter kernel=’linear’, but implemented in terms of liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better to large numbers of samples.

This class supports both dense and sparse input and the multiclass support is handled according to a one-vs-the-rest scheme.

[package](http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html)

Linear Support Vector Regression.

Similar to SVR with parameter kernel=’linear’, but implemented in terms of liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better to large numbers of samples.

This class supports both dense and sparse input.

[package](http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVR.html)

In [31]:
clf = OneVsRestClassifier(LinearSVC())
eval_model(True, clf)

Average accuracy for 48 timeslot: 0.7940
Time: 97.47 sec


In [32]:
clf = OneVsRestClassifier(LinearSVR())
eval_model(False, clf)

Mean square error for 48 timeslot: 0.4750
Time: 226.71 sec


### Compare and Observation

Linear SVC model's accuracy is less than others sometime but the using time is less. Could cause it implemented in terms of liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better to large numbers of samples.

The outcomes of the Linear SVR are much the same of the SVR model in linear kernel but the mean square error is less than the just SVR model. I think it could be good at the choice of penalties and loss functions and should scale better to large numbers of samples.

And we can find that in large dataset, using linear SVC or SVR would predict much accuracy.