# Housing Days On Market - GridSearchCV Supervised Model Selection

## Information

Housing related data sources were combined in the project SQLite database. The output CSV file is analyzed here. 

### Environment Information:

Environment used for coding is as follow:

Oracle VM VirtualBox running Ubuntu (guest) on Windows 10 (host).

Current conda install:

               platform : linux-64
          conda version : 4.2.13
       conda is private : False
      conda-env version : 4.2.13
    conda-build version : 1.20.0
         python version : 2.7.11.final.0
       requests version : 2.9.1
       default environment : /home/jonathan/anaconda2/envs/py35
       
       Python 3.5.2 :: Anaconda 4.1.1 (64-bit)

Package requirements:

dill : 0.2.5, numpy : 1.11.3, pandas : 0.18.1, matplotlib : 1.5.1, scipy : 0.18.1, seaborn : 0.7.1, scikit-image : 0.12.3, scikit-learn : 0.18.1

## Python Package(s) Used

In [1]:
import dill
import numpy as np
import pandas as pd
import time

In [2]:
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
from sklearn.ensemble import RandomForestRegressor, ExtraTreesRegressor, GradientBoostingRegressor
from sklearn.linear_model import LassoCV, RidgeCV, ElasticNetCV, SGDRegressor, HuberRegressor, PassiveAggressiveRegressor, TheilSenRegressor
from sklearn.metrics import explained_variance_score, mean_absolute_error, mean_squared_error, r2_score   
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVR

In [4]:
%matplotlib inline

In [5]:
plt.style.use('seaborn-whitegrid')

## Data and Methods

### Data Fetching

In [6]:
# Import data csv into dataframe
df = pd.read_csv('df_RFECV_feature_selection_LCV_eps_01_qcut_1,1_target_DOMP_output.csv')
df = df.drop('Unnamed: 0', axis = 1)
df.head()

Unnamed: 0,PctOfHomesDecreasingInValues,distance_grocery_km,distance_public_school_arts_center_km,distance_public_school_ye_km,ListPrice2_delta,FreddieMac15yr,PriceIndex,distance_cap_gain_school_km,SaleCount,TaxTotalLivingArea,count_metro_station_km,YearBuilt,Acres,Bedrooms,BasementY/N,Fireplaces,TotalTaxes2,DOMP,qcut_DOMP
0,83.26,1.680077,2.074353,1.890599,-0.118838,4.33,167.140141,2.189428,2,1024.0,1,1950,0.082,3,1,0,2967.0,10,0
1,16.48,1.700697,1.779916,1.043636,-0.153386,3.03,171.196163,2.674176,1,1260.0,0,1987,0.0,2,0,1,2890.0,41,0
2,5.53,0.285119,2.958269,2.260644,-0.161353,3.35,188.939072,2.958276,1,682.0,13,1941,0.0,1,0,0,2500.0,12,0
3,5.03,0.285119,2.958269,2.260644,-0.310248,3.48,188.160922,2.958276,1,589.0,13,1941,0.0,1,0,0,2021.0,16,0
4,28.8,0.285119,2.958269,2.260644,-0.300308,3.25,195.086312,2.958276,1,532.0,13,1891,0.0,1,0,0,1944.0,5,0


## Regression Modelling

### Feature Selection

In [7]:
df_2 = df.copy()

In [8]:
# Performing feature selection on full dataset, resulted in best_score ~ 0.1.
# Qcutting to separate out data.
# qcut value = 1 is dataset as is.

# Define target for process
target_col_str = 'DOMP'

# Qcut target data
df_2['qcut_'+target_col_str] = pd.qcut(df_2[target_col_str], 1, labels = False)

# Print out total row counts for each group
print(df_2['qcut_'+target_col_str].value_counts())

# Select specific range
df_2 = df_2[df_2['qcut_'+target_col_str] == 0]

# Define qcut for process
qcut_str = '1,1'

# Save dataframe to disk
df_2.to_csv('df_GridSearchCV_model_selection_qcut_'+qcut_str+'_target_'+target_col_str+'_save_point.csv')

# Copy dataframe for dropping columns and determining categorical columns
df_3 = df_2.copy()

# Drop target column and qcut column for test-train-split
df_3 = df_3.drop(target_col_str, axis=1)
df_3 = df_3.drop('qcut_'+target_col_str, axis=1)

0    13725
Name: qcut_DOMP, dtype: int64


In [9]:
# Checking for grouped categorical columns
#df_3.columns[0:]

In [10]:
# Columns that are not scaled since they are categorical
# cat_df = df_3[['MS_IsTitleI','ES_IsTitleI','BasementY/N']]
# cat_df_2 = df_3.ix[:,39:43]
# cat_df_3 = df_3.ix[:,48:52]
# cat_df_4 = df_3.ix[:,57:61]
# cat_df_5 = df_3.ix[:,93:178]
# cat_df_6 = pd.concat([cat_df,cat_df_2,cat_df_3,cat_df_4,cat_df_5],axis=1)
# CATEGORICAL = [x for x in cat_df_6.columns]

In [11]:
CATEGORICAL = ['BasementY/N','ES_IsCharter','ES_IsMagnet','ES_IsTitleI','ES_IsVirtual',
               'HS_IsCharter','HS_IsMagnet','HS_IsTitleI','HS_IsVirtual','MS_IsCharter',
               'MS_IsMagnet','MS_IsTitleI','MS_IsVirtual','zip_20001','zip_20002','zip_20004',
               'zip_20005','zip_20007','zip_20008','zip_20009','zip_20010','zip_20011',
               'zip_20012','zip_20015','zip_20017','zip_20018','zip_20019','zip_20020',
               'zip_20032','zip_20036','zip_20037','ldmonth_1','ldmonth_2','ldmonth_3',
               'ldmonth_4','ldmonth_5','ldmonth_6','ldmonth_7','ldmonth_8','ldmonth_9',
               'ldmonth_10','ldmonth_11','ldmonth_12','ldday_1','ldday_2','ldday_3',
               'ldday_4','ldday_5','ldday_6','ldday_7','ldday_8','ldday_9','ldday_10',
               'ldday_11','ldday_12','ldday_13','ldday_14','ldday_15','ldday_16','ldday_17',
               'ldday_18','ldday_19','ldday_20','ldday_21','ldday_22','ldday_23','ldday_24',
               'ldday_25','ldday_26','ldday_27','ldday_28','ldday_29','ldday_30','ldday_31',
               'ESSR_0.0','ESSR_1.0','ESSR_2.0','ESSR_3.0','ESSR_4.0','ESSR_5.0','HSSR_0.0',
               'HSSR_1.0','HSSR_2.0','HSSR_3.0','HSSR_4.0','MSSR_0.0','MSSR_1.0','MSSR_2.0',
               'MSSR_3.0','MSSR_4.0','MSSR_5.0']

In [12]:
# Parameters for estimator GridSearch

# MAE not used for rfr,etr,or gbr. Program would crash or hit memory limitations 
# if used with MSE.
rfr_parameters = {'n_estimators':[100], 
              'criterion':['mse'],
              'max_features':['auto'],
              'min_samples_leaf':[1,2,5],
              'random_state':[1]}

etr_parameters = {'n_estimators':[100], 
              'criterion':['mse'],
              'max_features':['auto'],
              'min_samples_leaf':[1,2,5],
              'random_state':[1]}

gbr_parameters = {'loss':['ls','lad','huber'], 
              'learning_rate':[0.1],
              'n_estimators':[100],
              'criterion':['friedman_mse','mse'],
              'min_samples_leaf':[1,2,5],
              'max_features':['auto'],    
              'random_state':[1]}

sgdr_parameters = {'loss':['squared_loss','huber','epsilon_insensitive'], 
              'alpha':[0.0001,0.001,0.01,0.1,1.0],
              'fit_intercept':[True,False],
              'n_iter':[5,10],     
              'random_state':[1]}

tsr_parameters = {'fit_intercept':[True,False],
              'max_iter':[400],     
              'random_state':[1]}

par_parameters = {'loss':['epsilon_insensitive','squared_epsilon_insensitive'],
              'C':[0.001,0.01,0.1,1.0,10.0],
              'fit_intercept':[True,False],
              'n_iter':[5,10],     
              'random_state':[1]}

hr_parameters = {'alpha':[0.0001,0.001,0.01,0.1,1.0],
              'fit_intercept':[True,False],
              'max_iter':[200]}

lcv_parameters = {'eps':[0.001,0.01,0.1],
              'fit_intercept':[True,False],
              'cv':[4],
              'max_iter':[25000],    
              'random_state':[1]}

rcv_parameters = {'alphas':[np.array([0.1,1.0,10.0])],
              'fit_intercept':[True,False],
              'cv':[4]}

encv_parameters = {'l1_ratio':[0.2,0.4,0.6,0.8],
              'eps':[0.001,0.01,0.1],
              'fit_intercept':[True,False],
              'cv':[4],
              'max_iter':[25000],    
              'random_state':[1]}

knnr_parameters = {'n_neighbors':[2,4,6,8,10,12,14,16,18,20], 
              'weights':['uniform','distance'],
              'algorithm':['ball_tree','kd_tree']}

svr_parameters = {'C':[0.01,0.1,1.0], 
              'kernel':['poly','rbf'],
              'degree':[1,2,3]}

In [13]:
def gridsearch_model_selection(df,model_estimator,model_estimator_str,grid_params,eps_str,CATEGORICAL,qcut_str,target_str):

    """
    Perform GridSearchCV on different estimators using the dataframe from feature selection.
    The function test-train-splits the dataset, and performs GridSearchCV. Individual models
    are saved as dills, and the regression metrics are saved to csv.
    """
    
    # Start clock for run time
    start  = time.time()
    
    # Copy dataframe
    df_2 = df.copy()
    
    # Drop target column and qcut column for test-train-split
    df_2 = df_2.drop(target_str, axis=1)
    df_2 = df_2.drop('qcut_'+target_str, axis=1)
    
    # Test-train split. Using 70/30% split.
    X_train, X_test, y_train, y_test = train_test_split(df_2, df[target_str], train_size=0.70,
                                                    random_state=1)    
    
    # Standardizing training and testing data. Standardized separately to avoid information
    # leaking from the training set to the testing set. Categorical data not scaled.
    for i in X_train.columns.difference(CATEGORICAL):
        X_train[i] = StandardScaler().fit_transform(X_train[i].values.reshape(-1,1))

    for i in X_test.columns.difference(CATEGORICAL):
        X_test[i] = StandardScaler().fit_transform(X_test[i].values.reshape(-1,1))
        
    # GridSearchCV for estimator
    grid = GridSearchCV(model_estimator, grid_params, cv = 12, n_jobs = -1, verbose=True)
    grid.fit(X_train, y_train)
    
    # Print out best score and respective parameters
    print('Best score from GridSearchCV for estimator '+model_estimator_str+' is ',grid.best_score_)
    #print "Best model parameters from GridSearch are ",grid.best_estimator_.get_params()
    
    # Save model to disk
    dill.dump(grid, open('GridSearchCV_model_selection_'+model_estimator_str+'_eps_'+eps_str+'_qcut_'+qcut_str+'_target_'+target_str, 'wb'))
    dill.dump(grid.best_estimator_, open('GridSearchCV_model_selection_best_'+model_estimator_str+'_eps_'+eps_str+'_qcut_'+qcut_str+'_target_'+target_str, 'wb'))

    # Predicted target values
    y_pred = grid.predict(X_test)
   
    # Store regression metrics
    exp_var_score = explained_variance_score(y_test, y_pred)
    #r2 = r2_score(y_test, y_pred)
    mae = mean_absolute_error(y_test, y_pred)
    mse = mean_squared_error(y_test, y_pred)    

    # Create dataframe to save regression metrics
    df_combination = pd.DataFrame(columns = {'estimator','r2_score','exp_var_score',
                                             'mae','mse','process_time'})
    estimator_list_lst = []
    exp_var_score_lst = []
    r2_score_lst = []
    mae_lst = []
    mse_lst = []
    process_time_lst = []
    
    # Save metrics to separate lists for inclusion in dataframe. Saving directly to dataframe
    # resulted in typeerrors being flagged.
    estimator_list_lst.append(model_estimator_str)
    exp_var_score_lst.append(exp_var_score)
    r2_score_lst.append(grid.best_score_) # best_score here is the r2_score, since GridSearchCV
                                          # uses the default score metrics from the estimator 
    mae_lst.append(mae)
    mse_lst.append(mse)    
    process_time_lst.append(time.time()-start)
    
    # Add lists as series to dataframe, and save file to disk.
    df_combination['estimator'] = estimator_list_lst
    df_combination['exp_var_score'] = exp_var_score_lst
    df_combination['r2_score'] = r2_score_lst
    df_combination['mae'] = mae_lst
    df_combination['mse'] = mse_lst
    df_combination['process_time'] = process_time_lst
    df_combination.to_csv('GridSearchCV_model_selection_'+model_estimator_str+'_eps_'+eps_str+'_qcut_'+qcut_str+'_target_'+target_str+'_regression_metrics.csv')
    
    # Print run time
    print("\nBuild and Validation took {:0.3f} seconds\n".format(time.time()-start))   

In [14]:
gridsearch_model_selection(df_2,SGDRegressor(),'SGDR',sgdr_parameters,'01',CATEGORICAL,'1,1','DOMP')
gridsearch_model_selection(df_2,TheilSenRegressor(),'TSR',tsr_parameters,'01',CATEGORICAL,'1,1','DOMP')
gridsearch_model_selection(df_2,PassiveAggressiveRegressor(),'PAR',par_parameters,'01',CATEGORICAL,'1,1','DOMP')
gridsearch_model_selection(df_2,HuberRegressor(),'HR',hr_parameters,'01',CATEGORICAL,'1,1','DOMP')
gridsearch_model_selection(df_2,LassoCV(),'LCV',lcv_parameters,'01',CATEGORICAL,'1,1','DOMP')
gridsearch_model_selection(df_2,RidgeCV(),'RCV',rcv_parameters,'01',CATEGORICAL,'1,1','DOMP')
gridsearch_model_selection(df_2,ElasticNetCV(),'ENCV',encv_parameters,'01',CATEGORICAL,'1,1','DOMP')
gridsearch_model_selection(df_2,KNeighborsRegressor(),'KNNR',knnr_parameters,'01',CATEGORICAL,'1,1','DOMP')
gridsearch_model_selection(df_2,RandomForestRegressor(),'RFR',rfr_parameters,'01',CATEGORICAL,'1,1','DOMP')
gridsearch_model_selection(df_2,ExtraTreesRegressor(),'ETR',etr_parameters,'01',CATEGORICAL,'1,1','DOMP')
gridsearch_model_selection(df_2,GradientBoostingRegressor(),'GBR',gbr_parameters,'01',CATEGORICAL,'1,1','DOMP')
gridsearch_model_selection(df_2,SVR(),'SVR',svr_parameters,'01',CATEGORICAL,'1,1','DOMP')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Fitting 12 folds for each of 60 candidates, totalling 720 fits


[Parallel(n_jobs=-1)]: Done  76 tasks      | elapsed:    3.5s
[Parallel(n_jobs=-1)]: Done 376 tasks      | elapsed:   16.0s
[Parallel(n_jobs=-1)]: Done 720 out of 720 | elapsed:   31.0s finished
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Best score from GridSearchCV for estimator SGDR is  0.0989913480871

Build and Validation took 33.239 seconds



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Fitting 12 folds for each of 2 candidates, totalling 24 fits


[Parallel(n_jobs=-1)]: Done  24 out of  24 | elapsed:  1.7min finished


Best score from GridSearchCV for estimator TSR is  0.0924763568806

Build and Validation took 107.388 seconds



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Fitting 12 folds for each of 40 candidates, totalling 480 fits


[Parallel(n_jobs=-1)]: Done  76 tasks      | elapsed:    3.4s
[Parallel(n_jobs=-1)]: Done 376 tasks      | elapsed:   14.1s
[Parallel(n_jobs=-1)]: Done 480 out of 480 | elapsed:   17.5s finished


Best score from GridSearchCV for estimator PAR is  0.0811737652717

Build and Validation took 20.044 seconds



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Fitting 12 folds for each of 10 candidates, totalling 120 fits


[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:    5.2s
[Parallel(n_jobs=-1)]: Done 120 out of 120 | elapsed:   12.8s finished


Best score from GridSearchCV for estimator HR is  0.00663109202387

Build and Validation took 15.091 seconds



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Fitting 12 folds for each of 6 candidates, totalling 72 fits


[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:   11.2s
[Parallel(n_jobs=-1)]: Done  72 out of  72 | elapsed:   17.9s finished


Best score from GridSearchCV for estimator LCV is  0.10780770643

Build and Validation took 20.415 seconds



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Fitting 12 folds for each of 2 candidates, totalling 24 fits


[Parallel(n_jobs=-1)]: Done  24 out of  24 | elapsed:    6.8s finished


Best score from GridSearchCV for estimator RCV is  0.107808716234

Build and Validation took 9.234 seconds



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Fitting 12 folds for each of 24 candidates, totalling 288 fits


[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:   11.6s
[Parallel(n_jobs=-1)]: Done 192 tasks      | elapsed:   48.1s
[Parallel(n_jobs=-1)]: Done 288 out of 288 | elapsed:  1.1min finished


Best score from GridSearchCV for estimator ENCV is  0.108424140164

Build and Validation took 70.370 seconds



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Fitting 12 folds for each of 40 candidates, totalling 480 fits


[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:   58.6s
[Parallel(n_jobs=-1)]: Done 192 tasks      | elapsed:  4.8min
[Parallel(n_jobs=-1)]: Done 442 tasks      | elapsed:  9.2min
[Parallel(n_jobs=-1)]: Done 480 out of 480 | elapsed:  9.8min finished


Best score from GridSearchCV for estimator KNNR is  0.116250792895

Build and Validation took 592.425 seconds



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Fitting 12 folds for each of 3 candidates, totalling 36 fits


[Parallel(n_jobs=-1)]: Done  36 out of  36 | elapsed:  1.8min finished


Best score from GridSearchCV for estimator RFR is  0.125658075903

Build and Validation took 119.405 seconds



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Fitting 12 folds for each of 3 candidates, totalling 36 fits


[Parallel(n_jobs=-1)]: Done  36 out of  36 | elapsed:   37.3s finished


Best score from GridSearchCV for estimator ETR is  0.138451110804

Build and Validation took 42.085 seconds



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Fitting 12 folds for each of 18 candidates, totalling 216 fits


[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:   16.5s
[Parallel(n_jobs=-1)]: Done 192 tasks      | elapsed:  1.2min
[Parallel(n_jobs=-1)]: Done 216 out of 216 | elapsed:  1.4min finished


Best score from GridSearchCV for estimator GBR is  0.123413827241

Build and Validation took 87.779 seconds



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Fitting 12 folds for each of 18 candidates, totalling 216 fits


[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:  3.5min
[Parallel(n_jobs=-1)]: Done 192 tasks      | elapsed: 15.8min
[Parallel(n_jobs=-1)]: Done 216 out of 216 | elapsed: 17.7min finished


Best score from GridSearchCV for estimator SVR is  -0.0415979933107

Build and Validation took 1073.747 seconds

