After exploring the data, we're going to find of much of it can be relevant for our decision tree. This is a critical point for every Data Science project, since too much train data can easily result in bad model generalisation (accuracy on test/real/unseen observations). 

In this Ipython notebook, I am going to cover Random Forest. Random Forest is a versatile machine learning method capable of performing both regression and classification tasks. It also undertakes dimensional reduction methods, treats missing values, outlier values and other essential steps of data exploration, and does a fairly good job. It is a type of ensemble learning method, where a group of weak models combine to form a powerful model.

Advantages of Random Forest:

1) This algorithm can solve both type of problems i.e. classification and regression and does a decent estimation at both fronts.
      
2) One of benefits of Random forest which excites me most is, <B> the power of handle large data set with higher dimensionality. It can handle thousands of input variables and identify most significant variables so it is considered as one of the dimensionality reduction methods. </B>  Further, the model outputs Importance of variable, which can be a very handy feature (on some random data set).

3) It has an effective method for estimating missing data and maintains accuracy when a large proportion of the data are missing.

4) It has methods for balancing errors in data sets where classes are imbalanced.

5) The capabilities of the above can be extended to unlabeled data, leading to unsupervised clustering, data views and outlier detection.

6) Random Forest involves sampling of the input data with replacement called as bootstrap sampling. Here one third of the data is not used for training and can be used to testing. These are called the out of bag samples. Error estimated on these out of bag samples is known as out of bag error. Study of error estimates by Out of bag, gives evidence to show that the out-of-bag estimate is as accurate as using a test set of the same size as the training set. Therefore, using the out-of-bag error estimate removes the need for a set aside test set.





In [9]:
import warnings
warnings.filterwarnings("ignore")

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from sklearn import preprocessing
from sklearn.model_selection import cross_val_score
from sklearn import metrics
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import skew
from scipy.stats.stats import pearsonr


%config InlineBackend.figure_format = 'retina' #set 'png' here when working on notebook
%matplotlib inline

In [10]:
train_df = pd.read_csv('train_cleaned.csv')
test_df  = pd.read_csv('test_cleaned.csv')

id = test_df['Id']
train_df.drop('Id',axis = 1, inplace = True)
test_df.drop('Id',axis = 1 , inplace = True)
y_train = train_df['SalePrice']
x_train = train_df.drop('SalePrice', axis = 1)

test_df.head()


Unnamed: 0,MSSubClass,LotFrontage,LotArea,OverallQual,OverallCond,YearBuilt,YearRemodAdd,MasVnrArea,BsmtFinSF1,BsmtFinSF2,...,Functional_Min2,BldgType_Twnhs,RoofStyle_Mansard,RoofMatl_CompShg,SaleCondition_Partial,GarageCond_Ex,Functional_Maj2,SaleType_New,GarageType_BuiltIn,Exterior2nd_CBlock
0,3.044522,80,9.360741,5,6,1961,1961,0.0,6.150603,4.976734,...,0,0,0,1,0,0,0,0,0,0
1,3.044522,81,9.565775,6,6,1958,1958,4.691348,6.828712,0.0,...,0,0,0,1,0,0,0,0,0,0
2,4.110874,74,9.534668,5,5,1997,1998,0.0,6.674561,0.0,...,0,0,0,1,0,0,0,0,0,0
3,4.110874,78,9.208238,6,6,1998,1998,3.044522,6.401917,0.0,...,0,0,0,1,0,0,0,0,0,0
4,4.795791,43,8.518392,8,5,1992,1992,0.0,5.575949,0.0,...,0,0,0,1,0,0,0,0,0,0


In [11]:
y_train.head()
feat_labels = train_df.columns.values
#feat_labels
x_train.isnull().values.any()

False

Since, Random Forest is good regression method which has an ability to handle a large amount of high-dimensional data ( here we have ~ 280 columns as features), we are going to construct a function which handles the features and list the features which are important in prediction.Also , I am going to construct a function which fits a given model to the train data and cross validates via K-Fold Cross Validation Scheme 

In [12]:
from sklearn.ensemble import RandomForestRegressor

def printFeature(model,x_train, y_train , performCV = True , cvFolds = 10):
    
    model.fit(x_train,y_train)
    feature_importances = model.feature_importances_
    feat_labels = [ x for x in x_train.columns.values]
    feat_imp = pd.Series(feature_importances,feat_labels).sort_values(ascending=False)
    #print(feat_imp.dtypes)
    #plt.bar(feat_labels[:],feat_imp[:])
    plt.ylabel('Feature Importances')
    #plt.show()
    
def modelFit(model,x_train,y_train,performCV = True , cvFolds = 10):
    
    model.fit(x_train,y_train)
    x_train_predictions = model.predict(x_train)
    
    if performCV == True :
        cv_score_meanSqError = cross_val_score(model,x_train,y_train,cv= cvFolds,scoring = 'mean_squared_error')
        cv_score_r2 = cross_val_score(model,x_train,y_train,cv= cvFolds,scoring = 'r2')
        
    print(" Model Report")
    print(" Mean Squared Error = %.4g" % metrics.mean_squared_error(y_train.values,x_train_predictions))
    print(" R2 score = %.4g" % metrics.r2_score(y_train.values,x_train_predictions))
          
    
    print(" CV Score (Mean Square Error) : Mean - %.7g | Std - %.7g | Min - %.7g | Max - %.7g" % (np.mean(cv_score_meanSqError),np.std(cv_score_meanSqError),np.min(cv_score_meanSqError),np.max(cv_score_meanSqError)))
    print(" CV Score ( R square        ) : Mean - %.7g | Std - %.7g | Min - %.7g | Max - %.7g" % (np.mean(cv_score_r2),np.std(cv_score_r2),np.min(cv_score_r2),np.max(cv_score_r2)))




In [13]:
#defining the model
n_estimators = 1000
random_state = 0
n_jobs = -1
rf = RandomForestRegressor(n_estimators = n_estimators,random_state = random_state,n_jobs = n_jobs) 
#printFeature(rf,x_train,y_train)
modelFit(rf,x_train,y_train,performCV = True, cvFolds = 5)

 Model Report
 Mean Squared Error = 0.002689
 R2 score = 0.9831
 CV Score (Mean Square Error) : Mean - -0.02019539 | Std - 0.002728923 | Min - -0.02373028 | Max - -0.01596594
 CV Score ( R square        ) : Mean - 0.8733335 | Std - 0.0115757 | Min - 0.8545422 | Max - 0.8896669


In [14]:
from sklearn.grid_search import GridSearchCV

<B> Parameter Tunign of the Random Forest Model </B> :

We will use grid search to identify the optimal parameters of our random forest model. Because our training dataset is quite small, we can get away with testing a wider range of hyperparameter values.I will discuss the results of this grid search. I will tune :

n_estimators = Signifies the number of trees used in random forest

and then tune tree specific paramters here :
1) min_samples_split
2) min_smaples_leaf
3) max_depth
4) min_leaf_nodes
5) max_features
6) loss function

In [128]:
#Tuning the no. of trees used in the regresor

param_test1 =  {'n_estimators':np.arange(1000,10000,500)}
rfr = RandomForestRegressor(min_samples_split=2,min_samples_leaf=1,max_depth=8,max_features='sqrt',random_state=10)
gridSearch1 =  GridSearchCV(estimator = rfr, 
                       param_grid = param_test1, scoring='r2',n_jobs=4,iid=False, cv=5)
gridSearch1.fit(x_train,y_train)

GridSearchCV(cv=5, error_score='raise',
       estimator=RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=8,
           max_features='sqrt', max_leaf_nodes=None,
           min_impurity_split=1e-07, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           n_estimators=10, n_jobs=1, oob_score=False, random_state=10,
           verbose=0, warm_start=False),
       fit_params={}, iid=False, n_jobs=4,
       param_grid={'n_estimators': array([1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500, 6000,
       6500, 7000, 7500, 8000, 8500, 9000, 9500])},
       pre_dispatch='2*n_jobs', refit=True, scoring='r2', verbose=0)

In [127]:
gridSearch1.grid_scores_, gridSearch1.best_params_ , gridSearch1.best_score_

([mean: 0.48082, std: 0.01574, params: {'n_estimators': 1000},
  mean: 0.48570, std: 0.01560, params: {'n_estimators': 1500},
  mean: 0.48465, std: 0.01530, params: {'n_estimators': 2000},
  mean: 0.48475, std: 0.01519, params: {'n_estimators': 2500},
  mean: 0.48500, std: 0.01495, params: {'n_estimators': 3000},
  mean: 0.48415, std: 0.01530, params: {'n_estimators': 3500},
  mean: 0.48442, std: 0.01527, params: {'n_estimators': 4000},
  mean: 0.48381, std: 0.01536, params: {'n_estimators': 4500},
  mean: 0.48366, std: 0.01545, params: {'n_estimators': 5000},
  mean: 0.48314, std: 0.01547, params: {'n_estimators': 5500},
  mean: 0.48250, std: 0.01538, params: {'n_estimators': 6000},
  mean: 0.48232, std: 0.01545, params: {'n_estimators': 6500},
  mean: 0.48238, std: 0.01544, params: {'n_estimators': 7000},
  mean: 0.48216, std: 0.01550, params: {'n_estimators': 7500},
  mean: 0.48204, std: 0.01549, params: {'n_estimators': 8000},
  mean: 0.48163, std: 0.01548, params: {'n_estimators':

In [8]:
# Tuning max_depth and min_samples_split

param_test2 =  {'max_depth':np.arange(7,9,1), 'min_samples_split':np.arange(1,11,2)}
rfr = RandomForestRegressor(n_estimators=1500,max_features='sqrt', random_state=0)
gridSearch2 = GridSearchCV(estimator = rfr , param_grid = param_test2 , scoring = 'r2',n_jobs = -1 , iid= False , cv =5)
gridSearch2.fit(x_train,y_train)

JoblibValueError: JoblibValueError
___________________________________________________________________________
Multiprocessing exception:
...........................................................................
/root/anaconda3/lib/python3.5/runpy.py in _run_module_as_main(mod_name='ipykernel.__main__', alter_argv=1)
    165         sys.exit(msg)
    166     main_globals = sys.modules["__main__"].__dict__
    167     if alter_argv:
    168         sys.argv[0] = mod_spec.origin
    169     return _run_code(code, main_globals, None,
--> 170                      "__main__", mod_spec)
        mod_spec = ModuleSpec(name='ipykernel.__main__', loader=<_f...b/python3.5/site-packages/ipykernel/__main__.py')
    171 
    172 def run_module(mod_name, init_globals=None,
    173                run_name=None, alter_sys=False):
    174     """Execute a module's code without importing it

...........................................................................
/root/anaconda3/lib/python3.5/runpy.py in _run_code(code=<code object <module> at 0x7f9255bca5d0, file "/...3.5/site-packages/ipykernel/__main__.py", line 1>, run_globals={'__builtins__': <module 'builtins' (built-in)>, '__cached__': '/root/anaconda3/lib/python3.5/site-packages/ipykernel/__pycache__/__main__.cpython-35.pyc', '__doc__': None, '__file__': '/root/anaconda3/lib/python3.5/site-packages/ipykernel/__main__.py', '__loader__': <_frozen_importlib_external.SourceFileLoader object>, '__name__': '__main__', '__package__': 'ipykernel', '__spec__': ModuleSpec(name='ipykernel.__main__', loader=<_f...b/python3.5/site-packages/ipykernel/__main__.py'), 'app': <module 'ipykernel.kernelapp' from '/root/anacon.../python3.5/site-packages/ipykernel/kernelapp.py'>}, init_globals=None, mod_name='__main__', mod_spec=ModuleSpec(name='ipykernel.__main__', loader=<_f...b/python3.5/site-packages/ipykernel/__main__.py'), pkg_name='ipykernel', script_name=None)
     80                        __cached__ = cached,
     81                        __doc__ = None,
     82                        __loader__ = loader,
     83                        __package__ = pkg_name,
     84                        __spec__ = mod_spec)
---> 85     exec(code, run_globals)
        code = <code object <module> at 0x7f9255bca5d0, file "/...3.5/site-packages/ipykernel/__main__.py", line 1>
        run_globals = {'__builtins__': <module 'builtins' (built-in)>, '__cached__': '/root/anaconda3/lib/python3.5/site-packages/ipykernel/__pycache__/__main__.cpython-35.pyc', '__doc__': None, '__file__': '/root/anaconda3/lib/python3.5/site-packages/ipykernel/__main__.py', '__loader__': <_frozen_importlib_external.SourceFileLoader object>, '__name__': '__main__', '__package__': 'ipykernel', '__spec__': ModuleSpec(name='ipykernel.__main__', loader=<_f...b/python3.5/site-packages/ipykernel/__main__.py'), 'app': <module 'ipykernel.kernelapp' from '/root/anacon.../python3.5/site-packages/ipykernel/kernelapp.py'>}
     86     return run_globals
     87 
     88 def _run_module_code(code, init_globals=None,
     89                     mod_name=None, mod_spec=None,

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/ipykernel/__main__.py in <module>()
      1 
      2 
----> 3 
      4 if __name__ == '__main__':
      5     from ipykernel import kernelapp as app
      6     app.launch_new_instance()
      7 
      8 
      9 
     10 

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/traitlets/config/application.py in launch_instance(cls=<class 'ipykernel.kernelapp.IPKernelApp'>, argv=None, **kwargs={})
    587         
    588         If a global instance already exists, this reinitializes and starts it
    589         """
    590         app = cls.instance(**kwargs)
    591         app.initialize(argv)
--> 592         app.start()
        app.start = <bound method IPKernelApp.start of <ipykernel.kernelapp.IPKernelApp object>>
    593 
    594 #-----------------------------------------------------------------------------
    595 # utility functions, for convenience
    596 #-----------------------------------------------------------------------------

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/ipykernel/kernelapp.py in start(self=<ipykernel.kernelapp.IPKernelApp object>)
    398         
    399         if self.poller is not None:
    400             self.poller.start()
    401         self.kernel.start()
    402         try:
--> 403             ioloop.IOLoop.instance().start()
    404         except KeyboardInterrupt:
    405             pass
    406 
    407 launch_new_instance = IPKernelApp.launch_instance

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/zmq/eventloop/ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
    146             PollIOLoop.configure(ZMQIOLoop)
    147         return PollIOLoop.instance()
    148     
    149     def start(self):
    150         try:
--> 151             super(ZMQIOLoop, self).start()
        self.start = <bound method ZMQIOLoop.start of <zmq.eventloop.ioloop.ZMQIOLoop object>>
    152         except ZMQError as e:
    153             if e.errno == ETERM:
    154                 # quietly return on ETERM
    155                 pass

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/tornado/ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
    861                 self._events.update(event_pairs)
    862                 while self._events:
    863                     fd, events = self._events.popitem()
    864                     try:
    865                         fd_obj, handler_func = self._handlers[fd]
--> 866                         handler_func(fd_obj, events)
        handler_func = <function wrap.<locals>.null_wrapper>
        fd_obj = <zmq.sugar.socket.Socket object>
        events = 1
    867                     except (OSError, IOError) as e:
    868                         if errno_from_exception(e) == errno.EPIPE:
    869                             # Happens when the client closes the connection
    870                             pass

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/tornado/stack_context.py in null_wrapper(*args=(<zmq.sugar.socket.Socket object>, 1), **kwargs={})
    270         # Fast path when there are no active contexts.
    271         def null_wrapper(*args, **kwargs):
    272             try:
    273                 current_state = _state.contexts
    274                 _state.contexts = cap_contexts[0]
--> 275                 return fn(*args, **kwargs)
        args = (<zmq.sugar.socket.Socket object>, 1)
        kwargs = {}
    276             finally:
    277                 _state.contexts = current_state
    278         null_wrapper._wrapped = True
    279         return null_wrapper

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py in _handle_events(self=<zmq.eventloop.zmqstream.ZMQStream object>, fd=<zmq.sugar.socket.Socket object>, events=1)
    428             # dispatch events:
    429             if events & IOLoop.ERROR:
    430                 gen_log.error("got POLLERR event on ZMQStream, which doesn't make sense")
    431                 return
    432             if events & IOLoop.READ:
--> 433                 self._handle_recv()
        self._handle_recv = <bound method ZMQStream._handle_recv of <zmq.eventloop.zmqstream.ZMQStream object>>
    434                 if not self.socket:
    435                     return
    436             if events & IOLoop.WRITE:
    437                 self._handle_send()

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py in _handle_recv(self=<zmq.eventloop.zmqstream.ZMQStream object>)
    460                 gen_log.error("RECV Error: %s"%zmq.strerror(e.errno))
    461         else:
    462             if self._recv_callback:
    463                 callback = self._recv_callback
    464                 # self._recv_callback = None
--> 465                 self._run_callback(callback, msg)
        self._run_callback = <bound method ZMQStream._run_callback of <zmq.eventloop.zmqstream.ZMQStream object>>
        callback = <function wrap.<locals>.null_wrapper>
        msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
    466                 
    467         # self.update_state()
    468         
    469 

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py in _run_callback(self=<zmq.eventloop.zmqstream.ZMQStream object>, callback=<function wrap.<locals>.null_wrapper>, *args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
    402         close our socket."""
    403         try:
    404             # Use a NullContext to ensure that all StackContexts are run
    405             # inside our blanket exception handler rather than outside.
    406             with stack_context.NullContext():
--> 407                 callback(*args, **kwargs)
        callback = <function wrap.<locals>.null_wrapper>
        args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
        kwargs = {}
    408         except:
    409             gen_log.error("Uncaught exception, closing connection.",
    410                           exc_info=True)
    411             # Close the socket on an uncaught exception from a user callback

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/tornado/stack_context.py in null_wrapper(*args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
    270         # Fast path when there are no active contexts.
    271         def null_wrapper(*args, **kwargs):
    272             try:
    273                 current_state = _state.contexts
    274                 _state.contexts = cap_contexts[0]
--> 275                 return fn(*args, **kwargs)
        args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
        kwargs = {}
    276             finally:
    277                 _state.contexts = current_state
    278         null_wrapper._wrapped = True
    279         return null_wrapper

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/ipykernel/kernelbase.py in dispatcher(msg=[<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>])
    255         if self.control_stream:
    256             self.control_stream.on_recv(self.dispatch_control, copy=False)
    257 
    258         def make_dispatcher(stream):
    259             def dispatcher(msg):
--> 260                 return self.dispatch_shell(stream, msg)
        msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
    261             return dispatcher
    262 
    263         for s in self.shell_streams:
    264             s.on_recv(make_dispatcher(s), copy=False)

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/ipykernel/kernelbase.py in dispatch_shell(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, msg={'buffers': [], 'content': {'allow_stdin': True, 'code': '# Tuning max_depth and min_samples_split\n\nparam_...= False , cv =5)\ngridSearch2.fit(x_train,y_train)', 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': '2017-03-09T20:06:48.087588', 'msg_id': '9634A8315E1449F288A2EE8E8B8AE923', 'msg_type': 'execute_request', 'session': 'CE7ADA908716473F9D6167D438276510', 'username': 'username', 'version': '5.0'}, 'metadata': {}, 'msg_id': '9634A8315E1449F288A2EE8E8B8AE923', 'msg_type': 'execute_request', 'parent_header': {}})
    207             self.log.error("UNKNOWN MESSAGE TYPE: %r", msg_type)
    208         else:
    209             self.log.debug("%s: %s", msg_type, msg)
    210             self.pre_handler_hook()
    211             try:
--> 212                 handler(stream, idents, msg)
        handler = <bound method Kernel.execute_request of <ipykernel.ipkernel.IPythonKernel object>>
        stream = <zmq.eventloop.zmqstream.ZMQStream object>
        idents = [b'CE7ADA908716473F9D6167D438276510']
        msg = {'buffers': [], 'content': {'allow_stdin': True, 'code': '# Tuning max_depth and min_samples_split\n\nparam_...= False , cv =5)\ngridSearch2.fit(x_train,y_train)', 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': '2017-03-09T20:06:48.087588', 'msg_id': '9634A8315E1449F288A2EE8E8B8AE923', 'msg_type': 'execute_request', 'session': 'CE7ADA908716473F9D6167D438276510', 'username': 'username', 'version': '5.0'}, 'metadata': {}, 'msg_id': '9634A8315E1449F288A2EE8E8B8AE923', 'msg_type': 'execute_request', 'parent_header': {}}
    213             except Exception:
    214                 self.log.error("Exception in message handler:", exc_info=True)
    215             finally:
    216                 self.post_handler_hook()

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/ipykernel/kernelbase.py in execute_request(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, ident=[b'CE7ADA908716473F9D6167D438276510'], parent={'buffers': [], 'content': {'allow_stdin': True, 'code': '# Tuning max_depth and min_samples_split\n\nparam_...= False , cv =5)\ngridSearch2.fit(x_train,y_train)', 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': '2017-03-09T20:06:48.087588', 'msg_id': '9634A8315E1449F288A2EE8E8B8AE923', 'msg_type': 'execute_request', 'session': 'CE7ADA908716473F9D6167D438276510', 'username': 'username', 'version': '5.0'}, 'metadata': {}, 'msg_id': '9634A8315E1449F288A2EE8E8B8AE923', 'msg_type': 'execute_request', 'parent_header': {}})
    365         if not silent:
    366             self.execution_count += 1
    367             self._publish_execute_input(code, parent, self.execution_count)
    368 
    369         reply_content = self.do_execute(code, silent, store_history,
--> 370                                         user_expressions, allow_stdin)
        user_expressions = {}
        allow_stdin = True
    371 
    372         # Flush output before sending the reply.
    373         sys.stdout.flush()
    374         sys.stderr.flush()

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/ipykernel/ipkernel.py in do_execute(self=<ipykernel.ipkernel.IPythonKernel object>, code='# Tuning max_depth and min_samples_split\n\nparam_...= False , cv =5)\ngridSearch2.fit(x_train,y_train)', silent=False, store_history=True, user_expressions={}, allow_stdin=True)
    170 
    171         reply_content = {}
    172         # FIXME: the shell calls the exception handler itself.
    173         shell._reply_content = None
    174         try:
--> 175             shell.run_cell(code, store_history=store_history, silent=silent)
        shell.run_cell = <bound method InteractiveShell.run_cell of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        code = '# Tuning max_depth and min_samples_split\n\nparam_...= False , cv =5)\ngridSearch2.fit(x_train,y_train)'
        store_history = True
        silent = False
    176         except:
    177             status = u'error'
    178             # FIXME: this code right now isn't being used yet by default,
    179             # because the run_cell() call above directly fires off exception

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_cell(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, raw_cell='# Tuning max_depth and min_samples_split\n\nparam_...= False , cv =5)\ngridSearch2.fit(x_train,y_train)', store_history=True, silent=False, shell_futures=True)
   2897                 self.displayhook.exec_result = result
   2898 
   2899                 # Execute the user code
   2900                 interactivity = "none" if silent else self.ast_node_interactivity
   2901                 self.run_ast_nodes(code_ast.body, cell_name,
-> 2902                    interactivity=interactivity, compiler=compiler, result=result)
        interactivity = 'last_expr'
        compiler = <IPython.core.compilerop.CachingCompiler object>
   2903 
   2904                 # Reset this so later displayed values do not modify the
   2905                 # ExecutionResult
   2906                 self.displayhook.exec_result = None

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_ast_nodes(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, nodelist=[<_ast.Assign object>, <_ast.Assign object>, <_ast.Assign object>, <_ast.Expr object>], cell_name='<ipython-input-8-ca3d5ac98d81>', interactivity='last', compiler=<IPython.core.compilerop.CachingCompiler object>, result=<IPython.core.interactiveshell.ExecutionResult object>)
   3007                     return True
   3008 
   3009             for i, node in enumerate(to_run_interactive):
   3010                 mod = ast.Interactive([node])
   3011                 code = compiler(mod, cell_name, "single")
-> 3012                 if self.run_code(code, result):
        self.run_code = <bound method InteractiveShell.run_code of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        code = <code object <module> at 0x7f92176896f0, file "<ipython-input-8-ca3d5ac98d81>", line 6>
        result = <IPython.core.interactiveshell.ExecutionResult object>
   3013                     return True
   3014 
   3015             # Flush softspace
   3016             if softspace(sys.stdout, 0):

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_code(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, code_obj=<code object <module> at 0x7f92176896f0, file "<ipython-input-8-ca3d5ac98d81>", line 6>, result=<IPython.core.interactiveshell.ExecutionResult object>)
   3061         outflag = 1  # happens in more places, so it's easier as default
   3062         try:
   3063             try:
   3064                 self.hooks.pre_run_code_hook()
   3065                 #rprint('Running code', repr(code_obj)) # dbg
-> 3066                 exec(code_obj, self.user_global_ns, self.user_ns)
        code_obj = <code object <module> at 0x7f92176896f0, file "<ipython-input-8-ca3d5ac98d81>", line 6>
        self.user_global_ns = {'GridSearchCV': <class 'sklearn.grid_search.GridSearchCV'>, 'In': ['', '# Tuning max_depth and min_samples_split\n\nparam_...= False , cv =5)\ngridSearch2.fit(x_train,y_train)', 'import warnings\nwarnings.filterwarnings("ignore"...tebook")\nget_ipython().magic(\'matplotlib inline\')', "train_df = pd.read_csv('train_cleaned.csv')\ntest...in_df.drop('SalePrice', axis = 1)\n\ntest_df.head()", 'y_train.head()\nfeat_labels = train_df.columns.values\n#feat_labels\nx_train.isnull().values.any()', 'from sklearn.ensemble import RandomForestRegress...ore_r2),np.min(cv_score_r2),np.max(cv_score_r2)))', 'from sklearn.grid_search import GridSearchCV', '# Tuning max_depth and min_samples_split\n\nparam_...= False , cv =5)\ngridSearch2.fit(x_train,y_train)', '# Tuning max_depth and min_samples_split\n\nparam_...= False , cv =5)\ngridSearch2.fit(x_train,y_train)'], 'Out': {3:    MSSubClass  LotFrontage   LotArea  OverallQua...  0                   0  

[5 rows x 254 columns], 4: False}, 'RandomForestRegressor': <class 'sklearn.ensemble.forest.RandomForestRegressor'>, '_': False, '_3':    MSSubClass  LotFrontage   LotArea  OverallQua...  0                   0  

[5 rows x 254 columns], '_4': False, '__':    MSSubClass  LotFrontage   LotArea  OverallQua...  0                   0  

[5 rows x 254 columns], '___': '', '__builtin__': <module 'builtins' (built-in)>, ...}
        self.user_ns = {'GridSearchCV': <class 'sklearn.grid_search.GridSearchCV'>, 'In': ['', '# Tuning max_depth and min_samples_split\n\nparam_...= False , cv =5)\ngridSearch2.fit(x_train,y_train)', 'import warnings\nwarnings.filterwarnings("ignore"...tebook")\nget_ipython().magic(\'matplotlib inline\')', "train_df = pd.read_csv('train_cleaned.csv')\ntest...in_df.drop('SalePrice', axis = 1)\n\ntest_df.head()", 'y_train.head()\nfeat_labels = train_df.columns.values\n#feat_labels\nx_train.isnull().values.any()', 'from sklearn.ensemble import RandomForestRegress...ore_r2),np.min(cv_score_r2),np.max(cv_score_r2)))', 'from sklearn.grid_search import GridSearchCV', '# Tuning max_depth and min_samples_split\n\nparam_...= False , cv =5)\ngridSearch2.fit(x_train,y_train)', '# Tuning max_depth and min_samples_split\n\nparam_...= False , cv =5)\ngridSearch2.fit(x_train,y_train)'], 'Out': {3:    MSSubClass  LotFrontage   LotArea  OverallQua...  0                   0  

[5 rows x 254 columns], 4: False}, 'RandomForestRegressor': <class 'sklearn.ensemble.forest.RandomForestRegressor'>, '_': False, '_3':    MSSubClass  LotFrontage   LotArea  OverallQua...  0                   0  

[5 rows x 254 columns], '_4': False, '__':    MSSubClass  LotFrontage   LotArea  OverallQua...  0                   0  

[5 rows x 254 columns], '___': '', '__builtin__': <module 'builtins' (built-in)>, ...}
   3067             finally:
   3068                 # Reset our crash handler in place
   3069                 sys.excepthook = old_excepthook
   3070         except SystemExit as e:

...........................................................................
/home/saksham/Machine Learning/Kaggle/Housing Prices Prediction/<ipython-input-8-ca3d5ac98d81> in <module>()
      1 # Tuning max_depth and min_samples_split
      2 
      3 param_test2 =  {'max_depth':np.arange(7,9,1), 'min_samples_split':np.arange(1,11,2)}
      4 rfr = RandomForestRegressor(n_estimators=1500,max_features='sqrt', random_state=0)
      5 gridSearch2 = GridSearchCV(estimator = rfr , param_grid = param_test2 , scoring = 'r2',n_jobs = -1 , iid= False , cv =5)
----> 6 gridSearch2.fit(x_train,y_train)
      7 
      8 
      9 
     10 

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/grid_search.py in fit(self=GridSearchCV(cv=5, error_score='raise',
       e...='2*n_jobs', refit=True, scoring='r2', verbose=0), X=      MSSubClass  LotFrontage    LotArea  Overal...                   0  

[1460 rows x 254 columns], y=0       12.247699
1       12.109016
2       12.3...1459    11.901590
Name: SalePrice, dtype: float64)
    824         y : array-like, shape = [n_samples] or [n_samples, n_output], optional
    825             Target relative to X for classification or regression;
    826             None for unsupervised learning.
    827 
    828         """
--> 829         return self._fit(X, y, ParameterGrid(self.param_grid))
        self._fit = <bound method BaseSearchCV._fit of GridSearchCV(...'2*n_jobs', refit=True, scoring='r2', verbose=0)>
        X =       MSSubClass  LotFrontage    LotArea  Overal...                   0  

[1460 rows x 254 columns]
        y = 0       12.247699
1       12.109016
2       12.3...1459    11.901590
Name: SalePrice, dtype: float64
        self.param_grid = {'max_depth': array([7, 8]), 'min_samples_split': array([1, 3, 5, 7, 9])}
    830 
    831 
    832 class RandomizedSearchCV(BaseSearchCV):
    833     """Randomized search on hyper parameters.

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/grid_search.py in _fit(self=GridSearchCV(cv=5, error_score='raise',
       e...='2*n_jobs', refit=True, scoring='r2', verbose=0), X=      MSSubClass  LotFrontage    LotArea  Overal...                   0  

[1460 rows x 254 columns], y=0       12.247699
1       12.109016
2       12.3...1459    11.901590
Name: SalePrice, dtype: float64, parameter_iterable=<sklearn.grid_search.ParameterGrid object>)
    568         )(
    569             delayed(_fit_and_score)(clone(base_estimator), X, y, self.scorer_,
    570                                     train, test, self.verbose, parameters,
    571                                     self.fit_params, return_parameters=True,
    572                                     error_score=self.error_score)
--> 573                 for parameters in parameter_iterable
        parameters = undefined
        parameter_iterable = <sklearn.grid_search.ParameterGrid object>
    574                 for train, test in cv)
    575 
    576         # Out is a list of triplet: score, estimator, n_test_samples
    577         n_fits = len(out)

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=Parallel(n_jobs=-1), iterable=<generator object BaseSearchCV._fit.<locals>.<genexpr>>)
    763             if pre_dispatch == "all" or n_jobs == 1:
    764                 # The iterable was consumed all at once by the above for loop.
    765                 # No need to wait for async callbacks to trigger to
    766                 # consumption.
    767                 self._iterating = False
--> 768             self.retrieve()
        self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=-1)>
    769             # Make sure that we get a last message telling us we are done
    770             elapsed_time = time.time() - self._start_time
    771             self._print('Done %3i out of %3i | elapsed: %s finished',
    772                         (len(self._output), len(self._output),

---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
ValueError                                         Thu Mar  9 20:06:51 2017
PID: 3372                         Python 3.5.0: /root/anaconda3/bin/python3
...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=<sklearn.externals.joblib.parallel.BatchedCalls object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        self.items = [(<function _fit_and_score>, (RandomForestRegressor(bootstrap=True, criterion=..._state=0,
           verbose=0, warm_start=False),       MSSubClass  LotFrontage    LotArea  Overal...                   0  

[1460 rows x 254 columns], 0       12.247699
1       12.109016
2       12.3...1459    11.901590
Name: SalePrice, dtype: float64, make_scorer(r2_score), array([ 292,  293,  294, ..., 1457, 1458, 1459]), array([  0,   1,   2,   3,   4,   5,   6,   7,  ..., 284, 285,
       286, 287, 288, 289, 290, 291]), 0, {'max_depth': 7, 'min_samples_split': 1}, {}), {'error_score': 'raise', 'return_parameters': True})]
    132 
    133     def __len__(self):
    134         return self._size
    135 

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in <listcomp>(.0=<list_iterator object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <function _fit_and_score>
        args = (RandomForestRegressor(bootstrap=True, criterion=..._state=0,
           verbose=0, warm_start=False),       MSSubClass  LotFrontage    LotArea  Overal...                   0  

[1460 rows x 254 columns], 0       12.247699
1       12.109016
2       12.3...1459    11.901590
Name: SalePrice, dtype: float64, make_scorer(r2_score), array([ 292,  293,  294, ..., 1457, 1458, 1459]), array([  0,   1,   2,   3,   4,   5,   6,   7,  ..., 284, 285,
       286, 287, 288, 289, 290, 291]), 0, {'max_depth': 7, 'min_samples_split': 1}, {})
        kwargs = {'error_score': 'raise', 'return_parameters': True}
    132 
    133     def __len__(self):
    134         return self._size
    135 

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/cross_validation.py in _fit_and_score(estimator=RandomForestRegressor(bootstrap=True, criterion=..._state=0,
           verbose=0, warm_start=False), X=      MSSubClass  LotFrontage    LotArea  Overal...                   0  

[1460 rows x 254 columns], y=0       12.247699
1       12.109016
2       12.3...1459    11.901590
Name: SalePrice, dtype: float64, scorer=make_scorer(r2_score), train=array([ 292,  293,  294, ..., 1457, 1458, 1459]), test=array([  0,   1,   2,   3,   4,   5,   6,   7,  ..., 284, 285,
       286, 287, 288, 289, 290, 291]), verbose=0, parameters={'max_depth': 7, 'min_samples_split': 1}, fit_params={}, return_train_score=False, return_parameters=True, error_score='raise')
   1660 
   1661     try:
   1662         if y_train is None:
   1663             estimator.fit(X_train, **fit_params)
   1664         else:
-> 1665             estimator.fit(X_train, y_train, **fit_params)
        estimator.fit = <bound method BaseForest.fit of RandomForestRegr...state=0,
           verbose=0, warm_start=False)>
        X_train =       MSSubClass  LotFrontage    LotArea  Overal...                   0  

[1168 rows x 254 columns]
        y_train = 292     11.782960
293     12.367345
294     12.0...1459    11.901590
Name: SalePrice, dtype: float64
        fit_params = {}
   1666 
   1667     except Exception as e:
   1668         if error_score == 'raise':
   1669             raise

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/ensemble/forest.py in fit(self=RandomForestRegressor(bootstrap=True, criterion=..._state=0,
           verbose=0, warm_start=False), X=array([[ 3.93182564,  4.1108737 ,  9.3422451 , ....        0.        ,  0.        ]], dtype=float32), y=array([[ 11.78296024],
       [ 12.36734505],
  ...],
       [ 11.86446927],
       [ 11.90159023]]), sample_weight=None)
    321             trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
    322                              backend="threading")(
    323                 delayed(_parallel_build_trees)(
    324                     t, self, X, y, sample_weight, i, len(trees),
    325                     verbose=self.verbose, class_weight=self.class_weight)
--> 326                 for i, t in enumerate(trees))
        i = 1499
    327 
    328             # Collect newly grown trees
    329             self.estimators_.extend(trees)
    330 

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=Parallel(n_jobs=1), iterable=<generator object BaseForest.fit.<locals>.<genexpr>>)
    753         self.n_completed_tasks = 0
    754         try:
    755             # Only set self._iterating to True if at least a batch
    756             # was dispatched. In particular this covers the edge
    757             # case of Parallel used with an exhausted iterator.
--> 758             while self.dispatch_one_batch(iterator):
        self.dispatch_one_batch = <bound method Parallel.dispatch_one_batch of Parallel(n_jobs=1)>
        iterator = <generator object BaseForest.fit.<locals>.<genexpr>>
    759                 self._iterating = True
    760             else:
    761                 self._iterating = False
    762 

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in dispatch_one_batch(self=Parallel(n_jobs=1), iterator=<generator object BaseForest.fit.<locals>.<genexpr>>)
    603             tasks = BatchedCalls(itertools.islice(iterator, batch_size))
    604             if len(tasks) == 0:
    605                 # No more tasks available in the iterator: tell caller to stop.
    606                 return False
    607             else:
--> 608                 self._dispatch(tasks)
        self._dispatch = <bound method Parallel._dispatch of Parallel(n_jobs=1)>
        tasks = <sklearn.externals.joblib.parallel.BatchedCalls object>
    609                 return True
    610 
    611     def _print(self, msg, msg_args):
    612         """Display the message on stout or stderr depending on verbosity"""

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in _dispatch(self=Parallel(n_jobs=1), batch=<sklearn.externals.joblib.parallel.BatchedCalls object>)
    566         self.n_dispatched_tasks += len(batch)
    567         self.n_dispatched_batches += 1
    568 
    569         dispatch_timestamp = time.time()
    570         cb = BatchCompletionCallBack(dispatch_timestamp, len(batch), self)
--> 571         job = self._backend.apply_async(batch, callback=cb)
        job = undefined
        self._backend.apply_async = <bound method SequentialBackend.apply_async of <...lib._parallel_backends.SequentialBackend object>>
        batch = <sklearn.externals.joblib.parallel.BatchedCalls object>
        cb = <sklearn.externals.joblib.parallel.BatchCompletionCallBack object>
    572         self._jobs.append(job)
    573 
    574     def dispatch_next(self):
    575         """Dispatch more data for parallel processing

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/_parallel_backends.py in apply_async(self=<sklearn.externals.joblib._parallel_backends.SequentialBackend object>, func=<sklearn.externals.joblib.parallel.BatchedCalls object>, callback=<sklearn.externals.joblib.parallel.BatchCompletionCallBack object>)
    104             raise ValueError('n_jobs == 0 in Parallel has no meaning')
    105         return 1
    106 
    107     def apply_async(self, func, callback=None):
    108         """Schedule a func to be run"""
--> 109         result = ImmediateResult(func)
        result = undefined
        func = <sklearn.externals.joblib.parallel.BatchedCalls object>
    110         if callback:
    111             callback(result)
    112         return result
    113 

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/_parallel_backends.py in __init__(self=<sklearn.externals.joblib._parallel_backends.ImmediateResult object>, batch=<sklearn.externals.joblib.parallel.BatchedCalls object>)
    321 
    322 class ImmediateResult(object):
    323     def __init__(self, batch):
    324         # Don't delay the application, to avoid keeping the input
    325         # arguments in memory
--> 326         self.results = batch()
        self.results = undefined
        batch = <sklearn.externals.joblib.parallel.BatchedCalls object>
    327 
    328     def get(self):
    329         return self.results
    330 

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=<sklearn.externals.joblib.parallel.BatchedCalls object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        self.items = [(<function _parallel_build_trees>, (DecisionTreeRegressor(criterion='mse', max_depth...         random_state=209652396, splitter='best'), RandomForestRegressor(bootstrap=True, criterion=..._state=0,
           verbose=0, warm_start=False), array([[ 3.93182564,  4.1108737 ,  9.3422451 , ....        0.        ,  0.        ]], dtype=float32), array([[ 11.78296024],
       [ 12.36734505],
  ...],
       [ 11.86446927],
       [ 11.90159023]]), None, 0, 1500), {'class_weight': None, 'verbose': 0})]
    132 
    133     def __len__(self):
    134         return self._size
    135 

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in <listcomp>(.0=<list_iterator object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <function _parallel_build_trees>
        args = (DecisionTreeRegressor(criterion='mse', max_depth...         random_state=209652396, splitter='best'), RandomForestRegressor(bootstrap=True, criterion=..._state=0,
           verbose=0, warm_start=False), array([[ 3.93182564,  4.1108737 ,  9.3422451 , ....        0.        ,  0.        ]], dtype=float32), array([[ 11.78296024],
       [ 12.36734505],
  ...],
       [ 11.86446927],
       [ 11.90159023]]), None, 0, 1500)
        kwargs = {'class_weight': None, 'verbose': 0}
    132 
    133     def __len__(self):
    134         return self._size
    135 

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/ensemble/forest.py in _parallel_build_trees(tree=DecisionTreeRegressor(criterion='mse', max_depth...         random_state=209652396, splitter='best'), forest=RandomForestRegressor(bootstrap=True, criterion=..._state=0,
           verbose=0, warm_start=False), X=array([[ 3.93182564,  4.1108737 ,  9.3422451 , ....        0.        ,  0.        ]], dtype=float32), y=array([[ 11.78296024],
       [ 12.36734505],
  ...],
       [ 11.86446927],
       [ 11.90159023]]), sample_weight=None, tree_idx=0, n_trees=1500, verbose=0, class_weight=None)
    115                 warnings.simplefilter('ignore', DeprecationWarning)
    116                 curr_sample_weight *= compute_sample_weight('auto', y, indices)
    117         elif class_weight == 'balanced_subsample':
    118             curr_sample_weight *= compute_sample_weight('balanced', y, indices)
    119 
--> 120         tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
        tree.fit = <bound method DecisionTreeRegressor.fit of Decis...        random_state=209652396, splitter='best')>
        X = array([[ 3.93182564,  4.1108737 ,  9.3422451 , ....        0.        ,  0.        ]], dtype=float32)
        y = array([[ 11.78296024],
       [ 12.36734505],
  ...],
       [ 11.86446927],
       [ 11.90159023]])
        sample_weight = None
        curr_sample_weight = array([ 1.,  1.,  1., ...,  0.,  2.,  1.])
    121     else:
    122         tree.fit(X, y, sample_weight=sample_weight, check_input=False)
    123 
    124     return tree

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/tree/tree.py in fit(self=DecisionTreeRegressor(criterion='mse', max_depth...         random_state=209652396, splitter='best'), X=array([[ 3.93182564,  4.1108737 ,  9.3422451 , ....        0.        ,  0.        ]], dtype=float32), y=array([[ 11.78296024],
       [ 12.36734505],
  ...],
       [ 11.86446927],
       [ 11.90159023]]), sample_weight=array([ 1.,  1.,  1., ...,  0.,  2.,  1.]), check_input=False, X_idx_sorted=None)
   1024 
   1025         super(DecisionTreeRegressor, self).fit(
   1026             X, y,
   1027             sample_weight=sample_weight,
   1028             check_input=check_input,
-> 1029             X_idx_sorted=X_idx_sorted)
        X_idx_sorted = None
   1030         return self
   1031 
   1032 
   1033 class ExtraTreeClassifier(DecisionTreeClassifier):

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/tree/tree.py in fit(self=DecisionTreeRegressor(criterion='mse', max_depth...         random_state=209652396, splitter='best'), X=array([[ 3.93182564,  4.1108737 ,  9.3422451 , ....        0.        ,  0.        ]], dtype=float32), y=array([[ 11.78296024],
       [ 12.36734505],
  ...],
       [ 11.86446927],
       [ 11.90159023]]), sample_weight=array([ 1.,  1.,  1., ...,  0.,  2.,  1.]), check_input=False, X_idx_sorted=None)
    194 
    195         if isinstance(self.min_samples_split, (numbers.Integral, np.integer)):
    196             if not 2 <= self.min_samples_split:
    197                 raise ValueError("min_samples_split must be at least 2 "
    198                                  "or in (0, 1], got %s"
--> 199                                  % self.min_samples_split)
        self.min_samples_split = 1
    200             min_samples_split = self.min_samples_split
    201         else:  # float
    202             if not 0. < self.min_samples_split <= 1.:
    203                 raise ValueError("min_samples_split must be at least 2 "

ValueError: min_samples_split must be at least 2 or in (0, 1], got 1
___________________________________________________________________________

In [118]:
gridSearch2.grid_scores_, gridSearch2.best_params_ , gridSearch2.best_score_

([mean: 0.70907, std: 0.01400, params: {'max_depth': 5, 'min_samples_split': 200},
  mean: 0.54627, std: 0.01892, params: {'max_depth': 5, 'min_samples_split': 400},
  mean: 0.46493, std: 0.01550, params: {'max_depth': 5, 'min_samples_split': 600},
  mean: -0.00338, std: 0.00473, params: {'max_depth': 5, 'min_samples_split': 800},
  mean: -0.00338, std: 0.00473, params: {'max_depth': 5, 'min_samples_split': 1000},
  mean: 0.71050, std: 0.01405, params: {'max_depth': 7, 'min_samples_split': 200},
  mean: 0.54627, std: 0.01892, params: {'max_depth': 7, 'min_samples_split': 400},
  mean: 0.46493, std: 0.01550, params: {'max_depth': 7, 'min_samples_split': 600},
  mean: -0.00338, std: 0.00473, params: {'max_depth': 7, 'min_samples_split': 800},
  mean: -0.00338, std: 0.00473, params: {'max_depth': 7, 'min_samples_split': 1000},
  mean: 0.71055, std: 0.01413, params: {'max_depth': 9, 'min_samples_split': 200},
  mean: 0.54627, std: 0.01892, params: {'max_depth': 9, 'min_samples_split': 400}

We obtain a really bad score here of 0.71. We will try come back to optimisation of min_sample_split later on . We choose max_depth = 9 as optimum paramter.

In [122]:
param_test1 =  {'min_samples_leaf':np.arange(1,71,6), 'min_samples_split':np.arange(1,200,20)}
rfr = RandomForestRegressor(n_estimators=60,max_depth = 9,max_features='sqrt', random_state=10)
gridSearch2 = GridSearchCV(estimator = rfr , param_grid = param_test1 , scoring = 'r2',n_jobs = -1 , iid= False , cv =5)
gridSearch2.fit(x_train,y_train)

JoblibValueError: JoblibValueError
___________________________________________________________________________
Multiprocessing exception:
...........................................................................
/root/anaconda3/lib/python3.5/runpy.py in _run_module_as_main(mod_name='ipykernel.__main__', alter_argv=1)
    165         sys.exit(msg)
    166     main_globals = sys.modules["__main__"].__dict__
    167     if alter_argv:
    168         sys.argv[0] = mod_spec.origin
    169     return _run_code(code, main_globals, None,
--> 170                      "__main__", mod_spec)
        mod_spec = ModuleSpec(name='ipykernel.__main__', loader=<_f...b/python3.5/site-packages/ipykernel/__main__.py')
    171 
    172 def run_module(mod_name, init_globals=None,
    173                run_name=None, alter_sys=False):
    174     """Execute a module's code without importing it

...........................................................................
/root/anaconda3/lib/python3.5/runpy.py in _run_code(code=<code object <module> at 0x7fe15c64e5d0, file "/...3.5/site-packages/ipykernel/__main__.py", line 1>, run_globals={'__builtins__': <module 'builtins' (built-in)>, '__cached__': '/root/anaconda3/lib/python3.5/site-packages/ipykernel/__pycache__/__main__.cpython-35.pyc', '__doc__': None, '__file__': '/root/anaconda3/lib/python3.5/site-packages/ipykernel/__main__.py', '__loader__': <_frozen_importlib_external.SourceFileLoader object>, '__name__': '__main__', '__package__': 'ipykernel', '__spec__': ModuleSpec(name='ipykernel.__main__', loader=<_f...b/python3.5/site-packages/ipykernel/__main__.py'), 'app': <module 'ipykernel.kernelapp' from '/root/anacon.../python3.5/site-packages/ipykernel/kernelapp.py'>}, init_globals=None, mod_name='__main__', mod_spec=ModuleSpec(name='ipykernel.__main__', loader=<_f...b/python3.5/site-packages/ipykernel/__main__.py'), pkg_name='ipykernel', script_name=None)
     80                        __cached__ = cached,
     81                        __doc__ = None,
     82                        __loader__ = loader,
     83                        __package__ = pkg_name,
     84                        __spec__ = mod_spec)
---> 85     exec(code, run_globals)
        code = <code object <module> at 0x7fe15c64e5d0, file "/...3.5/site-packages/ipykernel/__main__.py", line 1>
        run_globals = {'__builtins__': <module 'builtins' (built-in)>, '__cached__': '/root/anaconda3/lib/python3.5/site-packages/ipykernel/__pycache__/__main__.cpython-35.pyc', '__doc__': None, '__file__': '/root/anaconda3/lib/python3.5/site-packages/ipykernel/__main__.py', '__loader__': <_frozen_importlib_external.SourceFileLoader object>, '__name__': '__main__', '__package__': 'ipykernel', '__spec__': ModuleSpec(name='ipykernel.__main__', loader=<_f...b/python3.5/site-packages/ipykernel/__main__.py'), 'app': <module 'ipykernel.kernelapp' from '/root/anacon.../python3.5/site-packages/ipykernel/kernelapp.py'>}
     86     return run_globals
     87 
     88 def _run_module_code(code, init_globals=None,
     89                     mod_name=None, mod_spec=None,

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/ipykernel/__main__.py in <module>()
      1 
      2 
----> 3 
      4 if __name__ == '__main__':
      5     from ipykernel import kernelapp as app
      6     app.launch_new_instance()
      7 
      8 
      9 
     10 

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/traitlets/config/application.py in launch_instance(cls=<class 'ipykernel.kernelapp.IPKernelApp'>, argv=None, **kwargs={})
    587         
    588         If a global instance already exists, this reinitializes and starts it
    589         """
    590         app = cls.instance(**kwargs)
    591         app.initialize(argv)
--> 592         app.start()
        app.start = <bound method IPKernelApp.start of <ipykernel.kernelapp.IPKernelApp object>>
    593 
    594 #-----------------------------------------------------------------------------
    595 # utility functions, for convenience
    596 #-----------------------------------------------------------------------------

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/ipykernel/kernelapp.py in start(self=<ipykernel.kernelapp.IPKernelApp object>)
    398         
    399         if self.poller is not None:
    400             self.poller.start()
    401         self.kernel.start()
    402         try:
--> 403             ioloop.IOLoop.instance().start()
    404         except KeyboardInterrupt:
    405             pass
    406 
    407 launch_new_instance = IPKernelApp.launch_instance

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/zmq/eventloop/ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
    146             PollIOLoop.configure(ZMQIOLoop)
    147         return PollIOLoop.instance()
    148     
    149     def start(self):
    150         try:
--> 151             super(ZMQIOLoop, self).start()
        self.start = <bound method ZMQIOLoop.start of <zmq.eventloop.ioloop.ZMQIOLoop object>>
    152         except ZMQError as e:
    153             if e.errno == ETERM:
    154                 # quietly return on ETERM
    155                 pass

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/tornado/ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
    861                 self._events.update(event_pairs)
    862                 while self._events:
    863                     fd, events = self._events.popitem()
    864                     try:
    865                         fd_obj, handler_func = self._handlers[fd]
--> 866                         handler_func(fd_obj, events)
        handler_func = <function wrap.<locals>.null_wrapper>
        fd_obj = <zmq.sugar.socket.Socket object>
        events = 1
    867                     except (OSError, IOError) as e:
    868                         if errno_from_exception(e) == errno.EPIPE:
    869                             # Happens when the client closes the connection
    870                             pass

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/tornado/stack_context.py in null_wrapper(*args=(<zmq.sugar.socket.Socket object>, 1), **kwargs={})
    270         # Fast path when there are no active contexts.
    271         def null_wrapper(*args, **kwargs):
    272             try:
    273                 current_state = _state.contexts
    274                 _state.contexts = cap_contexts[0]
--> 275                 return fn(*args, **kwargs)
        args = (<zmq.sugar.socket.Socket object>, 1)
        kwargs = {}
    276             finally:
    277                 _state.contexts = current_state
    278         null_wrapper._wrapped = True
    279         return null_wrapper

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py in _handle_events(self=<zmq.eventloop.zmqstream.ZMQStream object>, fd=<zmq.sugar.socket.Socket object>, events=1)
    428             # dispatch events:
    429             if events & IOLoop.ERROR:
    430                 gen_log.error("got POLLERR event on ZMQStream, which doesn't make sense")
    431                 return
    432             if events & IOLoop.READ:
--> 433                 self._handle_recv()
        self._handle_recv = <bound method ZMQStream._handle_recv of <zmq.eventloop.zmqstream.ZMQStream object>>
    434                 if not self.socket:
    435                     return
    436             if events & IOLoop.WRITE:
    437                 self._handle_send()

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py in _handle_recv(self=<zmq.eventloop.zmqstream.ZMQStream object>)
    460                 gen_log.error("RECV Error: %s"%zmq.strerror(e.errno))
    461         else:
    462             if self._recv_callback:
    463                 callback = self._recv_callback
    464                 # self._recv_callback = None
--> 465                 self._run_callback(callback, msg)
        self._run_callback = <bound method ZMQStream._run_callback of <zmq.eventloop.zmqstream.ZMQStream object>>
        callback = <function wrap.<locals>.null_wrapper>
        msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
    466                 
    467         # self.update_state()
    468         
    469 

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py in _run_callback(self=<zmq.eventloop.zmqstream.ZMQStream object>, callback=<function wrap.<locals>.null_wrapper>, *args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
    402         close our socket."""
    403         try:
    404             # Use a NullContext to ensure that all StackContexts are run
    405             # inside our blanket exception handler rather than outside.
    406             with stack_context.NullContext():
--> 407                 callback(*args, **kwargs)
        callback = <function wrap.<locals>.null_wrapper>
        args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
        kwargs = {}
    408         except:
    409             gen_log.error("Uncaught exception, closing connection.",
    410                           exc_info=True)
    411             # Close the socket on an uncaught exception from a user callback

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/tornado/stack_context.py in null_wrapper(*args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
    270         # Fast path when there are no active contexts.
    271         def null_wrapper(*args, **kwargs):
    272             try:
    273                 current_state = _state.contexts
    274                 _state.contexts = cap_contexts[0]
--> 275                 return fn(*args, **kwargs)
        args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
        kwargs = {}
    276             finally:
    277                 _state.contexts = current_state
    278         null_wrapper._wrapped = True
    279         return null_wrapper

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/ipykernel/kernelbase.py in dispatcher(msg=[<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>])
    255         if self.control_stream:
    256             self.control_stream.on_recv(self.dispatch_control, copy=False)
    257 
    258         def make_dispatcher(stream):
    259             def dispatcher(msg):
--> 260                 return self.dispatch_shell(stream, msg)
        msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
    261             return dispatcher
    262 
    263         for s in self.shell_streams:
    264             s.on_recv(make_dispatcher(s), copy=False)

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/ipykernel/kernelbase.py in dispatch_shell(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, msg={'buffers': [], 'content': {'allow_stdin': True, 'code': "param_test1 =  {'min_samples_leaf':np.arange(1,7...= False , cv =5)\ngridSearch2.fit(x_train,y_train)", 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': '2017-03-09T18:33:55.373574', 'msg_id': '01116CC4A91C44B6BDDF93064FB3914E', 'msg_type': 'execute_request', 'session': 'E6EEAAEE6A364ACD9ADEF28027FA3F37', 'username': 'username', 'version': '5.0'}, 'metadata': {}, 'msg_id': '01116CC4A91C44B6BDDF93064FB3914E', 'msg_type': 'execute_request', 'parent_header': {}})
    207             self.log.error("UNKNOWN MESSAGE TYPE: %r", msg_type)
    208         else:
    209             self.log.debug("%s: %s", msg_type, msg)
    210             self.pre_handler_hook()
    211             try:
--> 212                 handler(stream, idents, msg)
        handler = <bound method Kernel.execute_request of <ipykernel.ipkernel.IPythonKernel object>>
        stream = <zmq.eventloop.zmqstream.ZMQStream object>
        idents = [b'E6EEAAEE6A364ACD9ADEF28027FA3F37']
        msg = {'buffers': [], 'content': {'allow_stdin': True, 'code': "param_test1 =  {'min_samples_leaf':np.arange(1,7...= False , cv =5)\ngridSearch2.fit(x_train,y_train)", 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': '2017-03-09T18:33:55.373574', 'msg_id': '01116CC4A91C44B6BDDF93064FB3914E', 'msg_type': 'execute_request', 'session': 'E6EEAAEE6A364ACD9ADEF28027FA3F37', 'username': 'username', 'version': '5.0'}, 'metadata': {}, 'msg_id': '01116CC4A91C44B6BDDF93064FB3914E', 'msg_type': 'execute_request', 'parent_header': {}}
    213             except Exception:
    214                 self.log.error("Exception in message handler:", exc_info=True)
    215             finally:
    216                 self.post_handler_hook()

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/ipykernel/kernelbase.py in execute_request(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, ident=[b'E6EEAAEE6A364ACD9ADEF28027FA3F37'], parent={'buffers': [], 'content': {'allow_stdin': True, 'code': "param_test1 =  {'min_samples_leaf':np.arange(1,7...= False , cv =5)\ngridSearch2.fit(x_train,y_train)", 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': '2017-03-09T18:33:55.373574', 'msg_id': '01116CC4A91C44B6BDDF93064FB3914E', 'msg_type': 'execute_request', 'session': 'E6EEAAEE6A364ACD9ADEF28027FA3F37', 'username': 'username', 'version': '5.0'}, 'metadata': {}, 'msg_id': '01116CC4A91C44B6BDDF93064FB3914E', 'msg_type': 'execute_request', 'parent_header': {}})
    365         if not silent:
    366             self.execution_count += 1
    367             self._publish_execute_input(code, parent, self.execution_count)
    368 
    369         reply_content = self.do_execute(code, silent, store_history,
--> 370                                         user_expressions, allow_stdin)
        user_expressions = {}
        allow_stdin = True
    371 
    372         # Flush output before sending the reply.
    373         sys.stdout.flush()
    374         sys.stderr.flush()

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/ipykernel/ipkernel.py in do_execute(self=<ipykernel.ipkernel.IPythonKernel object>, code="param_test1 =  {'min_samples_leaf':np.arange(1,7...= False , cv =5)\ngridSearch2.fit(x_train,y_train)", silent=False, store_history=True, user_expressions={}, allow_stdin=True)
    170 
    171         reply_content = {}
    172         # FIXME: the shell calls the exception handler itself.
    173         shell._reply_content = None
    174         try:
--> 175             shell.run_cell(code, store_history=store_history, silent=silent)
        shell.run_cell = <bound method InteractiveShell.run_cell of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        code = "param_test1 =  {'min_samples_leaf':np.arange(1,7...= False , cv =5)\ngridSearch2.fit(x_train,y_train)"
        store_history = True
        silent = False
    176         except:
    177             status = u'error'
    178             # FIXME: this code right now isn't being used yet by default,
    179             # because the run_cell() call above directly fires off exception

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_cell(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, raw_cell="param_test1 =  {'min_samples_leaf':np.arange(1,7...= False , cv =5)\ngridSearch2.fit(x_train,y_train)", store_history=True, silent=False, shell_futures=True)
   2897                 self.displayhook.exec_result = result
   2898 
   2899                 # Execute the user code
   2900                 interactivity = "none" if silent else self.ast_node_interactivity
   2901                 self.run_ast_nodes(code_ast.body, cell_name,
-> 2902                    interactivity=interactivity, compiler=compiler, result=result)
        interactivity = 'last_expr'
        compiler = <IPython.core.compilerop.CachingCompiler object>
   2903 
   2904                 # Reset this so later displayed values do not modify the
   2905                 # ExecutionResult
   2906                 self.displayhook.exec_result = None

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_ast_nodes(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, nodelist=[<_ast.Assign object>, <_ast.Assign object>, <_ast.Assign object>, <_ast.Expr object>], cell_name='<ipython-input-122-60c5fb63bcef>', interactivity='last', compiler=<IPython.core.compilerop.CachingCompiler object>, result=<IPython.core.interactiveshell.ExecutionResult object>)
   3007                     return True
   3008 
   3009             for i, node in enumerate(to_run_interactive):
   3010                 mod = ast.Interactive([node])
   3011                 code = compiler(mod, cell_name, "single")
-> 3012                 if self.run_code(code, result):
        self.run_code = <bound method InteractiveShell.run_code of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        code = <code object <module> at 0x7fe11e458f60, file "<ipython-input-122-60c5fb63bcef>", line 4>
        result = <IPython.core.interactiveshell.ExecutionResult object>
   3013                     return True
   3014 
   3015             # Flush softspace
   3016             if softspace(sys.stdout, 0):

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_code(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, code_obj=<code object <module> at 0x7fe11e458f60, file "<ipython-input-122-60c5fb63bcef>", line 4>, result=<IPython.core.interactiveshell.ExecutionResult object>)
   3061         outflag = 1  # happens in more places, so it's easier as default
   3062         try:
   3063             try:
   3064                 self.hooks.pre_run_code_hook()
   3065                 #rprint('Running code', repr(code_obj)) # dbg
-> 3066                 exec(code_obj, self.user_global_ns, self.user_ns)
        code_obj = <code object <module> at 0x7fe11e458f60, file "<ipython-input-122-60c5fb63bcef>", line 4>
        self.user_global_ns = {'GridSearchCV': <class 'sklearn.grid_search.GridSearchCV'>, 'In': ['', "import pandas as pd\nimport numpy as np\n\ntrain_df... = pd.read_csv('test_cleaned.csv')\n\ntest_df.shape", "import numpy as np # linear algebra\nimport panda... = pd.read_csv('test_cleaned.csv')\n\ntest_df.shape", "train_df = pd.read_csv('train_cleaned.csv')\ntest... = pd.read_csv('test_cleaned.csv')\n\ntest_df.shape", 'import numpy as np # linear algebra\nimport panda...tebook")\nget_ipython().magic(\'matplotlib inline\')', 'from sklearn.ensemble import RandomForestRegressor', "train_df = pd.read_csv('train_cleaned.csv')\ntest...test_cleaned.csv')\n\ntest_df.shape\ntest_df.columns", "train_df = pd.read_csv('train_cleaned.csv')\ntest....csv')\n\ntest_df.shape\ntest_df['SalePrice'].head()", "train_df = pd.read_csv('train_cleaned.csv')\ntest....csv')\n\ntest_df.shape\ntest_df['SalePrice'].head()", "train_df = pd.read_csv('train_cleaned.csv')\ntest...= pd.read_csv('test_cleaned.csv')\n\ntest_df.head()", "train_df = pd.read_csv('train_cleaned.csv')\ntest...= pd.read_csv('test_cleaned.csv')\n\ntest_df.head()", 'import numpy as np # linear algebra\nimport panda...tebook")\nget_ipython().magic(\'matplotlib inline\')', "train_df = pd.read_csv('train_cleaned.csv')\ntest...= pd.read_csv('test_cleaned.csv')\n\ntest_df.head()", 'import numpy as np # linear algebra\nimport panda...tebook")\nget_ipython().magic(\'matplotlib inline\')', "train_df = pd.read_csv('train_cleaned.csv')\ntest...= pd.read_csv('test_cleaned.csv')\n\ntest_df.head()", "train_df = pd.read_csv('train_cleaned.csv')\ntest... pd.read_csv('test_cleaned.csv')\n\ntrain_df.head()", "train_df = pd.read_csv('train_cleaned.csv')\ntest...Id']\ntrain_df.drop('Id',axis = 1, inplace = True)", "train_df = pd.read_csv('train_cleaned.csv')\ntest...p('Id',axis = 1, inplace = True)\n\ntrain_df.head()", "train_df = pd.read_csv('train_cleaned.csv')\ntest...\ny_train = train_df['SalePrice']\n\ntrain_df.head()", '\ntrain_df.head()', ...], 'Out': {1: (1459, 255), 2: (1459, 255), 3: (1459, 255), 6: Index(['Id', 'MSSubClass', 'LotFrontage', 'LotAr...or2nd_CBlock'],
      dtype='object', length=255), 9:      Id  MSSubClass  LotFrontage   LotArea  Over...  0                   0  

[5 rows x 255 columns], 10:      Id  MSSubClass  LotFrontage   LotArea  Over...  0                   0  

[5 rows x 255 columns], 12:      Id  MSSubClass  LotFrontage   LotArea  Over...  0                   0  

[5 rows x 255 columns], 14:      Id  MSSubClass  LotFrontage   LotArea  Over...  0                   0  

[5 rows x 255 columns], 15:    Id  MSSubClass  LotFrontage   LotArea  Overal... 
4                   0  

[5 rows x 256 columns], 17:    MSSubClass  LotFrontage   LotArea  OverallQua...  0                   0  

[5 rows x 255 columns], ...}, 'RandomForestRegressor': <class 'sklearn.ensemble.forest.RandomForestRegressor'>, '_': ([mean: 0.80172, std: 0.01473, params: {'min_samples_leaf': 3, 'min_samples_split': 60}, mean: 0.77277, std: 0.01373, params: {'min_samples_leaf': 3, 'min_samples_split': 100}, mean: 0.74661, std: 0.01298, params: {'min_samples_leaf': 3, 'min_samples_split': 140}, mean: 0.72263, std: 0.01311, params: {'min_samples_leaf': 3, 'min_samples_split': 180}, mean: 0.78965, std: 0.00935, params: {'min_samples_leaf': 13, 'min_samples_split': 60}, mean: 0.76161, std: 0.01268, params: {'min_samples_leaf': 13, 'min_samples_split': 100}, mean: 0.74197, std: 0.01123, params: {'min_samples_leaf': 13, 'min_samples_split': 140}, mean: 0.71896, std: 0.01309, params: {'min_samples_leaf': 13, 'min_samples_split': 180}, mean: 0.77485, std: 0.01285, params: {'min_samples_leaf': 23, 'min_samples_split': 60}, mean: 0.75107, std: 0.01385, params: {'min_samples_leaf': 23, 'min_samples_split': 100}, mean: 0.73340, std: 0.01352, params: {'min_samples_leaf': 23, 'min_samples_split': 140}, mean: 0.71338, std: 0.01406, params: {'min_samples_leaf': 23, 'min_samples_split': 180}, mean: 0.75257, std: 0.01011, params: {'min_samples_leaf': 33, 'min_samples_split': 60}, mean: 0.74229, std: 0.01220, params: {'min_samples_leaf': 33, 'min_samples_split': 100}, mean: 0.72337, std: 0.01178, params: {'min_samples_leaf': 33, 'min_samples_split': 140}, mean: 0.70336, std: 0.01211, params: {'min_samples_leaf': 33, 'min_samples_split': 180}, mean: 0.73271, std: 0.01226, params: {'min_samples_leaf': 43, 'min_samples_split': 60}, mean: 0.72552, std: 0.01338, params: {'min_samples_leaf': 43, 'min_samples_split': 100}, mean: 0.71348, std: 0.01316, params: {'min_samples_leaf': 43, 'min_samples_split': 140}, mean: 0.69496, std: 0.01239, params: {'min_samples_leaf': 43, 'min_samples_split': 180}, ...], {'min_samples_leaf': 3, 'min_samples_split': 60}, 0.8017249264607569), '_1': (1459, 255), '_10':      Id  MSSubClass  LotFrontage   LotArea  Over...  0                   0  

[5 rows x 255 columns], '_113': {'max_depth': range(5, 16, 2), 'min_samples_split': range(200, 1001, 200)}, '_114': range(5, 16, 2), '_115': array([ 5,  7,  9, 11, 13, 15]), ...}
        self.user_ns = {'GridSearchCV': <class 'sklearn.grid_search.GridSearchCV'>, 'In': ['', "import pandas as pd\nimport numpy as np\n\ntrain_df... = pd.read_csv('test_cleaned.csv')\n\ntest_df.shape", "import numpy as np # linear algebra\nimport panda... = pd.read_csv('test_cleaned.csv')\n\ntest_df.shape", "train_df = pd.read_csv('train_cleaned.csv')\ntest... = pd.read_csv('test_cleaned.csv')\n\ntest_df.shape", 'import numpy as np # linear algebra\nimport panda...tebook")\nget_ipython().magic(\'matplotlib inline\')', 'from sklearn.ensemble import RandomForestRegressor', "train_df = pd.read_csv('train_cleaned.csv')\ntest...test_cleaned.csv')\n\ntest_df.shape\ntest_df.columns", "train_df = pd.read_csv('train_cleaned.csv')\ntest....csv')\n\ntest_df.shape\ntest_df['SalePrice'].head()", "train_df = pd.read_csv('train_cleaned.csv')\ntest....csv')\n\ntest_df.shape\ntest_df['SalePrice'].head()", "train_df = pd.read_csv('train_cleaned.csv')\ntest...= pd.read_csv('test_cleaned.csv')\n\ntest_df.head()", "train_df = pd.read_csv('train_cleaned.csv')\ntest...= pd.read_csv('test_cleaned.csv')\n\ntest_df.head()", 'import numpy as np # linear algebra\nimport panda...tebook")\nget_ipython().magic(\'matplotlib inline\')', "train_df = pd.read_csv('train_cleaned.csv')\ntest...= pd.read_csv('test_cleaned.csv')\n\ntest_df.head()", 'import numpy as np # linear algebra\nimport panda...tebook")\nget_ipython().magic(\'matplotlib inline\')', "train_df = pd.read_csv('train_cleaned.csv')\ntest...= pd.read_csv('test_cleaned.csv')\n\ntest_df.head()", "train_df = pd.read_csv('train_cleaned.csv')\ntest... pd.read_csv('test_cleaned.csv')\n\ntrain_df.head()", "train_df = pd.read_csv('train_cleaned.csv')\ntest...Id']\ntrain_df.drop('Id',axis = 1, inplace = True)", "train_df = pd.read_csv('train_cleaned.csv')\ntest...p('Id',axis = 1, inplace = True)\n\ntrain_df.head()", "train_df = pd.read_csv('train_cleaned.csv')\ntest...\ny_train = train_df['SalePrice']\n\ntrain_df.head()", '\ntrain_df.head()', ...], 'Out': {1: (1459, 255), 2: (1459, 255), 3: (1459, 255), 6: Index(['Id', 'MSSubClass', 'LotFrontage', 'LotAr...or2nd_CBlock'],
      dtype='object', length=255), 9:      Id  MSSubClass  LotFrontage   LotArea  Over...  0                   0  

[5 rows x 255 columns], 10:      Id  MSSubClass  LotFrontage   LotArea  Over...  0                   0  

[5 rows x 255 columns], 12:      Id  MSSubClass  LotFrontage   LotArea  Over...  0                   0  

[5 rows x 255 columns], 14:      Id  MSSubClass  LotFrontage   LotArea  Over...  0                   0  

[5 rows x 255 columns], 15:    Id  MSSubClass  LotFrontage   LotArea  Overal... 
4                   0  

[5 rows x 256 columns], 17:    MSSubClass  LotFrontage   LotArea  OverallQua...  0                   0  

[5 rows x 255 columns], ...}, 'RandomForestRegressor': <class 'sklearn.ensemble.forest.RandomForestRegressor'>, '_': ([mean: 0.80172, std: 0.01473, params: {'min_samples_leaf': 3, 'min_samples_split': 60}, mean: 0.77277, std: 0.01373, params: {'min_samples_leaf': 3, 'min_samples_split': 100}, mean: 0.74661, std: 0.01298, params: {'min_samples_leaf': 3, 'min_samples_split': 140}, mean: 0.72263, std: 0.01311, params: {'min_samples_leaf': 3, 'min_samples_split': 180}, mean: 0.78965, std: 0.00935, params: {'min_samples_leaf': 13, 'min_samples_split': 60}, mean: 0.76161, std: 0.01268, params: {'min_samples_leaf': 13, 'min_samples_split': 100}, mean: 0.74197, std: 0.01123, params: {'min_samples_leaf': 13, 'min_samples_split': 140}, mean: 0.71896, std: 0.01309, params: {'min_samples_leaf': 13, 'min_samples_split': 180}, mean: 0.77485, std: 0.01285, params: {'min_samples_leaf': 23, 'min_samples_split': 60}, mean: 0.75107, std: 0.01385, params: {'min_samples_leaf': 23, 'min_samples_split': 100}, mean: 0.73340, std: 0.01352, params: {'min_samples_leaf': 23, 'min_samples_split': 140}, mean: 0.71338, std: 0.01406, params: {'min_samples_leaf': 23, 'min_samples_split': 180}, mean: 0.75257, std: 0.01011, params: {'min_samples_leaf': 33, 'min_samples_split': 60}, mean: 0.74229, std: 0.01220, params: {'min_samples_leaf': 33, 'min_samples_split': 100}, mean: 0.72337, std: 0.01178, params: {'min_samples_leaf': 33, 'min_samples_split': 140}, mean: 0.70336, std: 0.01211, params: {'min_samples_leaf': 33, 'min_samples_split': 180}, mean: 0.73271, std: 0.01226, params: {'min_samples_leaf': 43, 'min_samples_split': 60}, mean: 0.72552, std: 0.01338, params: {'min_samples_leaf': 43, 'min_samples_split': 100}, mean: 0.71348, std: 0.01316, params: {'min_samples_leaf': 43, 'min_samples_split': 140}, mean: 0.69496, std: 0.01239, params: {'min_samples_leaf': 43, 'min_samples_split': 180}, ...], {'min_samples_leaf': 3, 'min_samples_split': 60}, 0.8017249264607569), '_1': (1459, 255), '_10':      Id  MSSubClass  LotFrontage   LotArea  Over...  0                   0  

[5 rows x 255 columns], '_113': {'max_depth': range(5, 16, 2), 'min_samples_split': range(200, 1001, 200)}, '_114': range(5, 16, 2), '_115': array([ 5,  7,  9, 11, 13, 15]), ...}
   3067             finally:
   3068                 # Reset our crash handler in place
   3069                 sys.excepthook = old_excepthook
   3070         except SystemExit as e:

...........................................................................
/home/saksham/Machine Learning/Kaggle/Housing Prices Prediction/<ipython-input-122-60c5fb63bcef> in <module>()
      1 
      2 
      3 param_test1 =  {'min_samples_leaf':np.arange(1,71,6), 'min_samples_split':np.arange(1,200,20)}
----> 4 rfr = RandomForestRegressor(n_estimators=60,max_depth = 9,max_features='sqrt', random_state=10)
      5 gridSearch2 = GridSearchCV(estimator = rfr , param_grid = param_test1 , scoring = 'r2',n_jobs = -1 , iid= False , cv =5)
      6 gridSearch2.fit(x_train,y_train)
      7 
      8 
      9 
     10 

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/grid_search.py in fit(self=GridSearchCV(cv=5, error_score='raise',
       e...='2*n_jobs', refit=True, scoring='r2', verbose=0), X=      MSSubClass  LotFrontage    LotArea  Overal...                   0  

[1460 rows x 254 columns], y=0       12.247699
1       12.109016
2       12.3...1459    11.901590
Name: SalePrice, dtype: float64)
    824         y : array-like, shape = [n_samples] or [n_samples, n_output], optional
    825             Target relative to X for classification or regression;
    826             None for unsupervised learning.
    827 
    828         """
--> 829         return self._fit(X, y, ParameterGrid(self.param_grid))
        self._fit = <bound method BaseSearchCV._fit of GridSearchCV(...'2*n_jobs', refit=True, scoring='r2', verbose=0)>
        X =       MSSubClass  LotFrontage    LotArea  Overal...                   0  

[1460 rows x 254 columns]
        y = 0       12.247699
1       12.109016
2       12.3...1459    11.901590
Name: SalePrice, dtype: float64
        self.param_grid = {'min_samples_leaf': array([ 1,  7, 13, 19, 25, 31, 37, 43, 49, 55, 61, 67]), 'min_samples_split': array([  1,  21,  41,  61,  81, 101, 121, 141, 161, 181])}
    830 
    831 
    832 class RandomizedSearchCV(BaseSearchCV):
    833     """Randomized search on hyper parameters.

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/grid_search.py in _fit(self=GridSearchCV(cv=5, error_score='raise',
       e...='2*n_jobs', refit=True, scoring='r2', verbose=0), X=      MSSubClass  LotFrontage    LotArea  Overal...                   0  

[1460 rows x 254 columns], y=0       12.247699
1       12.109016
2       12.3...1459    11.901590
Name: SalePrice, dtype: float64, parameter_iterable=<sklearn.grid_search.ParameterGrid object>)
    568         )(
    569             delayed(_fit_and_score)(clone(base_estimator), X, y, self.scorer_,
    570                                     train, test, self.verbose, parameters,
    571                                     self.fit_params, return_parameters=True,
    572                                     error_score=self.error_score)
--> 573                 for parameters in parameter_iterable
        parameters = undefined
        parameter_iterable = <sklearn.grid_search.ParameterGrid object>
    574                 for train, test in cv)
    575 
    576         # Out is a list of triplet: score, estimator, n_test_samples
    577         n_fits = len(out)

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=Parallel(n_jobs=-1), iterable=<generator object BaseSearchCV._fit.<locals>.<genexpr>>)
    763             if pre_dispatch == "all" or n_jobs == 1:
    764                 # The iterable was consumed all at once by the above for loop.
    765                 # No need to wait for async callbacks to trigger to
    766                 # consumption.
    767                 self._iterating = False
--> 768             self.retrieve()
        self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=-1)>
    769             # Make sure that we get a last message telling us we are done
    770             elapsed_time = time.time() - self._start_time
    771             self._print('Done %3i out of %3i | elapsed: %s finished',
    772                         (len(self._output), len(self._output),

---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
ValueError                                         Thu Mar  9 18:33:57 2017
PID: 1577                         Python 3.5.0: /root/anaconda3/bin/python3
...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=<sklearn.externals.joblib.parallel.BatchedCalls object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        self.items = [(<function _fit_and_score>, (RandomForestRegressor(bootstrap=True, criterion=...state=10,
           verbose=0, warm_start=False),       MSSubClass  LotFrontage    LotArea  Overal...                   0  

[1460 rows x 254 columns], 0       12.247699
1       12.109016
2       12.3...1459    11.901590
Name: SalePrice, dtype: float64, make_scorer(r2_score), array([ 292,  293,  294, ..., 1457, 1458, 1459]), array([  0,   1,   2,   3,   4,   5,   6,   7,  ..., 284, 285,
       286, 287, 288, 289, 290, 291]), 0, {'min_samples_leaf': 1, 'min_samples_split': 1}, {}), {'error_score': 'raise', 'return_parameters': True})]
    132 
    133     def __len__(self):
    134         return self._size
    135 

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in <listcomp>(.0=<list_iterator object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <function _fit_and_score>
        args = (RandomForestRegressor(bootstrap=True, criterion=...state=10,
           verbose=0, warm_start=False),       MSSubClass  LotFrontage    LotArea  Overal...                   0  

[1460 rows x 254 columns], 0       12.247699
1       12.109016
2       12.3...1459    11.901590
Name: SalePrice, dtype: float64, make_scorer(r2_score), array([ 292,  293,  294, ..., 1457, 1458, 1459]), array([  0,   1,   2,   3,   4,   5,   6,   7,  ..., 284, 285,
       286, 287, 288, 289, 290, 291]), 0, {'min_samples_leaf': 1, 'min_samples_split': 1}, {})
        kwargs = {'error_score': 'raise', 'return_parameters': True}
    132 
    133     def __len__(self):
    134         return self._size
    135 

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/cross_validation.py in _fit_and_score(estimator=RandomForestRegressor(bootstrap=True, criterion=...state=10,
           verbose=0, warm_start=False), X=      MSSubClass  LotFrontage    LotArea  Overal...                   0  

[1460 rows x 254 columns], y=0       12.247699
1       12.109016
2       12.3...1459    11.901590
Name: SalePrice, dtype: float64, scorer=make_scorer(r2_score), train=array([ 292,  293,  294, ..., 1457, 1458, 1459]), test=array([  0,   1,   2,   3,   4,   5,   6,   7,  ..., 284, 285,
       286, 287, 288, 289, 290, 291]), verbose=0, parameters={'min_samples_leaf': 1, 'min_samples_split': 1}, fit_params={}, return_train_score=False, return_parameters=True, error_score='raise')
   1660 
   1661     try:
   1662         if y_train is None:
   1663             estimator.fit(X_train, **fit_params)
   1664         else:
-> 1665             estimator.fit(X_train, y_train, **fit_params)
        estimator.fit = <bound method BaseForest.fit of RandomForestRegr...tate=10,
           verbose=0, warm_start=False)>
        X_train =       MSSubClass  LotFrontage    LotArea  Overal...                   0  

[1168 rows x 254 columns]
        y_train = 292     11.782960
293     12.367345
294     12.0...1459    11.901590
Name: SalePrice, dtype: float64
        fit_params = {}
   1666 
   1667     except Exception as e:
   1668         if error_score == 'raise':
   1669             raise

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/ensemble/forest.py in fit(self=RandomForestRegressor(bootstrap=True, criterion=...state=10,
           verbose=0, warm_start=False), X=array([[ 3.93182564,  4.1108737 ,  9.3422451 , ....        0.        ,  0.        ]], dtype=float32), y=array([[ 11.78296024],
       [ 12.36734505],
  ...],
       [ 11.86446927],
       [ 11.90159023]]), sample_weight=None)
    321             trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
    322                              backend="threading")(
    323                 delayed(_parallel_build_trees)(
    324                     t, self, X, y, sample_weight, i, len(trees),
    325                     verbose=self.verbose, class_weight=self.class_weight)
--> 326                 for i, t in enumerate(trees))
        i = 59
    327 
    328             # Collect newly grown trees
    329             self.estimators_.extend(trees)
    330 

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=Parallel(n_jobs=1), iterable=<generator object BaseForest.fit.<locals>.<genexpr>>)
    753         self.n_completed_tasks = 0
    754         try:
    755             # Only set self._iterating to True if at least a batch
    756             # was dispatched. In particular this covers the edge
    757             # case of Parallel used with an exhausted iterator.
--> 758             while self.dispatch_one_batch(iterator):
        self.dispatch_one_batch = <bound method Parallel.dispatch_one_batch of Parallel(n_jobs=1)>
        iterator = <generator object BaseForest.fit.<locals>.<genexpr>>
    759                 self._iterating = True
    760             else:
    761                 self._iterating = False
    762 

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in dispatch_one_batch(self=Parallel(n_jobs=1), iterator=<generator object BaseForest.fit.<locals>.<genexpr>>)
    603             tasks = BatchedCalls(itertools.islice(iterator, batch_size))
    604             if len(tasks) == 0:
    605                 # No more tasks available in the iterator: tell caller to stop.
    606                 return False
    607             else:
--> 608                 self._dispatch(tasks)
        self._dispatch = <bound method Parallel._dispatch of Parallel(n_jobs=1)>
        tasks = <sklearn.externals.joblib.parallel.BatchedCalls object>
    609                 return True
    610 
    611     def _print(self, msg, msg_args):
    612         """Display the message on stout or stderr depending on verbosity"""

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in _dispatch(self=Parallel(n_jobs=1), batch=<sklearn.externals.joblib.parallel.BatchedCalls object>)
    566         self.n_dispatched_tasks += len(batch)
    567         self.n_dispatched_batches += 1
    568 
    569         dispatch_timestamp = time.time()
    570         cb = BatchCompletionCallBack(dispatch_timestamp, len(batch), self)
--> 571         job = self._backend.apply_async(batch, callback=cb)
        job = undefined
        self._backend.apply_async = <bound method SequentialBackend.apply_async of <...lib._parallel_backends.SequentialBackend object>>
        batch = <sklearn.externals.joblib.parallel.BatchedCalls object>
        cb = <sklearn.externals.joblib.parallel.BatchCompletionCallBack object>
    572         self._jobs.append(job)
    573 
    574     def dispatch_next(self):
    575         """Dispatch more data for parallel processing

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/_parallel_backends.py in apply_async(self=<sklearn.externals.joblib._parallel_backends.SequentialBackend object>, func=<sklearn.externals.joblib.parallel.BatchedCalls object>, callback=<sklearn.externals.joblib.parallel.BatchCompletionCallBack object>)
    104             raise ValueError('n_jobs == 0 in Parallel has no meaning')
    105         return 1
    106 
    107     def apply_async(self, func, callback=None):
    108         """Schedule a func to be run"""
--> 109         result = ImmediateResult(func)
        result = undefined
        func = <sklearn.externals.joblib.parallel.BatchedCalls object>
    110         if callback:
    111             callback(result)
    112         return result
    113 

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/_parallel_backends.py in __init__(self=<sklearn.externals.joblib._parallel_backends.ImmediateResult object>, batch=<sklearn.externals.joblib.parallel.BatchedCalls object>)
    321 
    322 class ImmediateResult(object):
    323     def __init__(self, batch):
    324         # Don't delay the application, to avoid keeping the input
    325         # arguments in memory
--> 326         self.results = batch()
        self.results = undefined
        batch = <sklearn.externals.joblib.parallel.BatchedCalls object>
    327 
    328     def get(self):
    329         return self.results
    330 

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=<sklearn.externals.joblib.parallel.BatchedCalls object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        self.items = [(<function _parallel_build_trees>, (DecisionTreeRegressor(criterion='mse', max_depth...        random_state=1165313289, splitter='best'), RandomForestRegressor(bootstrap=True, criterion=...state=10,
           verbose=0, warm_start=False), array([[ 3.93182564,  4.1108737 ,  9.3422451 , ....        0.        ,  0.        ]], dtype=float32), array([[ 11.78296024],
       [ 12.36734505],
  ...],
       [ 11.86446927],
       [ 11.90159023]]), None, 0, 60), {'class_weight': None, 'verbose': 0})]
    132 
    133     def __len__(self):
    134         return self._size
    135 

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in <listcomp>(.0=<list_iterator object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <function _parallel_build_trees>
        args = (DecisionTreeRegressor(criterion='mse', max_depth...        random_state=1165313289, splitter='best'), RandomForestRegressor(bootstrap=True, criterion=...state=10,
           verbose=0, warm_start=False), array([[ 3.93182564,  4.1108737 ,  9.3422451 , ....        0.        ,  0.        ]], dtype=float32), array([[ 11.78296024],
       [ 12.36734505],
  ...],
       [ 11.86446927],
       [ 11.90159023]]), None, 0, 60)
        kwargs = {'class_weight': None, 'verbose': 0}
    132 
    133     def __len__(self):
    134         return self._size
    135 

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/ensemble/forest.py in _parallel_build_trees(tree=DecisionTreeRegressor(criterion='mse', max_depth...        random_state=1165313289, splitter='best'), forest=RandomForestRegressor(bootstrap=True, criterion=...state=10,
           verbose=0, warm_start=False), X=array([[ 3.93182564,  4.1108737 ,  9.3422451 , ....        0.        ,  0.        ]], dtype=float32), y=array([[ 11.78296024],
       [ 12.36734505],
  ...],
       [ 11.86446927],
       [ 11.90159023]]), sample_weight=None, tree_idx=0, n_trees=60, verbose=0, class_weight=None)
    115                 warnings.simplefilter('ignore', DeprecationWarning)
    116                 curr_sample_weight *= compute_sample_weight('auto', y, indices)
    117         elif class_weight == 'balanced_subsample':
    118             curr_sample_weight *= compute_sample_weight('balanced', y, indices)
    119 
--> 120         tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
        tree.fit = <bound method DecisionTreeRegressor.fit of Decis...       random_state=1165313289, splitter='best')>
        X = array([[ 3.93182564,  4.1108737 ,  9.3422451 , ....        0.        ,  0.        ]], dtype=float32)
        y = array([[ 11.78296024],
       [ 12.36734505],
  ...],
       [ 11.86446927],
       [ 11.90159023]])
        sample_weight = None
        curr_sample_weight = array([ 0.,  0.,  0., ...,  0.,  1.,  1.])
    121     else:
    122         tree.fit(X, y, sample_weight=sample_weight, check_input=False)
    123 
    124     return tree

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/tree/tree.py in fit(self=DecisionTreeRegressor(criterion='mse', max_depth...        random_state=1165313289, splitter='best'), X=array([[ 3.93182564,  4.1108737 ,  9.3422451 , ....        0.        ,  0.        ]], dtype=float32), y=array([[ 11.78296024],
       [ 12.36734505],
  ...],
       [ 11.86446927],
       [ 11.90159023]]), sample_weight=array([ 0.,  0.,  0., ...,  0.,  1.,  1.]), check_input=False, X_idx_sorted=None)
   1024 
   1025         super(DecisionTreeRegressor, self).fit(
   1026             X, y,
   1027             sample_weight=sample_weight,
   1028             check_input=check_input,
-> 1029             X_idx_sorted=X_idx_sorted)
        X_idx_sorted = None
   1030         return self
   1031 
   1032 
   1033 class ExtraTreeClassifier(DecisionTreeClassifier):

...........................................................................
/root/anaconda3/lib/python3.5/site-packages/sklearn/tree/tree.py in fit(self=DecisionTreeRegressor(criterion='mse', max_depth...        random_state=1165313289, splitter='best'), X=array([[ 3.93182564,  4.1108737 ,  9.3422451 , ....        0.        ,  0.        ]], dtype=float32), y=array([[ 11.78296024],
       [ 12.36734505],
  ...],
       [ 11.86446927],
       [ 11.90159023]]), sample_weight=array([ 0.,  0.,  0., ...,  0.,  1.,  1.]), check_input=False, X_idx_sorted=None)
    194 
    195         if isinstance(self.min_samples_split, (numbers.Integral, np.integer)):
    196             if not 2 <= self.min_samples_split:
    197                 raise ValueError("min_samples_split must be at least 2 "
    198                                  "or in (0, 1], got %s"
--> 199                                  % self.min_samples_split)
        self.min_samples_split = 1
    200             min_samples_split = self.min_samples_split
    201         else:  # float
    202             if not 0. < self.min_samples_split <= 1.:
    203                 raise ValueError("min_samples_split must be at least 2 "

ValueError: min_samples_split must be at least 2 or in (0, 1], got 1
___________________________________________________________________________

In [120]:
gridSearch2.grid_scores_, gridSearch2.best_params_ , gridSearch2.best_score_

([mean: 0.80172, std: 0.01473, params: {'min_samples_leaf': 3, 'min_samples_split': 60},
  mean: 0.77277, std: 0.01373, params: {'min_samples_leaf': 3, 'min_samples_split': 100},
  mean: 0.74661, std: 0.01298, params: {'min_samples_leaf': 3, 'min_samples_split': 140},
  mean: 0.72263, std: 0.01311, params: {'min_samples_leaf': 3, 'min_samples_split': 180},
  mean: 0.78965, std: 0.00935, params: {'min_samples_leaf': 13, 'min_samples_split': 60},
  mean: 0.76161, std: 0.01268, params: {'min_samples_leaf': 13, 'min_samples_split': 100},
  mean: 0.74197, std: 0.01123, params: {'min_samples_leaf': 13, 'min_samples_split': 140},
  mean: 0.71896, std: 0.01309, params: {'min_samples_leaf': 13, 'min_samples_split': 180},
  mean: 0.77485, std: 0.01285, params: {'min_samples_leaf': 23, 'min_samples_split': 60},
  mean: 0.75107, std: 0.01385, params: {'min_samples_leaf': 23, 'min_samples_split': 100},
  mean: 0.73340, std: 0.01352, params: {'min_samples_leaf': 23, 'min_samples_split': 140},
  mean