In this chapter we
will discuss a simple, nonlinear model for classification and regression tasks: the
decision tree. We'll use decision trees to build an ad blocker that can learn to classify
images on a web page as banner advertisements or page content. Finally, we will
introduce ensemble learning methods, which combine a set of models to produce an
estimator with better predictive performance than any of its component estimators.

# Decision Trees with scikit-learn

Let's use decision trees to create software that can block banner ads on web pages.
This program will predict whether each of the images on a web page is an
advertisement or article content. Images that are classified as being advertisements
could then be hidden using Cascading Style Sheets. We will train a decision tree
classifier using the Internet Advertisements Data Set from http://archive.ics.uci.edu/ml/datasets/Internet+Advertisements, which contains data for 3,279 images.
The proportions of the classes are skewed; 459 of the images are advertisements and
2,820 are content. Decision tree learning algorithms can produce biased trees from data
with unbalanced class proportions; we will evaluate a model on the unaltered data set
before deciding if it is worth balancing the training data by over- or under-sampling
instances. The explanatory variables are the dimensions of the image, words from the
containing page's URL, words from the image's URL, the image's alt text, the image's
anchor text, and a window of words surrounding the image tag. The response variable
is the image's class. The explanatory variables have already been transformed into
feature representations. The first three features are real numbers that encode the width,
height, and aspect ratio of the images. The remaining features encode binary term
frequencies for the text variables. In the following sample, we will grid search for the
hyperparameter values that produce the decision tree with the greatest accuracy,
and then evaluate the tree's performance on a test set:

In [3]:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.cross_validation import train_test_split
from sklearn.metrics import classification_report
from sklearn.pipeline import Pipeline
from sklearn.grid_search import GridSearchCV

if __name__ == '__main__':
    df = pd.read_csv('data/ad.data', header=None)
    explanatory_variable_columns = set(df.columns.values)
    response_variable_column = df[len(df.columns.values)-1]
    
    # The last column describes the targets
    explanatory_variable_columns.remove(len(df.columns.values)-1)
    
    # Encode advertisment as positive class and content as negative class
    y = [1 if e == 'ad.' else 0 for e in response_variable_column]
    X = df[list(explanatory_variable_columns)]
    
    '''
    More than one quarter of the instances are missing at least one of the values
    for the image's dimensions. These missing values are marked by whitespace and a
    question mark. We replaced the missing values with negative one, but we could have
    imputed the missing values; for instance, we could have replaced the missing height
    values with the average height value:
    '''
    X.replace(to_replace=' *\?', value=-1, regex=True, inplace=True)
    
    # Split data into training and testing set
    X_train, X_test, y_train, y_test = train_test_split(X, y)
    
    # set the criterion keyword argument to entropy to build the tree 
    # using the informationgain heuristic
    pipeline = Pipeline([('clf', DecisionTreeClassifier(criterion='entropy'))])
    
    # specify hyperparameters for grid search
    parameters = {
            'clf__max_depth': (150, 155, 160),
            'clf__min_samples_split': (1, 2, 3),
            'clf__min_samples_leaf': (1, 2, 3)
            }
    
    # set GridSearchCV to maximise the models f1 score
    grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=1, scoring='f1')
    grid_search.fit(X_train, y_train)
    print 'Best score: %0.3f' % grid_search.best_score_
    print 'Best parameters set:'
    best_parameters = grid_search.best_estimator_.get_params()
    for param_name in sorted(parameters.keys()):
        print '\t%s: %r' % (param_name, best_parameters[param_name])
        
    predictions = grid_search.predict(X_test)
    print classification_report(y_test, predictions)
    
    

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Fitting 3 folds for each of 27 candidates, totalling 81 fits


JoblibValueError: JoblibValueError
___________________________________________________________________________
Multiprocessing exception:
...........................................................................
C:\ProgramData\Anaconda2\lib\runpy.py in _run_module_as_main(mod_name='ipykernel_launcher', alter_argv=1)
    169     pkg_name = mod_name.rpartition('.')[0]
    170     main_globals = sys.modules["__main__"].__dict__
    171     if alter_argv:
    172         sys.argv[0] = fname
    173     return _run_code(code, main_globals, None,
--> 174                      "__main__", fname, loader, pkg_name)
        fname = r'C:\ProgramData\Anaconda2\lib\site-packages\ipykernel_launcher.py'
        loader = <pkgutil.ImpLoader instance>
        pkg_name = ''
    175 
    176 def run_module(mod_name, init_globals=None,
    177                run_name=None, alter_sys=False):
    178     """Execute a module's code without importing it

...........................................................................
C:\ProgramData\Anaconda2\lib\runpy.py in _run_code(code=<code object <module> at 00000000026BECB0, file ...lib\site-packages\ipykernel_launcher.py", line 5>, run_globals={'__builtins__': <module '__builtin__' (built-in)>, '__doc__': 'Entry point for launching an IPython kernel.\n\nTh...orts until\nafter removing the cwd from sys.path.\n', '__file__': r'C:\ProgramData\Anaconda2\lib\site-packages\ipykernel_launcher.py', '__loader__': <pkgutil.ImpLoader instance>, '__name__': '__main__', '__package__': '', 'app': <module 'ipykernel.kernelapp' from 'C:\ProgramDa...conda2\lib\site-packages\ipykernel\kernelapp.py'>, 'sys': <module 'sys' (built-in)>}, init_globals=None, mod_name='__main__', mod_fname=r'C:\ProgramData\Anaconda2\lib\site-packages\ipykernel_launcher.py', mod_loader=<pkgutil.ImpLoader instance>, pkg_name='')
     67         run_globals.update(init_globals)
     68     run_globals.update(__name__ = mod_name,
     69                        __file__ = mod_fname,
     70                        __loader__ = mod_loader,
     71                        __package__ = pkg_name)
---> 72     exec code in run_globals
        code = <code object <module> at 00000000026BECB0, file ...lib\site-packages\ipykernel_launcher.py", line 5>
        run_globals = {'__builtins__': <module '__builtin__' (built-in)>, '__doc__': 'Entry point for launching an IPython kernel.\n\nTh...orts until\nafter removing the cwd from sys.path.\n', '__file__': r'C:\ProgramData\Anaconda2\lib\site-packages\ipykernel_launcher.py', '__loader__': <pkgutil.ImpLoader instance>, '__name__': '__main__', '__package__': '', 'app': <module 'ipykernel.kernelapp' from 'C:\ProgramDa...conda2\lib\site-packages\ipykernel\kernelapp.py'>, 'sys': <module 'sys' (built-in)>}
     73     return run_globals
     74 
     75 def _run_module_code(code, init_globals=None,
     76                     mod_name=None, mod_fname=None,

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\ipykernel_launcher.py in <module>()
     11     # This is added back by InteractiveShellApp.init_path()
     12     if sys.path[0] == '':
     13         del sys.path[0]
     14 
     15     from ipykernel import kernelapp as app
---> 16     app.launch_new_instance()
     17 
     18 
     19 
     20 

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\traitlets\config\application.py in launch_instance(cls=<class 'ipykernel.kernelapp.IPKernelApp'>, argv=None, **kwargs={})
    653 
    654         If a global instance already exists, this reinitializes and starts it
    655         """
    656         app = cls.instance(**kwargs)
    657         app.initialize(argv)
--> 658         app.start()
        app.start = <bound method IPKernelApp.start of <ipykernel.kernelapp.IPKernelApp object>>
    659 
    660 #-----------------------------------------------------------------------------
    661 # utility functions, for convenience
    662 #-----------------------------------------------------------------------------

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\ipykernel\kernelapp.py in start(self=<ipykernel.kernelapp.IPKernelApp object>)
    472             return self.subapp.start()
    473         if self.poller is not None:
    474             self.poller.start()
    475         self.kernel.start()
    476         try:
--> 477             ioloop.IOLoop.instance().start()
    478         except KeyboardInterrupt:
    479             pass
    480 
    481 launch_new_instance = IPKernelApp.launch_instance

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\zmq\eventloop\ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
    172             )
    173         return loop
    174     
    175     def start(self):
    176         try:
--> 177             super(ZMQIOLoop, self).start()
        self.start = <bound method ZMQIOLoop.start of <zmq.eventloop.ioloop.ZMQIOLoop object>>
    178         except ZMQError as e:
    179             if e.errno == ETERM:
    180                 # quietly return on ETERM
    181                 pass

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\tornado\ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
    883                 self._events.update(event_pairs)
    884                 while self._events:
    885                     fd, events = self._events.popitem()
    886                     try:
    887                         fd_obj, handler_func = self._handlers[fd]
--> 888                         handler_func(fd_obj, events)
        handler_func = <function null_wrapper>
        fd_obj = <zmq.sugar.socket.Socket object>
        events = 1
    889                     except (OSError, IOError) as e:
    890                         if errno_from_exception(e) == errno.EPIPE:
    891                             # Happens when the client closes the connection
    892                             pass

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\tornado\stack_context.py in null_wrapper(*args=(<zmq.sugar.socket.Socket object>, 1), **kwargs={})
    272         # Fast path when there are no active contexts.
    273         def null_wrapper(*args, **kwargs):
    274             try:
    275                 current_state = _state.contexts
    276                 _state.contexts = cap_contexts[0]
--> 277                 return fn(*args, **kwargs)
        args = (<zmq.sugar.socket.Socket object>, 1)
        kwargs = {}
    278             finally:
    279                 _state.contexts = current_state
    280         null_wrapper._wrapped = True
    281         return null_wrapper

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\zmq\eventloop\zmqstream.py in _handle_events(self=<zmq.eventloop.zmqstream.ZMQStream object>, fd=<zmq.sugar.socket.Socket object>, events=1)
    435             # dispatch events:
    436             if events & IOLoop.ERROR:
    437                 gen_log.error("got POLLERR event on ZMQStream, which doesn't make sense")
    438                 return
    439             if events & IOLoop.READ:
--> 440                 self._handle_recv()
        self._handle_recv = <bound method ZMQStream._handle_recv of <zmq.eventloop.zmqstream.ZMQStream object>>
    441                 if not self.socket:
    442                     return
    443             if events & IOLoop.WRITE:
    444                 self._handle_send()

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\zmq\eventloop\zmqstream.py in _handle_recv(self=<zmq.eventloop.zmqstream.ZMQStream object>)
    467                 gen_log.error("RECV Error: %s"%zmq.strerror(e.errno))
    468         else:
    469             if self._recv_callback:
    470                 callback = self._recv_callback
    471                 # self._recv_callback = None
--> 472                 self._run_callback(callback, msg)
        self._run_callback = <bound method ZMQStream._run_callback of <zmq.eventloop.zmqstream.ZMQStream object>>
        callback = <function null_wrapper>
        msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
    473                 
    474         # self.update_state()
    475         
    476 

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\zmq\eventloop\zmqstream.py in _run_callback(self=<zmq.eventloop.zmqstream.ZMQStream object>, callback=<function null_wrapper>, *args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
    409         close our socket."""
    410         try:
    411             # Use a NullContext to ensure that all StackContexts are run
    412             # inside our blanket exception handler rather than outside.
    413             with stack_context.NullContext():
--> 414                 callback(*args, **kwargs)
        callback = <function null_wrapper>
        args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
        kwargs = {}
    415         except:
    416             gen_log.error("Uncaught exception, closing connection.",
    417                           exc_info=True)
    418             # Close the socket on an uncaught exception from a user callback

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\tornado\stack_context.py in null_wrapper(*args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
    272         # Fast path when there are no active contexts.
    273         def null_wrapper(*args, **kwargs):
    274             try:
    275                 current_state = _state.contexts
    276                 _state.contexts = cap_contexts[0]
--> 277                 return fn(*args, **kwargs)
        args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
        kwargs = {}
    278             finally:
    279                 _state.contexts = current_state
    280         null_wrapper._wrapped = True
    281         return null_wrapper

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\ipykernel\kernelbase.py in dispatcher(msg=[<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>])
    278         if self.control_stream:
    279             self.control_stream.on_recv(self.dispatch_control, copy=False)
    280 
    281         def make_dispatcher(stream):
    282             def dispatcher(msg):
--> 283                 return self.dispatch_shell(stream, msg)
        msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
    284             return dispatcher
    285 
    286         for s in self.shell_streams:
    287             s.on_recv(make_dispatcher(s), copy=False)

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\ipykernel\kernelbase.py in dispatch_shell(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, msg={'buffers': [], 'content': {u'allow_stdin': True, u'code': u"import pandas as pd\nfrom sklearn.tree import ...fication_report(y_test, predictions)\n    \n    ", u'silent': False, u'stop_on_error': True, u'store_history': True, u'user_expressions': {}}, 'header': {'date': datetime.datetime(2017, 7, 1, 22, 8, 51, 814000, tzinfo=tzutc()), u'msg_id': u'4F5A6B4786D64796B5E2A6A6237AE4D4', u'msg_type': u'execute_request', u'session': u'99A14FBC9F5F453C90E5BD8018ADA22D', u'username': u'username', u'version': u'5.0'}, 'metadata': {}, 'msg_id': u'4F5A6B4786D64796B5E2A6A6237AE4D4', 'msg_type': u'execute_request', 'parent_header': {}})
    230             self.log.warn("Unknown message type: %r", msg_type)
    231         else:
    232             self.log.debug("%s: %s", msg_type, msg)
    233             self.pre_handler_hook()
    234             try:
--> 235                 handler(stream, idents, msg)
        handler = <bound method IPythonKernel.execute_request of <ipykernel.ipkernel.IPythonKernel object>>
        stream = <zmq.eventloop.zmqstream.ZMQStream object>
        idents = ['99A14FBC9F5F453C90E5BD8018ADA22D']
        msg = {'buffers': [], 'content': {u'allow_stdin': True, u'code': u"import pandas as pd\nfrom sklearn.tree import ...fication_report(y_test, predictions)\n    \n    ", u'silent': False, u'stop_on_error': True, u'store_history': True, u'user_expressions': {}}, 'header': {'date': datetime.datetime(2017, 7, 1, 22, 8, 51, 814000, tzinfo=tzutc()), u'msg_id': u'4F5A6B4786D64796B5E2A6A6237AE4D4', u'msg_type': u'execute_request', u'session': u'99A14FBC9F5F453C90E5BD8018ADA22D', u'username': u'username', u'version': u'5.0'}, 'metadata': {}, 'msg_id': u'4F5A6B4786D64796B5E2A6A6237AE4D4', 'msg_type': u'execute_request', 'parent_header': {}}
    236             except Exception:
    237                 self.log.error("Exception in message handler:", exc_info=True)
    238             finally:
    239                 self.post_handler_hook()

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\ipykernel\kernelbase.py in execute_request(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, ident=['99A14FBC9F5F453C90E5BD8018ADA22D'], parent={'buffers': [], 'content': {u'allow_stdin': True, u'code': u"import pandas as pd\nfrom sklearn.tree import ...fication_report(y_test, predictions)\n    \n    ", u'silent': False, u'stop_on_error': True, u'store_history': True, u'user_expressions': {}}, 'header': {'date': datetime.datetime(2017, 7, 1, 22, 8, 51, 814000, tzinfo=tzutc()), u'msg_id': u'4F5A6B4786D64796B5E2A6A6237AE4D4', u'msg_type': u'execute_request', u'session': u'99A14FBC9F5F453C90E5BD8018ADA22D', u'username': u'username', u'version': u'5.0'}, 'metadata': {}, 'msg_id': u'4F5A6B4786D64796B5E2A6A6237AE4D4', 'msg_type': u'execute_request', 'parent_header': {}})
    394         if not silent:
    395             self.execution_count += 1
    396             self._publish_execute_input(code, parent, self.execution_count)
    397 
    398         reply_content = self.do_execute(code, silent, store_history,
--> 399                                         user_expressions, allow_stdin)
        user_expressions = {}
        allow_stdin = True
    400 
    401         # Flush output before sending the reply.
    402         sys.stdout.flush()
    403         sys.stderr.flush()

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\ipykernel\ipkernel.py in do_execute(self=<ipykernel.ipkernel.IPythonKernel object>, code=u"import pandas as pd\nfrom sklearn.tree import ...fication_report(y_test, predictions)\n    \n    ", silent=False, store_history=True, user_expressions={}, allow_stdin=True)
    191 
    192         self._forward_input(allow_stdin)
    193 
    194         reply_content = {}
    195         try:
--> 196             res = shell.run_cell(code, store_history=store_history, silent=silent)
        res = undefined
        shell.run_cell = <bound method ZMQInteractiveShell.run_cell of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        code = u"import pandas as pd\nfrom sklearn.tree import ...fication_report(y_test, predictions)\n    \n    "
        store_history = True
        silent = False
    197         finally:
    198             self._restore_input()
    199 
    200         if res.error_before_exec is not None:

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\ipykernel\zmqshell.py in run_cell(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, *args=(u"import pandas as pd\nfrom sklearn.tree import ...fication_report(y_test, predictions)\n    \n    ",), **kwargs={'silent': False, 'store_history': True})
    528             )
    529         self.payload_manager.write_payload(payload)
    530 
    531     def run_cell(self, *args, **kwargs):
    532         self._last_traceback = None
--> 533         return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
        self.run_cell = <bound method ZMQInteractiveShell.run_cell of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        args = (u"import pandas as pd\nfrom sklearn.tree import ...fication_report(y_test, predictions)\n    \n    ",)
        kwargs = {'silent': False, 'store_history': True}
    534 
    535     def _showtraceback(self, etype, evalue, stb):
    536         # try to preserve ordering of tracebacks and print statements
    537         sys.stdout.flush()

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\IPython\core\interactiveshell.py in run_cell(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, raw_cell=u"import pandas as pd\nfrom sklearn.tree import ...fication_report(y_test, predictions)\n    \n    ", store_history=True, silent=False, shell_futures=True)
   2712                 self.displayhook.exec_result = result
   2713 
   2714                 # Execute the user code
   2715                 interactivity = "none" if silent else self.ast_node_interactivity
   2716                 has_raised = self.run_ast_nodes(code_ast.body, cell_name,
-> 2717                    interactivity=interactivity, compiler=compiler, result=result)
        interactivity = 'last_expr'
        compiler = <IPython.core.compilerop.CachingCompiler instance>
   2718                 
   2719                 self.last_execution_succeeded = not has_raised
   2720 
   2721                 # Reset this so later displayed values do not modify the

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\IPython\core\interactiveshell.py in run_ast_nodes(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, nodelist=[<_ast.Import object>, <_ast.ImportFrom object>, <_ast.ImportFrom object>, <_ast.ImportFrom object>, <_ast.ImportFrom object>, <_ast.ImportFrom object>, <_ast.If object>], cell_name='<ipython-input-3-628a811f7bea>', interactivity='none', compiler=<IPython.core.compilerop.CachingCompiler instance>, result=<ExecutionResult object at 105e2fd0, execution_c..._before_exec=None error_in_exec=None result=None>)
   2816 
   2817         try:
   2818             for i, node in enumerate(to_run_exec):
   2819                 mod = ast.Module([node])
   2820                 code = compiler(mod, cell_name, "exec")
-> 2821                 if self.run_code(code, result):
        self.run_code = <bound method ZMQInteractiveShell.run_code of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        code = <code object <module> at 000000000FB66330, file "<ipython-input-3-628a811f7bea>", line 8>
        result = <ExecutionResult object at 105e2fd0, execution_c..._before_exec=None error_in_exec=None result=None>
   2822                     return True
   2823 
   2824             for i, node in enumerate(to_run_interactive):
   2825                 mod = ast.Interactive([node])

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\IPython\core\interactiveshell.py in run_code(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, code_obj=<code object <module> at 000000000FB66330, file "<ipython-input-3-628a811f7bea>", line 8>, result=<ExecutionResult object at 105e2fd0, execution_c..._before_exec=None error_in_exec=None result=None>)
   2876         outflag = 1  # happens in more places, so it's easier as default
   2877         try:
   2878             try:
   2879                 self.hooks.pre_run_code_hook()
   2880                 #rprint('Running code', repr(code_obj)) # dbg
-> 2881                 exec(code_obj, self.user_global_ns, self.user_ns)
        code_obj = <code object <module> at 000000000FB66330, file "<ipython-input-3-628a811f7bea>", line 8>
        self.user_global_ns = {'DecisionTreeClassifier': <class 'sklearn.tree.tree.DecisionTreeClassifier'>, 'GridSearchCV': <class 'sklearn.grid_search.GridSearchCV'>, 'In': ['', u'print "g"', u"import pandas as pd\nfrom sklearn.tree import ...fication_report(y_test, predictions)\n    \n    ", u"import pandas as pd\nfrom sklearn.tree import ...fication_report(y_test, predictions)\n    \n    "], 'Out': {}, 'Pipeline': <class 'sklearn.pipeline.Pipeline'>, 'X':       0     1       2    3     4     5     6    ...0     0     0     0  

[3279 rows x 1558 columns], 'X_test':       0     1       2    3     4     5     6    ... 0     0     0     0  

[820 rows x 1558 columns], 'X_train':       0     1       2    3     4     5     6    ...0     0     0     0  

[2459 rows x 1558 columns], '_': '', '__': '', ...}
        self.user_ns = {'DecisionTreeClassifier': <class 'sklearn.tree.tree.DecisionTreeClassifier'>, 'GridSearchCV': <class 'sklearn.grid_search.GridSearchCV'>, 'In': ['', u'print "g"', u"import pandas as pd\nfrom sklearn.tree import ...fication_report(y_test, predictions)\n    \n    ", u"import pandas as pd\nfrom sklearn.tree import ...fication_report(y_test, predictions)\n    \n    "], 'Out': {}, 'Pipeline': <class 'sklearn.pipeline.Pipeline'>, 'X':       0     1       2    3     4     5     6    ...0     0     0     0  

[3279 rows x 1558 columns], 'X_test':       0     1       2    3     4     5     6    ... 0     0     0     0  

[820 rows x 1558 columns], 'X_train':       0     1       2    3     4     5     6    ...0     0     0     0  

[2459 rows x 1558 columns], '_': '', '__': '', ...}
   2882             finally:
   2883                 # Reset our crash handler in place
   2884                 sys.excepthook = old_excepthook
   2885         except SystemExit as e:

...........................................................................
C:\Users\Satvik Sachdev\Documents\GitHub\Machine Learning\Mastering Machine Learning with Scikit Learn\Chapter 5 Nonlinear Classification and Regression with Decision Trees\<ipython-input-3-628a811f7bea> in <module>()
     40             'clf__min_samples_leaf': (1, 2, 3)
     41             }
     42     
     43     # set GridSearchCV to maximise the models f1 score
     44     grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=1, scoring='f1')
---> 45     grid_search.fit(X_train, y_train)
     46     print 'Best score: %0.3f' % grid_search.best_score_
     47     print 'Best parameters set:'
     48     best_parameters = grid_search.best_estimator_.get_params()
     49     for param_name in sorted(parameters.keys()):

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\sklearn\grid_search.py in fit(self=GridSearchCV(cv=None, error_score='raise',
     ...='2*n_jobs', refit=True, scoring='f1', verbose=1), X=      0     1       2    3     4     5     6    ...0     0     0     0  

[2459 rows x 1558 columns], y=[0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ...])
    824         y : array-like, shape = [n_samples] or [n_samples, n_output], optional
    825             Target relative to X for classification or regression;
    826             None for unsupervised learning.
    827 
    828         """
--> 829         return self._fit(X, y, ParameterGrid(self.param_grid))
        self._fit = <bound method GridSearchCV._fit of GridSearchCV(...'2*n_jobs', refit=True, scoring='f1', verbose=1)>
        X =       0     1       2    3     4     5     6    ...0     0     0     0  

[2459 rows x 1558 columns]
        y = [0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ...]
        self.param_grid = {'clf__max_depth': (150, 155, 160), 'clf__min_samples_leaf': (1, 2, 3), 'clf__min_samples_split': (1, 2, 3)}
    830 
    831 
    832 class RandomizedSearchCV(BaseSearchCV):
    833     """Randomized search on hyper parameters.

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\sklearn\grid_search.py in _fit(self=GridSearchCV(cv=None, error_score='raise',
     ...='2*n_jobs', refit=True, scoring='f1', verbose=1), X=      0     1       2    3     4     5     6    ...0     0     0     0  

[2459 rows x 1558 columns], y=[0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ...], parameter_iterable=<sklearn.grid_search.ParameterGrid object>)
    568         )(
    569             delayed(_fit_and_score)(clone(base_estimator), X, y, self.scorer_,
    570                                     train, test, self.verbose, parameters,
    571                                     self.fit_params, return_parameters=True,
    572                                     error_score=self.error_score)
--> 573                 for parameters in parameter_iterable
        parameters = undefined
        parameter_iterable = <sklearn.grid_search.ParameterGrid object>
    574                 for train, test in cv)
    575 
    576         # Out is a list of triplet: score, estimator, n_test_samples
    577         n_fits = len(out)

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py in __call__(self=Parallel(n_jobs=-1), iterable=<generator object <genexpr>>)
    763             if pre_dispatch == "all" or n_jobs == 1:
    764                 # The iterable was consumed all at once by the above for loop.
    765                 # No need to wait for async callbacks to trigger to
    766                 # consumption.
    767                 self._iterating = False
--> 768             self.retrieve()
        self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=-1)>
    769             # Make sure that we get a last message telling us we are done
    770             elapsed_time = time.time() - self._start_time
    771             self._print('Done %3i out of %3i | elapsed: %s finished',
    772                         (len(self._output), len(self._output),

---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
ValueError                                         Sat Jul 01 15:09:02 2017
PID: 17600               Python 2.7.13: C:\ProgramData\Anaconda2\python.exe
...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py in __call__(self=<sklearn.externals.joblib.parallel.BatchedCalls object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <function _fit_and_score>
        args = (Pipeline(steps=[('clf', DecisionTreeClassifier(c...ort=False, random_state=None, splitter='best'))]),       0     1       2    3     4     5     6    ...0     0     0     0  

[2459 rows x 1558 columns], [0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ...], make_scorer(f1_score), array([ 806,  807,  808, ..., 2456, 2457, 2458]), array([  0,   1,   2,   3,   4,   5,   6,   7,  ...32, 840, 846, 847, 852, 859, 865, 874, 887, 890]), 1, {'clf__max_depth': 150, 'clf__min_samples_leaf': 1, 'clf__min_samples_split': 1}, {})
        kwargs = {'error_score': 'raise', 'return_parameters': True}
        self.items = [(<function _fit_and_score>, (Pipeline(steps=[('clf', DecisionTreeClassifier(c...ort=False, random_state=None, splitter='best'))]),       0     1       2    3     4     5     6    ...0     0     0     0  

[2459 rows x 1558 columns], [0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ...], make_scorer(f1_score), array([ 806,  807,  808, ..., 2456, 2457, 2458]), array([  0,   1,   2,   3,   4,   5,   6,   7,  ...32, 840, 846, 847, 852, 859, 865, 874, 887, 890]), 1, {'clf__max_depth': 150, 'clf__min_samples_leaf': 1, 'clf__min_samples_split': 1}, {}), {'error_score': 'raise', 'return_parameters': True})]
    132 
    133     def __len__(self):
    134         return self._size
    135 

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\sklearn\cross_validation.py in _fit_and_score(estimator=Pipeline(steps=[('clf', DecisionTreeClassifier(c...ort=False, random_state=None, splitter='best'))]), X=      0     1       2    3     4     5     6    ...0     0     0     0  

[2459 rows x 1558 columns], y=[0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ...], scorer=make_scorer(f1_score), train=array([ 806,  807,  808, ..., 2456, 2457, 2458]), test=array([  0,   1,   2,   3,   4,   5,   6,   7,  ...32, 840, 846, 847, 852, 859, 865, 874, 887, 890]), verbose=1, parameters={'clf__max_depth': 150, 'clf__min_samples_leaf': 1, 'clf__min_samples_split': 1}, fit_params={}, return_train_score=False, return_parameters=True, error_score='raise')
   1660 
   1661     try:
   1662         if y_train is None:
   1663             estimator.fit(X_train, **fit_params)
   1664         else:
-> 1665             estimator.fit(X_train, y_train, **fit_params)
        estimator.fit = <bound method Pipeline.fit of Pipeline(steps=[('...rt=False, random_state=None, splitter='best'))])>
        X_train =       0     1       2    3     4     5     6    ...0     0     0     0  

[1639 rows x 1558 columns]
        y_train = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]
        fit_params = {}
   1666 
   1667     except Exception as e:
   1668         if error_score == 'raise':
   1669             raise

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\sklearn\pipeline.py in fit(self=Pipeline(steps=[('clf', DecisionTreeClassifier(c...ort=False, random_state=None, splitter='best'))]), X=      0     1       2    3     4     5     6    ...0     0     0     0  

[1639 rows x 1558 columns], y=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], **fit_params={})
    265         self : Pipeline
    266             This estimator
    267         """
    268         Xt, fit_params = self._fit(X, y, **fit_params)
    269         if self._final_estimator is not None:
--> 270             self._final_estimator.fit(Xt, y, **fit_params)
        self._final_estimator.fit = <bound method DecisionTreeClassifier.fit of Deci...esort=False, random_state=None, splitter='best')>
        Xt =       0     1       2    3     4     5     6    ...0     0     0     0  

[1639 rows x 1558 columns]
        y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]
        fit_params = {}
    271         return self
    272 
    273     def fit_transform(self, X, y=None, **fit_params):
    274         """Fit the model and transform with the final estimator

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\sklearn\tree\tree.py in fit(self=DecisionTreeClassifier(class_weight=None, criter...resort=False, random_state=None, splitter='best'), X=      0     1       2    3     4     5     6    ...0     0     0     0  

[1639 rows x 1558 columns], y=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], sample_weight=None, check_input=True, X_idx_sorted=None)
    734 
    735         super(DecisionTreeClassifier, self).fit(
    736             X, y,
    737             sample_weight=sample_weight,
    738             check_input=check_input,
--> 739             X_idx_sorted=X_idx_sorted)
        X_idx_sorted = None
    740         return self
    741 
    742 
    743     def predict_proba(self, X, check_input=True):

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\sklearn\tree\tree.py in fit(self=DecisionTreeClassifier(class_weight=None, criter...resort=False, random_state=None, splitter='best'), X=array([[ 77.        ,  77.        ,   1.        ...       0.        ,   0.        ]], dtype=float32), y=array([[ 0.],
       [ 0.],
       [ 0.],
       ..., 
       [ 0.],
       [ 0.],
       [ 0.]]), sample_weight=None, check_input=True, X_idx_sorted=None)
    194 
    195         if isinstance(self.min_samples_split, (numbers.Integral, np.integer)):
    196             if not 2 <= self.min_samples_split:
    197                 raise ValueError("min_samples_split must be at least 2 "
    198                                  "or in (0, 1], got %s"
--> 199                                  % self.min_samples_split)
        self.min_samples_split = 1
    200             min_samples_split = self.min_samples_split
    201         else:  # float
    202             if not 0. < self.min_samples_split <= 1.:
    203                 raise ValueError("min_samples_split must be at least 2 "

ValueError: min_samples_split must be at least 2 or in (0, 1], got 1
___________________________________________________________________________

# Tree ensembles

In [4]:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.cross_validation import train_test_split
from sklearn.metrics import classification_report
from sklearn.pipeline import Pipeline
from sklearn.grid_search import GridSearchCV
from sklearn.ensemble import RandomForestClassifier

if __name__ == '__main__':
    df = pd.read_csv('data/ad.data', header=None)
    explanatory_variable_columns = set(df.columns.values)
    response_variable_column = df[len(df.columns.values)-1]
    
    # The last column describes the targets
    explanatory_variable_columns.remove(len(df.columns.values)-1)
    
    # Encode advertisment as positive class and content as negative class
    y = [1 if e == 'ad.' else 0 for e in response_variable_column]
    X = df[list(explanatory_variable_columns)]
    
    '''
    More than one quarter of the instances are missing at least one of the values
    for the image's dimensions. These missing values are marked by whitespace and a
    question mark. We replaced the missing values with negative one, but we could have
    imputed the missing values; for instance, we could have replaced the missing height
    values with the average height value:
    '''
    X.replace(to_replace=' *\?', value=-1, regex=True, inplace=True)
    
    # Split data into training and testing set
    X_train, X_test, y_train, y_test = train_test_split(X, y)
    
    # set the criterion keyword argument to entropy to build the tree 
    # using the informationgain heuristic
    pipeline = Pipeline([('clf', RandomForestClassifier(criterion='entropy'))])
    
    # specify hyperparameters for grid search
    parameters = {
        'clf__n_estimators': (5, 10, 20, 50),
        'clf__max_depth': (50, 150, 250),
        'clf__min_samples_split': (1, 2, 3),
        'clf__min_samples_leaf': (1, 2, 3)
        }
    # set GridSearchCV to maximise the models f1 score
    grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=1, scoring='f1')
    grid_search.fit(X_train, y_train)
    print 'Best score: %0.3f' % grid_search.best_score_
    print 'Best parameters set:'
    best_parameters = grid_search.best_estimator_.get_params()
    for param_name in sorted(parameters.keys()):
        print '\t%s: %r' % (param_name, best_parameters[param_name])
        
    predictions = grid_search.predict(X_test)
    print classification_report(y_test, predictions)
    
    

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Fitting 3 folds for each of 108 candidates, totalling 324 fits


JoblibValueError: JoblibValueError
___________________________________________________________________________
Multiprocessing exception:
...........................................................................
C:\ProgramData\Anaconda2\lib\runpy.py in _run_module_as_main(mod_name='ipykernel_launcher', alter_argv=1)
    169     pkg_name = mod_name.rpartition('.')[0]
    170     main_globals = sys.modules["__main__"].__dict__
    171     if alter_argv:
    172         sys.argv[0] = fname
    173     return _run_code(code, main_globals, None,
--> 174                      "__main__", fname, loader, pkg_name)
        fname = r'C:\ProgramData\Anaconda2\lib\site-packages\ipykernel_launcher.py'
        loader = <pkgutil.ImpLoader instance>
        pkg_name = ''
    175 
    176 def run_module(mod_name, init_globals=None,
    177                run_name=None, alter_sys=False):
    178     """Execute a module's code without importing it

...........................................................................
C:\ProgramData\Anaconda2\lib\runpy.py in _run_code(code=<code object <module> at 00000000026BECB0, file ...lib\site-packages\ipykernel_launcher.py", line 5>, run_globals={'__builtins__': <module '__builtin__' (built-in)>, '__doc__': 'Entry point for launching an IPython kernel.\n\nTh...orts until\nafter removing the cwd from sys.path.\n', '__file__': r'C:\ProgramData\Anaconda2\lib\site-packages\ipykernel_launcher.py', '__loader__': <pkgutil.ImpLoader instance>, '__name__': '__main__', '__package__': '', 'app': <module 'ipykernel.kernelapp' from 'C:\ProgramDa...conda2\lib\site-packages\ipykernel\kernelapp.py'>, 'sys': <module 'sys' (built-in)>}, init_globals=None, mod_name='__main__', mod_fname=r'C:\ProgramData\Anaconda2\lib\site-packages\ipykernel_launcher.py', mod_loader=<pkgutil.ImpLoader instance>, pkg_name='')
     67         run_globals.update(init_globals)
     68     run_globals.update(__name__ = mod_name,
     69                        __file__ = mod_fname,
     70                        __loader__ = mod_loader,
     71                        __package__ = pkg_name)
---> 72     exec code in run_globals
        code = <code object <module> at 00000000026BECB0, file ...lib\site-packages\ipykernel_launcher.py", line 5>
        run_globals = {'__builtins__': <module '__builtin__' (built-in)>, '__doc__': 'Entry point for launching an IPython kernel.\n\nTh...orts until\nafter removing the cwd from sys.path.\n', '__file__': r'C:\ProgramData\Anaconda2\lib\site-packages\ipykernel_launcher.py', '__loader__': <pkgutil.ImpLoader instance>, '__name__': '__main__', '__package__': '', 'app': <module 'ipykernel.kernelapp' from 'C:\ProgramDa...conda2\lib\site-packages\ipykernel\kernelapp.py'>, 'sys': <module 'sys' (built-in)>}
     73     return run_globals
     74 
     75 def _run_module_code(code, init_globals=None,
     76                     mod_name=None, mod_fname=None,

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\ipykernel_launcher.py in <module>()
     11     # This is added back by InteractiveShellApp.init_path()
     12     if sys.path[0] == '':
     13         del sys.path[0]
     14 
     15     from ipykernel import kernelapp as app
---> 16     app.launch_new_instance()
     17 
     18 
     19 
     20 

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\traitlets\config\application.py in launch_instance(cls=<class 'ipykernel.kernelapp.IPKernelApp'>, argv=None, **kwargs={})
    653 
    654         If a global instance already exists, this reinitializes and starts it
    655         """
    656         app = cls.instance(**kwargs)
    657         app.initialize(argv)
--> 658         app.start()
        app.start = <bound method IPKernelApp.start of <ipykernel.kernelapp.IPKernelApp object>>
    659 
    660 #-----------------------------------------------------------------------------
    661 # utility functions, for convenience
    662 #-----------------------------------------------------------------------------

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\ipykernel\kernelapp.py in start(self=<ipykernel.kernelapp.IPKernelApp object>)
    472             return self.subapp.start()
    473         if self.poller is not None:
    474             self.poller.start()
    475         self.kernel.start()
    476         try:
--> 477             ioloop.IOLoop.instance().start()
    478         except KeyboardInterrupt:
    479             pass
    480 
    481 launch_new_instance = IPKernelApp.launch_instance

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\zmq\eventloop\ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
    172             )
    173         return loop
    174     
    175     def start(self):
    176         try:
--> 177             super(ZMQIOLoop, self).start()
        self.start = <bound method ZMQIOLoop.start of <zmq.eventloop.ioloop.ZMQIOLoop object>>
    178         except ZMQError as e:
    179             if e.errno == ETERM:
    180                 # quietly return on ETERM
    181                 pass

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\tornado\ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
    883                 self._events.update(event_pairs)
    884                 while self._events:
    885                     fd, events = self._events.popitem()
    886                     try:
    887                         fd_obj, handler_func = self._handlers[fd]
--> 888                         handler_func(fd_obj, events)
        handler_func = <function null_wrapper>
        fd_obj = <zmq.sugar.socket.Socket object>
        events = 1
    889                     except (OSError, IOError) as e:
    890                         if errno_from_exception(e) == errno.EPIPE:
    891                             # Happens when the client closes the connection
    892                             pass

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\tornado\stack_context.py in null_wrapper(*args=(<zmq.sugar.socket.Socket object>, 1), **kwargs={})
    272         # Fast path when there are no active contexts.
    273         def null_wrapper(*args, **kwargs):
    274             try:
    275                 current_state = _state.contexts
    276                 _state.contexts = cap_contexts[0]
--> 277                 return fn(*args, **kwargs)
        args = (<zmq.sugar.socket.Socket object>, 1)
        kwargs = {}
    278             finally:
    279                 _state.contexts = current_state
    280         null_wrapper._wrapped = True
    281         return null_wrapper

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\zmq\eventloop\zmqstream.py in _handle_events(self=<zmq.eventloop.zmqstream.ZMQStream object>, fd=<zmq.sugar.socket.Socket object>, events=1)
    435             # dispatch events:
    436             if events & IOLoop.ERROR:
    437                 gen_log.error("got POLLERR event on ZMQStream, which doesn't make sense")
    438                 return
    439             if events & IOLoop.READ:
--> 440                 self._handle_recv()
        self._handle_recv = <bound method ZMQStream._handle_recv of <zmq.eventloop.zmqstream.ZMQStream object>>
    441                 if not self.socket:
    442                     return
    443             if events & IOLoop.WRITE:
    444                 self._handle_send()

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\zmq\eventloop\zmqstream.py in _handle_recv(self=<zmq.eventloop.zmqstream.ZMQStream object>)
    467                 gen_log.error("RECV Error: %s"%zmq.strerror(e.errno))
    468         else:
    469             if self._recv_callback:
    470                 callback = self._recv_callback
    471                 # self._recv_callback = None
--> 472                 self._run_callback(callback, msg)
        self._run_callback = <bound method ZMQStream._run_callback of <zmq.eventloop.zmqstream.ZMQStream object>>
        callback = <function null_wrapper>
        msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
    473                 
    474         # self.update_state()
    475         
    476 

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\zmq\eventloop\zmqstream.py in _run_callback(self=<zmq.eventloop.zmqstream.ZMQStream object>, callback=<function null_wrapper>, *args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
    409         close our socket."""
    410         try:
    411             # Use a NullContext to ensure that all StackContexts are run
    412             # inside our blanket exception handler rather than outside.
    413             with stack_context.NullContext():
--> 414                 callback(*args, **kwargs)
        callback = <function null_wrapper>
        args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
        kwargs = {}
    415         except:
    416             gen_log.error("Uncaught exception, closing connection.",
    417                           exc_info=True)
    418             # Close the socket on an uncaught exception from a user callback

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\tornado\stack_context.py in null_wrapper(*args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
    272         # Fast path when there are no active contexts.
    273         def null_wrapper(*args, **kwargs):
    274             try:
    275                 current_state = _state.contexts
    276                 _state.contexts = cap_contexts[0]
--> 277                 return fn(*args, **kwargs)
        args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
        kwargs = {}
    278             finally:
    279                 _state.contexts = current_state
    280         null_wrapper._wrapped = True
    281         return null_wrapper

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\ipykernel\kernelbase.py in dispatcher(msg=[<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>])
    278         if self.control_stream:
    279             self.control_stream.on_recv(self.dispatch_control, copy=False)
    280 
    281         def make_dispatcher(stream):
    282             def dispatcher(msg):
--> 283                 return self.dispatch_shell(stream, msg)
        msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
    284             return dispatcher
    285 
    286         for s in self.shell_streams:
    287             s.on_recv(make_dispatcher(s), copy=False)

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\ipykernel\kernelbase.py in dispatch_shell(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, msg={'buffers': [], 'content': {u'allow_stdin': True, u'code': u"import pandas as pd\nfrom sklearn.tree import ...fication_report(y_test, predictions)\n    \n    ", u'silent': False, u'stop_on_error': True, u'store_history': True, u'user_expressions': {}}, 'header': {'date': datetime.datetime(2017, 7, 1, 22, 11, 26, 173000, tzinfo=tzutc()), u'msg_id': u'560B17249AB04CB28F5D788CB6FEFC76', u'msg_type': u'execute_request', u'session': u'99A14FBC9F5F453C90E5BD8018ADA22D', u'username': u'username', u'version': u'5.0'}, 'metadata': {}, 'msg_id': u'560B17249AB04CB28F5D788CB6FEFC76', 'msg_type': u'execute_request', 'parent_header': {}})
    230             self.log.warn("Unknown message type: %r", msg_type)
    231         else:
    232             self.log.debug("%s: %s", msg_type, msg)
    233             self.pre_handler_hook()
    234             try:
--> 235                 handler(stream, idents, msg)
        handler = <bound method IPythonKernel.execute_request of <ipykernel.ipkernel.IPythonKernel object>>
        stream = <zmq.eventloop.zmqstream.ZMQStream object>
        idents = ['99A14FBC9F5F453C90E5BD8018ADA22D']
        msg = {'buffers': [], 'content': {u'allow_stdin': True, u'code': u"import pandas as pd\nfrom sklearn.tree import ...fication_report(y_test, predictions)\n    \n    ", u'silent': False, u'stop_on_error': True, u'store_history': True, u'user_expressions': {}}, 'header': {'date': datetime.datetime(2017, 7, 1, 22, 11, 26, 173000, tzinfo=tzutc()), u'msg_id': u'560B17249AB04CB28F5D788CB6FEFC76', u'msg_type': u'execute_request', u'session': u'99A14FBC9F5F453C90E5BD8018ADA22D', u'username': u'username', u'version': u'5.0'}, 'metadata': {}, 'msg_id': u'560B17249AB04CB28F5D788CB6FEFC76', 'msg_type': u'execute_request', 'parent_header': {}}
    236             except Exception:
    237                 self.log.error("Exception in message handler:", exc_info=True)
    238             finally:
    239                 self.post_handler_hook()

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\ipykernel\kernelbase.py in execute_request(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, ident=['99A14FBC9F5F453C90E5BD8018ADA22D'], parent={'buffers': [], 'content': {u'allow_stdin': True, u'code': u"import pandas as pd\nfrom sklearn.tree import ...fication_report(y_test, predictions)\n    \n    ", u'silent': False, u'stop_on_error': True, u'store_history': True, u'user_expressions': {}}, 'header': {'date': datetime.datetime(2017, 7, 1, 22, 11, 26, 173000, tzinfo=tzutc()), u'msg_id': u'560B17249AB04CB28F5D788CB6FEFC76', u'msg_type': u'execute_request', u'session': u'99A14FBC9F5F453C90E5BD8018ADA22D', u'username': u'username', u'version': u'5.0'}, 'metadata': {}, 'msg_id': u'560B17249AB04CB28F5D788CB6FEFC76', 'msg_type': u'execute_request', 'parent_header': {}})
    394         if not silent:
    395             self.execution_count += 1
    396             self._publish_execute_input(code, parent, self.execution_count)
    397 
    398         reply_content = self.do_execute(code, silent, store_history,
--> 399                                         user_expressions, allow_stdin)
        user_expressions = {}
        allow_stdin = True
    400 
    401         # Flush output before sending the reply.
    402         sys.stdout.flush()
    403         sys.stderr.flush()

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\ipykernel\ipkernel.py in do_execute(self=<ipykernel.ipkernel.IPythonKernel object>, code=u"import pandas as pd\nfrom sklearn.tree import ...fication_report(y_test, predictions)\n    \n    ", silent=False, store_history=True, user_expressions={}, allow_stdin=True)
    191 
    192         self._forward_input(allow_stdin)
    193 
    194         reply_content = {}
    195         try:
--> 196             res = shell.run_cell(code, store_history=store_history, silent=silent)
        res = undefined
        shell.run_cell = <bound method ZMQInteractiveShell.run_cell of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        code = u"import pandas as pd\nfrom sklearn.tree import ...fication_report(y_test, predictions)\n    \n    "
        store_history = True
        silent = False
    197         finally:
    198             self._restore_input()
    199 
    200         if res.error_before_exec is not None:

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\ipykernel\zmqshell.py in run_cell(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, *args=(u"import pandas as pd\nfrom sklearn.tree import ...fication_report(y_test, predictions)\n    \n    ",), **kwargs={'silent': False, 'store_history': True})
    528             )
    529         self.payload_manager.write_payload(payload)
    530 
    531     def run_cell(self, *args, **kwargs):
    532         self._last_traceback = None
--> 533         return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
        self.run_cell = <bound method ZMQInteractiveShell.run_cell of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        args = (u"import pandas as pd\nfrom sklearn.tree import ...fication_report(y_test, predictions)\n    \n    ",)
        kwargs = {'silent': False, 'store_history': True}
    534 
    535     def _showtraceback(self, etype, evalue, stb):
    536         # try to preserve ordering of tracebacks and print statements
    537         sys.stdout.flush()

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\IPython\core\interactiveshell.py in run_cell(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, raw_cell=u"import pandas as pd\nfrom sklearn.tree import ...fication_report(y_test, predictions)\n    \n    ", store_history=True, silent=False, shell_futures=True)
   2712                 self.displayhook.exec_result = result
   2713 
   2714                 # Execute the user code
   2715                 interactivity = "none" if silent else self.ast_node_interactivity
   2716                 has_raised = self.run_ast_nodes(code_ast.body, cell_name,
-> 2717                    interactivity=interactivity, compiler=compiler, result=result)
        interactivity = 'last_expr'
        compiler = <IPython.core.compilerop.CachingCompiler instance>
   2718                 
   2719                 self.last_execution_succeeded = not has_raised
   2720 
   2721                 # Reset this so later displayed values do not modify the

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\IPython\core\interactiveshell.py in run_ast_nodes(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, nodelist=[<_ast.Import object>, <_ast.ImportFrom object>, <_ast.ImportFrom object>, <_ast.ImportFrom object>, <_ast.ImportFrom object>, <_ast.ImportFrom object>, <_ast.ImportFrom object>, <_ast.If object>], cell_name='<ipython-input-4-567e9303e245>', interactivity='none', compiler=<IPython.core.compilerop.CachingCompiler instance>, result=<ExecutionResult object at 5d6dc18, execution_co..._before_exec=None error_in_exec=None result=None>)
   2816 
   2817         try:
   2818             for i, node in enumerate(to_run_exec):
   2819                 mod = ast.Module([node])
   2820                 code = compiler(mod, cell_name, "exec")
-> 2821                 if self.run_code(code, result):
        self.run_code = <bound method ZMQInteractiveShell.run_code of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        code = <code object <module> at 000000000FB66630, file "<ipython-input-4-567e9303e245>", line 9>
        result = <ExecutionResult object at 5d6dc18, execution_co..._before_exec=None error_in_exec=None result=None>
   2822                     return True
   2823 
   2824             for i, node in enumerate(to_run_interactive):
   2825                 mod = ast.Interactive([node])

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\IPython\core\interactiveshell.py in run_code(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, code_obj=<code object <module> at 000000000FB66630, file "<ipython-input-4-567e9303e245>", line 9>, result=<ExecutionResult object at 5d6dc18, execution_co..._before_exec=None error_in_exec=None result=None>)
   2876         outflag = 1  # happens in more places, so it's easier as default
   2877         try:
   2878             try:
   2879                 self.hooks.pre_run_code_hook()
   2880                 #rprint('Running code', repr(code_obj)) # dbg
-> 2881                 exec(code_obj, self.user_global_ns, self.user_ns)
        code_obj = <code object <module> at 000000000FB66630, file "<ipython-input-4-567e9303e245>", line 9>
        self.user_global_ns = {'DecisionTreeClassifier': <class 'sklearn.tree.tree.DecisionTreeClassifier'>, 'GridSearchCV': <class 'sklearn.grid_search.GridSearchCV'>, 'In': ['', u'print "g"', u"import pandas as pd\nfrom sklearn.tree import ...fication_report(y_test, predictions)\n    \n    ", u"import pandas as pd\nfrom sklearn.tree import ...fication_report(y_test, predictions)\n    \n    ", u"import pandas as pd\nfrom sklearn.tree import ...fication_report(y_test, predictions)\n    \n    "], 'Out': {}, 'Pipeline': <class 'sklearn.pipeline.Pipeline'>, 'RandomForestClassifier': <class 'sklearn.ensemble.forest.RandomForestClassifier'>, 'X':       0     1       2    3     4     5     6    ...0     0     0     0  

[3279 rows x 1558 columns], 'X_test':       0     1        2    3     4     5     6   ... 0     0     0     0  

[820 rows x 1558 columns], 'X_train':       0     1       2    3     4     5     6    ...0     0     0     0  

[2459 rows x 1558 columns], '_': '', ...}
        self.user_ns = {'DecisionTreeClassifier': <class 'sklearn.tree.tree.DecisionTreeClassifier'>, 'GridSearchCV': <class 'sklearn.grid_search.GridSearchCV'>, 'In': ['', u'print "g"', u"import pandas as pd\nfrom sklearn.tree import ...fication_report(y_test, predictions)\n    \n    ", u"import pandas as pd\nfrom sklearn.tree import ...fication_report(y_test, predictions)\n    \n    ", u"import pandas as pd\nfrom sklearn.tree import ...fication_report(y_test, predictions)\n    \n    "], 'Out': {}, 'Pipeline': <class 'sklearn.pipeline.Pipeline'>, 'RandomForestClassifier': <class 'sklearn.ensemble.forest.RandomForestClassifier'>, 'X':       0     1       2    3     4     5     6    ...0     0     0     0  

[3279 rows x 1558 columns], 'X_test':       0     1        2    3     4     5     6   ... 0     0     0     0  

[820 rows x 1558 columns], 'X_train':       0     1       2    3     4     5     6    ...0     0     0     0  

[2459 rows x 1558 columns], '_': '', ...}
   2882             finally:
   2883                 # Reset our crash handler in place
   2884                 sys.excepthook = old_excepthook
   2885         except SystemExit as e:

...........................................................................
C:\Users\Satvik Sachdev\Documents\GitHub\Machine Learning\Mastering Machine Learning with Scikit Learn\Chapter 5 Nonlinear Classification and Regression with Decision Trees\<ipython-input-4-567e9303e245> in <module>()
     41         'clf__min_samples_split': (1, 2, 3),
     42         'clf__min_samples_leaf': (1, 2, 3)
     43         }
     44     # set GridSearchCV to maximise the models f1 score
     45     grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=1, scoring='f1')
---> 46     grid_search.fit(X_train, y_train)
     47     print 'Best score: %0.3f' % grid_search.best_score_
     48     print 'Best parameters set:'
     49     best_parameters = grid_search.best_estimator_.get_params()
     50     for param_name in sorted(parameters.keys()):

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\sklearn\grid_search.py in fit(self=GridSearchCV(cv=None, error_score='raise',
     ...='2*n_jobs', refit=True, scoring='f1', verbose=1), X=      0     1       2    3     4     5     6    ...0     0     0     0  

[2459 rows x 1558 columns], y=[0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, ...])
    824         y : array-like, shape = [n_samples] or [n_samples, n_output], optional
    825             Target relative to X for classification or regression;
    826             None for unsupervised learning.
    827 
    828         """
--> 829         return self._fit(X, y, ParameterGrid(self.param_grid))
        self._fit = <bound method GridSearchCV._fit of GridSearchCV(...'2*n_jobs', refit=True, scoring='f1', verbose=1)>
        X =       0     1       2    3     4     5     6    ...0     0     0     0  

[2459 rows x 1558 columns]
        y = [0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, ...]
        self.param_grid = {'clf__max_depth': (50, 150, 250), 'clf__min_samples_leaf': (1, 2, 3), 'clf__min_samples_split': (1, 2, 3), 'clf__n_estimators': (5, 10, 20, 50)}
    830 
    831 
    832 class RandomizedSearchCV(BaseSearchCV):
    833     """Randomized search on hyper parameters.

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\sklearn\grid_search.py in _fit(self=GridSearchCV(cv=None, error_score='raise',
     ...='2*n_jobs', refit=True, scoring='f1', verbose=1), X=      0     1       2    3     4     5     6    ...0     0     0     0  

[2459 rows x 1558 columns], y=[0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, ...], parameter_iterable=<sklearn.grid_search.ParameterGrid object>)
    568         )(
    569             delayed(_fit_and_score)(clone(base_estimator), X, y, self.scorer_,
    570                                     train, test, self.verbose, parameters,
    571                                     self.fit_params, return_parameters=True,
    572                                     error_score=self.error_score)
--> 573                 for parameters in parameter_iterable
        parameters = undefined
        parameter_iterable = <sklearn.grid_search.ParameterGrid object>
    574                 for train, test in cv)
    575 
    576         # Out is a list of triplet: score, estimator, n_test_samples
    577         n_fits = len(out)

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py in __call__(self=Parallel(n_jobs=-1), iterable=<generator object <genexpr>>)
    763             if pre_dispatch == "all" or n_jobs == 1:
    764                 # The iterable was consumed all at once by the above for loop.
    765                 # No need to wait for async callbacks to trigger to
    766                 # consumption.
    767                 self._iterating = False
--> 768             self.retrieve()
        self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=-1)>
    769             # Make sure that we get a last message telling us we are done
    770             elapsed_time = time.time() - self._start_time
    771             self._print('Done %3i out of %3i | elapsed: %s finished',
    772                         (len(self._output), len(self._output),

---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
ValueError                                         Sat Jul 01 15:11:37 2017
PID: 4688                Python 2.7.13: C:\ProgramData\Anaconda2\python.exe
...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py in __call__(self=<sklearn.externals.joblib.parallel.BatchedCalls object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <function _fit_and_score>
        args = (Pipeline(steps=[('clf', RandomForestClassifier(b...None,
            verbose=0, warm_start=False))]),       0     1       2    3     4     5     6    ...0     0     0     0  

[2459 rows x 1558 columns], [0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, ...], make_scorer(f1_score), array([ 813,  814,  815, ..., 2456, 2457, 2458]), array([  0,   1,   2,   3,   4,   5,   6,   7,  ..., 834, 835, 841, 851, 859, 863,
       870, 873]), 1, {'clf__max_depth': 50, 'clf__min_samples_leaf': 1, 'clf__min_samples_split': 1, 'clf__n_estimators': 5}, {})
        kwargs = {'error_score': 'raise', 'return_parameters': True}
        self.items = [(<function _fit_and_score>, (Pipeline(steps=[('clf', RandomForestClassifier(b...None,
            verbose=0, warm_start=False))]),       0     1       2    3     4     5     6    ...0     0     0     0  

[2459 rows x 1558 columns], [0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, ...], make_scorer(f1_score), array([ 813,  814,  815, ..., 2456, 2457, 2458]), array([  0,   1,   2,   3,   4,   5,   6,   7,  ..., 834, 835, 841, 851, 859, 863,
       870, 873]), 1, {'clf__max_depth': 50, 'clf__min_samples_leaf': 1, 'clf__min_samples_split': 1, 'clf__n_estimators': 5}, {}), {'error_score': 'raise', 'return_parameters': True})]
    132 
    133     def __len__(self):
    134         return self._size
    135 

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\sklearn\cross_validation.py in _fit_and_score(estimator=Pipeline(steps=[('clf', RandomForestClassifier(b...None,
            verbose=0, warm_start=False))]), X=      0     1       2    3     4     5     6    ...0     0     0     0  

[2459 rows x 1558 columns], y=[0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, ...], scorer=make_scorer(f1_score), train=array([ 813,  814,  815, ..., 2456, 2457, 2458]), test=array([  0,   1,   2,   3,   4,   5,   6,   7,  ..., 834, 835, 841, 851, 859, 863,
       870, 873]), verbose=1, parameters={'clf__max_depth': 50, 'clf__min_samples_leaf': 1, 'clf__min_samples_split': 1, 'clf__n_estimators': 5}, fit_params={}, return_train_score=False, return_parameters=True, error_score='raise')
   1660 
   1661     try:
   1662         if y_train is None:
   1663             estimator.fit(X_train, **fit_params)
   1664         else:
-> 1665             estimator.fit(X_train, y_train, **fit_params)
        estimator.fit = <bound method Pipeline.fit of Pipeline(steps=[('...one,
            verbose=0, warm_start=False))])>
        X_train =       0     1       2    3     4     5     6    ...0     0     0     0  

[1638 rows x 1558 columns]
        y_train = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]
        fit_params = {}
   1666 
   1667     except Exception as e:
   1668         if error_score == 'raise':
   1669             raise

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\sklearn\pipeline.py in fit(self=Pipeline(steps=[('clf', RandomForestClassifier(b...None,
            verbose=0, warm_start=False))]), X=      0     1       2    3     4     5     6    ...0     0     0     0  

[1638 rows x 1558 columns], y=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], **fit_params={})
    265         self : Pipeline
    266             This estimator
    267         """
    268         Xt, fit_params = self._fit(X, y, **fit_params)
    269         if self._final_estimator is not None:
--> 270             self._final_estimator.fit(Xt, y, **fit_params)
        self._final_estimator.fit = <bound method RandomForestClassifier.fit of Rand...e=None,
            verbose=0, warm_start=False)>
        Xt =       0     1       2    3     4     5     6    ...0     0     0     0  

[1638 rows x 1558 columns]
        y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]
        fit_params = {}
    271         return self
    272 
    273     def fit_transform(self, X, y=None, **fit_params):
    274         """Fit the model and transform with the final estimator

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\sklearn\ensemble\forest.py in fit(self=RandomForestClassifier(bootstrap=True, class_wei...te=None,
            verbose=0, warm_start=False), X=array([[  -1.        ,   -1.        ,   -1.     ...      0.        ,    0.        ]], dtype=float32), y=array([[ 0.],
       [ 0.],
       [ 0.],
       ..., 
       [ 1.],
       [ 0.],
       [ 0.]]), sample_weight=None)
    321             trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
    322                              backend="threading")(
    323                 delayed(_parallel_build_trees)(
    324                     t, self, X, y, sample_weight, i, len(trees),
    325                     verbose=self.verbose, class_weight=self.class_weight)
--> 326                 for i, t in enumerate(trees))
        i = 4
    327 
    328             # Collect newly grown trees
    329             self.estimators_.extend(trees)
    330 

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py in __call__(self=Parallel(n_jobs=1), iterable=<generator object <genexpr>>)
    753         self.n_completed_tasks = 0
    754         try:
    755             # Only set self._iterating to True if at least a batch
    756             # was dispatched. In particular this covers the edge
    757             # case of Parallel used with an exhausted iterator.
--> 758             while self.dispatch_one_batch(iterator):
        self.dispatch_one_batch = <bound method Parallel.dispatch_one_batch of Parallel(n_jobs=1)>
        iterator = <generator object <genexpr>>
    759                 self._iterating = True
    760             else:
    761                 self._iterating = False
    762 

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py in dispatch_one_batch(self=Parallel(n_jobs=1), iterator=<generator object <genexpr>>)
    603             tasks = BatchedCalls(itertools.islice(iterator, batch_size))
    604             if len(tasks) == 0:
    605                 # No more tasks available in the iterator: tell caller to stop.
    606                 return False
    607             else:
--> 608                 self._dispatch(tasks)
        self._dispatch = <bound method Parallel._dispatch of Parallel(n_jobs=1)>
        tasks = <sklearn.externals.joblib.parallel.BatchedCalls object>
    609                 return True
    610 
    611     def _print(self, msg, msg_args):
    612         """Display the message on stout or stderr depending on verbosity"""

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py in _dispatch(self=Parallel(n_jobs=1), batch=<sklearn.externals.joblib.parallel.BatchedCalls object>)
    566         self.n_dispatched_tasks += len(batch)
    567         self.n_dispatched_batches += 1
    568 
    569         dispatch_timestamp = time.time()
    570         cb = BatchCompletionCallBack(dispatch_timestamp, len(batch), self)
--> 571         job = self._backend.apply_async(batch, callback=cb)
        job = undefined
        self._backend.apply_async = <bound method SequentialBackend.apply_async of <...lib._parallel_backends.SequentialBackend object>>
        batch = <sklearn.externals.joblib.parallel.BatchedCalls object>
        cb = <sklearn.externals.joblib.parallel.BatchCompletionCallBack object>
    572         self._jobs.append(job)
    573 
    574     def dispatch_next(self):
    575         """Dispatch more data for parallel processing

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py in apply_async(self=<sklearn.externals.joblib._parallel_backends.SequentialBackend object>, func=<sklearn.externals.joblib.parallel.BatchedCalls object>, callback=<sklearn.externals.joblib.parallel.BatchCompletionCallBack object>)
    104             raise ValueError('n_jobs == 0 in Parallel has no meaning')
    105         return 1
    106 
    107     def apply_async(self, func, callback=None):
    108         """Schedule a func to be run"""
--> 109         result = ImmediateResult(func)
        result = undefined
        func = <sklearn.externals.joblib.parallel.BatchedCalls object>
    110         if callback:
    111             callback(result)
    112         return result
    113 

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py in __init__(self=<sklearn.externals.joblib._parallel_backends.ImmediateResult object>, batch=<sklearn.externals.joblib.parallel.BatchedCalls object>)
    321 
    322 class ImmediateResult(object):
    323     def __init__(self, batch):
    324         # Don't delay the application, to avoid keeping the input
    325         # arguments in memory
--> 326         self.results = batch()
        self.results = undefined
        batch = <sklearn.externals.joblib.parallel.BatchedCalls object>
    327 
    328     def get(self):
    329         return self.results
    330 

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py in __call__(self=<sklearn.externals.joblib.parallel.BatchedCalls object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <function _parallel_build_trees>
        args = (DecisionTreeClassifier(class_weight=None, criter...=False, random_state=1124529390, splitter='best'), RandomForestClassifier(bootstrap=True, class_wei...te=None,
            verbose=0, warm_start=False), array([[  -1.        ,   -1.        ,   -1.     ...      0.        ,    0.        ]], dtype=float32), array([[ 0.],
       [ 0.],
       [ 0.],
       ..., 
       [ 1.],
       [ 0.],
       [ 0.]]), None, 0, 5)
        kwargs = {'class_weight': None, 'verbose': 0}
        self.items = [(<function _parallel_build_trees>, (DecisionTreeClassifier(class_weight=None, criter...=False, random_state=1124529390, splitter='best'), RandomForestClassifier(bootstrap=True, class_wei...te=None,
            verbose=0, warm_start=False), array([[  -1.        ,   -1.        ,   -1.     ...      0.        ,    0.        ]], dtype=float32), array([[ 0.],
       [ 0.],
       [ 0.],
       ..., 
       [ 1.],
       [ 0.],
       [ 0.]]), None, 0, 5), {'class_weight': None, 'verbose': 0})]
    132 
    133     def __len__(self):
    134         return self._size
    135 

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\sklearn\ensemble\forest.py in _parallel_build_trees(tree=DecisionTreeClassifier(class_weight=None, criter...=False, random_state=1124529390, splitter='best'), forest=RandomForestClassifier(bootstrap=True, class_wei...te=None,
            verbose=0, warm_start=False), X=array([[  -1.        ,   -1.        ,   -1.     ...      0.        ,    0.        ]], dtype=float32), y=array([[ 0.],
       [ 0.],
       [ 0.],
       ..., 
       [ 1.],
       [ 0.],
       [ 0.]]), sample_weight=None, tree_idx=0, n_trees=5, verbose=0, class_weight=None)
    115                 warnings.simplefilter('ignore', DeprecationWarning)
    116                 curr_sample_weight *= compute_sample_weight('auto', y, indices)
    117         elif class_weight == 'balanced_subsample':
    118             curr_sample_weight *= compute_sample_weight('balanced', y, indices)
    119 
--> 120         tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
        tree.fit = <bound method DecisionTreeClassifier.fit of Deci...False, random_state=1124529390, splitter='best')>
        X = array([[  -1.        ,   -1.        ,   -1.     ...      0.        ,    0.        ]], dtype=float32)
        y = array([[ 0.],
       [ 0.],
       [ 0.],
       ..., 
       [ 1.],
       [ 0.],
       [ 0.]])
        sample_weight = None
        curr_sample_weight = array([ 2.,  2.,  0., ...,  0.,  1.,  1.])
    121     else:
    122         tree.fit(X, y, sample_weight=sample_weight, check_input=False)
    123 
    124     return tree

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\sklearn\tree\tree.py in fit(self=DecisionTreeClassifier(class_weight=None, criter...=False, random_state=1124529390, splitter='best'), X=array([[  -1.        ,   -1.        ,   -1.     ...      0.        ,    0.        ]], dtype=float32), y=array([[ 0.],
       [ 0.],
       [ 0.],
       ..., 
       [ 1.],
       [ 0.],
       [ 0.]]), sample_weight=array([ 2.,  2.,  0., ...,  0.,  1.,  1.]), check_input=False, X_idx_sorted=None)
    734 
    735         super(DecisionTreeClassifier, self).fit(
    736             X, y,
    737             sample_weight=sample_weight,
    738             check_input=check_input,
--> 739             X_idx_sorted=X_idx_sorted)
        X_idx_sorted = None
    740         return self
    741 
    742 
    743     def predict_proba(self, X, check_input=True):

...........................................................................
C:\ProgramData\Anaconda2\lib\site-packages\sklearn\tree\tree.py in fit(self=DecisionTreeClassifier(class_weight=None, criter...=False, random_state=1124529390, splitter='best'), X=array([[  -1.        ,   -1.        ,   -1.     ...      0.        ,    0.        ]], dtype=float32), y=array([[ 0.],
       [ 0.],
       [ 0.],
       ..., 
       [ 1.],
       [ 0.],
       [ 0.]]), sample_weight=array([ 2.,  2.,  0., ...,  0.,  1.,  1.]), check_input=False, X_idx_sorted=None)
    194 
    195         if isinstance(self.min_samples_split, (numbers.Integral, np.integer)):
    196             if not 2 <= self.min_samples_split:
    197                 raise ValueError("min_samples_split must be at least 2 "
    198                                  "or in (0, 1], got %s"
--> 199                                  % self.min_samples_split)
        self.min_samples_split = 1
    200             min_samples_split = self.min_samples_split
    201         else:  # float
    202             if not 0. < self.min_samples_split <= 1.:
    203                 raise ValueError("min_samples_split must be at least 2 "

ValueError: min_samples_split must be at least 2 or in (0, 1], got 1
___________________________________________________________________________