
This notebook documents a minimal working example of the "ValueError: Buffer source array is read-only" that I get when setting `n_jobs > 1` in `GridSearchCV`.  

The use case is the following: I have a dataframe with a mix of string and numeric columns. I build a Pipeline that first encodes the data properly and then passes the encoded data into a classifier. I encode the data using a custom class called `DataFrame_Encoder` to specify which columns should be one-hot encoded with DictVectorizer and which columns should be kept as is. I then pass that Pipeline into GridSearchCV to optimize over the hyperparameters of the classifier. 

This snippet crashes when `n_jobs > 1` in `GridSearchCV` and the dataset is large (breaks somewhere between 100K and 200K rows but I haven't found the exact breaking point. 

The error occurs whenever a Dataframe that contains a column of dtype Object is passed to `GridSearchCV.fit`. I initially believed that the error was caused by my custom `DataFrame_Encoder` class but I no longer that is the case. The code breaks prior to ever calling the `fit` or `transform` methods of `DataFrame_Encoder`. I think the error is happening as soon as `GridSearchCV.fit` is called. Perhaps there's a check that works successfully for non-Object columns but doesn't work for Object columns. 

- Example 1 shows the MWE that results in the ValueError
- Example 2&3 show that the error is not caused by my custom DataFrame_Encoder class.
- Example 4 shows that things work if you drop the Object column from the Dataframe. 
- Example 5 shows that things work if you decrease the number of rows in the dataset.

## Related issues

This bug seems to be related to the following issues: 

- https://github.com/pandas-dev/pandas/issues/9928#issuecomment-97038944
- https://github.com/scikit-learn/scikit-learn/issues/4772
- https://github.com/scikit-learn/scikit-learn/pull/4775

It seems like it was supposed to be fixed in a recent version, so perhaps this is separate thing. 

In [1]:
import pandas as pd

from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.datasets import make_classification
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction import DictVectorizer

import numpy as np


In [2]:
class DataFrame_Encoder(BaseEstimator, TransformerMixin):
    
    def __init__(self, categorical_cols_=None,numeric_cols_=None):
        print("__init__ called")
        self.categorical_cols_ = categorical_cols_
        self.numeric_cols_ = numeric_cols_
    
    def fit(self, df, y=None):
        print("Fit called")
        ### df should be a dataframe that is a mix of categorical and numeric columns
        self.vec_ = DictVectorizer(sparse=False)
        temp_data = df[self.categorical_cols_].astype(str)
        self.vec_.fit(temp_data.to_dict('records'))
        self.feature_names_ = list(self.numeric_cols_) + list(self.vec_.feature_names_)
        return self

    def transform(self, df):
        ### df should be a dataframe that is a mix of categorical and numeric columns
        print("Transform called")
        temp_data = df[self.categorical_cols_].astype(str)
        categorical_data = self.vec_.transform(temp_data.to_dict('records'))
        categorical_df = pd.DataFrame(categorical_data, columns=self.vec_.feature_names_, index=df.index)
        new_data = pd.concat([df[self.numeric_cols_], categorical_df],axis=1)
        return new_data


# Example 1 : Fails 

In [3]:
x,y = make_classification(n_samples=200000,n_features=5)

numeric_features = ['x1','x2','x3','x4','x5']
string_features = ['category']

df = pd.DataFrame(data=x,columns=numeric_features)
df['category'] = 'a'

base_clf = RandomForestClassifier(n_jobs=4)
param_grid = {'clf__n_estimators':[10,100]}

pipeline = Pipeline([
        ('feature_encoder',DataFrame_Encoder()),
        ('clf',base_clf)
])
pipeline.set_params(feature_encoder__categorical_cols_=string_features, feature_encoder__numeric_cols_=numeric_features)

clf = GridSearchCV(pipeline, param_grid,cv=5,n_jobs=2,verbose=1)

clf.fit(df,y)


__init__ called
Fitting 5 folds for each of 2 candidates, totalling 10 fits
__init__ called
__init__ called




__init__ called
__init__ called
__init__ called




JoblibValueError: JoblibValueError
___________________________________________________________________________
Multiprocessing exception:
...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/runpy.py in _run_module_as_main(mod_name='ipykernel_launcher', alter_argv=1)
    188         sys.exit(msg)
    189     main_globals = sys.modules["__main__"].__dict__
    190     if alter_argv:
    191         sys.argv[0] = mod_spec.origin
    192     return _run_code(code, main_globals, None,
--> 193                      "__main__", mod_spec)
        mod_spec = ModuleSpec(name='ipykernel_launcher', loader=<_f...b/python3.6/site-packages/ipykernel_launcher.py')
    194 
    195 def run_module(mod_name, init_globals=None,
    196                run_name=None, alter_sys=False):
    197     """Execute a module's code without importing it

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/runpy.py in _run_code(code=<code object <module> at 0x10e8bd030, file "/Use...3.6/site-packages/ipykernel_launcher.py", line 5>, run_globals={'__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__cached__': '/Users/gstoddard/anaconda/envs/standard_py3_env/...ges/__pycache__/ipykernel_launcher.cpython-36.pyc', '__doc__': 'Entry point for launching an IPython kernel.\n\nTh...orts until\nafter removing the cwd from sys.path.\n', '__file__': '/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/ipykernel_launcher.py', '__loader__': <_frozen_importlib_external.SourceFileLoader object>, '__name__': '__main__', '__package__': '', '__spec__': ModuleSpec(name='ipykernel_launcher', loader=<_f...b/python3.6/site-packages/ipykernel_launcher.py'), 'app': <module 'ipykernel.kernelapp' from '/Users/gstod.../python3.6/site-packages/ipykernel/kernelapp.py'>, ...}, init_globals=None, mod_name='__main__', mod_spec=ModuleSpec(name='ipykernel_launcher', loader=<_f...b/python3.6/site-packages/ipykernel_launcher.py'), pkg_name='', script_name=None)
     80                        __cached__ = cached,
     81                        __doc__ = None,
     82                        __loader__ = loader,
     83                        __package__ = pkg_name,
     84                        __spec__ = mod_spec)
---> 85     exec(code, run_globals)
        code = <code object <module> at 0x10e8bd030, file "/Use...3.6/site-packages/ipykernel_launcher.py", line 5>
        run_globals = {'__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__cached__': '/Users/gstoddard/anaconda/envs/standard_py3_env/...ges/__pycache__/ipykernel_launcher.cpython-36.pyc', '__doc__': 'Entry point for launching an IPython kernel.\n\nTh...orts until\nafter removing the cwd from sys.path.\n', '__file__': '/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/ipykernel_launcher.py', '__loader__': <_frozen_importlib_external.SourceFileLoader object>, '__name__': '__main__', '__package__': '', '__spec__': ModuleSpec(name='ipykernel_launcher', loader=<_f...b/python3.6/site-packages/ipykernel_launcher.py'), 'app': <module 'ipykernel.kernelapp' from '/Users/gstod.../python3.6/site-packages/ipykernel/kernelapp.py'>, ...}
     86     return run_globals
     87 
     88 def _run_module_code(code, init_globals=None,
     89                     mod_name=None, mod_spec=None,

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/ipykernel_launcher.py in <module>()
     11     # This is added back by InteractiveShellApp.init_path()
     12     if sys.path[0] == '':
     13         del sys.path[0]
     14 
     15     from ipykernel import kernelapp as app
---> 16     app.launch_new_instance()
     17 
     18 
     19 
     20 

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/traitlets/config/application.py in launch_instance(cls=<class 'ipykernel.kernelapp.IPKernelApp'>, argv=None, **kwargs={})
    653 
    654         If a global instance already exists, this reinitializes and starts it
    655         """
    656         app = cls.instance(**kwargs)
    657         app.initialize(argv)
--> 658         app.start()
        app.start = <bound method IPKernelApp.start of <ipykernel.kernelapp.IPKernelApp object>>
    659 
    660 #-----------------------------------------------------------------------------
    661 # utility functions, for convenience
    662 #-----------------------------------------------------------------------------

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/ipykernel/kernelapp.py in start(self=<ipykernel.kernelapp.IPKernelApp object>)
    472             return self.subapp.start()
    473         if self.poller is not None:
    474             self.poller.start()
    475         self.kernel.start()
    476         try:
--> 477             ioloop.IOLoop.instance().start()
    478         except KeyboardInterrupt:
    479             pass
    480 
    481 launch_new_instance = IPKernelApp.launch_instance

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/zmq/eventloop/ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
    172             )
    173         return loop
    174     
    175     def start(self):
    176         try:
--> 177             super(ZMQIOLoop, self).start()
        self.start = <bound method ZMQIOLoop.start of <zmq.eventloop.ioloop.ZMQIOLoop object>>
    178         except ZMQError as e:
    179             if e.errno == ETERM:
    180                 # quietly return on ETERM
    181                 pass

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/tornado/ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
    883                 self._events.update(event_pairs)
    884                 while self._events:
    885                     fd, events = self._events.popitem()
    886                     try:
    887                         fd_obj, handler_func = self._handlers[fd]
--> 888                         handler_func(fd_obj, events)
        handler_func = <function wrap.<locals>.null_wrapper>
        fd_obj = <zmq.sugar.socket.Socket object>
        events = 5
    889                     except (OSError, IOError) as e:
    890                         if errno_from_exception(e) == errno.EPIPE:
    891                             # Happens when the client closes the connection
    892                             pass

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/tornado/stack_context.py in null_wrapper(*args=(<zmq.sugar.socket.Socket object>, 5), **kwargs={})
    272         # Fast path when there are no active contexts.
    273         def null_wrapper(*args, **kwargs):
    274             try:
    275                 current_state = _state.contexts
    276                 _state.contexts = cap_contexts[0]
--> 277                 return fn(*args, **kwargs)
        args = (<zmq.sugar.socket.Socket object>, 5)
        kwargs = {}
    278             finally:
    279                 _state.contexts = current_state
    280         null_wrapper._wrapped = True
    281         return null_wrapper

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py in _handle_events(self=<zmq.eventloop.zmqstream.ZMQStream object>, fd=<zmq.sugar.socket.Socket object>, events=5)
    435             # dispatch events:
    436             if events & IOLoop.ERROR:
    437                 gen_log.error("got POLLERR event on ZMQStream, which doesn't make sense")
    438                 return
    439             if events & IOLoop.READ:
--> 440                 self._handle_recv()
        self._handle_recv = <bound method ZMQStream._handle_recv of <zmq.eventloop.zmqstream.ZMQStream object>>
    441                 if not self.socket:
    442                     return
    443             if events & IOLoop.WRITE:
    444                 self._handle_send()

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py in _handle_recv(self=<zmq.eventloop.zmqstream.ZMQStream object>)
    467                 gen_log.error("RECV Error: %s"%zmq.strerror(e.errno))
    468         else:
    469             if self._recv_callback:
    470                 callback = self._recv_callback
    471                 # self._recv_callback = None
--> 472                 self._run_callback(callback, msg)
        self._run_callback = <bound method ZMQStream._run_callback of <zmq.eventloop.zmqstream.ZMQStream object>>
        callback = <function wrap.<locals>.null_wrapper>
        msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
    473                 
    474         # self.update_state()
    475         
    476 

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py in _run_callback(self=<zmq.eventloop.zmqstream.ZMQStream object>, callback=<function wrap.<locals>.null_wrapper>, *args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
    409         close our socket."""
    410         try:
    411             # Use a NullContext to ensure that all StackContexts are run
    412             # inside our blanket exception handler rather than outside.
    413             with stack_context.NullContext():
--> 414                 callback(*args, **kwargs)
        callback = <function wrap.<locals>.null_wrapper>
        args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
        kwargs = {}
    415         except:
    416             gen_log.error("Uncaught exception, closing connection.",
    417                           exc_info=True)
    418             # Close the socket on an uncaught exception from a user callback

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/tornado/stack_context.py in null_wrapper(*args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
    272         # Fast path when there are no active contexts.
    273         def null_wrapper(*args, **kwargs):
    274             try:
    275                 current_state = _state.contexts
    276                 _state.contexts = cap_contexts[0]
--> 277                 return fn(*args, **kwargs)
        args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
        kwargs = {}
    278             finally:
    279                 _state.contexts = current_state
    280         null_wrapper._wrapped = True
    281         return null_wrapper

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/ipykernel/kernelbase.py in dispatcher(msg=[<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>])
    278         if self.control_stream:
    279             self.control_stream.on_recv(self.dispatch_control, copy=False)
    280 
    281         def make_dispatcher(stream):
    282             def dispatcher(msg):
--> 283                 return self.dispatch_shell(stream, msg)
        msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
    284             return dispatcher
    285 
    286         for s in self.shell_streams:
    287             s.on_recv(make_dispatcher(s), copy=False)

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/ipykernel/kernelbase.py in dispatch_shell(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, msg={'buffers': [], 'content': {'allow_stdin': True, 'code': 'x,y = make_classification(n_samples=200000,n_fea...ram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)\n', 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': datetime.datetime(2017, 8, 2, 22, 24, 59, 88163, tzinfo=tzutc()), 'msg_id': '81F1F7C9D6BE4AB79A8ED2592D8F62FE', 'msg_type': 'execute_request', 'session': 'A09FD675E3EA46CB837CBE1D3E75F32E', 'username': 'username', 'version': '5.0'}, 'metadata': {}, 'msg_id': '81F1F7C9D6BE4AB79A8ED2592D8F62FE', 'msg_type': 'execute_request', 'parent_header': {}})
    230             self.log.warn("Unknown message type: %r", msg_type)
    231         else:
    232             self.log.debug("%s: %s", msg_type, msg)
    233             self.pre_handler_hook()
    234             try:
--> 235                 handler(stream, idents, msg)
        handler = <bound method Kernel.execute_request of <ipykernel.ipkernel.IPythonKernel object>>
        stream = <zmq.eventloop.zmqstream.ZMQStream object>
        idents = [b'A09FD675E3EA46CB837CBE1D3E75F32E']
        msg = {'buffers': [], 'content': {'allow_stdin': True, 'code': 'x,y = make_classification(n_samples=200000,n_fea...ram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)\n', 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': datetime.datetime(2017, 8, 2, 22, 24, 59, 88163, tzinfo=tzutc()), 'msg_id': '81F1F7C9D6BE4AB79A8ED2592D8F62FE', 'msg_type': 'execute_request', 'session': 'A09FD675E3EA46CB837CBE1D3E75F32E', 'username': 'username', 'version': '5.0'}, 'metadata': {}, 'msg_id': '81F1F7C9D6BE4AB79A8ED2592D8F62FE', 'msg_type': 'execute_request', 'parent_header': {}}
    236             except Exception:
    237                 self.log.error("Exception in message handler:", exc_info=True)
    238             finally:
    239                 self.post_handler_hook()

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/ipykernel/kernelbase.py in execute_request(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, ident=[b'A09FD675E3EA46CB837CBE1D3E75F32E'], parent={'buffers': [], 'content': {'allow_stdin': True, 'code': 'x,y = make_classification(n_samples=200000,n_fea...ram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)\n', 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': datetime.datetime(2017, 8, 2, 22, 24, 59, 88163, tzinfo=tzutc()), 'msg_id': '81F1F7C9D6BE4AB79A8ED2592D8F62FE', 'msg_type': 'execute_request', 'session': 'A09FD675E3EA46CB837CBE1D3E75F32E', 'username': 'username', 'version': '5.0'}, 'metadata': {}, 'msg_id': '81F1F7C9D6BE4AB79A8ED2592D8F62FE', 'msg_type': 'execute_request', 'parent_header': {}})
    394         if not silent:
    395             self.execution_count += 1
    396             self._publish_execute_input(code, parent, self.execution_count)
    397 
    398         reply_content = self.do_execute(code, silent, store_history,
--> 399                                         user_expressions, allow_stdin)
        user_expressions = {}
        allow_stdin = True
    400 
    401         # Flush output before sending the reply.
    402         sys.stdout.flush()
    403         sys.stderr.flush()

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/ipykernel/ipkernel.py in do_execute(self=<ipykernel.ipkernel.IPythonKernel object>, code='x,y = make_classification(n_samples=200000,n_fea...ram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)\n', silent=False, store_history=True, user_expressions={}, allow_stdin=True)
    191 
    192         self._forward_input(allow_stdin)
    193 
    194         reply_content = {}
    195         try:
--> 196             res = shell.run_cell(code, store_history=store_history, silent=silent)
        res = undefined
        shell.run_cell = <bound method ZMQInteractiveShell.run_cell of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        code = 'x,y = make_classification(n_samples=200000,n_fea...ram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)\n'
        store_history = True
        silent = False
    197         finally:
    198             self._restore_input()
    199 
    200         if res.error_before_exec is not None:

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/ipykernel/zmqshell.py in run_cell(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, *args=('x,y = make_classification(n_samples=200000,n_fea...ram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)\n',), **kwargs={'silent': False, 'store_history': True})
    528             )
    529         self.payload_manager.write_payload(payload)
    530 
    531     def run_cell(self, *args, **kwargs):
    532         self._last_traceback = None
--> 533         return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
        self.run_cell = <bound method ZMQInteractiveShell.run_cell of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        args = ('x,y = make_classification(n_samples=200000,n_fea...ram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)\n',)
        kwargs = {'silent': False, 'store_history': True}
    534 
    535     def _showtraceback(self, etype, evalue, stb):
    536         # try to preserve ordering of tracebacks and print statements
    537         sys.stdout.flush()

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/IPython/core/interactiveshell.py in run_cell(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, raw_cell='x,y = make_classification(n_samples=200000,n_fea...ram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)\n', store_history=True, silent=False, shell_futures=True)
   2678                 self.displayhook.exec_result = result
   2679 
   2680                 # Execute the user code
   2681                 interactivity = "none" if silent else self.ast_node_interactivity
   2682                 has_raised = self.run_ast_nodes(code_ast.body, cell_name,
-> 2683                    interactivity=interactivity, compiler=compiler, result=result)
        interactivity = 'last_expr'
        compiler = <IPython.core.compilerop.CachingCompiler object>
   2684                 
   2685                 self.last_execution_succeeded = not has_raised
   2686 
   2687                 # Reset this so later displayed values do not modify the

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/IPython/core/interactiveshell.py in run_ast_nodes(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, nodelist=[<_ast.Assign object>, <_ast.Assign object>, <_ast.Assign object>, <_ast.Assign object>, <_ast.Assign object>, <_ast.Assign object>, <_ast.Assign object>, <_ast.Assign object>, <_ast.Expr object>, <_ast.Assign object>, <_ast.Expr object>], cell_name='<ipython-input-3-b63d72744ba1>', interactivity='last', compiler=<IPython.core.compilerop.CachingCompiler object>, result=<ExecutionResult object at 119404c50, execution_..._before_exec=None error_in_exec=None result=None>)
   2788                     return True
   2789 
   2790             for i, node in enumerate(to_run_interactive):
   2791                 mod = ast.Interactive([node])
   2792                 code = compiler(mod, cell_name, "single")
-> 2793                 if self.run_code(code, result):
        self.run_code = <bound method InteractiveShell.run_code of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        code = <code object <module> at 0x1193a4f60, file "<ipython-input-3-b63d72744ba1>", line 20>
        result = <ExecutionResult object at 119404c50, execution_..._before_exec=None error_in_exec=None result=None>
   2794                     return True
   2795 
   2796             # Flush softspace
   2797             if softspace(sys.stdout, 0):

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/IPython/core/interactiveshell.py in run_code(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, code_obj=<code object <module> at 0x1193a4f60, file "<ipython-input-3-b63d72744ba1>", line 20>, result=<ExecutionResult object at 119404c50, execution_..._before_exec=None error_in_exec=None result=None>)
   2842         outflag = True  # happens in more places, so it's easier as default
   2843         try:
   2844             try:
   2845                 self.hooks.pre_run_code_hook()
   2846                 #rprint('Running code', repr(code_obj)) # dbg
-> 2847                 exec(code_obj, self.user_global_ns, self.user_ns)
        code_obj = <code object <module> at 0x1193a4f60, file "<ipython-input-3-b63d72744ba1>", line 20>
        self.user_global_ns = {'BaseEstimator': <class 'sklearn.base.BaseEstimator'>, 'DataFrame_Encoder': <class '__main__.DataFrame_Encoder'>, 'DictVectorizer': <class 'sklearn.feature_extraction.dict_vectorizer.DictVectorizer'>, 'GridSearchCV': <class 'sklearn.model_selection._search.GridSearchCV'>, 'In': ['', 'import pandas as pd\n\nfrom sklearn.model_selectio...raction import DictVectorizer\n\nimport numpy as np', 'class DataFrame_Encoder(BaseEstimator, Transform..., categorical_df],axis=1)\n        return new_data', 'x,y = make_classification(n_samples=200000,n_fea...aram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)'], 'Out': {}, 'Pipeline': <class 'sklearn.pipeline.Pipeline'>, 'RandomForestClassifier': <class 'sklearn.ensemble.forest.RandomForestClassifier'>, 'TransformerMixin': <class 'sklearn.base.TransformerMixin'>, '_': '', ...}
        self.user_ns = {'BaseEstimator': <class 'sklearn.base.BaseEstimator'>, 'DataFrame_Encoder': <class '__main__.DataFrame_Encoder'>, 'DictVectorizer': <class 'sklearn.feature_extraction.dict_vectorizer.DictVectorizer'>, 'GridSearchCV': <class 'sklearn.model_selection._search.GridSearchCV'>, 'In': ['', 'import pandas as pd\n\nfrom sklearn.model_selectio...raction import DictVectorizer\n\nimport numpy as np', 'class DataFrame_Encoder(BaseEstimator, Transform..., categorical_df],axis=1)\n        return new_data', 'x,y = make_classification(n_samples=200000,n_fea...aram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)'], 'Out': {}, 'Pipeline': <class 'sklearn.pipeline.Pipeline'>, 'RandomForestClassifier': <class 'sklearn.ensemble.forest.RandomForestClassifier'>, 'TransformerMixin': <class 'sklearn.base.TransformerMixin'>, '_': '', ...}
   2848             finally:
   2849                 # Reset our crash handler in place
   2850                 sys.excepthook = old_excepthook
   2851         except SystemExit as e:

...........................................................................
/Users/gstoddard/sklearn_bug/<ipython-input-3-b63d72744ba1> in <module>()
     15 ])
     16 pipeline.set_params(feature_encoder__categorical_cols_=string_features, feature_encoder__numeric_cols_=numeric_features)
     17 
     18 clf = GridSearchCV(pipeline, param_grid,cv=5,n_jobs=2,verbose=1)
     19 
---> 20 clf.fit(df,y)
     21 
     22 
     23 
     24 

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/sklearn/model_selection/_search.py in fit(self=GridSearchCV(cv=5, error_score='raise',
       e...train_score=True,
       scoring=None, verbose=1), X=              x1        x2        x3        x4  ...849 -0.190363        a

[200000 rows x 6 columns], y=array([0, 0, 1, ..., 1, 0, 1]), groups=None)
    940 
    941         groups : array-like, with shape (n_samples,), optional
    942             Group labels for the samples used while splitting the dataset into
    943             train/test set.
    944         """
--> 945         return self._fit(X, y, groups, ParameterGrid(self.param_grid))
        self._fit = <bound method BaseSearchCV._fit of GridSearchCV(...rain_score=True,
       scoring=None, verbose=1)>
        X =               x1        x2        x3        x4  ...849 -0.190363        a

[200000 rows x 6 columns]
        y = array([0, 0, 1, ..., 1, 0, 1])
        groups = None
        self.param_grid = {'clf__n_estimators': [10, 100]}
    946 
    947 
    948 class RandomizedSearchCV(BaseSearchCV):
    949     """Randomized search on hyper parameters.

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/sklearn/model_selection/_search.py in _fit(self=GridSearchCV(cv=5, error_score='raise',
       e...train_score=True,
       scoring=None, verbose=1), X=              x1        x2        x3        x4  ...849 -0.190363        a

[200000 rows x 6 columns], y=array([0, 0, 1, ..., 1, 0, 1]), groups=None, parameter_iterable=<sklearn.model_selection._search.ParameterGrid object>)
    559                                   fit_params=self.fit_params,
    560                                   return_train_score=self.return_train_score,
    561                                   return_n_test_samples=True,
    562                                   return_times=True, return_parameters=True,
    563                                   error_score=self.error_score)
--> 564           for parameters in parameter_iterable
        parameters = undefined
        parameter_iterable = <sklearn.model_selection._search.ParameterGrid object>
    565           for train, test in cv_iter)
    566 
    567         # if one choose to see train score, "out" will contain train score info
    568         if self.return_train_score:

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=Parallel(n_jobs=2), iterable=<generator object BaseSearchCV._fit.<locals>.<genexpr>>)
    763             if pre_dispatch == "all" or n_jobs == 1:
    764                 # The iterable was consumed all at once by the above for loop.
    765                 # No need to wait for async callbacks to trigger to
    766                 # consumption.
    767                 self._iterating = False
--> 768             self.retrieve()
        self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=2)>
    769             # Make sure that we get a last message telling us we are done
    770             elapsed_time = time.time() - self._start_time
    771             self._print('Done %3i out of %3i | elapsed: %s finished',
    772                         (len(self._output), len(self._output),

---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
ValueError                                         Wed Aug  2 18:25:00 2017
PID: 35261Python 3.6.1: /Users/gstoddard/anaconda/envs/standard_py3_env/bin/python
...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=<sklearn.externals.joblib.parallel.BatchedCalls object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        self.items = [(<function _fit_and_score>, (Pipeline(steps=[('feature_encoder', DataFrame_En...None,
            verbose=0, warm_start=False))]),               x1        x2        x3        x4  ...849 -0.190363        a

[200000 rows x 6 columns], memmap([0, 0, 1, ..., 1, 0, 1]), <function _passthrough_scorer>, memmap([ 39890,  39891,  39892, ..., 199997, 199998, 199999]), array([    0,     1,     2, ..., 40133, 40138, 40141]), 1, {'clf__n_estimators': 10}), {'error_score': 'raise', 'fit_params': {}, 'return_n_test_samples': True, 'return_parameters': True, 'return_times': True, 'return_train_score': True})]
    132 
    133     def __len__(self):
    134         return self._size
    135 

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in <listcomp>(.0=<list_iterator object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <function _fit_and_score>
        args = (Pipeline(steps=[('feature_encoder', DataFrame_En...None,
            verbose=0, warm_start=False))]),               x1        x2        x3        x4  ...849 -0.190363        a

[200000 rows x 6 columns], memmap([0, 0, 1, ..., 1, 0, 1]), <function _passthrough_scorer>, memmap([ 39890,  39891,  39892, ..., 199997, 199998, 199999]), array([    0,     1,     2, ..., 40133, 40138, 40141]), 1, {'clf__n_estimators': 10})
        kwargs = {'error_score': 'raise', 'fit_params': {}, 'return_n_test_samples': True, 'return_parameters': True, 'return_times': True, 'return_train_score': True}
    132 
    133     def __len__(self):
    134         return self._size
    135 

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/sklearn/model_selection/_validation.py in _fit_and_score(estimator=Pipeline(steps=[('feature_encoder', DataFrame_En...None,
            verbose=0, warm_start=False))]), X=              x1        x2        x3        x4  ...849 -0.190363        a

[200000 rows x 6 columns], y=memmap([0, 0, 1, ..., 1, 0, 1]), scorer=<function _passthrough_scorer>, train=memmap([ 39890,  39891,  39892, ..., 199997, 199998, 199999]), test=array([    0,     1,     2, ..., 40133, 40138, 40141]), verbose=1, parameters={'clf__n_estimators': 10}, fit_params={}, return_train_score=True, return_parameters=True, return_n_test_samples=True, return_times=True, error_score='raise')
    226     if parameters is not None:
    227         estimator.set_params(**parameters)
    228 
    229     start_time = time.time()
    230 
--> 231     X_train, y_train = _safe_split(estimator, X, y, train)
        X_train = undefined
        y_train = undefined
        estimator = Pipeline(steps=[('feature_encoder', DataFrame_En...None,
            verbose=0, warm_start=False))])
        X =               x1        x2        x3        x4  ...849 -0.190363        a

[200000 rows x 6 columns]
        y = memmap([0, 0, 1, ..., 1, 0, 1])
        train = memmap([ 39890,  39891,  39892, ..., 199997, 199998, 199999])
    232     X_test, y_test = _safe_split(estimator, X, y, test, train)
    233 
    234     try:
    235         if y_train is None:

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/sklearn/utils/metaestimators.py in _safe_split(estimator=Pipeline(steps=[('feature_encoder', DataFrame_En...None,
            verbose=0, warm_start=False))]), X=              x1        x2        x3        x4  ...849 -0.190363        a

[200000 rows x 6 columns], y=memmap([0, 0, 1, ..., 1, 0, 1]), indices=memmap([ 39890,  39891,  39892, ..., 199997, 199998, 199999]), train_indices=None)
    103             if train_indices is None:
    104                 X_subset = X[np.ix_(indices, indices)]
    105             else:
    106                 X_subset = X[np.ix_(indices, train_indices)]
    107         else:
--> 108             X_subset = safe_indexing(X, indices)
        X_subset = undefined
        X =               x1        x2        x3        x4  ...849 -0.190363        a

[200000 rows x 6 columns]
        indices = memmap([ 39890,  39891,  39892, ..., 199997, 199998, 199999])
    109 
    110     if y is not None:
    111         y_subset = safe_indexing(y, indices)
    112     else:

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/sklearn/utils/__init__.py in safe_indexing(X=              x1        x2        x3        x4  ...849 -0.190363        a

[200000 rows x 6 columns], indices=memmap([ 39890,  39891,  39892, ..., 199997, 199998, 199999]))
    100         except ValueError:
    101             # Cython typed memoryviews internally used in pandas do not support
    102             # readonly buffers.
    103             warnings.warn("Copying input dataframe for slicing.",
    104                           DataConversionWarning)
--> 105             return X.copy().iloc[indices]
        X.copy.iloc = undefined
        indices = memmap([ 39890,  39891,  39892, ..., 199997, 199998, 199999])
    106     elif hasattr(X, "shape"):
    107         if hasattr(X, 'take') and (hasattr(indices, 'dtype') and
    108                                    indices.dtype.kind == 'i'):
    109             # This is often substantially faster than X[indices]

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/core/indexing.py in __getitem__(self=<pandas.core.indexing._iLocIndexer object>, key=memmap([ 39890,  39891,  39892, ..., 199997, 199998, 199999]))
   1323             except (KeyError, IndexError):
   1324                 pass
   1325             return self._getitem_tuple(key)
   1326         else:
   1327             key = com._apply_if_callable(key, self.obj)
-> 1328             return self._getitem_axis(key, axis=0)
        self._getitem_axis = <bound method _iLocIndexer._getitem_axis of <pandas.core.indexing._iLocIndexer object>>
        key = memmap([ 39890,  39891,  39892, ..., 199997, 199998, 199999])
   1329 
   1330     def _is_scalar_access(self, key):
   1331         raise NotImplementedError()
   1332 

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_axis(self=<pandas.core.indexing._iLocIndexer object>, key=memmap([ 39890,  39891,  39892, ..., 199997, 199998, 199999]), axis=0)
   1733             self._has_valid_type(key, axis)
   1734             return self._getbool_axis(key, axis=axis)
   1735 
   1736         # a list of integers
   1737         elif is_list_like_indexer(key):
-> 1738             return self._get_list_axis(key, axis=axis)
        self._get_list_axis = <bound method _iLocIndexer._get_list_axis of <pandas.core.indexing._iLocIndexer object>>
        key = memmap([ 39890,  39891,  39892, ..., 199997, 199998, 199999])
        axis = 0
   1739 
   1740         # a single integer
   1741         else:
   1742             key = self._convert_scalar_indexer(key, axis)

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/core/indexing.py in _get_list_axis(self=<pandas.core.indexing._iLocIndexer object>, key=memmap([ 39890,  39891,  39892, ..., 199997, 199998, 199999]), axis=0)
   1710         Returns
   1711         -------
   1712         Series object
   1713         """
   1714         try:
-> 1715             return self.obj.take(key, axis=axis, convert=False)
        self.obj.take = <bound method NDFrame.take of               x1  ...49 -0.190363        a

[200000 rows x 6 columns]>
        key = memmap([ 39890,  39891,  39892, ..., 199997, 199998, 199999])
        axis = 0
   1716         except IndexError:
   1717             # re-raise with different error message
   1718             raise IndexError("positional indexers are out-of-bounds")
   1719 

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/core/generic.py in take(self=              x1        x2        x3        x4  ...849 -0.190363        a

[200000 rows x 6 columns], indices=memmap([ 39890,  39891,  39892, ..., 199997, 199998, 199999]), axis=0, convert=False, is_copy=True, **kwargs={})
   1923         """
   1924         nv.validate_take(tuple(), kwargs)
   1925         self._consolidate_inplace()
   1926         new_data = self._data.take(indices,
   1927                                    axis=self._get_block_manager_axis(axis),
-> 1928                                    convert=True, verify=True)
        convert = False
   1929         result = self._constructor(new_data).__finalize__(self)
   1930 
   1931         # maybe set copy if we didn't actually change the index
   1932         if is_copy:

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/core/internals.py in take(self=BlockManager
Items: Index(['x1', 'x2', 'x3', 'x4...tBlock: slice(5, 6, 1), 1 x 200000, dtype: object, indexer=memmap([ 39890,  39891,  39892, ..., 199997, 199998, 199999]), axis=1, verify=True, convert=True)
   4006                 raise Exception('Indices must be nonzero and less than '
   4007                                 'the axis length')
   4008 
   4009         new_labels = self.axes[axis].take(indexer)
   4010         return self.reindex_indexer(new_axis=new_labels, indexer=indexer,
-> 4011                                     axis=axis, allow_dups=True)
        axis = 1
   4012 
   4013     def merge(self, other, lsuffix='', rsuffix=''):
   4014         if not self._is_indexed_like(other):
   4015             raise AssertionError('Must have same axes to merge managers')

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/core/internals.py in reindex_indexer(self=BlockManager
Items: Index(['x1', 'x2', 'x3', 'x4...tBlock: slice(5, 6, 1), 1 x 200000, dtype: object, new_axis=Int64Index([ 39890,  39891,  39892,  39893,  398...199999],
           dtype='int64', length=159999), indexer=memmap([ 39890,  39891,  39892, ..., 199997, 199998, 199999]), axis=1, fill_value=None, allow_dups=True, copy=True)
   3892             new_blocks = self._slice_take_blocks_ax0(indexer,
   3893                                                      fill_tuple=(fill_value,))
   3894         else:
   3895             new_blocks = [blk.take_nd(indexer, axis=axis, fill_tuple=(
   3896                 fill_value if fill_value is not None else blk.fill_value,))
-> 3897                 for blk in self.blocks]
        self.blocks = (FloatBlock: slice(0, 5, 1), 5 x 200000, dtype: float64, ObjectBlock: slice(5, 6, 1), 1 x 200000, dtype: object)
   3898 
   3899         new_axes = list(self.axes)
   3900         new_axes[axis] = new_axis
   3901         return self.__class__(new_blocks, new_axes)

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/core/internals.py in <listcomp>(.0=<tuple_iterator object>)
   3892             new_blocks = self._slice_take_blocks_ax0(indexer,
   3893                                                      fill_tuple=(fill_value,))
   3894         else:
   3895             new_blocks = [blk.take_nd(indexer, axis=axis, fill_tuple=(
   3896                 fill_value if fill_value is not None else blk.fill_value,))
-> 3897                 for blk in self.blocks]
        blk = FloatBlock: slice(0, 5, 1), 5 x 200000, dtype: float64
   3898 
   3899         new_axes = list(self.axes)
   3900         new_axes[axis] = new_axis
   3901         return self.__class__(new_blocks, new_axes)

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/core/internals.py in take_nd(self=FloatBlock: slice(0, 5, 1), 5 x 200000, dtype: float64, indexer=memmap([ 39890,  39891,  39892, ..., 199997, 199998, 199999]), axis=1, new_mgr_locs=None, fill_tuple=(nan,))
   1041             new_values = algos.take_nd(values, indexer, axis=axis,
   1042                                        allow_fill=False)
   1043         else:
   1044             fill_value = fill_tuple[0]
   1045             new_values = algos.take_nd(values, indexer, axis=axis,
-> 1046                                        allow_fill=True, fill_value=fill_value)
        fill_value = nan
   1047 
   1048         if new_mgr_locs is None:
   1049             if axis == 0:
   1050                 slc = lib.indexer_as_slice(indexer)

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/core/algorithms.py in take_nd(arr=memmap([[-0.26666035, -0.23195333, -1.14554448, ... 0.064569  ,
         -0.07119485, -0.1903627 ]]), indexer=memmap([ 39890,  39891,  39892, ..., 199997, 199998, 199999]), axis=1, out=array([[-0.70387825,  0.01771368, -0.14068663, ....  1.11458733,
         1.00384929, -0.1903627 ]]), fill_value=nan, mask_info=None, allow_fill=True)
   1466         else:
   1467             out = np.empty(out_shape, dtype=dtype)
   1468 
   1469     func = _get_take_nd_function(arr.ndim, arr.dtype, out.dtype, axis=axis,
   1470                                  mask_info=mask_info)
-> 1471     func(arr, indexer, out, fill_value)
        func = <built-in function take_2d_axis1_float64_float64>
        arr = memmap([[-0.26666035, -0.23195333, -1.14554448, ... 0.064569  ,
         -0.07119485, -0.1903627 ]])
        indexer = memmap([ 39890,  39891,  39892, ..., 199997, 199998, 199999])
        out = array([[-0.70387825,  0.01771368, -0.14068663, ....  1.11458733,
         1.00384929, -0.1903627 ]])
        fill_value = nan
   1472 
   1473     if flip_order:
   1474         out = out.T
   1475     return out

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/_libs/algos.cpython-36m-darwin.so in pandas._libs.algos.take_2d_axis1_float64_float64 (pandas/_libs/algos.c:111160)()
   4629 
   4630 
   4631 
   4632 
   4633 
-> 4634 
   4635 
   4636 
   4637 
   4638 

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/_libs/algos.cpython-36m-darwin.so in View.MemoryView.memoryview_cwrapper (pandas/_libs/algos.c:124730)()
    639 
    640 
    641 
    642 
    643 
--> 644 
    645 
    646 
    647 
    648 

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/_libs/algos.cpython-36m-darwin.so in View.MemoryView.memoryview.__cinit__ (pandas/_libs/algos.c:120965)()
    340 
    341 
    342 
    343 
    344 
--> 345 
    346 
    347 
    348 
    349 

ValueError: buffer source array is read-only
___________________________________________________________________________

#  Example 2: Fails

First we attempt to get rid of the string columns (by setting the list `string_features` to be empty) and see if that fixes anything. It doesn't. Furthermore, a glance at the output shows that `fit` and `transform` are never called. 



In [4]:
x,y = make_classification(n_samples=200000,n_features=5)

numeric_features = ['x1','x2','x3','x4','x5']
string_features = []


df = pd.DataFrame(data=x,columns=numeric_features)
df['category'] = 'a'

base_clf = RandomForestClassifier(n_jobs=4)
param_grid = {'clf__n_estimators':[10,100]}

pipeline = Pipeline([
        ('feature_encoder',DataFrame_Encoder()),
        ('clf',base_clf)
])
pipeline.set_params(feature_encoder__categorical_cols_=string_features, feature_encoder__numeric_cols_=numeric_features)

clf = GridSearchCV(pipeline, param_grid,cv=5,n_jobs=2,verbose=1)

clf.fit(df,y)


__init__ called
Fitting 5 folds for each of 2 candidates, totalling 10 fits
__init__ called
__init__ called
__init__ called
__init__ called
__init__ called




JoblibValueError: JoblibValueError
___________________________________________________________________________
Multiprocessing exception:
...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/runpy.py in _run_module_as_main(mod_name='ipykernel_launcher', alter_argv=1)
    188         sys.exit(msg)
    189     main_globals = sys.modules["__main__"].__dict__
    190     if alter_argv:
    191         sys.argv[0] = mod_spec.origin
    192     return _run_code(code, main_globals, None,
--> 193                      "__main__", mod_spec)
        mod_spec = ModuleSpec(name='ipykernel_launcher', loader=<_f...b/python3.6/site-packages/ipykernel_launcher.py')
    194 
    195 def run_module(mod_name, init_globals=None,
    196                run_name=None, alter_sys=False):
    197     """Execute a module's code without importing it

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/runpy.py in _run_code(code=<code object <module> at 0x10e8bd030, file "/Use...3.6/site-packages/ipykernel_launcher.py", line 5>, run_globals={'__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__cached__': '/Users/gstoddard/anaconda/envs/standard_py3_env/...ges/__pycache__/ipykernel_launcher.cpython-36.pyc', '__doc__': 'Entry point for launching an IPython kernel.\n\nTh...orts until\nafter removing the cwd from sys.path.\n', '__file__': '/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/ipykernel_launcher.py', '__loader__': <_frozen_importlib_external.SourceFileLoader object>, '__name__': '__main__', '__package__': '', '__spec__': ModuleSpec(name='ipykernel_launcher', loader=<_f...b/python3.6/site-packages/ipykernel_launcher.py'), 'app': <module 'ipykernel.kernelapp' from '/Users/gstod.../python3.6/site-packages/ipykernel/kernelapp.py'>, ...}, init_globals=None, mod_name='__main__', mod_spec=ModuleSpec(name='ipykernel_launcher', loader=<_f...b/python3.6/site-packages/ipykernel_launcher.py'), pkg_name='', script_name=None)
     80                        __cached__ = cached,
     81                        __doc__ = None,
     82                        __loader__ = loader,
     83                        __package__ = pkg_name,
     84                        __spec__ = mod_spec)
---> 85     exec(code, run_globals)
        code = <code object <module> at 0x10e8bd030, file "/Use...3.6/site-packages/ipykernel_launcher.py", line 5>
        run_globals = {'__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__cached__': '/Users/gstoddard/anaconda/envs/standard_py3_env/...ges/__pycache__/ipykernel_launcher.cpython-36.pyc', '__doc__': 'Entry point for launching an IPython kernel.\n\nTh...orts until\nafter removing the cwd from sys.path.\n', '__file__': '/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/ipykernel_launcher.py', '__loader__': <_frozen_importlib_external.SourceFileLoader object>, '__name__': '__main__', '__package__': '', '__spec__': ModuleSpec(name='ipykernel_launcher', loader=<_f...b/python3.6/site-packages/ipykernel_launcher.py'), 'app': <module 'ipykernel.kernelapp' from '/Users/gstod.../python3.6/site-packages/ipykernel/kernelapp.py'>, ...}
     86     return run_globals
     87 
     88 def _run_module_code(code, init_globals=None,
     89                     mod_name=None, mod_spec=None,

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/ipykernel_launcher.py in <module>()
     11     # This is added back by InteractiveShellApp.init_path()
     12     if sys.path[0] == '':
     13         del sys.path[0]
     14 
     15     from ipykernel import kernelapp as app
---> 16     app.launch_new_instance()
     17 
     18 
     19 
     20 

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/traitlets/config/application.py in launch_instance(cls=<class 'ipykernel.kernelapp.IPKernelApp'>, argv=None, **kwargs={})
    653 
    654         If a global instance already exists, this reinitializes and starts it
    655         """
    656         app = cls.instance(**kwargs)
    657         app.initialize(argv)
--> 658         app.start()
        app.start = <bound method IPKernelApp.start of <ipykernel.kernelapp.IPKernelApp object>>
    659 
    660 #-----------------------------------------------------------------------------
    661 # utility functions, for convenience
    662 #-----------------------------------------------------------------------------

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/ipykernel/kernelapp.py in start(self=<ipykernel.kernelapp.IPKernelApp object>)
    472             return self.subapp.start()
    473         if self.poller is not None:
    474             self.poller.start()
    475         self.kernel.start()
    476         try:
--> 477             ioloop.IOLoop.instance().start()
    478         except KeyboardInterrupt:
    479             pass
    480 
    481 launch_new_instance = IPKernelApp.launch_instance

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/zmq/eventloop/ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
    172             )
    173         return loop
    174     
    175     def start(self):
    176         try:
--> 177             super(ZMQIOLoop, self).start()
        self.start = <bound method ZMQIOLoop.start of <zmq.eventloop.ioloop.ZMQIOLoop object>>
    178         except ZMQError as e:
    179             if e.errno == ETERM:
    180                 # quietly return on ETERM
    181                 pass

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/tornado/ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
    883                 self._events.update(event_pairs)
    884                 while self._events:
    885                     fd, events = self._events.popitem()
    886                     try:
    887                         fd_obj, handler_func = self._handlers[fd]
--> 888                         handler_func(fd_obj, events)
        handler_func = <function wrap.<locals>.null_wrapper>
        fd_obj = <zmq.sugar.socket.Socket object>
        events = 1
    889                     except (OSError, IOError) as e:
    890                         if errno_from_exception(e) == errno.EPIPE:
    891                             # Happens when the client closes the connection
    892                             pass

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/tornado/stack_context.py in null_wrapper(*args=(<zmq.sugar.socket.Socket object>, 1), **kwargs={})
    272         # Fast path when there are no active contexts.
    273         def null_wrapper(*args, **kwargs):
    274             try:
    275                 current_state = _state.contexts
    276                 _state.contexts = cap_contexts[0]
--> 277                 return fn(*args, **kwargs)
        args = (<zmq.sugar.socket.Socket object>, 1)
        kwargs = {}
    278             finally:
    279                 _state.contexts = current_state
    280         null_wrapper._wrapped = True
    281         return null_wrapper

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py in _handle_events(self=<zmq.eventloop.zmqstream.ZMQStream object>, fd=<zmq.sugar.socket.Socket object>, events=1)
    435             # dispatch events:
    436             if events & IOLoop.ERROR:
    437                 gen_log.error("got POLLERR event on ZMQStream, which doesn't make sense")
    438                 return
    439             if events & IOLoop.READ:
--> 440                 self._handle_recv()
        self._handle_recv = <bound method ZMQStream._handle_recv of <zmq.eventloop.zmqstream.ZMQStream object>>
    441                 if not self.socket:
    442                     return
    443             if events & IOLoop.WRITE:
    444                 self._handle_send()

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py in _handle_recv(self=<zmq.eventloop.zmqstream.ZMQStream object>)
    467                 gen_log.error("RECV Error: %s"%zmq.strerror(e.errno))
    468         else:
    469             if self._recv_callback:
    470                 callback = self._recv_callback
    471                 # self._recv_callback = None
--> 472                 self._run_callback(callback, msg)
        self._run_callback = <bound method ZMQStream._run_callback of <zmq.eventloop.zmqstream.ZMQStream object>>
        callback = <function wrap.<locals>.null_wrapper>
        msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
    473                 
    474         # self.update_state()
    475         
    476 

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py in _run_callback(self=<zmq.eventloop.zmqstream.ZMQStream object>, callback=<function wrap.<locals>.null_wrapper>, *args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
    409         close our socket."""
    410         try:
    411             # Use a NullContext to ensure that all StackContexts are run
    412             # inside our blanket exception handler rather than outside.
    413             with stack_context.NullContext():
--> 414                 callback(*args, **kwargs)
        callback = <function wrap.<locals>.null_wrapper>
        args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
        kwargs = {}
    415         except:
    416             gen_log.error("Uncaught exception, closing connection.",
    417                           exc_info=True)
    418             # Close the socket on an uncaught exception from a user callback

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/tornado/stack_context.py in null_wrapper(*args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
    272         # Fast path when there are no active contexts.
    273         def null_wrapper(*args, **kwargs):
    274             try:
    275                 current_state = _state.contexts
    276                 _state.contexts = cap_contexts[0]
--> 277                 return fn(*args, **kwargs)
        args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
        kwargs = {}
    278             finally:
    279                 _state.contexts = current_state
    280         null_wrapper._wrapped = True
    281         return null_wrapper

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/ipykernel/kernelbase.py in dispatcher(msg=[<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>])
    278         if self.control_stream:
    279             self.control_stream.on_recv(self.dispatch_control, copy=False)
    280 
    281         def make_dispatcher(stream):
    282             def dispatcher(msg):
--> 283                 return self.dispatch_shell(stream, msg)
        msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
    284             return dispatcher
    285 
    286         for s in self.shell_streams:
    287             s.on_recv(make_dispatcher(s), copy=False)

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/ipykernel/kernelbase.py in dispatch_shell(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, msg={'buffers': [], 'content': {'allow_stdin': True, 'code': 'x,y = make_classification(n_samples=200000,n_fea...ram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)\n', 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': datetime.datetime(2017, 8, 2, 22, 31, 52, 510190, tzinfo=tzutc()), 'msg_id': 'D98F82F152284DFFA8552ADFF17725A4', 'msg_type': 'execute_request', 'session': 'A09FD675E3EA46CB837CBE1D3E75F32E', 'username': 'username', 'version': '5.0'}, 'metadata': {}, 'msg_id': 'D98F82F152284DFFA8552ADFF17725A4', 'msg_type': 'execute_request', 'parent_header': {}})
    230             self.log.warn("Unknown message type: %r", msg_type)
    231         else:
    232             self.log.debug("%s: %s", msg_type, msg)
    233             self.pre_handler_hook()
    234             try:
--> 235                 handler(stream, idents, msg)
        handler = <bound method Kernel.execute_request of <ipykernel.ipkernel.IPythonKernel object>>
        stream = <zmq.eventloop.zmqstream.ZMQStream object>
        idents = [b'A09FD675E3EA46CB837CBE1D3E75F32E']
        msg = {'buffers': [], 'content': {'allow_stdin': True, 'code': 'x,y = make_classification(n_samples=200000,n_fea...ram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)\n', 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': datetime.datetime(2017, 8, 2, 22, 31, 52, 510190, tzinfo=tzutc()), 'msg_id': 'D98F82F152284DFFA8552ADFF17725A4', 'msg_type': 'execute_request', 'session': 'A09FD675E3EA46CB837CBE1D3E75F32E', 'username': 'username', 'version': '5.0'}, 'metadata': {}, 'msg_id': 'D98F82F152284DFFA8552ADFF17725A4', 'msg_type': 'execute_request', 'parent_header': {}}
    236             except Exception:
    237                 self.log.error("Exception in message handler:", exc_info=True)
    238             finally:
    239                 self.post_handler_hook()

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/ipykernel/kernelbase.py in execute_request(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, ident=[b'A09FD675E3EA46CB837CBE1D3E75F32E'], parent={'buffers': [], 'content': {'allow_stdin': True, 'code': 'x,y = make_classification(n_samples=200000,n_fea...ram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)\n', 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': datetime.datetime(2017, 8, 2, 22, 31, 52, 510190, tzinfo=tzutc()), 'msg_id': 'D98F82F152284DFFA8552ADFF17725A4', 'msg_type': 'execute_request', 'session': 'A09FD675E3EA46CB837CBE1D3E75F32E', 'username': 'username', 'version': '5.0'}, 'metadata': {}, 'msg_id': 'D98F82F152284DFFA8552ADFF17725A4', 'msg_type': 'execute_request', 'parent_header': {}})
    394         if not silent:
    395             self.execution_count += 1
    396             self._publish_execute_input(code, parent, self.execution_count)
    397 
    398         reply_content = self.do_execute(code, silent, store_history,
--> 399                                         user_expressions, allow_stdin)
        user_expressions = {}
        allow_stdin = True
    400 
    401         # Flush output before sending the reply.
    402         sys.stdout.flush()
    403         sys.stderr.flush()

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/ipykernel/ipkernel.py in do_execute(self=<ipykernel.ipkernel.IPythonKernel object>, code='x,y = make_classification(n_samples=200000,n_fea...ram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)\n', silent=False, store_history=True, user_expressions={}, allow_stdin=True)
    191 
    192         self._forward_input(allow_stdin)
    193 
    194         reply_content = {}
    195         try:
--> 196             res = shell.run_cell(code, store_history=store_history, silent=silent)
        res = undefined
        shell.run_cell = <bound method ZMQInteractiveShell.run_cell of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        code = 'x,y = make_classification(n_samples=200000,n_fea...ram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)\n'
        store_history = True
        silent = False
    197         finally:
    198             self._restore_input()
    199 
    200         if res.error_before_exec is not None:

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/ipykernel/zmqshell.py in run_cell(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, *args=('x,y = make_classification(n_samples=200000,n_fea...ram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)\n',), **kwargs={'silent': False, 'store_history': True})
    528             )
    529         self.payload_manager.write_payload(payload)
    530 
    531     def run_cell(self, *args, **kwargs):
    532         self._last_traceback = None
--> 533         return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
        self.run_cell = <bound method ZMQInteractiveShell.run_cell of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        args = ('x,y = make_classification(n_samples=200000,n_fea...ram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)\n',)
        kwargs = {'silent': False, 'store_history': True}
    534 
    535     def _showtraceback(self, etype, evalue, stb):
    536         # try to preserve ordering of tracebacks and print statements
    537         sys.stdout.flush()

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/IPython/core/interactiveshell.py in run_cell(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, raw_cell='x,y = make_classification(n_samples=200000,n_fea...ram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)\n', store_history=True, silent=False, shell_futures=True)
   2678                 self.displayhook.exec_result = result
   2679 
   2680                 # Execute the user code
   2681                 interactivity = "none" if silent else self.ast_node_interactivity
   2682                 has_raised = self.run_ast_nodes(code_ast.body, cell_name,
-> 2683                    interactivity=interactivity, compiler=compiler, result=result)
        interactivity = 'last_expr'
        compiler = <IPython.core.compilerop.CachingCompiler object>
   2684                 
   2685                 self.last_execution_succeeded = not has_raised
   2686 
   2687                 # Reset this so later displayed values do not modify the

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/IPython/core/interactiveshell.py in run_ast_nodes(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, nodelist=[<_ast.Assign object>, <_ast.Assign object>, <_ast.Assign object>, <_ast.Assign object>, <_ast.Assign object>, <_ast.Assign object>, <_ast.Assign object>, <_ast.Assign object>, <_ast.Expr object>, <_ast.Assign object>, <_ast.Expr object>], cell_name='<ipython-input-4-a373d1acd0e6>', interactivity='last', compiler=<IPython.core.compilerop.CachingCompiler object>, result=<ExecutionResult object at 1196c4390, execution_..._before_exec=None error_in_exec=None result=None>)
   2788                     return True
   2789 
   2790             for i, node in enumerate(to_run_interactive):
   2791                 mod = ast.Interactive([node])
   2792                 code = compiler(mod, cell_name, "single")
-> 2793                 if self.run_code(code, result):
        self.run_code = <bound method InteractiveShell.run_code of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        code = <code object <module> at 0x1195a45d0, file "<ipython-input-4-a373d1acd0e6>", line 21>
        result = <ExecutionResult object at 1196c4390, execution_..._before_exec=None error_in_exec=None result=None>
   2794                     return True
   2795 
   2796             # Flush softspace
   2797             if softspace(sys.stdout, 0):

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/IPython/core/interactiveshell.py in run_code(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, code_obj=<code object <module> at 0x1195a45d0, file "<ipython-input-4-a373d1acd0e6>", line 21>, result=<ExecutionResult object at 1196c4390, execution_..._before_exec=None error_in_exec=None result=None>)
   2842         outflag = True  # happens in more places, so it's easier as default
   2843         try:
   2844             try:
   2845                 self.hooks.pre_run_code_hook()
   2846                 #rprint('Running code', repr(code_obj)) # dbg
-> 2847                 exec(code_obj, self.user_global_ns, self.user_ns)
        code_obj = <code object <module> at 0x1195a45d0, file "<ipython-input-4-a373d1acd0e6>", line 21>
        self.user_global_ns = {'BaseEstimator': <class 'sklearn.base.BaseEstimator'>, 'DataFrame_Encoder': <class '__main__.DataFrame_Encoder'>, 'DictVectorizer': <class 'sklearn.feature_extraction.dict_vectorizer.DictVectorizer'>, 'GridSearchCV': <class 'sklearn.model_selection._search.GridSearchCV'>, 'In': ['', 'import pandas as pd\n\nfrom sklearn.model_selectio...raction import DictVectorizer\n\nimport numpy as np', 'class DataFrame_Encoder(BaseEstimator, Transform..., categorical_df],axis=1)\n        return new_data', 'x,y = make_classification(n_samples=200000,n_fea...aram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)', 'x,y = make_classification(n_samples=200000,n_fea...aram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)'], 'Out': {}, 'Pipeline': <class 'sklearn.pipeline.Pipeline'>, 'RandomForestClassifier': <class 'sklearn.ensemble.forest.RandomForestClassifier'>, 'TransformerMixin': <class 'sklearn.base.TransformerMixin'>, '_': '', ...}
        self.user_ns = {'BaseEstimator': <class 'sklearn.base.BaseEstimator'>, 'DataFrame_Encoder': <class '__main__.DataFrame_Encoder'>, 'DictVectorizer': <class 'sklearn.feature_extraction.dict_vectorizer.DictVectorizer'>, 'GridSearchCV': <class 'sklearn.model_selection._search.GridSearchCV'>, 'In': ['', 'import pandas as pd\n\nfrom sklearn.model_selectio...raction import DictVectorizer\n\nimport numpy as np', 'class DataFrame_Encoder(BaseEstimator, Transform..., categorical_df],axis=1)\n        return new_data', 'x,y = make_classification(n_samples=200000,n_fea...aram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)', 'x,y = make_classification(n_samples=200000,n_fea...aram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)'], 'Out': {}, 'Pipeline': <class 'sklearn.pipeline.Pipeline'>, 'RandomForestClassifier': <class 'sklearn.ensemble.forest.RandomForestClassifier'>, 'TransformerMixin': <class 'sklearn.base.TransformerMixin'>, '_': '', ...}
   2848             finally:
   2849                 # Reset our crash handler in place
   2850                 sys.excepthook = old_excepthook
   2851         except SystemExit as e:

...........................................................................
/Users/gstoddard/sklearn_bug/<ipython-input-4-a373d1acd0e6> in <module>()
     16 ])
     17 pipeline.set_params(feature_encoder__categorical_cols_=string_features, feature_encoder__numeric_cols_=numeric_features)
     18 
     19 clf = GridSearchCV(pipeline, param_grid,cv=5,n_jobs=2,verbose=1)
     20 
---> 21 clf.fit(df,y)
     22 
     23 
     24 
     25 

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/sklearn/model_selection/_search.py in fit(self=GridSearchCV(cv=5, error_score='raise',
       e...train_score=True,
       scoring=None, verbose=1), X=              x1        x2        x3        x4  ...254  1.403188        a

[200000 rows x 6 columns], y=array([0, 1, 0, ..., 1, 0, 1]), groups=None)
    940 
    941         groups : array-like, with shape (n_samples,), optional
    942             Group labels for the samples used while splitting the dataset into
    943             train/test set.
    944         """
--> 945         return self._fit(X, y, groups, ParameterGrid(self.param_grid))
        self._fit = <bound method BaseSearchCV._fit of GridSearchCV(...rain_score=True,
       scoring=None, verbose=1)>
        X =               x1        x2        x3        x4  ...254  1.403188        a

[200000 rows x 6 columns]
        y = array([0, 1, 0, ..., 1, 0, 1])
        groups = None
        self.param_grid = {'clf__n_estimators': [10, 100]}
    946 
    947 
    948 class RandomizedSearchCV(BaseSearchCV):
    949     """Randomized search on hyper parameters.

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/sklearn/model_selection/_search.py in _fit(self=GridSearchCV(cv=5, error_score='raise',
       e...train_score=True,
       scoring=None, verbose=1), X=              x1        x2        x3        x4  ...254  1.403188        a

[200000 rows x 6 columns], y=array([0, 1, 0, ..., 1, 0, 1]), groups=None, parameter_iterable=<sklearn.model_selection._search.ParameterGrid object>)
    559                                   fit_params=self.fit_params,
    560                                   return_train_score=self.return_train_score,
    561                                   return_n_test_samples=True,
    562                                   return_times=True, return_parameters=True,
    563                                   error_score=self.error_score)
--> 564           for parameters in parameter_iterable
        parameters = undefined
        parameter_iterable = <sklearn.model_selection._search.ParameterGrid object>
    565           for train, test in cv_iter)
    566 
    567         # if one choose to see train score, "out" will contain train score info
    568         if self.return_train_score:

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=Parallel(n_jobs=2), iterable=<generator object BaseSearchCV._fit.<locals>.<genexpr>>)
    763             if pre_dispatch == "all" or n_jobs == 1:
    764                 # The iterable was consumed all at once by the above for loop.
    765                 # No need to wait for async callbacks to trigger to
    766                 # consumption.
    767                 self._iterating = False
--> 768             self.retrieve()
        self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=2)>
    769             # Make sure that we get a last message telling us we are done
    770             elapsed_time = time.time() - self._start_time
    771             self._print('Done %3i out of %3i | elapsed: %s finished',
    772                         (len(self._output), len(self._output),

---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
ValueError                                         Wed Aug  2 18:31:52 2017
PID: 35368Python 3.6.1: /Users/gstoddard/anaconda/envs/standard_py3_env/bin/python
...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=<sklearn.externals.joblib.parallel.BatchedCalls object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        self.items = [(<function _fit_and_score>, (Pipeline(steps=[('feature_encoder', DataFrame_En...None,
            verbose=0, warm_start=False))]),               x1        x2        x3        x4  ...254  1.403188        a

[200000 rows x 6 columns], memmap([0, 1, 0, ..., 1, 0, 1]), <function _passthrough_scorer>, memmap([ 39820,  39823,  39826, ..., 199997, 199998, 199999]), array([    0,     1,     2, ..., 40167, 40169, 40171]), 1, {'clf__n_estimators': 10}), {'error_score': 'raise', 'fit_params': {}, 'return_n_test_samples': True, 'return_parameters': True, 'return_times': True, 'return_train_score': True})]
    132 
    133     def __len__(self):
    134         return self._size
    135 

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in <listcomp>(.0=<list_iterator object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <function _fit_and_score>
        args = (Pipeline(steps=[('feature_encoder', DataFrame_En...None,
            verbose=0, warm_start=False))]),               x1        x2        x3        x4  ...254  1.403188        a

[200000 rows x 6 columns], memmap([0, 1, 0, ..., 1, 0, 1]), <function _passthrough_scorer>, memmap([ 39820,  39823,  39826, ..., 199997, 199998, 199999]), array([    0,     1,     2, ..., 40167, 40169, 40171]), 1, {'clf__n_estimators': 10})
        kwargs = {'error_score': 'raise', 'fit_params': {}, 'return_n_test_samples': True, 'return_parameters': True, 'return_times': True, 'return_train_score': True}
    132 
    133     def __len__(self):
    134         return self._size
    135 

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/sklearn/model_selection/_validation.py in _fit_and_score(estimator=Pipeline(steps=[('feature_encoder', DataFrame_En...None,
            verbose=0, warm_start=False))]), X=              x1        x2        x3        x4  ...254  1.403188        a

[200000 rows x 6 columns], y=memmap([0, 1, 0, ..., 1, 0, 1]), scorer=<function _passthrough_scorer>, train=memmap([ 39820,  39823,  39826, ..., 199997, 199998, 199999]), test=array([    0,     1,     2, ..., 40167, 40169, 40171]), verbose=1, parameters={'clf__n_estimators': 10}, fit_params={}, return_train_score=True, return_parameters=True, return_n_test_samples=True, return_times=True, error_score='raise')
    226     if parameters is not None:
    227         estimator.set_params(**parameters)
    228 
    229     start_time = time.time()
    230 
--> 231     X_train, y_train = _safe_split(estimator, X, y, train)
        X_train = undefined
        y_train = undefined
        estimator = Pipeline(steps=[('feature_encoder', DataFrame_En...None,
            verbose=0, warm_start=False))])
        X =               x1        x2        x3        x4  ...254  1.403188        a

[200000 rows x 6 columns]
        y = memmap([0, 1, 0, ..., 1, 0, 1])
        train = memmap([ 39820,  39823,  39826, ..., 199997, 199998, 199999])
    232     X_test, y_test = _safe_split(estimator, X, y, test, train)
    233 
    234     try:
    235         if y_train is None:

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/sklearn/utils/metaestimators.py in _safe_split(estimator=Pipeline(steps=[('feature_encoder', DataFrame_En...None,
            verbose=0, warm_start=False))]), X=              x1        x2        x3        x4  ...254  1.403188        a

[200000 rows x 6 columns], y=memmap([0, 1, 0, ..., 1, 0, 1]), indices=memmap([ 39820,  39823,  39826, ..., 199997, 199998, 199999]), train_indices=None)
    103             if train_indices is None:
    104                 X_subset = X[np.ix_(indices, indices)]
    105             else:
    106                 X_subset = X[np.ix_(indices, train_indices)]
    107         else:
--> 108             X_subset = safe_indexing(X, indices)
        X_subset = undefined
        X =               x1        x2        x3        x4  ...254  1.403188        a

[200000 rows x 6 columns]
        indices = memmap([ 39820,  39823,  39826, ..., 199997, 199998, 199999])
    109 
    110     if y is not None:
    111         y_subset = safe_indexing(y, indices)
    112     else:

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/sklearn/utils/__init__.py in safe_indexing(X=              x1        x2        x3        x4  ...254  1.403188        a

[200000 rows x 6 columns], indices=memmap([ 39820,  39823,  39826, ..., 199997, 199998, 199999]))
    100         except ValueError:
    101             # Cython typed memoryviews internally used in pandas do not support
    102             # readonly buffers.
    103             warnings.warn("Copying input dataframe for slicing.",
    104                           DataConversionWarning)
--> 105             return X.copy().iloc[indices]
        X.copy.iloc = undefined
        indices = memmap([ 39820,  39823,  39826, ..., 199997, 199998, 199999])
    106     elif hasattr(X, "shape"):
    107         if hasattr(X, 'take') and (hasattr(indices, 'dtype') and
    108                                    indices.dtype.kind == 'i'):
    109             # This is often substantially faster than X[indices]

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/core/indexing.py in __getitem__(self=<pandas.core.indexing._iLocIndexer object>, key=memmap([ 39820,  39823,  39826, ..., 199997, 199998, 199999]))
   1323             except (KeyError, IndexError):
   1324                 pass
   1325             return self._getitem_tuple(key)
   1326         else:
   1327             key = com._apply_if_callable(key, self.obj)
-> 1328             return self._getitem_axis(key, axis=0)
        self._getitem_axis = <bound method _iLocIndexer._getitem_axis of <pandas.core.indexing._iLocIndexer object>>
        key = memmap([ 39820,  39823,  39826, ..., 199997, 199998, 199999])
   1329 
   1330     def _is_scalar_access(self, key):
   1331         raise NotImplementedError()
   1332 

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_axis(self=<pandas.core.indexing._iLocIndexer object>, key=memmap([ 39820,  39823,  39826, ..., 199997, 199998, 199999]), axis=0)
   1733             self._has_valid_type(key, axis)
   1734             return self._getbool_axis(key, axis=axis)
   1735 
   1736         # a list of integers
   1737         elif is_list_like_indexer(key):
-> 1738             return self._get_list_axis(key, axis=axis)
        self._get_list_axis = <bound method _iLocIndexer._get_list_axis of <pandas.core.indexing._iLocIndexer object>>
        key = memmap([ 39820,  39823,  39826, ..., 199997, 199998, 199999])
        axis = 0
   1739 
   1740         # a single integer
   1741         else:
   1742             key = self._convert_scalar_indexer(key, axis)

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/core/indexing.py in _get_list_axis(self=<pandas.core.indexing._iLocIndexer object>, key=memmap([ 39820,  39823,  39826, ..., 199997, 199998, 199999]), axis=0)
   1710         Returns
   1711         -------
   1712         Series object
   1713         """
   1714         try:
-> 1715             return self.obj.take(key, axis=axis, convert=False)
        self.obj.take = <bound method NDFrame.take of               x1  ...54  1.403188        a

[200000 rows x 6 columns]>
        key = memmap([ 39820,  39823,  39826, ..., 199997, 199998, 199999])
        axis = 0
   1716         except IndexError:
   1717             # re-raise with different error message
   1718             raise IndexError("positional indexers are out-of-bounds")
   1719 

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/core/generic.py in take(self=              x1        x2        x3        x4  ...254  1.403188        a

[200000 rows x 6 columns], indices=memmap([ 39820,  39823,  39826, ..., 199997, 199998, 199999]), axis=0, convert=False, is_copy=True, **kwargs={})
   1923         """
   1924         nv.validate_take(tuple(), kwargs)
   1925         self._consolidate_inplace()
   1926         new_data = self._data.take(indices,
   1927                                    axis=self._get_block_manager_axis(axis),
-> 1928                                    convert=True, verify=True)
        convert = False
   1929         result = self._constructor(new_data).__finalize__(self)
   1930 
   1931         # maybe set copy if we didn't actually change the index
   1932         if is_copy:

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/core/internals.py in take(self=BlockManager
Items: Index(['x1', 'x2', 'x3', 'x4...tBlock: slice(5, 6, 1), 1 x 200000, dtype: object, indexer=memmap([ 39820,  39823,  39826, ..., 199997, 199998, 199999]), axis=1, verify=True, convert=True)
   4006                 raise Exception('Indices must be nonzero and less than '
   4007                                 'the axis length')
   4008 
   4009         new_labels = self.axes[axis].take(indexer)
   4010         return self.reindex_indexer(new_axis=new_labels, indexer=indexer,
-> 4011                                     axis=axis, allow_dups=True)
        axis = 1
   4012 
   4013     def merge(self, other, lsuffix='', rsuffix=''):
   4014         if not self._is_indexed_like(other):
   4015             raise AssertionError('Must have same axes to merge managers')

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/core/internals.py in reindex_indexer(self=BlockManager
Items: Index(['x1', 'x2', 'x3', 'x4...tBlock: slice(5, 6, 1), 1 x 200000, dtype: object, new_axis=Int64Index([ 39820,  39823,  39826,  39828,  398...199999],
           dtype='int64', length=159999), indexer=memmap([ 39820,  39823,  39826, ..., 199997, 199998, 199999]), axis=1, fill_value=None, allow_dups=True, copy=True)
   3892             new_blocks = self._slice_take_blocks_ax0(indexer,
   3893                                                      fill_tuple=(fill_value,))
   3894         else:
   3895             new_blocks = [blk.take_nd(indexer, axis=axis, fill_tuple=(
   3896                 fill_value if fill_value is not None else blk.fill_value,))
-> 3897                 for blk in self.blocks]
        self.blocks = (FloatBlock: slice(0, 5, 1), 5 x 200000, dtype: float64, ObjectBlock: slice(5, 6, 1), 1 x 200000, dtype: object)
   3898 
   3899         new_axes = list(self.axes)
   3900         new_axes[axis] = new_axis
   3901         return self.__class__(new_blocks, new_axes)

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/core/internals.py in <listcomp>(.0=<tuple_iterator object>)
   3892             new_blocks = self._slice_take_blocks_ax0(indexer,
   3893                                                      fill_tuple=(fill_value,))
   3894         else:
   3895             new_blocks = [blk.take_nd(indexer, axis=axis, fill_tuple=(
   3896                 fill_value if fill_value is not None else blk.fill_value,))
-> 3897                 for blk in self.blocks]
        blk = FloatBlock: slice(0, 5, 1), 5 x 200000, dtype: float64
   3898 
   3899         new_axes = list(self.axes)
   3900         new_axes[axis] = new_axis
   3901         return self.__class__(new_blocks, new_axes)

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/core/internals.py in take_nd(self=FloatBlock: slice(0, 5, 1), 5 x 200000, dtype: float64, indexer=memmap([ 39820,  39823,  39826, ..., 199997, 199998, 199999]), axis=1, new_mgr_locs=None, fill_tuple=(nan,))
   1041             new_values = algos.take_nd(values, indexer, axis=axis,
   1042                                        allow_fill=False)
   1043         else:
   1044             fill_value = fill_tuple[0]
   1045             new_values = algos.take_nd(values, indexer, axis=axis,
-> 1046                                        allow_fill=True, fill_value=fill_value)
        fill_value = nan
   1047 
   1048         if new_mgr_locs is None:
   1049             if axis == 0:
   1050                 slc = lib.indexer_as_slice(indexer)

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/core/algorithms.py in take_nd(arr=memmap([[-0.57082802, -1.01821463,  1.08032314, ... 1.87256242,
         -0.38265788,  1.40318761]]), indexer=memmap([ 39820,  39823,  39826, ..., 199997, 199998, 199999]), axis=1, out=array([[ 0.45680369, -1.04423349,  0.77484853, .... -0.1491669 ,
        -0.31025364,  1.40318761]]), fill_value=nan, mask_info=None, allow_fill=True)
   1466         else:
   1467             out = np.empty(out_shape, dtype=dtype)
   1468 
   1469     func = _get_take_nd_function(arr.ndim, arr.dtype, out.dtype, axis=axis,
   1470                                  mask_info=mask_info)
-> 1471     func(arr, indexer, out, fill_value)
        func = <built-in function take_2d_axis1_float64_float64>
        arr = memmap([[-0.57082802, -1.01821463,  1.08032314, ... 1.87256242,
         -0.38265788,  1.40318761]])
        indexer = memmap([ 39820,  39823,  39826, ..., 199997, 199998, 199999])
        out = array([[ 0.45680369, -1.04423349,  0.77484853, .... -0.1491669 ,
        -0.31025364,  1.40318761]])
        fill_value = nan
   1472 
   1473     if flip_order:
   1474         out = out.T
   1475     return out

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/_libs/algos.cpython-36m-darwin.so in pandas._libs.algos.take_2d_axis1_float64_float64 (pandas/_libs/algos.c:111160)()
   4629 
   4630 
   4631 
   4632 
   4633 
-> 4634 
   4635 
   4636 
   4637 
   4638 

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/_libs/algos.cpython-36m-darwin.so in View.MemoryView.memoryview_cwrapper (pandas/_libs/algos.c:124730)()
    639 
    640 
    641 
    642 
    643 
--> 644 
    645 
    646 
    647 
    648 

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/_libs/algos.cpython-36m-darwin.so in View.MemoryView.memoryview.__cinit__ (pandas/_libs/algos.c:120965)()
    340 
    341 
    342 
    343 
    344 
--> 345 
    346 
    347 
    348 
    349 

ValueError: buffer source array is read-only
___________________________________________________________________________

# Example 3: Fails

We now use a different encoder that simply drops any Object columns from a Dataframe. Even with this encoder, it fails. 

In [5]:
class DropString_Encoder(BaseEstimator, TransformerMixin):
    
    def fit(self,df,y=None):
        return self
    
    def transform(self,df):
        non_string_cols = df.select_dtypes(exclude=[object]).columns.values
        return df[non_string_cols]


    
x,y = make_classification(n_samples=200000,n_features=5)

numeric_features = ['x1','x2','x3','x4','x5']
string_features = ['category']

df = pd.DataFrame(data=x,columns=numeric_features)
df['category'] = 'a'

base_clf = RandomForestClassifier(n_jobs=4)
param_grid = {'clf__n_estimators':[10,100]}

pipeline = Pipeline([
        ('feature_encoder',DropString_Encoder()),
        ('clf',base_clf)
])

clf = GridSearchCV(pipeline, param_grid,cv=5,n_jobs=2,verbose=1)

clf.fit(df,y)


Fitting 5 folds for each of 2 candidates, totalling 10 fits




JoblibValueError: JoblibValueError
___________________________________________________________________________
Multiprocessing exception:
...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/runpy.py in _run_module_as_main(mod_name='ipykernel_launcher', alter_argv=1)
    188         sys.exit(msg)
    189     main_globals = sys.modules["__main__"].__dict__
    190     if alter_argv:
    191         sys.argv[0] = mod_spec.origin
    192     return _run_code(code, main_globals, None,
--> 193                      "__main__", mod_spec)
        mod_spec = ModuleSpec(name='ipykernel_launcher', loader=<_f...b/python3.6/site-packages/ipykernel_launcher.py')
    194 
    195 def run_module(mod_name, init_globals=None,
    196                run_name=None, alter_sys=False):
    197     """Execute a module's code without importing it

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/runpy.py in _run_code(code=<code object <module> at 0x10e8bd030, file "/Use...3.6/site-packages/ipykernel_launcher.py", line 5>, run_globals={'__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__cached__': '/Users/gstoddard/anaconda/envs/standard_py3_env/...ges/__pycache__/ipykernel_launcher.cpython-36.pyc', '__doc__': 'Entry point for launching an IPython kernel.\n\nTh...orts until\nafter removing the cwd from sys.path.\n', '__file__': '/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/ipykernel_launcher.py', '__loader__': <_frozen_importlib_external.SourceFileLoader object>, '__name__': '__main__', '__package__': '', '__spec__': ModuleSpec(name='ipykernel_launcher', loader=<_f...b/python3.6/site-packages/ipykernel_launcher.py'), 'app': <module 'ipykernel.kernelapp' from '/Users/gstod.../python3.6/site-packages/ipykernel/kernelapp.py'>, ...}, init_globals=None, mod_name='__main__', mod_spec=ModuleSpec(name='ipykernel_launcher', loader=<_f...b/python3.6/site-packages/ipykernel_launcher.py'), pkg_name='', script_name=None)
     80                        __cached__ = cached,
     81                        __doc__ = None,
     82                        __loader__ = loader,
     83                        __package__ = pkg_name,
     84                        __spec__ = mod_spec)
---> 85     exec(code, run_globals)
        code = <code object <module> at 0x10e8bd030, file "/Use...3.6/site-packages/ipykernel_launcher.py", line 5>
        run_globals = {'__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__cached__': '/Users/gstoddard/anaconda/envs/standard_py3_env/...ges/__pycache__/ipykernel_launcher.cpython-36.pyc', '__doc__': 'Entry point for launching an IPython kernel.\n\nTh...orts until\nafter removing the cwd from sys.path.\n', '__file__': '/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/ipykernel_launcher.py', '__loader__': <_frozen_importlib_external.SourceFileLoader object>, '__name__': '__main__', '__package__': '', '__spec__': ModuleSpec(name='ipykernel_launcher', loader=<_f...b/python3.6/site-packages/ipykernel_launcher.py'), 'app': <module 'ipykernel.kernelapp' from '/Users/gstod.../python3.6/site-packages/ipykernel/kernelapp.py'>, ...}
     86     return run_globals
     87 
     88 def _run_module_code(code, init_globals=None,
     89                     mod_name=None, mod_spec=None,

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/ipykernel_launcher.py in <module>()
     11     # This is added back by InteractiveShellApp.init_path()
     12     if sys.path[0] == '':
     13         del sys.path[0]
     14 
     15     from ipykernel import kernelapp as app
---> 16     app.launch_new_instance()
     17 
     18 
     19 
     20 

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/traitlets/config/application.py in launch_instance(cls=<class 'ipykernel.kernelapp.IPKernelApp'>, argv=None, **kwargs={})
    653 
    654         If a global instance already exists, this reinitializes and starts it
    655         """
    656         app = cls.instance(**kwargs)
    657         app.initialize(argv)
--> 658         app.start()
        app.start = <bound method IPKernelApp.start of <ipykernel.kernelapp.IPKernelApp object>>
    659 
    660 #-----------------------------------------------------------------------------
    661 # utility functions, for convenience
    662 #-----------------------------------------------------------------------------

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/ipykernel/kernelapp.py in start(self=<ipykernel.kernelapp.IPKernelApp object>)
    472             return self.subapp.start()
    473         if self.poller is not None:
    474             self.poller.start()
    475         self.kernel.start()
    476         try:
--> 477             ioloop.IOLoop.instance().start()
    478         except KeyboardInterrupt:
    479             pass
    480 
    481 launch_new_instance = IPKernelApp.launch_instance

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/zmq/eventloop/ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
    172             )
    173         return loop
    174     
    175     def start(self):
    176         try:
--> 177             super(ZMQIOLoop, self).start()
        self.start = <bound method ZMQIOLoop.start of <zmq.eventloop.ioloop.ZMQIOLoop object>>
    178         except ZMQError as e:
    179             if e.errno == ETERM:
    180                 # quietly return on ETERM
    181                 pass

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/tornado/ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
    883                 self._events.update(event_pairs)
    884                 while self._events:
    885                     fd, events = self._events.popitem()
    886                     try:
    887                         fd_obj, handler_func = self._handlers[fd]
--> 888                         handler_func(fd_obj, events)
        handler_func = <function wrap.<locals>.null_wrapper>
        fd_obj = <zmq.sugar.socket.Socket object>
        events = 1
    889                     except (OSError, IOError) as e:
    890                         if errno_from_exception(e) == errno.EPIPE:
    891                             # Happens when the client closes the connection
    892                             pass

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/tornado/stack_context.py in null_wrapper(*args=(<zmq.sugar.socket.Socket object>, 1), **kwargs={})
    272         # Fast path when there are no active contexts.
    273         def null_wrapper(*args, **kwargs):
    274             try:
    275                 current_state = _state.contexts
    276                 _state.contexts = cap_contexts[0]
--> 277                 return fn(*args, **kwargs)
        args = (<zmq.sugar.socket.Socket object>, 1)
        kwargs = {}
    278             finally:
    279                 _state.contexts = current_state
    280         null_wrapper._wrapped = True
    281         return null_wrapper

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py in _handle_events(self=<zmq.eventloop.zmqstream.ZMQStream object>, fd=<zmq.sugar.socket.Socket object>, events=1)
    435             # dispatch events:
    436             if events & IOLoop.ERROR:
    437                 gen_log.error("got POLLERR event on ZMQStream, which doesn't make sense")
    438                 return
    439             if events & IOLoop.READ:
--> 440                 self._handle_recv()
        self._handle_recv = <bound method ZMQStream._handle_recv of <zmq.eventloop.zmqstream.ZMQStream object>>
    441                 if not self.socket:
    442                     return
    443             if events & IOLoop.WRITE:
    444                 self._handle_send()

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py in _handle_recv(self=<zmq.eventloop.zmqstream.ZMQStream object>)
    467                 gen_log.error("RECV Error: %s"%zmq.strerror(e.errno))
    468         else:
    469             if self._recv_callback:
    470                 callback = self._recv_callback
    471                 # self._recv_callback = None
--> 472                 self._run_callback(callback, msg)
        self._run_callback = <bound method ZMQStream._run_callback of <zmq.eventloop.zmqstream.ZMQStream object>>
        callback = <function wrap.<locals>.null_wrapper>
        msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
    473                 
    474         # self.update_state()
    475         
    476 

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py in _run_callback(self=<zmq.eventloop.zmqstream.ZMQStream object>, callback=<function wrap.<locals>.null_wrapper>, *args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
    409         close our socket."""
    410         try:
    411             # Use a NullContext to ensure that all StackContexts are run
    412             # inside our blanket exception handler rather than outside.
    413             with stack_context.NullContext():
--> 414                 callback(*args, **kwargs)
        callback = <function wrap.<locals>.null_wrapper>
        args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
        kwargs = {}
    415         except:
    416             gen_log.error("Uncaught exception, closing connection.",
    417                           exc_info=True)
    418             # Close the socket on an uncaught exception from a user callback

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/tornado/stack_context.py in null_wrapper(*args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
    272         # Fast path when there are no active contexts.
    273         def null_wrapper(*args, **kwargs):
    274             try:
    275                 current_state = _state.contexts
    276                 _state.contexts = cap_contexts[0]
--> 277                 return fn(*args, **kwargs)
        args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
        kwargs = {}
    278             finally:
    279                 _state.contexts = current_state
    280         null_wrapper._wrapped = True
    281         return null_wrapper

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/ipykernel/kernelbase.py in dispatcher(msg=[<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>])
    278         if self.control_stream:
    279             self.control_stream.on_recv(self.dispatch_control, copy=False)
    280 
    281         def make_dispatcher(stream):
    282             def dispatcher(msg):
--> 283                 return self.dispatch_shell(stream, msg)
        msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
    284             return dispatcher
    285 
    286         for s in self.shell_streams:
    287             s.on_recv(make_dispatcher(s), copy=False)

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/ipykernel/kernelbase.py in dispatch_shell(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, msg={'buffers': [], 'content': {'allow_stdin': True, 'code': 'class DropString_Encoder(BaseEstimator, Transfor...ram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)\n', 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': datetime.datetime(2017, 8, 2, 22, 31, 58, 246994, tzinfo=tzutc()), 'msg_id': '43497E3ACF714AAB98CB24992E51E772', 'msg_type': 'execute_request', 'session': 'A09FD675E3EA46CB837CBE1D3E75F32E', 'username': 'username', 'version': '5.0'}, 'metadata': {}, 'msg_id': '43497E3ACF714AAB98CB24992E51E772', 'msg_type': 'execute_request', 'parent_header': {}})
    230             self.log.warn("Unknown message type: %r", msg_type)
    231         else:
    232             self.log.debug("%s: %s", msg_type, msg)
    233             self.pre_handler_hook()
    234             try:
--> 235                 handler(stream, idents, msg)
        handler = <bound method Kernel.execute_request of <ipykernel.ipkernel.IPythonKernel object>>
        stream = <zmq.eventloop.zmqstream.ZMQStream object>
        idents = [b'A09FD675E3EA46CB837CBE1D3E75F32E']
        msg = {'buffers': [], 'content': {'allow_stdin': True, 'code': 'class DropString_Encoder(BaseEstimator, Transfor...ram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)\n', 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': datetime.datetime(2017, 8, 2, 22, 31, 58, 246994, tzinfo=tzutc()), 'msg_id': '43497E3ACF714AAB98CB24992E51E772', 'msg_type': 'execute_request', 'session': 'A09FD675E3EA46CB837CBE1D3E75F32E', 'username': 'username', 'version': '5.0'}, 'metadata': {}, 'msg_id': '43497E3ACF714AAB98CB24992E51E772', 'msg_type': 'execute_request', 'parent_header': {}}
    236             except Exception:
    237                 self.log.error("Exception in message handler:", exc_info=True)
    238             finally:
    239                 self.post_handler_hook()

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/ipykernel/kernelbase.py in execute_request(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, ident=[b'A09FD675E3EA46CB837CBE1D3E75F32E'], parent={'buffers': [], 'content': {'allow_stdin': True, 'code': 'class DropString_Encoder(BaseEstimator, Transfor...ram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)\n', 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': datetime.datetime(2017, 8, 2, 22, 31, 58, 246994, tzinfo=tzutc()), 'msg_id': '43497E3ACF714AAB98CB24992E51E772', 'msg_type': 'execute_request', 'session': 'A09FD675E3EA46CB837CBE1D3E75F32E', 'username': 'username', 'version': '5.0'}, 'metadata': {}, 'msg_id': '43497E3ACF714AAB98CB24992E51E772', 'msg_type': 'execute_request', 'parent_header': {}})
    394         if not silent:
    395             self.execution_count += 1
    396             self._publish_execute_input(code, parent, self.execution_count)
    397 
    398         reply_content = self.do_execute(code, silent, store_history,
--> 399                                         user_expressions, allow_stdin)
        user_expressions = {}
        allow_stdin = True
    400 
    401         # Flush output before sending the reply.
    402         sys.stdout.flush()
    403         sys.stderr.flush()

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/ipykernel/ipkernel.py in do_execute(self=<ipykernel.ipkernel.IPythonKernel object>, code='class DropString_Encoder(BaseEstimator, Transfor...ram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)\n', silent=False, store_history=True, user_expressions={}, allow_stdin=True)
    191 
    192         self._forward_input(allow_stdin)
    193 
    194         reply_content = {}
    195         try:
--> 196             res = shell.run_cell(code, store_history=store_history, silent=silent)
        res = undefined
        shell.run_cell = <bound method ZMQInteractiveShell.run_cell of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        code = 'class DropString_Encoder(BaseEstimator, Transfor...ram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)\n'
        store_history = True
        silent = False
    197         finally:
    198             self._restore_input()
    199 
    200         if res.error_before_exec is not None:

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/ipykernel/zmqshell.py in run_cell(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, *args=('class DropString_Encoder(BaseEstimator, Transfor...ram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)\n',), **kwargs={'silent': False, 'store_history': True})
    528             )
    529         self.payload_manager.write_payload(payload)
    530 
    531     def run_cell(self, *args, **kwargs):
    532         self._last_traceback = None
--> 533         return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
        self.run_cell = <bound method ZMQInteractiveShell.run_cell of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        args = ('class DropString_Encoder(BaseEstimator, Transfor...ram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)\n',)
        kwargs = {'silent': False, 'store_history': True}
    534 
    535     def _showtraceback(self, etype, evalue, stb):
    536         # try to preserve ordering of tracebacks and print statements
    537         sys.stdout.flush()

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/IPython/core/interactiveshell.py in run_cell(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, raw_cell='class DropString_Encoder(BaseEstimator, Transfor...ram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)\n', store_history=True, silent=False, shell_futures=True)
   2678                 self.displayhook.exec_result = result
   2679 
   2680                 # Execute the user code
   2681                 interactivity = "none" if silent else self.ast_node_interactivity
   2682                 has_raised = self.run_ast_nodes(code_ast.body, cell_name,
-> 2683                    interactivity=interactivity, compiler=compiler, result=result)
        interactivity = 'last_expr'
        compiler = <IPython.core.compilerop.CachingCompiler object>
   2684                 
   2685                 self.last_execution_succeeded = not has_raised
   2686 
   2687                 # Reset this so later displayed values do not modify the

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/IPython/core/interactiveshell.py in run_ast_nodes(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, nodelist=[<_ast.ClassDef object>, <_ast.Assign object>, <_ast.Assign object>, <_ast.Assign object>, <_ast.Assign object>, <_ast.Assign object>, <_ast.Assign object>, <_ast.Assign object>, <_ast.Assign object>, <_ast.Assign object>, <_ast.Expr object>], cell_name='<ipython-input-5-d5101a6a0bf9>', interactivity='last', compiler=<IPython.core.compilerop.CachingCompiler object>, result=<ExecutionResult object at 119632668, execution_..._before_exec=None error_in_exec=None result=None>)
   2788                     return True
   2789 
   2790             for i, node in enumerate(to_run_interactive):
   2791                 mod = ast.Interactive([node])
   2792                 code = compiler(mod, cell_name, "single")
-> 2793                 if self.run_code(code, result):
        self.run_code = <bound method InteractiveShell.run_code of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        code = <code object <module> at 0x1196b6540, file "<ipython-input-5-d5101a6a0bf9>", line 30>
        result = <ExecutionResult object at 119632668, execution_..._before_exec=None error_in_exec=None result=None>
   2794                     return True
   2795 
   2796             # Flush softspace
   2797             if softspace(sys.stdout, 0):

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/IPython/core/interactiveshell.py in run_code(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, code_obj=<code object <module> at 0x1196b6540, file "<ipython-input-5-d5101a6a0bf9>", line 30>, result=<ExecutionResult object at 119632668, execution_..._before_exec=None error_in_exec=None result=None>)
   2842         outflag = True  # happens in more places, so it's easier as default
   2843         try:
   2844             try:
   2845                 self.hooks.pre_run_code_hook()
   2846                 #rprint('Running code', repr(code_obj)) # dbg
-> 2847                 exec(code_obj, self.user_global_ns, self.user_ns)
        code_obj = <code object <module> at 0x1196b6540, file "<ipython-input-5-d5101a6a0bf9>", line 30>
        self.user_global_ns = {'BaseEstimator': <class 'sklearn.base.BaseEstimator'>, 'DataFrame_Encoder': <class '__main__.DataFrame_Encoder'>, 'DictVectorizer': <class 'sklearn.feature_extraction.dict_vectorizer.DictVectorizer'>, 'DropString_Encoder': <class '__main__.DropString_Encoder'>, 'GridSearchCV': <class 'sklearn.model_selection._search.GridSearchCV'>, 'In': ['', 'import pandas as pd\n\nfrom sklearn.model_selectio...raction import DictVectorizer\n\nimport numpy as np', 'class DataFrame_Encoder(BaseEstimator, Transform..., categorical_df],axis=1)\n        return new_data', 'x,y = make_classification(n_samples=200000,n_fea...aram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)', 'x,y = make_classification(n_samples=200000,n_fea...aram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)', 'class DropString_Encoder(BaseEstimator, Transfor...aram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)'], 'Out': {}, 'Pipeline': <class 'sklearn.pipeline.Pipeline'>, 'RandomForestClassifier': <class 'sklearn.ensemble.forest.RandomForestClassifier'>, 'TransformerMixin': <class 'sklearn.base.TransformerMixin'>, ...}
        self.user_ns = {'BaseEstimator': <class 'sklearn.base.BaseEstimator'>, 'DataFrame_Encoder': <class '__main__.DataFrame_Encoder'>, 'DictVectorizer': <class 'sklearn.feature_extraction.dict_vectorizer.DictVectorizer'>, 'DropString_Encoder': <class '__main__.DropString_Encoder'>, 'GridSearchCV': <class 'sklearn.model_selection._search.GridSearchCV'>, 'In': ['', 'import pandas as pd\n\nfrom sklearn.model_selectio...raction import DictVectorizer\n\nimport numpy as np', 'class DataFrame_Encoder(BaseEstimator, Transform..., categorical_df],axis=1)\n        return new_data', 'x,y = make_classification(n_samples=200000,n_fea...aram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)', 'x,y = make_classification(n_samples=200000,n_fea...aram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)', 'class DropString_Encoder(BaseEstimator, Transfor...aram_grid,cv=5,n_jobs=2,verbose=1)\n\nclf.fit(df,y)'], 'Out': {}, 'Pipeline': <class 'sklearn.pipeline.Pipeline'>, 'RandomForestClassifier': <class 'sklearn.ensemble.forest.RandomForestClassifier'>, 'TransformerMixin': <class 'sklearn.base.TransformerMixin'>, ...}
   2848             finally:
   2849                 # Reset our crash handler in place
   2850                 sys.excepthook = old_excepthook
   2851         except SystemExit as e:

...........................................................................
/Users/gstoddard/sklearn_bug/<ipython-input-5-d5101a6a0bf9> in <module>()
     25         ('clf',base_clf)
     26 ])
     27 
     28 clf = GridSearchCV(pipeline, param_grid,cv=5,n_jobs=2,verbose=1)
     29 
---> 30 clf.fit(df,y)
     31 
     32 
     33 
     34 

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/sklearn/model_selection/_search.py in fit(self=GridSearchCV(cv=5, error_score='raise',
       e...train_score=True,
       scoring=None, verbose=1), X=              x1        x2        x3        x4  ...160 -2.339500        a

[200000 rows x 6 columns], y=array([0, 1, 0, ..., 0, 1, 1]), groups=None)
    940 
    941         groups : array-like, with shape (n_samples,), optional
    942             Group labels for the samples used while splitting the dataset into
    943             train/test set.
    944         """
--> 945         return self._fit(X, y, groups, ParameterGrid(self.param_grid))
        self._fit = <bound method BaseSearchCV._fit of GridSearchCV(...rain_score=True,
       scoring=None, verbose=1)>
        X =               x1        x2        x3        x4  ...160 -2.339500        a

[200000 rows x 6 columns]
        y = array([0, 1, 0, ..., 0, 1, 1])
        groups = None
        self.param_grid = {'clf__n_estimators': [10, 100]}
    946 
    947 
    948 class RandomizedSearchCV(BaseSearchCV):
    949     """Randomized search on hyper parameters.

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/sklearn/model_selection/_search.py in _fit(self=GridSearchCV(cv=5, error_score='raise',
       e...train_score=True,
       scoring=None, verbose=1), X=              x1        x2        x3        x4  ...160 -2.339500        a

[200000 rows x 6 columns], y=array([0, 1, 0, ..., 0, 1, 1]), groups=None, parameter_iterable=<sklearn.model_selection._search.ParameterGrid object>)
    559                                   fit_params=self.fit_params,
    560                                   return_train_score=self.return_train_score,
    561                                   return_n_test_samples=True,
    562                                   return_times=True, return_parameters=True,
    563                                   error_score=self.error_score)
--> 564           for parameters in parameter_iterable
        parameters = undefined
        parameter_iterable = <sklearn.model_selection._search.ParameterGrid object>
    565           for train, test in cv_iter)
    566 
    567         # if one choose to see train score, "out" will contain train score info
    568         if self.return_train_score:

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=Parallel(n_jobs=2), iterable=<generator object BaseSearchCV._fit.<locals>.<genexpr>>)
    763             if pre_dispatch == "all" or n_jobs == 1:
    764                 # The iterable was consumed all at once by the above for loop.
    765                 # No need to wait for async callbacks to trigger to
    766                 # consumption.
    767                 self._iterating = False
--> 768             self.retrieve()
        self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=2)>
    769             # Make sure that we get a last message telling us we are done
    770             elapsed_time = time.time() - self._start_time
    771             self._print('Done %3i out of %3i | elapsed: %s finished',
    772                         (len(self._output), len(self._output),

---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
ValueError                                         Wed Aug  2 18:31:58 2017
PID: 35370Python 3.6.1: /Users/gstoddard/anaconda/envs/standard_py3_env/bin/python
...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=<sklearn.externals.joblib.parallel.BatchedCalls object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        self.items = [(<function _fit_and_score>, (Pipeline(steps=[('feature_encoder', DropString_E...None,
            verbose=0, warm_start=False))]),               x1        x2        x3        x4  ...160 -2.339500        a

[200000 rows x 6 columns], memmap([0, 1, 0, ..., 0, 1, 1]), <function _passthrough_scorer>, memmap([ 39804,  39805,  39808, ..., 199997, 199998, 199999]), array([    0,     1,     2, ..., 40168, 40171, 40174]), 1, {'clf__n_estimators': 10}), {'error_score': 'raise', 'fit_params': {}, 'return_n_test_samples': True, 'return_parameters': True, 'return_times': True, 'return_train_score': True})]
    132 
    133     def __len__(self):
    134         return self._size
    135 

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in <listcomp>(.0=<list_iterator object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <function _fit_and_score>
        args = (Pipeline(steps=[('feature_encoder', DropString_E...None,
            verbose=0, warm_start=False))]),               x1        x2        x3        x4  ...160 -2.339500        a

[200000 rows x 6 columns], memmap([0, 1, 0, ..., 0, 1, 1]), <function _passthrough_scorer>, memmap([ 39804,  39805,  39808, ..., 199997, 199998, 199999]), array([    0,     1,     2, ..., 40168, 40171, 40174]), 1, {'clf__n_estimators': 10})
        kwargs = {'error_score': 'raise', 'fit_params': {}, 'return_n_test_samples': True, 'return_parameters': True, 'return_times': True, 'return_train_score': True}
    132 
    133     def __len__(self):
    134         return self._size
    135 

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/sklearn/model_selection/_validation.py in _fit_and_score(estimator=Pipeline(steps=[('feature_encoder', DropString_E...None,
            verbose=0, warm_start=False))]), X=              x1        x2        x3        x4  ...160 -2.339500        a

[200000 rows x 6 columns], y=memmap([0, 1, 0, ..., 0, 1, 1]), scorer=<function _passthrough_scorer>, train=memmap([ 39804,  39805,  39808, ..., 199997, 199998, 199999]), test=array([    0,     1,     2, ..., 40168, 40171, 40174]), verbose=1, parameters={'clf__n_estimators': 10}, fit_params={}, return_train_score=True, return_parameters=True, return_n_test_samples=True, return_times=True, error_score='raise')
    226     if parameters is not None:
    227         estimator.set_params(**parameters)
    228 
    229     start_time = time.time()
    230 
--> 231     X_train, y_train = _safe_split(estimator, X, y, train)
        X_train = undefined
        y_train = undefined
        estimator = Pipeline(steps=[('feature_encoder', DropString_E...None,
            verbose=0, warm_start=False))])
        X =               x1        x2        x3        x4  ...160 -2.339500        a

[200000 rows x 6 columns]
        y = memmap([0, 1, 0, ..., 0, 1, 1])
        train = memmap([ 39804,  39805,  39808, ..., 199997, 199998, 199999])
    232     X_test, y_test = _safe_split(estimator, X, y, test, train)
    233 
    234     try:
    235         if y_train is None:

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/sklearn/utils/metaestimators.py in _safe_split(estimator=Pipeline(steps=[('feature_encoder', DropString_E...None,
            verbose=0, warm_start=False))]), X=              x1        x2        x3        x4  ...160 -2.339500        a

[200000 rows x 6 columns], y=memmap([0, 1, 0, ..., 0, 1, 1]), indices=memmap([ 39804,  39805,  39808, ..., 199997, 199998, 199999]), train_indices=None)
    103             if train_indices is None:
    104                 X_subset = X[np.ix_(indices, indices)]
    105             else:
    106                 X_subset = X[np.ix_(indices, train_indices)]
    107         else:
--> 108             X_subset = safe_indexing(X, indices)
        X_subset = undefined
        X =               x1        x2        x3        x4  ...160 -2.339500        a

[200000 rows x 6 columns]
        indices = memmap([ 39804,  39805,  39808, ..., 199997, 199998, 199999])
    109 
    110     if y is not None:
    111         y_subset = safe_indexing(y, indices)
    112     else:

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/sklearn/utils/__init__.py in safe_indexing(X=              x1        x2        x3        x4  ...160 -2.339500        a

[200000 rows x 6 columns], indices=memmap([ 39804,  39805,  39808, ..., 199997, 199998, 199999]))
    100         except ValueError:
    101             # Cython typed memoryviews internally used in pandas do not support
    102             # readonly buffers.
    103             warnings.warn("Copying input dataframe for slicing.",
    104                           DataConversionWarning)
--> 105             return X.copy().iloc[indices]
        X.copy.iloc = undefined
        indices = memmap([ 39804,  39805,  39808, ..., 199997, 199998, 199999])
    106     elif hasattr(X, "shape"):
    107         if hasattr(X, 'take') and (hasattr(indices, 'dtype') and
    108                                    indices.dtype.kind == 'i'):
    109             # This is often substantially faster than X[indices]

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/core/indexing.py in __getitem__(self=<pandas.core.indexing._iLocIndexer object>, key=memmap([ 39804,  39805,  39808, ..., 199997, 199998, 199999]))
   1323             except (KeyError, IndexError):
   1324                 pass
   1325             return self._getitem_tuple(key)
   1326         else:
   1327             key = com._apply_if_callable(key, self.obj)
-> 1328             return self._getitem_axis(key, axis=0)
        self._getitem_axis = <bound method _iLocIndexer._getitem_axis of <pandas.core.indexing._iLocIndexer object>>
        key = memmap([ 39804,  39805,  39808, ..., 199997, 199998, 199999])
   1329 
   1330     def _is_scalar_access(self, key):
   1331         raise NotImplementedError()
   1332 

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_axis(self=<pandas.core.indexing._iLocIndexer object>, key=memmap([ 39804,  39805,  39808, ..., 199997, 199998, 199999]), axis=0)
   1733             self._has_valid_type(key, axis)
   1734             return self._getbool_axis(key, axis=axis)
   1735 
   1736         # a list of integers
   1737         elif is_list_like_indexer(key):
-> 1738             return self._get_list_axis(key, axis=axis)
        self._get_list_axis = <bound method _iLocIndexer._get_list_axis of <pandas.core.indexing._iLocIndexer object>>
        key = memmap([ 39804,  39805,  39808, ..., 199997, 199998, 199999])
        axis = 0
   1739 
   1740         # a single integer
   1741         else:
   1742             key = self._convert_scalar_indexer(key, axis)

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/core/indexing.py in _get_list_axis(self=<pandas.core.indexing._iLocIndexer object>, key=memmap([ 39804,  39805,  39808, ..., 199997, 199998, 199999]), axis=0)
   1710         Returns
   1711         -------
   1712         Series object
   1713         """
   1714         try:
-> 1715             return self.obj.take(key, axis=axis, convert=False)
        self.obj.take = <bound method NDFrame.take of               x1  ...60 -2.339500        a

[200000 rows x 6 columns]>
        key = memmap([ 39804,  39805,  39808, ..., 199997, 199998, 199999])
        axis = 0
   1716         except IndexError:
   1717             # re-raise with different error message
   1718             raise IndexError("positional indexers are out-of-bounds")
   1719 

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/core/generic.py in take(self=              x1        x2        x3        x4  ...160 -2.339500        a

[200000 rows x 6 columns], indices=memmap([ 39804,  39805,  39808, ..., 199997, 199998, 199999]), axis=0, convert=False, is_copy=True, **kwargs={})
   1923         """
   1924         nv.validate_take(tuple(), kwargs)
   1925         self._consolidate_inplace()
   1926         new_data = self._data.take(indices,
   1927                                    axis=self._get_block_manager_axis(axis),
-> 1928                                    convert=True, verify=True)
        convert = False
   1929         result = self._constructor(new_data).__finalize__(self)
   1930 
   1931         # maybe set copy if we didn't actually change the index
   1932         if is_copy:

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/core/internals.py in take(self=BlockManager
Items: Index(['x1', 'x2', 'x3', 'x4...tBlock: slice(5, 6, 1), 1 x 200000, dtype: object, indexer=memmap([ 39804,  39805,  39808, ..., 199997, 199998, 199999]), axis=1, verify=True, convert=True)
   4006                 raise Exception('Indices must be nonzero and less than '
   4007                                 'the axis length')
   4008 
   4009         new_labels = self.axes[axis].take(indexer)
   4010         return self.reindex_indexer(new_axis=new_labels, indexer=indexer,
-> 4011                                     axis=axis, allow_dups=True)
        axis = 1
   4012 
   4013     def merge(self, other, lsuffix='', rsuffix=''):
   4014         if not self._is_indexed_like(other):
   4015             raise AssertionError('Must have same axes to merge managers')

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/core/internals.py in reindex_indexer(self=BlockManager
Items: Index(['x1', 'x2', 'x3', 'x4...tBlock: slice(5, 6, 1), 1 x 200000, dtype: object, new_axis=Int64Index([ 39804,  39805,  39808,  39816,  398...199999],
           dtype='int64', length=159999), indexer=memmap([ 39804,  39805,  39808, ..., 199997, 199998, 199999]), axis=1, fill_value=None, allow_dups=True, copy=True)
   3892             new_blocks = self._slice_take_blocks_ax0(indexer,
   3893                                                      fill_tuple=(fill_value,))
   3894         else:
   3895             new_blocks = [blk.take_nd(indexer, axis=axis, fill_tuple=(
   3896                 fill_value if fill_value is not None else blk.fill_value,))
-> 3897                 for blk in self.blocks]
        self.blocks = (FloatBlock: slice(0, 5, 1), 5 x 200000, dtype: float64, ObjectBlock: slice(5, 6, 1), 1 x 200000, dtype: object)
   3898 
   3899         new_axes = list(self.axes)
   3900         new_axes[axis] = new_axis
   3901         return self.__class__(new_blocks, new_axes)

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/core/internals.py in <listcomp>(.0=<tuple_iterator object>)
   3892             new_blocks = self._slice_take_blocks_ax0(indexer,
   3893                                                      fill_tuple=(fill_value,))
   3894         else:
   3895             new_blocks = [blk.take_nd(indexer, axis=axis, fill_tuple=(
   3896                 fill_value if fill_value is not None else blk.fill_value,))
-> 3897                 for blk in self.blocks]
        blk = FloatBlock: slice(0, 5, 1), 5 x 200000, dtype: float64
   3898 
   3899         new_axes = list(self.axes)
   3900         new_axes[axis] = new_axis
   3901         return self.__class__(new_blocks, new_axes)

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/core/internals.py in take_nd(self=FloatBlock: slice(0, 5, 1), 5 x 200000, dtype: float64, indexer=memmap([ 39804,  39805,  39808, ..., 199997, 199998, 199999]), axis=1, new_mgr_locs=None, fill_tuple=(nan,))
   1041             new_values = algos.take_nd(values, indexer, axis=axis,
   1042                                        allow_fill=False)
   1043         else:
   1044             fill_value = fill_tuple[0]
   1045             new_values = algos.take_nd(values, indexer, axis=axis,
-> 1046                                        allow_fill=True, fill_value=fill_value)
        fill_value = nan
   1047 
   1048         if new_mgr_locs is None:
   1049             if axis == 0:
   1050                 slc = lib.indexer_as_slice(indexer)

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/core/algorithms.py in take_nd(arr=memmap([[ 0.23953029, -0.50635937,  1.40387008, ... 1.38955039,
         -0.74704499, -2.33949991]]), indexer=memmap([ 39804,  39805,  39808, ..., 199997, 199998, 199999]), axis=1, out=array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
    ...0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]]), fill_value=nan, mask_info=None, allow_fill=True)
   1466         else:
   1467             out = np.empty(out_shape, dtype=dtype)
   1468 
   1469     func = _get_take_nd_function(arr.ndim, arr.dtype, out.dtype, axis=axis,
   1470                                  mask_info=mask_info)
-> 1471     func(arr, indexer, out, fill_value)
        func = <built-in function take_2d_axis1_float64_float64>
        arr = memmap([[ 0.23953029, -0.50635937,  1.40387008, ... 1.38955039,
         -0.74704499, -2.33949991]])
        indexer = memmap([ 39804,  39805,  39808, ..., 199997, 199998, 199999])
        out = array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
    ...0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
        fill_value = nan
   1472 
   1473     if flip_order:
   1474         out = out.T
   1475     return out

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/_libs/algos.cpython-36m-darwin.so in pandas._libs.algos.take_2d_axis1_float64_float64 (pandas/_libs/algos.c:111160)()
   4629 
   4630 
   4631 
   4632 
   4633 
-> 4634 
   4635 
   4636 
   4637 
   4638 

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/_libs/algos.cpython-36m-darwin.so in View.MemoryView.memoryview_cwrapper (pandas/_libs/algos.c:124730)()
    639 
    640 
    641 
    642 
    643 
--> 644 
    645 
    646 
    647 
    648 

...........................................................................
/Users/gstoddard/anaconda/envs/standard_py3_env/lib/python3.6/site-packages/pandas/_libs/algos.cpython-36m-darwin.so in View.MemoryView.memoryview.__cinit__ (pandas/_libs/algos.c:120965)()
    340 
    341 
    342 
    343 
    344 
--> 345 
    346 
    347 
    348 
    349 

ValueError: buffer source array is read-only
___________________________________________________________________________

# Example 4: Works
 
In line 20, we drop the one Object column and only keep the numeric ones. Once we do that, things work fine. 

In [7]:
x,y = make_classification(n_samples=200000,n_features=5)

numeric_features = ['x1','x2','x3','x4','x5']
string_features = []

df = pd.DataFrame(data=x,columns=numeric_features)
df['category'] = 'a'

base_clf = RandomForestClassifier(n_jobs=4)
param_grid = {'clf__n_estimators':[10]}

pipeline = Pipeline([
        ('feature_encoder',DataFrame_Encoder()),
        ('clf',base_clf)
])
pipeline.set_params(feature_encoder__categorical_cols_=string_features, feature_encoder__numeric_cols_=numeric_features)

clf = GridSearchCV(pipeline, param_grid,cv=2,n_jobs=2,verbose=1)

temp_df = df[numeric_features]

clf.fit(temp_df,y)


__init__ called
Fitting 2 folds for each of 1 candidates, totalling 2 fits
__init__ called
__init__ called
__init__ called
Fit called
Fit called
Transform called
Transform called
Transform called
Transform called
Transform called
Transform called


[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    4.9s remaining:    0.0s
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    4.9s finished


__init__ called
Fit called
Transform called


GridSearchCV(cv=2, error_score='raise',
       estimator=Pipeline(steps=[('feature_encoder', DataFrame_Encoder(categorical_cols_=[],
         numeric_cols_=['x1', 'x2', 'x3', 'x4', 'x5'])), ('clf', RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impuri...imators=10, n_jobs=4, oob_score=False, random_state=None,
            verbose=0, warm_start=False))]),
       fit_params={}, iid=True, n_jobs=2,
       param_grid={'clf__n_estimators': [10]}, pre_dispatch='2*n_jobs',
       refit=True, return_train_score=True, scoring=None, verbose=1)

# Example 5: Works
 
This is the full example but we simply decrease the size of the dataset and everything works fine. 

In [8]:
x,y = make_classification(n_samples=2000,n_features=5)

numeric_features = ['x1','x2','x3','x4','x5']
string_features = ['category']

df = pd.DataFrame(data=x,columns=numeric_features)
df['category'] = 'a'

base_clf = RandomForestClassifier(n_jobs=4)
param_grid = {'clf__n_estimators':[10]}

pipeline = Pipeline([
        ('feature_encoder',DataFrame_Encoder()),
        ('clf',base_clf)
])
pipeline.set_params(feature_encoder__categorical_cols_=string_features, feature_encoder__numeric_cols_=numeric_features)

clf = GridSearchCV(pipeline, param_grid,cv=2,n_jobs=2,verbose=1)


clf.fit(df,y)


__init__ called
Fitting 2 folds for each of 1 candidates, totalling 2 fits
__init__ called
Fit called
Fit called
Transform called
Transform called
__init__ called
__init__ called
Transform called
Transform called
Transform called
Transform called


[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.4s remaining:    0.0s
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.4s finished


__init__ called
Fit called
Transform called


GridSearchCV(cv=2, error_score='raise',
       estimator=Pipeline(steps=[('feature_encoder', DataFrame_Encoder(categorical_cols_=['category'],
         numeric_cols_=['x1', 'x2', 'x3', 'x4', 'x5'])), ('clf', RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            ...imators=10, n_jobs=4, oob_score=False, random_state=None,
            verbose=0, warm_start=False))]),
       fit_params={}, iid=True, n_jobs=2,
       param_grid={'clf__n_estimators': [10]}, pre_dispatch='2*n_jobs',
       refit=True, return_train_score=True, scoring=None, verbose=1)

# Versions

In [9]:
import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import sklearn; print("Scikit-Learn", sklearn.__version__)

Darwin-15.6.0-x86_64-i386-64bit
Python 3.6.1 |Continuum Analytics, Inc.| (default, May 11 2017, 13:04:09) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
NumPy 1.13.1
SciPy 0.19.1
Scikit-Learn 0.18.2
