# Decision Trees and Ensembles Lab

In this lab we will compare the performance of a simple Decision Tree classifier with a Bagging classifier. We will do that on few datasets, starting from the ones offered by Scikit Learn.

## 1. Breast Cancer Dataset
We will start our comparison on the breast cancer dataset.
You can load it directly from scikit-learn using the `load_breast_cancer` function.

### 1.a Simple comparison
1. Load the data and create X and y
- Initialize a Decision Tree Classifier and use cross_val_score to evaluate it's performance. Set crossvalidation to 5-folds
- Wrap a Bagging Classifier around the Decision Tree Classifier and use cross_val_score to evaluate it's performance. Set crossvalidation to 5-folds. 
- Which score is better? Are the score significantly different? How can you judge that?

In [31]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.datasets import load_breast_cancer

In [32]:
data=load_breast_cancer()
X=pd.DataFrame(data.data, columns=data.feature_names)
X.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


In [33]:
y=pd.Series(data.target)

In [34]:
y.describe()

count    569.000000
mean       0.627417
std        0.483918
min        0.000000
25%        0.000000
50%        1.000000
75%        1.000000
max        1.000000
dtype: float64

In [35]:
y.value_counts()/y.count()

1    0.627417
0    0.372583
dtype: float64

In [99]:
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier,BaggingRegressor
from sklearn.cross_validation import cross_val_score
from sklearn.cross_validation import train_test_split, cross_val_score

In [37]:
dt=DecisionTreeClassifier()

In [38]:
scores=cross_val_score(dt, X,y, n_jobs=-1)
scores.mean()

0.91560382437575427

In [39]:
bdt = BaggingClassifier(dt)
scores2=cross_val_score(bdt,X,y,n_jobs=-1)
scores2.mean()

0.9402766174695999

### 1.b Scaled pipelines
As you may have noticed the features are not normalized. Do the score improve with normalization?
By now you should be very familiar with pipelines and scaling, so:

1. Create 2 pipelines, with a scaling preprocessing step and then either a decision tree or a bagging decision tree.
- Which score is better? Are the score significantly different? How can you judge that?
- Are the scores different from the non-scaled data?

In [60]:
from sklearn.preprocessing import RobustScaler, StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn import tree

In [45]:
X_train,X_test,y_train,y_test=train_test_split(X,y)

In [46]:
#standardscaler with decision tree
pipe1= make_pipeline(StandardScaler(), DecisionTreeClassifier())
pipe1

Pipeline(steps=[('standardscaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('decisiontreeclassifier', DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            presort=False, random_state=None, splitter='best'))])

In [55]:
#robust scaler with bagging
pipe2= make_pipeline(RobustScaler(), BaggingClassifier(tree.DecisionTreeClassifier()))
pipe2

Pipeline(steps=[('robustscaler', RobustScaler(copy=True, with_centering=True, with_scaling=True)), ('baggingclassifier', BaggingClassifier(base_estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
       ...estimators=10, n_jobs=1, oob_score=False,
         random_state=None, verbose=0, warm_start=False))])

In [56]:
pipe1.fit(X_train,y_train).score(X_test,y_test)

0.91608391608391604

In [57]:
pipe2.fit(X_train,y_train).score(X_test,y_test)

0.95104895104895104

### 1.c Grid Search

Grid search is a great way to improve the performance of a classifier. Let's explore the parameter space of both models and see if we can improve their performance.

1. Initialize a GridSearchCV with 5-fold cross validation for the Decision Tree Classifier
- search for few values of the parameters in order to improve the score of the classifier
- Use the whole X, y dataset for your test
- Check the best\_score\_ once you've trained it. Is it better than before?
- How does the score of the Grid-searched DT compare with the score of the Bagging DT?
- Initialize a GridSearchCV with 5-fold cross validation for the Bagging Decision Tree Classifier
- Repeat the search
    - Note that you'll have to change parameter names for the base_estimator
    - Note that there are also additional parameters to change
    - Note that you may end up with a grid space to large to search in a short time
    - Make use of the n_jobs parameter to speed up your grid search
- Does the score improve for the Grid-searched Bagging Classifier?
- Which score is better? Are the score significantly different? How can you judge that?

In [58]:
from sklearn.grid_search import GridSearchCV

In [69]:
params={'n_estimators':np.arange(1,20,2),
       'max_samples':np.arange(0.1,1,.1)}
gs = GridSearchCV(tree.DecisionTreeClassifier(),params,n_jobs=-1,verbose=2)

In [70]:
gs.fit(X_train,y_train)

Fitting 3 folds for each of 90 candidates, totalling 270 fits
[CV] n_estimators=1, max_samples=0.1 .................................
[CV] n_estimators=1, max_samples=0.1 .................................
[CV] n_estimators=1, max_samples=0.1 .................................
[CV] n_estimators=3, max_samples=0.1 .................................
[CV] n_estimators=3, max_samples=0.1 .................................
[CV] n_estimators=3, max_samples=0.1 .................................
[CV] n_estimators=5, max_samples=0.1 .................................
[CV] n_estimators=5, max_samples=0.1 .................................


JoblibValueError: JoblibValueError
___________________________________________________________________________
Multiprocessing exception:
...........................................................................
//anaconda/lib/python2.7/runpy.py in _run_module_as_main(mod_name='ipykernel.__main__', alter_argv=1)
    169     pkg_name = mod_name.rpartition('.')[0]
    170     main_globals = sys.modules["__main__"].__dict__
    171     if alter_argv:
    172         sys.argv[0] = fname
    173     return _run_code(code, main_globals, None,
--> 174                      "__main__", fname, loader, pkg_name)
        fname = '/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py'
        loader = <pkgutil.ImpLoader instance>
        pkg_name = 'ipykernel'
    175 
    176 def run_module(mod_name, init_globals=None,
    177                run_name=None, alter_sys=False):
    178     """Execute a module's code without importing it

...........................................................................
//anaconda/lib/python2.7/runpy.py in _run_code(code=<code object <module> at 0x1006ce1b0, file "/ana...2.7/site-packages/ipykernel/__main__.py", line 1>, run_globals={'__builtins__': <module '__builtin__' (built-in)>, '__doc__': None, '__file__': '/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py', '__loader__': <pkgutil.ImpLoader instance>, '__name__': '__main__', '__package__': 'ipykernel', 'app': <module 'ipykernel.kernelapp' from '//anaconda/lib/python2.7/site-packages/ipykernel/kernelapp.pyc'>}, init_globals=None, mod_name='__main__', mod_fname='/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py', mod_loader=<pkgutil.ImpLoader instance>, pkg_name='ipykernel')
     67         run_globals.update(init_globals)
     68     run_globals.update(__name__ = mod_name,
     69                        __file__ = mod_fname,
     70                        __loader__ = mod_loader,
     71                        __package__ = pkg_name)
---> 72     exec code in run_globals
        code = <code object <module> at 0x1006ce1b0, file "/ana...2.7/site-packages/ipykernel/__main__.py", line 1>
        run_globals = {'__builtins__': <module '__builtin__' (built-in)>, '__doc__': None, '__file__': '/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py', '__loader__': <pkgutil.ImpLoader instance>, '__name__': '__main__', '__package__': 'ipykernel', 'app': <module 'ipykernel.kernelapp' from '//anaconda/lib/python2.7/site-packages/ipykernel/kernelapp.pyc'>}
     73     return run_globals
     74 
     75 def _run_module_code(code, init_globals=None,
     76                     mod_name=None, mod_fname=None,

...........................................................................
/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py in <module>()
      1 
      2 
----> 3 
      4 if __name__ == '__main__':
      5     from ipykernel import kernelapp as app
      6     app.launch_new_instance()
      7 
      8 
      9 
     10 

...........................................................................
//anaconda/lib/python2.7/site-packages/traitlets/config/application.py in launch_instance(cls=<class 'ipykernel.kernelapp.IPKernelApp'>, argv=None, **kwargs={})
    648 
    649         If a global instance already exists, this reinitializes and starts it
    650         """
    651         app = cls.instance(**kwargs)
    652         app.initialize(argv)
--> 653         app.start()
        app.start = <bound method IPKernelApp.start of <ipykernel.kernelapp.IPKernelApp object>>
    654 
    655 #-----------------------------------------------------------------------------
    656 # utility functions, for convenience
    657 #-----------------------------------------------------------------------------

...........................................................................
//anaconda/lib/python2.7/site-packages/ipykernel/kernelapp.py in start(self=<ipykernel.kernelapp.IPKernelApp object>)
    469             return self.subapp.start()
    470         if self.poller is not None:
    471             self.poller.start()
    472         self.kernel.start()
    473         try:
--> 474             ioloop.IOLoop.instance().start()
    475         except KeyboardInterrupt:
    476             pass
    477 
    478 launch_new_instance = IPKernelApp.launch_instance

...........................................................................
//anaconda/lib/python2.7/site-packages/zmq/eventloop/ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
    157             PollIOLoop.configure(ZMQIOLoop)
    158         return PollIOLoop.current(*args, **kwargs)
    159     
    160     def start(self):
    161         try:
--> 162             super(ZMQIOLoop, self).start()
        self.start = <bound method ZMQIOLoop.start of <zmq.eventloop.ioloop.ZMQIOLoop object>>
    163         except ZMQError as e:
    164             if e.errno == ETERM:
    165                 # quietly return on ETERM
    166                 pass

...........................................................................
//anaconda/lib/python2.7/site-packages/tornado/ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
    882                 self._events.update(event_pairs)
    883                 while self._events:
    884                     fd, events = self._events.popitem()
    885                     try:
    886                         fd_obj, handler_func = self._handlers[fd]
--> 887                         handler_func(fd_obj, events)
        handler_func = <function null_wrapper>
        fd_obj = <zmq.sugar.socket.Socket object>
        events = 1
    888                     except (OSError, IOError) as e:
    889                         if errno_from_exception(e) == errno.EPIPE:
    890                             # Happens when the client closes the connection
    891                             pass

...........................................................................
//anaconda/lib/python2.7/site-packages/tornado/stack_context.py in null_wrapper(*args=(<zmq.sugar.socket.Socket object>, 1), **kwargs={})
    270         # Fast path when there are no active contexts.
    271         def null_wrapper(*args, **kwargs):
    272             try:
    273                 current_state = _state.contexts
    274                 _state.contexts = cap_contexts[0]
--> 275                 return fn(*args, **kwargs)
        args = (<zmq.sugar.socket.Socket object>, 1)
        kwargs = {}
    276             finally:
    277                 _state.contexts = current_state
    278         null_wrapper._wrapped = True
    279         return null_wrapper

...........................................................................
//anaconda/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py in _handle_events(self=<zmq.eventloop.zmqstream.ZMQStream object>, fd=<zmq.sugar.socket.Socket object>, events=1)
    435             # dispatch events:
    436             if events & IOLoop.ERROR:
    437                 gen_log.error("got POLLERR event on ZMQStream, which doesn't make sense")
    438                 return
    439             if events & IOLoop.READ:
--> 440                 self._handle_recv()
        self._handle_recv = <bound method ZMQStream._handle_recv of <zmq.eventloop.zmqstream.ZMQStream object>>
    441                 if not self.socket:
    442                     return
    443             if events & IOLoop.WRITE:
    444                 self._handle_send()

...........................................................................
//anaconda/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py in _handle_recv(self=<zmq.eventloop.zmqstream.ZMQStream object>)
    467                 gen_log.error("RECV Error: %s"%zmq.strerror(e.errno))
    468         else:
    469             if self._recv_callback:
    470                 callback = self._recv_callback
    471                 # self._recv_callback = None
--> 472                 self._run_callback(callback, msg)
        self._run_callback = <bound method ZMQStream._run_callback of <zmq.eventloop.zmqstream.ZMQStream object>>
        callback = <function null_wrapper>
        msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
    473                 
    474         # self.update_state()
    475         
    476 

...........................................................................
//anaconda/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py in _run_callback(self=<zmq.eventloop.zmqstream.ZMQStream object>, callback=<function null_wrapper>, *args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
    409         close our socket."""
    410         try:
    411             # Use a NullContext to ensure that all StackContexts are run
    412             # inside our blanket exception handler rather than outside.
    413             with stack_context.NullContext():
--> 414                 callback(*args, **kwargs)
        callback = <function null_wrapper>
        args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
        kwargs = {}
    415         except:
    416             gen_log.error("Uncaught exception, closing connection.",
    417                           exc_info=True)
    418             # Close the socket on an uncaught exception from a user callback

...........................................................................
//anaconda/lib/python2.7/site-packages/tornado/stack_context.py in null_wrapper(*args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
    270         # Fast path when there are no active contexts.
    271         def null_wrapper(*args, **kwargs):
    272             try:
    273                 current_state = _state.contexts
    274                 _state.contexts = cap_contexts[0]
--> 275                 return fn(*args, **kwargs)
        args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
        kwargs = {}
    276             finally:
    277                 _state.contexts = current_state
    278         null_wrapper._wrapped = True
    279         return null_wrapper

...........................................................................
//anaconda/lib/python2.7/site-packages/ipykernel/kernelbase.py in dispatcher(msg=[<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>])
    271         if self.control_stream:
    272             self.control_stream.on_recv(self.dispatch_control, copy=False)
    273 
    274         def make_dispatcher(stream):
    275             def dispatcher(msg):
--> 276                 return self.dispatch_shell(stream, msg)
        msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
    277             return dispatcher
    278 
    279         for s in self.shell_streams:
    280             s.on_recv(make_dispatcher(s), copy=False)

...........................................................................
//anaconda/lib/python2.7/site-packages/ipykernel/kernelbase.py in dispatch_shell(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, msg={'buffers': [], 'content': {u'allow_stdin': True, u'code': u'gs.fit(X_train,y_train)', u'silent': False, u'stop_on_error': True, u'store_history': True, u'user_expressions': {}}, 'header': {'date': '2016-12-02T11:07:20.164987', u'msg_id': u'B5A9752BB37948148AA0755C2990ED24', u'msg_type': u'execute_request', u'session': u'E99BF56D528A499E9DC5EF095213BD9D', u'username': u'username', u'version': u'5.0'}, 'metadata': {}, 'msg_id': u'B5A9752BB37948148AA0755C2990ED24', 'msg_type': u'execute_request', 'parent_header': {}})
    223             self.log.error("UNKNOWN MESSAGE TYPE: %r", msg_type)
    224         else:
    225             self.log.debug("%s: %s", msg_type, msg)
    226             self.pre_handler_hook()
    227             try:
--> 228                 handler(stream, idents, msg)
        handler = <bound method IPythonKernel.execute_request of <ipykernel.ipkernel.IPythonKernel object>>
        stream = <zmq.eventloop.zmqstream.ZMQStream object>
        idents = ['E99BF56D528A499E9DC5EF095213BD9D']
        msg = {'buffers': [], 'content': {u'allow_stdin': True, u'code': u'gs.fit(X_train,y_train)', u'silent': False, u'stop_on_error': True, u'store_history': True, u'user_expressions': {}}, 'header': {'date': '2016-12-02T11:07:20.164987', u'msg_id': u'B5A9752BB37948148AA0755C2990ED24', u'msg_type': u'execute_request', u'session': u'E99BF56D528A499E9DC5EF095213BD9D', u'username': u'username', u'version': u'5.0'}, 'metadata': {}, 'msg_id': u'B5A9752BB37948148AA0755C2990ED24', 'msg_type': u'execute_request', 'parent_header': {}}
    229             except Exception:
    230                 self.log.error("Exception in message handler:", exc_info=True)
    231             finally:
    232                 self.post_handler_hook()

...........................................................................
//anaconda/lib/python2.7/site-packages/ipykernel/kernelbase.py in execute_request(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, ident=['E99BF56D528A499E9DC5EF095213BD9D'], parent={'buffers': [], 'content': {u'allow_stdin': True, u'code': u'gs.fit(X_train,y_train)', u'silent': False, u'stop_on_error': True, u'store_history': True, u'user_expressions': {}}, 'header': {'date': '2016-12-02T11:07:20.164987', u'msg_id': u'B5A9752BB37948148AA0755C2990ED24', u'msg_type': u'execute_request', u'session': u'E99BF56D528A499E9DC5EF095213BD9D', u'username': u'username', u'version': u'5.0'}, 'metadata': {}, 'msg_id': u'B5A9752BB37948148AA0755C2990ED24', 'msg_type': u'execute_request', 'parent_header': {}})
    385         if not silent:
    386             self.execution_count += 1
    387             self._publish_execute_input(code, parent, self.execution_count)
    388 
    389         reply_content = self.do_execute(code, silent, store_history,
--> 390                                         user_expressions, allow_stdin)
        user_expressions = {}
        allow_stdin = True
    391 
    392         # Flush output before sending the reply.
    393         sys.stdout.flush()
    394         sys.stderr.flush()

...........................................................................
//anaconda/lib/python2.7/site-packages/ipykernel/ipkernel.py in do_execute(self=<ipykernel.ipkernel.IPythonKernel object>, code=u'gs.fit(X_train,y_train)', silent=False, store_history=True, user_expressions={}, allow_stdin=True)
    191 
    192         self._forward_input(allow_stdin)
    193 
    194         reply_content = {}
    195         try:
--> 196             res = shell.run_cell(code, store_history=store_history, silent=silent)
        res = undefined
        shell.run_cell = <bound method ZMQInteractiveShell.run_cell of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        code = u'gs.fit(X_train,y_train)'
        store_history = True
        silent = False
    197         finally:
    198             self._restore_input()
    199 
    200         if res.error_before_exec is not None:

...........................................................................
//anaconda/lib/python2.7/site-packages/ipykernel/zmqshell.py in run_cell(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, *args=(u'gs.fit(X_train,y_train)',), **kwargs={'silent': False, 'store_history': True})
    496             )
    497         self.payload_manager.write_payload(payload)
    498 
    499     def run_cell(self, *args, **kwargs):
    500         self._last_traceback = None
--> 501         return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
        self.run_cell = <bound method ZMQInteractiveShell.run_cell of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        args = (u'gs.fit(X_train,y_train)',)
        kwargs = {'silent': False, 'store_history': True}
    502 
    503     def _showtraceback(self, etype, evalue, stb):
    504         # try to preserve ordering of tracebacks and print statements
    505         sys.stdout.flush()

...........................................................................
//anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.py in run_cell(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, raw_cell=u'gs.fit(X_train,y_train)', store_history=True, silent=False, shell_futures=True)
   2712                 self.displayhook.exec_result = result
   2713 
   2714                 # Execute the user code
   2715                 interactivity = "none" if silent else self.ast_node_interactivity
   2716                 has_raised = self.run_ast_nodes(code_ast.body, cell_name,
-> 2717                    interactivity=interactivity, compiler=compiler, result=result)
        interactivity = 'last_expr'
        compiler = <IPython.core.compilerop.CachingCompiler instance>
   2718                 
   2719                 self.last_execution_succeeded = not has_raised
   2720 
   2721                 # Reset this so later displayed values do not modify the

...........................................................................
//anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.py in run_ast_nodes(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, nodelist=[<_ast.Expr object>], cell_name='<ipython-input-70-a32b9d649c49>', interactivity='last', compiler=<IPython.core.compilerop.CachingCompiler instance>, result=<ExecutionResult object at 1156ba910, execution_..._before_exec=None error_in_exec=None result=None>)
   2822                     return True
   2823 
   2824             for i, node in enumerate(to_run_interactive):
   2825                 mod = ast.Interactive([node])
   2826                 code = compiler(mod, cell_name, "single")
-> 2827                 if self.run_code(code, result):
        self.run_code = <bound method ZMQInteractiveShell.run_code of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        code = <code object <module> at 0x1156a66b0, file "<ipython-input-70-a32b9d649c49>", line 1>
        result = <ExecutionResult object at 1156ba910, execution_..._before_exec=None error_in_exec=None result=None>
   2828                     return True
   2829 
   2830             # Flush softspace
   2831             if softspace(sys.stdout, 0):

...........................................................................
//anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.py in run_code(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, code_obj=<code object <module> at 0x1156a66b0, file "<ipython-input-70-a32b9d649c49>", line 1>, result=<ExecutionResult object at 1156ba910, execution_..._before_exec=None error_in_exec=None result=None>)
   2876         outflag = 1  # happens in more places, so it's easier as default
   2877         try:
   2878             try:
   2879                 self.hooks.pre_run_code_hook()
   2880                 #rprint('Running code', repr(code_obj)) # dbg
-> 2881                 exec(code_obj, self.user_global_ns, self.user_ns)
        code_obj = <code object <module> at 0x1156a66b0, file "<ipython-input-70-a32b9d649c49>", line 1>
        self.user_global_ns = {'BaggingClassifier': <class 'sklearn.ensemble.bagging.BaggingClassifier'>, 'DecisionTreeClassifier': <class 'sklearn.tree.tree.DecisionTreeClassifier'>, 'GridSearchCV': <class 'sklearn.grid_search.GridSearchCV'>, 'In': ['', u"import pandas as pd\nimport numpy as np\nimpor...s plt\nget_ipython().magic(u'matplotlib inline')", u'data=load_breast_cancer()', u"import pandas as pd\nimport numpy as np\nimpor...oom sklearn.datasets import load_breast_cancer()", u"import pandas as pd\nimport numpy as np\nimpor...rom sklearn.datasets import load_breast_cancer()", u"import pandas as pd\nimport numpy as np\nimpor...nfrom sklearn.datasets import load_breast_cancer", u'data=load_breast_cancer()', u'data=load_breast_cancer()\ndata.head()', u'data=load_breast_cancer()\ndata', u'y=pd.Series(data.target)', u'y.describe()', u'y.value.counts()/y.count()', u'(y.value.counts())/y.count()', u'y.value_counts()/y.count()', u'from sklearn.tree import DecisionTreeClassifie... sklearn.cross_validation import cross_val_score', u'dt=DecisionTreeClassifier()', u'scores=cross_val_score(dt, X,y, n_jobs=-1)', u'scores=cross_val_score(dt, x,y, n_jobs=-1)', u'from sklearn.preprocessing import RobustScaler...aler\nfrom sklearn.pipeline import make_pipeline', u'p1=make_pipeline()', ...], 'Out': {8: {'target_names': array(['malignant', 'benign'], ...'worst fractal dimension'], 
      dtype='|S23')}, 10: count    569.000000
mean       0.627417
std     ...      1.000000
max        1.000000
dtype: float64, 13: 1    0.627417
0    0.372583
dtype: float64, 20:    mean radius  mean texture  mean perimeter  me...                 0.07678  

[5 rows x 30 columns], 22: array([ 0.90526316,  0.94736842,  0.89417989]), 23: 0.91560382437575427, 24: 0.9402766174695999, 26: Pipeline(steps=[('standardscaler', StandardScale...ort=False, random_state=None, splitter='best'))]), 27: Pipeline(steps=[('standardscaler', StandardScale...te=None,
         verbose=0, warm_start=False))]), 30: Pipeline(steps=[('robustscaler', RobustScaler(co...andom_state=None, verbose=0, warm_start=False))]), ...}, 'RobustScaler': <class 'sklearn.preprocessing.data.RobustScaler'>, 'StandardScaler': <class 'sklearn.preprocessing.data.StandardScaler'>, 'X':      mean radius  mean texture  mean perimeter  ...               0.07039  

[569 rows x 30 columns], 'X_test':      mean radius  mean texture  mean perimeter  ...               0.10130  

[143 rows x 30 columns], 'X_train':      mean radius  mean texture  mean perimeter  ...               0.06688  

[426 rows x 30 columns], ...}
        self.user_ns = {'BaggingClassifier': <class 'sklearn.ensemble.bagging.BaggingClassifier'>, 'DecisionTreeClassifier': <class 'sklearn.tree.tree.DecisionTreeClassifier'>, 'GridSearchCV': <class 'sklearn.grid_search.GridSearchCV'>, 'In': ['', u"import pandas as pd\nimport numpy as np\nimpor...s plt\nget_ipython().magic(u'matplotlib inline')", u'data=load_breast_cancer()', u"import pandas as pd\nimport numpy as np\nimpor...oom sklearn.datasets import load_breast_cancer()", u"import pandas as pd\nimport numpy as np\nimpor...rom sklearn.datasets import load_breast_cancer()", u"import pandas as pd\nimport numpy as np\nimpor...nfrom sklearn.datasets import load_breast_cancer", u'data=load_breast_cancer()', u'data=load_breast_cancer()\ndata.head()', u'data=load_breast_cancer()\ndata', u'y=pd.Series(data.target)', u'y.describe()', u'y.value.counts()/y.count()', u'(y.value.counts())/y.count()', u'y.value_counts()/y.count()', u'from sklearn.tree import DecisionTreeClassifie... sklearn.cross_validation import cross_val_score', u'dt=DecisionTreeClassifier()', u'scores=cross_val_score(dt, X,y, n_jobs=-1)', u'scores=cross_val_score(dt, x,y, n_jobs=-1)', u'from sklearn.preprocessing import RobustScaler...aler\nfrom sklearn.pipeline import make_pipeline', u'p1=make_pipeline()', ...], 'Out': {8: {'target_names': array(['malignant', 'benign'], ...'worst fractal dimension'], 
      dtype='|S23')}, 10: count    569.000000
mean       0.627417
std     ...      1.000000
max        1.000000
dtype: float64, 13: 1    0.627417
0    0.372583
dtype: float64, 20:    mean radius  mean texture  mean perimeter  me...                 0.07678  

[5 rows x 30 columns], 22: array([ 0.90526316,  0.94736842,  0.89417989]), 23: 0.91560382437575427, 24: 0.9402766174695999, 26: Pipeline(steps=[('standardscaler', StandardScale...ort=False, random_state=None, splitter='best'))]), 27: Pipeline(steps=[('standardscaler', StandardScale...te=None,
         verbose=0, warm_start=False))]), 30: Pipeline(steps=[('robustscaler', RobustScaler(co...andom_state=None, verbose=0, warm_start=False))]), ...}, 'RobustScaler': <class 'sklearn.preprocessing.data.RobustScaler'>, 'StandardScaler': <class 'sklearn.preprocessing.data.StandardScaler'>, 'X':      mean radius  mean texture  mean perimeter  ...               0.07039  

[569 rows x 30 columns], 'X_test':      mean radius  mean texture  mean perimeter  ...               0.10130  

[143 rows x 30 columns], 'X_train':      mean radius  mean texture  mean perimeter  ...               0.06688  

[426 rows x 30 columns], ...}
   2882             finally:
   2883                 # Reset our crash handler in place
   2884                 sys.excepthook = old_excepthook
   2885         except SystemExit as e:

...........................................................................
/Users/samyuktha/Documents/dsi-RameshKattumenu/week-06/2.4-lab/code/starter-code/<ipython-input-70-a32b9d649c49> in <module>()
----> 1 
      2 
      3 
      4 
      5 
      6 gs.fit(X_train,y_train)
      7 
      8 
      9 
     10 

...........................................................................
//anaconda/lib/python2.7/site-packages/sklearn/grid_search.py in fit(self=GridSearchCV(cv=None, error_score='raise',
     ...='2*n_jobs', refit=True, scoring=None, verbose=2), X=     mean radius  mean texture  mean perimeter  ...               0.06688  

[426 rows x 30 columns], y=336    1
334    1
79     1
536    0
384    1
394...
320    1
202    0
377    1
281    1
dtype: int64)
    799         y : array-like, shape = [n_samples] or [n_samples, n_output], optional
    800             Target relative to X for classification or regression;
    801             None for unsupervised learning.
    802 
    803         """
--> 804         return self._fit(X, y, ParameterGrid(self.param_grid))
        self._fit = <bound method GridSearchCV._fit of GridSearchCV(...'2*n_jobs', refit=True, scoring=None, verbose=2)>
        X =      mean radius  mean texture  mean perimeter  ...               0.06688  

[426 rows x 30 columns]
        y = 336    1
334    1
79     1
536    0
384    1
394...
320    1
202    0
377    1
281    1
dtype: int64
        self.param_grid = {'max_samples': array([ 0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9]), 'n_estimators': array([ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19])}
    805 
    806 
    807 class RandomizedSearchCV(BaseSearchCV):
    808     """Randomized search on hyper parameters.

...........................................................................
//anaconda/lib/python2.7/site-packages/sklearn/grid_search.py in _fit(self=GridSearchCV(cv=None, error_score='raise',
     ...='2*n_jobs', refit=True, scoring=None, verbose=2), X=     mean radius  mean texture  mean perimeter  ...               0.06688  

[426 rows x 30 columns], y=336    1
334    1
79     1
536    0
384    1
394...
320    1
202    0
377    1
281    1
dtype: int64, parameter_iterable=<sklearn.grid_search.ParameterGrid object>)
    548         )(
    549             delayed(_fit_and_score)(clone(base_estimator), X, y, self.scorer_,
    550                                     train, test, self.verbose, parameters,
    551                                     self.fit_params, return_parameters=True,
    552                                     error_score=self.error_score)
--> 553                 for parameters in parameter_iterable
        parameters = undefined
        parameter_iterable = <sklearn.grid_search.ParameterGrid object>
    554                 for train, test in cv)
    555 
    556         # Out is a list of triplet: score, estimator, n_test_samples
    557         n_fits = len(out)

...........................................................................
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=Parallel(n_jobs=-1), iterable=<generator object <genexpr>>)
    805             if pre_dispatch == "all" or n_jobs == 1:
    806                 # The iterable was consumed all at once by the above for loop.
    807                 # No need to wait for async callbacks to trigger to
    808                 # consumption.
    809                 self._iterating = False
--> 810             self.retrieve()
        self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=-1)>
    811             # Make sure that we get a last message telling us we are done
    812             elapsed_time = time.time() - self._start_time
    813             self._print('Done %3i out of %3i | elapsed: %s finished',
    814                         (len(self._output), len(self._output),

---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
ValueError                                         Fri Dec  2 11:07:20 2016
PID: 48178                             Python 2.7.12: //anaconda/bin/python
...........................................................................
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=<sklearn.externals.joblib.parallel.BatchedCalls object>)
     67     def __init__(self, iterator_slice):
     68         self.items = list(iterator_slice)
     69         self._size = len(self.items)
     70 
     71     def __call__(self):
---> 72         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <function _fit_and_score>
        args = (DecisionTreeClassifier(class_weight=None, criter...resort=False, random_state=None, splitter='best'),      mean radius  mean texture  mean perimeter  ...               0.06688  

[426 rows x 30 columns], 336    1
334    1
79     1
536    0
384    1
394...
320    1
202    0
377    1
281    1
dtype: int64, <function _passthrough_scorer>, array([139, 140, 143, 144, 147, 148, 149, 150, 1...16, 417, 418, 419, 420, 421, 422, 423, 424, 425]), array([  0,   1,   2,   3,   4,   5,   6,   7,  ...33, 134, 135, 136, 137, 138, 141, 142, 145, 146]), 2, {'max_samples': 0.10000000000000001, 'n_estimators': 1}, {})
        kwargs = {'error_score': 'raise', 'return_parameters': True}
        self.items = [(<function _fit_and_score>, (DecisionTreeClassifier(class_weight=None, criter...resort=False, random_state=None, splitter='best'),      mean radius  mean texture  mean perimeter  ...               0.06688  

[426 rows x 30 columns], 336    1
334    1
79     1
536    0
384    1
394...
320    1
202    0
377    1
281    1
dtype: int64, <function _passthrough_scorer>, array([139, 140, 143, 144, 147, 148, 149, 150, 1...16, 417, 418, 419, 420, 421, 422, 423, 424, 425]), array([  0,   1,   2,   3,   4,   5,   6,   7,  ...33, 134, 135, 136, 137, 138, 141, 142, 145, 146]), 2, {'max_samples': 0.10000000000000001, 'n_estimators': 1}, {}), {'error_score': 'raise', 'return_parameters': True})]
     73 
     74     def __len__(self):
     75         return self._size
     76 

...........................................................................
//anaconda/lib/python2.7/site-packages/sklearn/cross_validation.py in _fit_and_score(estimator=DecisionTreeClassifier(class_weight=None, criter...resort=False, random_state=None, splitter='best'), X=     mean radius  mean texture  mean perimeter  ...               0.06688  

[426 rows x 30 columns], y=336    1
334    1
79     1
536    0
384    1
394...
320    1
202    0
377    1
281    1
dtype: int64, scorer=<function _passthrough_scorer>, train=array([139, 140, 143, 144, 147, 148, 149, 150, 1...16, 417, 418, 419, 420, 421, 422, 423, 424, 425]), test=array([  0,   1,   2,   3,   4,   5,   6,   7,  ...33, 134, 135, 136, 137, 138, 141, 142, 145, 146]), verbose=2, parameters={'max_samples': 0.10000000000000001, 'n_estimators': 1}, fit_params={}, return_train_score=False, return_parameters=True, error_score='raise')
   1515     fit_params = fit_params if fit_params is not None else {}
   1516     fit_params = dict([(k, _index_param_value(X, v, train))
   1517                       for k, v in fit_params.items()])
   1518 
   1519     if parameters is not None:
-> 1520         estimator.set_params(**parameters)
        estimator.set_params = <bound method DecisionTreeClassifier.set_params ...esort=False, random_state=None, splitter='best')>
        parameters = {'max_samples': 0.10000000000000001, 'n_estimators': 1}
   1521 
   1522     start_time = time.time()
   1523 
   1524     X_train, y_train = _safe_split(estimator, X, y, train)

...........................................................................
//anaconda/lib/python2.7/site-packages/sklearn/base.py in set_params(self=DecisionTreeClassifier(class_weight=None, criter...resort=False, random_state=None, splitter='best'), **params={'max_samples': 0.10000000000000001, 'n_estimators': 1})
    265                 # simple objects case
    266                 if key not in valid_params:
    267                     raise ValueError('Invalid parameter %s for estimator %s. '
    268                                      'Check the list of available parameters '
    269                                      'with `estimator.get_params().keys()`.' %
--> 270                                      (key, self.__class__.__name__))
        key = 'n_estimators'
        self.__class__.__name__ = 'DecisionTreeClassifier'
    271                 setattr(self, key, value)
    272         return self
    273 
    274     def __repr__(self):

ValueError: Invalid parameter n_estimators for estimator DecisionTreeClassifier. Check the list of available parameters with `estimator.get_params().keys()`.
___________________________________________________________________________

In [71]:
gs.get_params

<bound method GridSearchCV.get_params of GridSearchCV(cv=None, error_score='raise',
       estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            presort=False, random_state=None, splitter='best'),
       fit_params={}, iid=True, n_jobs=-1,
       param_grid={'n_estimators': array([ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19]), 'max_samples': array([ 0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9])},
       pre_dispatch='2*n_jobs', refit=True, scoring=None, verbose=2)>

## 2 Diabetes and Regression

Scikit Learn has a dataset of diabetic patients obtained from this study:

http://www4.stat.ncsu.edu/~boos/var.select/diabetes.html
http://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf

442 diabetes patients were measured on 10 baseline variables: age, sex, body mass index, average blood pressure, and six blood serum measurements.

The target is a quantitative measure of disease progression one year after baseline.

Repeat the above comparison between a DecisionTreeRegressor and a Bagging version of the same.

### 2.a Simple comparison
1. Load the data and create X and y
- Initialize a Decision Tree Regressor and use cross_val_score to evaluate it's performance. Set crossvalidation to 5-folds. Which score will you use?
- Wrap a Bagging Regressor around the Decision Tree Regressor and use cross_val_score to evaluate it's performance. Set crossvalidation to 5-folds. 
- Which score is better? Are the score significantly different? How can you judge that?

In [80]:
from sklearn.datasets import load_diabetes
data=load_diabetes()

In [89]:
X=pd.DataFrame(data.data)

In [90]:
feature_names=['age','sex','bmi','avg_bp','bs_1','bs_2','bs_3','bs_4','bs_5','bs_6']
X=pd.DataFrame(data.data, columns=feature_names)
X.head()

Unnamed: 0,age,sex,bmi,avg_bp,bs_1,bs_2,bs_3,bs_4,bs_5,bs_6
0,0.038076,0.05068,0.061696,0.021872,-0.044223,-0.034821,-0.043401,-0.002592,0.019908,-0.017646
1,-0.001882,-0.044642,-0.051474,-0.026328,-0.008449,-0.019163,0.074412,-0.039493,-0.06833,-0.092204
2,0.085299,0.05068,0.044451,-0.005671,-0.045599,-0.034194,-0.032356,-0.002592,0.002864,-0.02593
3,-0.089063,-0.044642,-0.011595,-0.036656,0.012191,0.024991,-0.036038,0.034309,0.022692,-0.009362
4,0.005383,-0.044642,-0.036385,0.021872,0.003935,0.015596,0.008142,-0.002592,-0.031991,-0.046641


In [93]:
y=data.target

In [96]:
dtr=DecisionTreeRegressor()

In [104]:
scores=cross_val_score(dtr, X,y, cv=5,n_jobs=-1)
print scores
scores.mean()

[-0.24508988 -0.06012957 -0.23846801  0.15198377 -0.15089586]


-0.10851991227550058

In [105]:
bdt = BaggingRegressor(dtr)
scores2=cross_val_score(bdt,X,y,cv=5,n_jobs=-1)
print scores2
scores2.mean()

[ 0.36371752  0.44872625  0.38740482  0.35560213  0.30846479]


0.37278310123202629

## Scaled pipelines

In [108]:
X_train,X_test,y_train,y_test=train_test_split(X,y)

In [109]:
#standardscaler with decision tree
pipe1= make_pipeline(StandardScaler(), DecisionTreeRegressor())
pipe1

Pipeline(steps=[('standardscaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('decisiontreeregressor', DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,
           max_leaf_nodes=None, min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, presort=False, random_state=None,
           splitter='best'))])

In [111]:
pipe1.fit(X_train,y_train).score(X_test,y_test)

0.14520230937125966

In [116]:
#Robust scaler with bagging
pipe2= make_pipeline(StandardScaler(), BaggingRegressor(tree.DecisionTreeRegressor()))
pipe2

Pipeline(steps=[('standardscaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('baggingregressor', BaggingRegressor(base_estimator=DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,
           max_leaf_nodes=None, min_samples_leaf=1, min_samples_split=2,
           min_...estimators=10, n_jobs=1, oob_score=False,
         random_state=None, verbose=0, warm_start=False))])

In [117]:
pipe2.fit(X_train,y_train).score(X_test,y_test)

0.42441810068295438

### 2.b Grid Search

Repeat Grid search as above:

1. Initialize a GridSearchCV with 5-fold cross validation for the Decision Tree Regressor
- Search for few values of the parameters in order to improve the score of the regressor
- Use the whole X, y dataset for your test
- Check the best\_score\_ once you've trained it. Is it better than before?
- How does the score of the Grid-searched DT compare with the score of the Bagging DT?
- Initialize a GridSearchCV with 5-fold cross validation for the Bagging Decision Tree Regressor
- Repeat the search
    - Note that you'll have to change parameter names for the base_estimator
    - Note that there are also additional parameters to change
    - Note that you may end up with a grid space to large to search in a short time
    - Make use of the n_jobs parameter to speed up your grid search
- Does the score improve for the Grid-searched Bagging Regressor?
- Which score is better? Are the score significantly different? How can you judge that?


In [122]:
#The cells below are for 6.3.1
import pandas as pd
df = pd.read_csv('car.csv')
df.head()

Unnamed: 0,buying,maint,doors,persons,lug_boot,safety,acceptability
0,vhigh,vhigh,2,2,small,low,unacc
1,vhigh,vhigh,2,2,small,med,unacc
2,vhigh,vhigh,2,2,small,high,unacc
3,vhigh,vhigh,2,2,med,low,unacc
4,vhigh,vhigh,2,2,med,med,unacc


In [123]:
from sklearn.preprocessing import LabelEncoder
y = LabelEncoder().fit_transform(df['acceptability'])
X = pd.get_dummies(df.drop('acceptability', axis=1))

In [124]:
from sklearn.cross_validation import cross_val_score, StratifiedKFold
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier, BaggingClassifier

cv = StratifiedKFold(y, n_folds=3, shuffle=True, random_state=41)

In [125]:
dt = DecisionTreeClassifier(class_weight='balanced')
s = cross_val_score(dt, X, y, cv=cv, n_jobs=-1)
print "{} Score:\t{:0.3} ± {:0.3}".format("Decision Tree", s.mean().round(3), s.std().round(3))

Decision Tree Score:	0.965 ± 0.01


In [126]:
bdt = BaggingClassifier(DecisionTreeClassifier())
rf = RandomForestClassifier(class_weight='balanced', n_jobs=-1)
et = ExtraTreesClassifier(class_weight='balanced', n_jobs=-1)

def score(model, name):
    s = cross_val_score(model, X, y, cv=cv, n_jobs=-1)
    print "{} Score:\t{:0.3} ± {:0.3}".format(name, s.mean().round(3), s.std().round(3))

score(dt, "Decision Tree")
score(bdt, "Bagging DT")
score(rf, "Random Forest")
score(et, "Extra Trees")

Decision Tree Score:	0.965 ± 0.01
Bagging DT Score:	0.972 ± 0.008
Random Forest Score:	0.944 ± 0.012
Extra Trees Score:	0.951 ± 0.002


In [135]:
%%timeit
from sklearn.ensemble import AdaBoostClassifier, GradientBoostingClassifier
ab = AdaBoostClassifier()
gb = GradientBoostingClassifier()
score(ab, "AdaBoost")
score(gb, "Gradient Boosting Classifier")

AdaBoost Score:	0.811 ± 0.002
Gradient Boosting Classifier Score:	0.982 ± 0.006
AdaBoost Score:	0.811 ± 0.002
Gradient Boosting Classifier Score:	0.982 ± 0.006
AdaBoost Score:	0.811 ± 0.002
Gradient Boosting Classifier Score:	0.982 ± 0.006
AdaBoost Score:	0.811 ± 0.002
Gradient Boosting Classifier Score:	0.982 ± 0.006
1 loop, best of 3: 1.91 s per loop


## Bonus: Project 6 data

Repeat the analysis for the Project 6 Dataset