Sebastian Raschka, 2015  
`mlxtend`, a library of extension and helper modules for Python's data analysis and machine learning libraries

- GitHub repository: https://github.com/rasbt/mlxtend
- Documentation: http://rasbt.github.io/mlxtend/

View this page in [jupyter nbviewer](http://nbviewer.ipython.org/github/rasbt/mlxtend/blob/master/docs/sources/_ipynb_templates/_template.ipynb)

In [1]:
%load_ext watermark
%watermark -a 'Sebastian Raschka' -u -d -v -p matplotlib,numpy,scipy

Sebastian Raschka 
Last updated: 12/07/2015 

CPython 3.5.0
IPython 4.0.0

matplotlib 1.5.0
numpy 1.10.1
scipy 0.16.0


In [2]:
import sys
sys.path.insert(0, '../../../github_mlxtend/')

import mlxtend
mlxtend.__version__

'0.3.0dev'

# DenseTransformer

A simple transformer that converts a sparse into a dense numpy array, e.g., required for scikit-learn's `Pipeline` when e.g,. `CountVectorizers` are used in combination with `RandomForest`s.

> from mlxtend.preprocessing import DenseTransformer

### Related Topics

- [Standardize](./standardize.md)
- [MeanCenterer](./mean_centerer.md)
- [Min-Max Scaling](./minmax_scaling.md)
- [DenseTransformer](./scikit-learn_dense_transformer.md)

# Examples

## Example 1

In [23]:
from sklearn.pipeline import Pipeline
from sklearn.grid_search import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction.text import CountVectorizer
from mlxtend.preprocessing import DenseTransformer
import re
import numpy as np

X_train = np.array(['abc def ghi', 'this is a test',
                    'this is a test', 'this is a test'])
y_train = np.array([0, 0, 1, 1])

pipe_1 = Pipeline([
    ('vect', CountVectorizer()),
    ('to_dense', DenseTransformer()),
    ('clf', RandomForestClassifier())
])

parameters_1 = dict(
    clf__n_estimators=[50, 100, 200],
    clf__max_features=['sqrt', 'log2', None],)

grid_search_1 = GridSearchCV(pipe_1, 
                             parameters_1, 
                             n_jobs=1, 
                             verbose=1,
                             scoring='accuracy',
                             cv=2)


print("Performing grid search...")
print("pipeline:", [name for name, _ in pipe_1.steps])
print("parameters:")
grid_search_1.fit(X_train, y_train)
print("Best score: %0.3f" % grid_search_1.best_score_)
print("Best parameters set:")
best_parameters_1 = grid_search_1.best_estimator_.get_params()
for param_name in sorted(parameters_1.keys()):
    print("\t%s: %r" % (param_name, best_parameters_1[param_name]))

Performing grid search...
pipeline: ['vect', 'to_dense', 'clf']
parameters:
Fitting 2 folds for each of 9 candidates, totalling 18 fits
Best score: 0.500
Best parameters set:
	clf__max_features: 'sqrt'
	clf__n_estimators: 50


[Parallel(n_jobs=1)]: Done  18 out of  18 | elapsed:    2.3s finished


# API

In [24]:
from mlxtend.preprocessing import DenseTransformer
help(DenseTransformer)

Help on class DenseTransformer in module mlxtend.preprocessing.dense_transformer:

class DenseTransformer(builtins.object)
 |  A transformer for scikit-learn's Pipeline class that converts
 |  a sparse matrix into a dense matrix.
 |  
 |  Methods defined here:
 |  
 |  __init__(self, some_param=True)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  fit(self, X, y=None)
 |  
 |  fit_transform(self, X, y=None)
 |  
 |  get_params(self, deep=True)
 |  
 |  transform(self, X, y=None)
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)

