changed readme to RST

scikit-learn-contrib · Apr 25, 2013 · d46db24 · d46db24
1 parent b10d6ee
commit d46db24
Show file tree

Hide file tree

Showing 5 changed files with 85 additions and 25 deletions.
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,48 @@
+sklearn-pandas -- bridge code for cross-validation of pandas data frames
+    with sklearn
+
+This software is provided 'as-is', without any express or implied
+warranty.  In no event will the authors be held liable for any damages
+arising from the use of this software.
+
+Permission is granted to anyone to use this software for any purpose,
+including commercial applications, and to alter it and redistribute it
+freely, subject to the following restrictions:
+
+1. The origin of this software must not be misrepresented; you must not
+ claim that you wrote the original software. If you use this software
+ in a product, an acknowledgment in the product documentation would be
+ appreciated but is not required.
+2. Altered source versions must be plainly marked as such, and must not be
+ misrepresented as being the original software.
+3. This notice may not be removed or altered from any source distribution.
+
+Paul Butler <paulgb@gmail.com>
+
+The source code of DataFrameMapper is derived from code originally written by
+Ben Hamner and released under the following license.
+
+Copyright (c) 2013, Ben Hamner
+Author: Ben Hamner (ben@benhamner.com)
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met: 
+
+1. Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer. 
+2. Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution. 
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
+ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -0,0 +1,2 @@
+include LICENSE
+include README.rst
diff --git a/README.md → README.rst b/README.md → README.rst
@@ -2,7 +2,7 @@
 Sklearn-pandas
 ==============
 
-This module provides a bridge between [Scikit-Learn](http://scikit-learn.org/stable/)'s machine learning methods and [pandas](http://pandas.pydata.org/)-style Data Frames.
+This module provides a bridge between `Scikit-Learn <http://scikit-learn.org/stable/>`__'s machine learning methods and `pandas <http://pandas.pydata.org/>`__-style Data Frames.
 
 In particular, it provides:
 
@@ -12,40 +12,42 @@ In particular, it provides:
 Installation
 ------------
 
-You can install `sklearn-pandas` with `pip`.
+You can install ``sklearn-pandas`` with ``pip``::
 
     # pip install sklearn-pandas
 
 Tests
 -----
 
-The examples in this file double as basic sanity tests. To run them, use `doctest`, which is included with python.
+The examples in this file double as basic sanity tests. To run them, use ``doctest``, which is included with python::
 
     # python -m doctest README.md
 
 Usage
 -----
 
-### Import
+Import
+******
 
-Import what you need from the `sklearn_pandas` package. The choices are:
+Import what you need from the ``sklearn_pandas`` package. The choices are:
 
-* `DataFrameMapper`, a class for mapping pandas data frame columns to different sklearn transformations
-* `cross_val_score`, similar to `sklearn.cross_validation.cross_val_score` but working on pandas DataFrames
+* ``DataFrameMapper``, a class for mapping pandas data frame columns to different sklearn transformations
+* ``cross_val_score``, similar to `sklearn.cross_validation.cross_val_score` but working on pandas DataFrames
 
-For this demonstration, we will import both.
+For this demonstration, we will import both::
 
     >>> from sklearn_pandas import DataFrameMapper, cross_val_score
 
-For these examples, we'll also use pandas and sklearn.
+For these examples, we'll also use pandas and sklearn::
 
     >>> import pandas as pd
     >>> import sklearn.preprocessing, sklearn.decomposition, \
     ...     sklearn.linear_model, sklearn.pipeline, sklearn.metrics
 
-### Load some Data
+Load some Data
+**************
 
-Normally you'll read the data from a file, but for demonstration purposes I'll create a data frame from a Python dict.
+Normally you'll read the data from a file, but for demonstration purposes I'll create a data frame from a Python dict::
 
     >>> data = pd.DataFrame({'pet':      ['cat', 'dog', 'dog', 'fish', 'cat', 'dog', 'cat', 'fish'],
     ...                      'children': [4., 6, 3, 3, 2, 3, 5, 4],
@@ -54,19 +56,21 @@ Normally you'll read the data from a file, but for demonstration purposes I'll c
 Transformation Mapping
 ----------------------
 
-### Map the Columns to Transformations
+Map the Columns to Transformations
+**********************************
 
-The mapper takes a list of pairs. The first is a column name from the pandas DataFrame (or a list of multiple columns, as we will see later). The second is an object which will perform the transformation which will be applied to that column.
+The mapper takes a list of pairs. The first is a column name from the pandas DataFrame (or a list of multiple columns, as we will see later). The second is an object which will perform the transformation which will be applied to that column::
 
     >>> mapper = DataFrameMapper([
     ...     ('pet', sklearn.preprocessing.LabelBinarizer()),
     ...     ('children', sklearn.preprocessing.StandardScaler())
     ... ])
 
 
-### Test the Transformation
+Test the Transformation
+***********************
 
-We can use the `fit_transform` shortcut to both fit the model and see what transformed data looks like.
+We can use the ``fit_transform`` shortcut to both fit the model and see what transformed data looks like::
 
     >>> mapper.fit_transform(data)
     array([[ 1.        ,  0.        ,  0.        ,  0.20851441],
@@ -78,22 +82,23 @@ We can use the `fit_transform` shortcut to both fit the model and see what trans
            [ 1.        ,  0.        ,  0.        ,  1.04257207],
            [ 0.        ,  0.        ,  1.        ,  0.20851441]])
 
-Note that the first three columns are the output of the `LabelBinarizer` (corresponding to _cat_, _dog_, and _fish_ respectively) and the fourth column is the standardized value for the number of children. In general, the columns are ordered according to the order given when the `DataFrameMapper` is constructed.
+Note that the first three columns are the output of the ``LabelBinarizer`` (corresponding to _cat_, _dog_, and _fish_ respectively) and the fourth column is the standardized value for the number of children. In general, the columns are ordered according to the order given when the ``DataFrameMapper`` is constructed.
 
-Now that the transformation is trained, we confirm that it works on new data.
+Now that the transformation is trained, we confirm that it works on new data::
 
     >>> mapper.transform({'pet': ['cat'], 'children': [5.]})
     array([[ 1.        ,  0.        ,  0.        ,  1.04257207]])
 
-### Transform Multiple Columns
+Transform Multiple Columns
+**************************
 
-Transformations may require multiple input columns. In these cases, the column names can be specified in a list.
+Transformations may require multiple input columns. In these cases, the column names can be specified in a list::
 
     >>> mapper2 = DataFrameMapper([
     ...     (['children', 'salary'], sklearn.decomposition.PCA(1))
     ... ])
     
-Now running `fit_transform` will run PCA on the `children` and `salary` columns and return the first principal component.
+Now running ``fit_transform`` will run PCA on the ``children`` and ``salary`` columns and return the first principal component::
 
     >>> mapper2.fit_transform(data)
     array([[ 47.62288153],
@@ -108,14 +113,20 @@ Now running `fit_transform` will run PCA on the `children` and `salary` columns
 Cross-Validation
 ----------------
 
-Now that we can combine features from pandas DataFrames, we may want to use cross-validation to see whether our model works. Scikit-learn provides features for cross-validation, but they expect numpy data structures and won't work with `DataFrameMapper`.
+Now that we can combine features from pandas DataFrames, we may want to use cross-validation to see whether our model works. Scikit-learn provides features for cross-validation, but they expect numpy data structures and won't work with ``DataFrameMapper``.
 
-To get around this, sklearn-pandas provides a wrapper on sklearn's `cross_val_score` function which passes a pandas DataFrame to the estimator rather than a numpy array.
+To get around this, sklearn-pandas provides a wrapper on sklearn's ``cross_val_score`` function which passes a pandas DataFrame to the estimator rather than a numpy array::
 
     >>> pipe = sklearn.pipeline.Pipeline([
     ...     ('featurize', mapper),
     ...     ('lm', sklearn.linear_model.LinearRegression())])
     >>> cross_val_score(pipe, data, data.salary, sklearn.metrics.mean_squared_error)
     array([ 2018.185     ,     6.72033058,  1899.58333333])
 
-Sklearn-pandas' `cross_val_score` function provides exactly the same interface as sklearn's function of the same name.
+Sklearn-pandas' ``cross_val_score`` function provides exactly the same interface as sklearn's function of the same name.
+
+Credit
+------
+
+The code for ``DataFrameMapper`` is based on code originally written by `Ben Hamner <https://github.com/benhamner>`__.
+
diff --git a/setup.py b/setup.py
@@ -3,7 +3,7 @@
 from setuptools import setup
 
 setup(name='sklearn-pandas',
-      version='0.2',
+      version='0.0.1',
       description='Pandas integration with sklearn',
       author='Paul Butler',
       author_email='paulgb@gmail.com',

diff --git a/sklearn_pandas/__init__.py b/sklearn_pandas/__init__.py
@@ -2,7 +2,6 @@
 import numpy as np
 from sklearn.base import BaseEstimator, TransformerMixin
 from sklearn import cross_validation
-import pdb
 
 def cross_val_score(estimator, X, *args, **kwargs):
     class DataFrameWrapper(object):