Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Build Status

pandas2sklearn

An integration of pandas dataframes with scikit learn.

The module contains:

  • dealing with dataframes in a scikit learn DataSet fashion.
  • transformation mechanism that can be easily integrated in scikit learn pipelines, DataSetTransformer.

Installation

The module can be easily installed with pip:

> pip install pandas2sklearn

Tests

The module contains some basic testing of the provided functionalities.

> py.test

Usage

The module contains two classes:

DataSet

The DataSet is wrapper around pandas DataFrame, that converts you can use to select:

  • id
  • features
  • target

Example, suppose we have a DataFrame that has the following columns;

df.coumns = id, FN1, FN2, FN3, FN4, FN5, FC1, FC2, FC3, FC4, FC5, FC6, target

from pandas_sklearn import DataSet

dataset = DataSet(df, target_column='target', id_column='id')

dataset.has_target() == True
dataset.has_id() == True
dataset.target == df['target']
dataset.id == df['id']
dataset.target_names == ['FN1', 'FN2', 'FN3', 'FN4', 'FN5', 'FC1', 'FC2', 'FC3', 'FC4', 'FC5', 'FC6']
dataset.data == df[['FN1', 'FN2', 'FN3', 'FN4', 'FN5', 'FC1', 'FC2', 'FC3', 'FC4', 'FC5', 'FC6']]


# removing some features that are not needed FN4, FN5, FC1, FC5, FC6
dataset.set_feature_names(usage=DataSet.EXCLUDE, columns=['FN4', 'FN5', 'FC1', 'FC5', 'FC6'])
dataset.target_names == ['FN1', 'FN2', 'FN3', 'FC2', 'FC3', 'FC4']

# converting the dataset to dictionary
dataset.to_dict() == [
    {'FN1': 12, 'FN2': 23, 'FC2': 'coffee', 'FC2': 'xbox one', 'FC4': 'inch'},
    ...
]

DataSetTransformer

A feature wise transformer, applies a scikit-learn transformer to one or more features. e.g.

DataSetTransformer([
    (['petal length (cm)', 'petal width (cm)'], StandardScaler()),
    ('sepal length (cm)', MinMaxScaler()),
    ('sepal width (cm)', None),
]))

It could be used together with pipelines, e.g.

pipeline = Pipeline([
    ('preprocess', DataSetTransformer([
        (['petal length (cm)', 'petal width (cm)'], StandardScaler()),
        ('sepal length (cm)', MinMaxScaler()),
        ('sepal width (cm)', None),
    ])),
    ('classify', SVC(kernel='linear'))
])

Credit

The DataSetTransformer is based on the work of Ben Hamner and Paul Butler.

About

An integration of pandas dataframes with scikit learn.

Resources

License

Releases

No releases published

Packages

No packages published

Languages

You can’t perform that action at this time.