Skip to content
Embarrassingly Parallel Array Computing: EPAC is a machine learning workflow builder.
Python Shell
Branch: master
Clone or download
Pull request Compare This branch is 260 commits ahead of duchesnay:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bin
doc
epac
examples
test
.gitignore
LICENSE
README.md
ez_setup.py
setup.py

README.md

epac

Embarrassingly Parallel Array Computing: EPAC is a machine learning workflow builder.

Given a database:

    from sklearn import datasets
    X, y = datasets.make_classification(n_samples=12,
                                        n_features=10,
                                        n_informative=2,
                                        random_state=1)
  • You can build a big machine workflow:

    Permutation (Perm) + Cross-validation (CV) of SVM(linear) and SVM(rbf)
    ----------------------------------------------------------------------
             Perms          Perm (Splitter)
        /     |       \
       0      1       2     Samples
              |
              CV            CV (Splitter)
          /   |   \
         0    1    2        Folds
              |
           Methods          Methods (Splitter)
       /           \
    SVM(linear)  SVM(rbf)   Classifiers (Estimator)

using very simple codes:

    from sklearn.svm import SVC
    from epac import Perms, CV, Methods
    perms_cv_svm = Perms(CV(
                     Methods(*[SVC(kernel="linear"), SVC(kernel="rbf")]),
                     n_folds=3),
                     n_perms=3)
    perms_cv_svm.run(X=X, y=y) # Top-down process: computing recognition rates, etc.
    perms_cv_svm.reduce() # Bottom-up process: computing p-values, etc.

Then you can get results like:

ResultSet(
[{'key': SVC(kernel=linear), 'y/test/score_f1': [ 0.5  0.5], 'y/test/score_recall_mean/pval': [ 0.5], 'y/test/score_recall/pval': [ 0.5  0.5], 'y/test/score_accuracy/pval': [ 0.5], 'y/test/score_f1/pval': [ 0.5  0.5], 'y/test/score_precision/pval': [ 0.5  0.5], 'y/test/score_precision': [ 0.5  0.5], 'y/test/score_recall': [ 0.5  0.5], 'y/test/score_accuracy': 0.5, 'y/test/score_recall_mean': 0.5},
 {'key': SVC(kernel=rbf), 'y/test/score_f1': [ 0.5  0.5], 'y/test/score_recall_mean/pval': [ 1.], 'y/test/score_recall/pval': [ 0.  1.], 'y/test/score_accuracy/pval': [ 1.], 'y/test/score_f1/pval': [ 1.  1.], 'y/test/score_precision/pval': [ 1.  1.], 'y/test/score_precision': [ 0.5  0.5], 'y/test/score_recall': [ 0.5  0.5], 'y/test/score_accuracy': 0.5, 'y/test/score_recall_mean': 0.5}])
  • Run epac tree in parallel on local multi-core machine or even on DRM system using soma-workflow.
    from epac import LocalEngine
    local_engine = LocalEngine(tree_root=perms_cv_svm, num_processes=2)
    perms_cv_svm = local_engine.run(X=X, y=y)
    perms_cv_svm.reduce()
  • Design your own machine learning algorithm as a plug-in node in epac tree.
    ## 1) Design your classifier
    ## =========================
    class MySVC:
        def __init__(self, C=1.0):
            self.C = C
        def transform(self, X, y):
            from sklearn.svm import SVC 
            svc = SVC(C=self.C)
            svc.fit(X, y)
            # "transform" should return a dictionary
            return {"y/pred": svc.predict(X), "y": y}

    ## 2) Design your reducer for recall rates
    ## ===========================================
    from epac.map_reduce.reducers import Reducer  
    class MyReducer(Reducer):
        def reduce(self, result):
            from sklearn.metrics import precision_recall_fscore_support 
            pred_list = []
            # iterate all the results of each classifier
            # then you can design you own reducer!
            for res in result:
                precision, recall, f1_score, support = \
                        precision_recall_fscore_support(res['y'], res['y/pred'])
                pred_list.append({res['key']: recall})
            return pred_list

    ## 3) Build a tree, and then compute results 
    ## =========================================
    from epac import Methods 
    my_svc1 = MySVC(C=1.0)
    my_svc2 = MySVC(C=2.0)
    two_svc = Methods(my_svc1, my_svc2)
    two_svc.reducer = MyReducer()
    #           Methods
    #          /      \
    # MySVC(C=1.0)  MySVC(C=2.0) 
    # top-down process to call transform
    two_svc.run(X=X, y=y)
    # buttom-up process to compute scores
    two_svc.reduce()

You can get results: [{'MySVC(C=1.0)': array([ 1., 1.])}, {'MySVC(C=2.0)': array([ 1., 1.])}]

Important links

Installation http://neurospin.github.io/pylearn-epac/installation.html

Tutorials http://neurospin.github.io/pylearn-epac/tutorials.html

Documentation http://neurospin.github.io/pylearn-epac

Presentation Embarrassingly Parallel Array Computing

You can’t perform that action at this time.