Skip to content

jubatus/jubakit

Repository files navigation

Travis Coveralls PyPi

jubakit: Jubatus Toolkit

jubakit is a Python module to access Jubatus features easily. jubakit can be used in conjunction with scikit-learn so that you can use powerful features like cross validation and model evaluation. See the Jubakit Documentation for the detailed description.

Currently jubakit supports Classifier, Regression, Anomaly, Recommender, NearestNeighbor, Clustering, Burst, Bandit and Weight engines.

Install

pip install jubakit

Requirements

  • Python 2.7, 3.3, 3.4 or 3.5.
  • Jubatus needs to be installed.
  • Although not mandatory, installing scikit-learn is required to use some features like K-fold cross validation.

Quick Start

The following example shows how to perform train/classify using CSV dataset.

from jubakit.classifier import Classifier, Schema, Dataset, Config
from jubakit.loader.csv import CSVLoader

# Load a CSV file.
loader = CSVLoader('iris.csv')

# Define types for each column in the CSV file.
schema = Schema({
  'Species': Schema.LABEL,
}, Schema.NUMBER)

# Get the shuffled dataset.
dataset = Dataset(loader, schema).shuffle()

# Run the classifier service (`jubaclassifier` process).
classifier = Classifier.run(Config())

# Train the classifier.
for _ in classifier.train(dataset): pass

# Classify using the trained classifier.
for (idx, label, result) in classifier.classify(dataset):
  print("true label: {0}, estimated label: {1}".format(label, result[0][0]))

Examples by Topics

See the example directory for working examples.

Example Topics Requires scikit-learn
classifier_csv.py Handling CSV file and numeric features  
classifier_shogun.py Handling CSV file and string features  
classifier_digits.py Handling toy dataset (digits)
classifier_libsvm.py Handling LIBSVM file
classifier_kfold.py K-fold cross validation and metrics
classifier_parameter.py Finding best hyper parameter
classifier_hyperopt_tuning.py Finding best hyper parameter using hyperopt
classifier_bulk.py Bulk Train-Test Classifier  
classifier_twitter.py Handling Twitter Streams  
classifier_model_extract.py Extract contents of Classfier model file  
classifier_sklearn_wrapper.py Classification using scikit-learn wrapper
classifier_sklearn_grid_search.py Grid Search example using scikit-learn wrapper
classifier_tensorboard.py Visualize a training process using TensorBoard
regression_boston.py Regression with toy dataset (boston)
regression_csv.py Regression with CSV file  
regression_sklearn_wrapper.py Regression using scikit-learn wrapper
anomaly_auc.py Anomaly detection and metrics  
recommender_npb.py Recommend similar items  
nearest_neighbor_aaai.py Search neighbor items  
clustering_2d.py Clustering 2-dimensional dataset  
burst_dummy_stream.py Burst detection with stream data  
bandit_slot.py Multi-armed bandit with slot machine example  
weight_shogun.py Tracing fv_converter behavior using Weight  
weight_model_extract.py Extract contents of Weight model file  

License

MIT License