Skip to content
(wip) haskell online (machine) learning toolkit
Branch: master
Clone or download
Valentin Reis
Valentin Reis readme update
Latest commit 99b6517 May 24, 2019

README.md

WIP: Holt (Haskell online learning toolkit)

WIP: use at your own risk.

The goal of this library is to be a coherent set of tools for a user who wants to develop a custom approach for a online machine learning problem, experimentally validate their model, deploy it to production and operate it. Some Haskell fluency is required to use this. It's very WIP - this only supports a few supervised models as of now, but I'd like to extend it to limited feedback and unsupervised models.

Package hierarchy:

  • Holt.Core
    Primitives for writing online machine learning algorithms. Polymorphic with respect to the data representation using Data.Linear. Depends on:

  • Holt.Extra
    Tools for experimentation, validation, deployment and monitoring of holt models.

  • Holt.Examples
    Example learner scripts. A detailed example is given below.

Design choices:

  • Value-level Haskell EDSL modeling. Learning algorithms are built much like an Xmonad, xmobar, yi, or termonad application would be configured.

  • No autodiff, no neural nets.

  • Production tooling goals: batch/daemon mode, checkpointing, monitoring, alerts, reporting, CLI interface.

  • Include a zoo of losses, regularizers and algorithms. These are value-level and overridable (a toy example is below for the simple case of overriding the loss function with custom haskell code).

Building

Nix packaging is provided:

$ nix-shell -A validation

See this post for more information.

dependencies:

base containers text optparse-applicative deepseq conduit conduit-extra conduit-combinators vector holt-core mtl resourcet bytestring linear cassava-conduit cassava attoparsec data-default aeson pretty-show random base linear deepseq mtl data-default MonadRandom lens random base containers vector optparse-applicative aeson deepseq conduit conduit-extra conduit-combinators mtl resourcet bytestring linear cassava-conduit cassava attoparsec data-default pretty-show random text MonadRandom lens base holt-core holt-extra containers optparse-applicative aeson base holt-core holt-extra vector optparse-applicative aeson

Hacking

Provisioning:

$ nix-shell

or

$ direnv allow && lorri watch

Tools:

  • monolithic ghcid with no package frontiers: ./shake ghcid Monolithic
  • readme: ./shake readme
  • inplace brittany: ./shake brittany

Detailed Motivation

There is distance between ML R&D and production code/deployments:

  • R&D code is not easily legible from the production engineering poind of view. Engineers have to undestand a R&D method and re-implement algorithms in a production framework.

  • Production code is not legible from the R&D process. This means that the ML/Datascience practicioner sometimes does not have access to the real pro/post processing layers from the production "pipeline".

Holt is designed to help reduce this distance for users who are knowledgable across the two domains. It certainly will not be as easy to use as a python notebook or a cvx model for an applied mathematician, but it is an attempt to move in the right direction.

Detailed example:

This snippet uses Holt.Core modeling primitives to express a sequential version of a support vector machine (SVM), via Online Gradient Descent(OGD). It can be executed using a Haskell interpreter or compiled via GHC into a specialized binary for dense data. Here, the instance memory representation is Data.Vector. The command-line building tools Holt.Extra.Wrapper are used to make this learner actually userful.

{-# language RecordWildCards #-}
{-# language DeriveGeneric #-}
{-# language DeriveAnyClass #-}

module Main
  ( main
  )
where

import           Holt.Extra.Wrapper
                   ( cli )
import           Holt.Extra.Parsers
import           Holt.Extra.State

import           Holt.Core.Tasks
import           Holt.Core.Algorithms
import           Holt.Core.Losses
import           Holt.Core.Regularizers
import           Holt.Core.Learners
import           Holt.Core.Models
import           Holt.Core.Metrics
import           Data.Vector                                       as V

import           Options.Applicative                               as OA

data SvmArgs = SvmArgs { rate :: Double, lambda :: Double }
pSvm :: OA.Parser SvmArgs
pSvm =
  SvmArgs
    <$> option auto (metavar "RATE" <> short 'r' <> help "Learning Rate")
    <*> option auto (metavar "LAMBDA" <> short 'l' <> help "L2 Norm size")

main :: IO ()
main = cli pSvm desc supervisedMachine wrappedParseCsv (const False) mkLearner
 where
  mkLearner SvmArgs {..} = WrappedSupervisedLearner
    { learner = supervisedLearner algorithm
                                  (fixedRate rate)
                                  (l1 lambda)
                                  regression
                                  mse
    , metrics = basicRegMetricsConfig
    }
  algorithm :: ProjectionDescent FirstOrderState V.Vector Double
  algorithm = ogd
  desc      = "Online SVM using the vanilla OGD step."

General options:

Preprocessing executable 'monolithic' for holt-0.1.0.0..
Building executable 'monolithic' for holt-0.1.0.0..
Running monolithic...
Online SVM using the vanilla OGD step.

Usage: monolithic [--version] [--verbose] COMMAND
  hol-tool

Available options:
  -h,--help                Show this help text
  --version                Show version
  --verbose                Print debug output to stdout.

Available commands:
  batch                    Run the learner in Batch mode.
  daemon                   Run the learner in Dameon mode.

Batch mode:

Preprocessing executable 'monolithic' for holt-0.1.0.0..
Building executable 'monolithic' for holt-0.1.0.0..
Running monolithic...
Usage: monolithic batch INPUT_DATA_FILE OUTPUT_DATA_FILE
                        [-o|--write_model OUTPUT_MODEL_FILE]
                        [-i|--load_model INPUT_MODEL_FILE] -r RATE -l LAMBDA
  Run the learner in Batch mode.

Available options:
  INPUT_DATA_FILE          Input file, `-` for stdin
  OUTPUT_DATA_FILE         Output file, `-` for stdout
  -o,--write_model OUTPUT_MODEL_FILE
                           Output file for the learned model
                           state. (default: "")
  -i,--load_model INPUT_MODEL_FILE
                           Input file for the initial model state (default: "")
  -r RATE                  Learning Rate
  -l LAMBDA                L2 Norm size
  -h,--help                Show this help text

Daemon mode:

Preprocessing executable 'monolithic' for holt-0.1.0.0..
Building executable 'monolithic' for holt-0.1.0.0..
Running monolithic...
Usage: monolithic daemon ADDRESS [-o|--write_model OUTPUT_MODEL_FILE]
                         [-i|--load_model INPUT_MODEL_FILE] -r RATE -l LAMBDA
  Run the learner in Dameon mode.

Available options:
  ADDRESS                  ZMQ socket address
  -o,--write_model OUTPUT_MODEL_FILE
                           Output file for the learned model
                           state. (default: "")
  -i,--load_model INPUT_MODEL_FILE
                           Input file for the initial model state (default: "")
  -r RATE                  Learning Rate
  -l LAMBDA                L2 Norm size
  -h,--help                Show this help text

See also:

vw

Jubatus

Some Scikit-Learn code

SOL

OLL

You can’t perform that action at this time.