FOS is machine learning model training and scoring server in Java.
There are pretty good machine learning training and scoring frameworks/libraries out there, but they don't provide the following benefits:
- Common API: fos provides a common abstraction for model attributes, model training and model scoring. Using a Weka based classifier will use have exactly the same API as using a R based classifier.
- Scoring & Training as a remote service: Training and scoring can be farmed to dedicated servers in the network enabling both vertical and horizontal scaling.
- Import and Export models: A model could be trained in a development box and imported seamlessly into a remote server
- Scalable and low latency scoring: Marshalling and Unmarshalling scoring requests/responses can be responsible for a significant amount of overhead. Along with the slow RMI based interface, fos also supports scoring using Kryo.
You need:
After both the Java SDK and Maven are installed run the following command
mvn clean install
This should compile fos-core, ran all the tests and install all modules into your local maven repo.
In order to start a FOS sever you need to create a bundle that contains both the core components and one or more fos backend implementations.
To create a bundle type
make package
on the fos-core
project root. This will:
- Build fos core
- Copy all dependencies listed in
non-core-dependencies.xml
into fos-server lib directory (this includes the weka and R API implementations) - Create a tar.gz bundle with all the necessary code plus shell scripts to bootstrap the process. The file will be available as
fos-server/target/fos-server-bin.tar.gz
.
- Untar the server bundle on your install location. Let's assume FOS will be installed in the home dir
$ cd ~
$ tar xf <git clone dir>/fos-server/target/fos-server-bin.tar.gz
By now you should have a fos-server
on your home.
$ cd fos-server
$ ~/fos-server $ ls
bin conf lib LICENSE.txt log models README.txt
Inside there are a couple directories:
bin
with startup scriptsconf
with FOS configuration scriptsmodels
where trained models are going to be found
To start fos, run the bundled startup script bin/startup.sh
$ bin/startup.sh
03-Feb 04:28:31 INFO com.feedzai.fos.server.Runner Starting fos server using configuration from conf/fos.properties
03-Feb 04:28:31 INFO com.feedzai.fos.server.Runner FOS Server started in 272ms
We've just started a FOS server with Weka support built into the fos-weka fos module.
Currently FOS can only support an active module per runtime instance. The active server is
set in the fos.factoryName
configuration option inside conf/fos.properties
file:
# the fos implementation to launch
fos.factoryName=com.feedzai.fos.impl.weka.WekaManagerFactory
We've prepared a couple FOS samples. Check them out
fos-core does not provide any concrete implementation. However, a bundled fos server includes both a fos-weka (active by default) and a fos-r implementation. It is pretty easy to create a new implementation if you want to leverage existing code.
Your first step will be to understand ManagerFactory interface. A ManagerFactory should perform all the necessary boostrapping for a given Manager implementation, which must provide implementations for:
- Model training
- Model management (add, removal, import and export)
- A Scorer implementation
Lets dissect a real example:
public class WekaManagerFactory implements ManagerFactory {
@Override
public Manager createManager(FosConfig configuration) {
WekaManagerConfig wekaManagerConfig = new WekaManagerConfig(configuration);
return new WekaManager(wekaManagerConfig);
}
}
The first step is parse Manager specific configuration parameters
WekaManagerConfig wekaManagerConfig = new WekaManagerConfig(configuration);
Now that we have specific config, we can create a new WekaManager:
return new WekaManager(wekaManagerConfig);
Most of the heavy lifting is done by WekaManager implementation. Since most of them are Weka specifc, it is not worthwhile to go into implementation details here. The following operations are performed:
- Search for previously saved models and load their configuration.
- Create a
fos-weka
Scorer implementation. - Start listening to requests via RMI and Kryo.
- Start a thread pool to allow parallel scoring.
In order to implement model training, you need to supply the model configuration and training instances. You can see a practical example in fos training sample. A model configuration is composed by:
-
A set of key-value properties relevant from each implementation. We recomend to define all configuration options in a dedicated class (see weka model configuration for example).
-
An attribute list. The number of atributes in the training data must match the configuration attribute list. An attribute can be:
- Numerical attribute: Numeric attributes can be real or integer types.
- Categorical attribute: For attributes with a limited set of valid values.
You'll have to convert these FOS abstractions to a format your implementation understands.
There are multiple training entry points:
UUID trainAndAdd(ModelConfig config,List<Object[]> instances) throws FOSException;
traindAndAdd
must train a new classifier with the given configuration and using the given instances
. It should return the serialized classifier and automatically make it avaiable for scoring
UUID trainAndAddFile(ModelConfig config,String path) throws FOSException;
trainAndAddFile
Same as above, but instances are read from a CSV file.
Model train(ModelConfig config,List<Object[]> instances) throws FOSException;
train
Trains a model and returns a Model. The Model implementation can be either a ModelBinary, which contains its serialized representation, or a ModelPMML, which contains a String with its representation in PMML. The model is not made available for scoring.
Model trainFile(ModelConfig config, String path) throws FOSException;
trainFile
same as above, but instances are read from a CSV file.
Along with training models, the manager is also responsible for providing a Scorer implementation. There is a weka weka scoring example for reference.