# Scflex User Guide

# Introduction

Scflex is a task scheduler for command-line programs in distributed cluster environment. It enables a command line program to run in a distributed, scalable and online environment. The architecture relies on MongoDB Database to store programs and task informations, and Apache Spark for resilient distributed computing.

## Elements of the system
The architecture currently only have two main functionalities
 1. A task scheduler
 1. A simple web-based task monitoring system

## Preparing a Scflex project

This guide will walk through the necessary steps, in the form of an example demonstration, to setup an outlier detection algorithm on time-series data of a repertoire of users. The sample program is a python executable named "simple_outlier_detector".

For demonstration purpose, javascript is being used in this guide. Authentication is not covered in this guide.

A (more or less) user-friendly Python interface has been mplementated and can be found in the main repository. 

The recommended directory structure includes the main executable ("simple_outlier_detector" in this example), a setup, a template and a “hook” directory that is soft-linked from the service directory (another directory separated from the main package)

Example directory structure
- scflex_hook/
- setup/
- templates/
- simple_outlier_detector

Example "scflex_hook" directory structure:

- start_app.sh
- setups/
 - setup_log_dir.sh
 - setup_task_db.js
 - setup.py
 - task_manager.py

where
- "start_app.sh" is a bash script to start the application,
- "setup_log_dir.sh" is the script to setup logging directory,
- "setup_task_db.js" is the script to setup mongodb databases and user privilege,
- "task_manager.py" is a task manager that is specific to each service,
- "setup.py" is the main script to initialize the task list

Examples to these files can be found in the Appendix part of this guide.

The setup/ directory is reserved for auxillary setup processes (e.g. database, authentication setup), whereas the templates/ directories are useful when templating technique is used (see advanced usages), and will be covered later.

### Setting up the task list
First we will decide the application name to be "simple_outlier_detection". It is recommended for the application name to be the same with the name of the MongoDB collection.

A task list (i.e. the collection) will be built under the "Analytics" database.

In [None]:
// build the task list
use Analytics;
db.simple_outlier_detection.insert({"role":"placeholder"});
// create indexes 
db.simple_outlier_detection.createIndex({'user_key': 1});
db.simple_outlier_detection.createIndex({'role': 1});

A "engine configuration" then needs to be built in the task list. The configuration document would have a {role:engine_conf} entry, and consists of minimal entries as shown follows.

In [None]:
db.simple_outlier_detection.insert({ 
  "role"              : "engine_conf",
  "name"              : "simple_outlier_detection",
  "task_db_name"      : "Analytics",
  "task_coll_name"    : "simple_outlier_detection",
  "batch_size"        : 20,
  "max_timeout"       : 300,
  "refresh_interval"  : 10,
  "hibernation_period": 0,
  "max_failure_n"     : 10,
  "uuid"              : "somerandomuuid",
});

### Adding tasks

One or more "task" can be added to the task list. The task document would have a {role:task} entry, and consists of entries as shown follows.

In [None]:
db.simple_outlier_detection.insert({
  "name"      : "uniqueNameEachTask",
  "role"      : "task",
  "uuid"      : "somerandomuuid",
  "date_created": "2017-03-09T10:56:00",
  "historicals" : {},
  "status": {
    "status"         : "active",
      "last_updated" : 0,
      "last_success" : 0,
      "last_failure" : 0,
      "failure_n"    : 0,
      "priority_r"   : 1.0,
      "last_response": {
        "status_code": -1,
        "info": {
          "duration": -1
        }
      }
  },
  "program_pars": {
    "working_dir"  : "/home/analytic_bot/uCare/services/Analytics/simple_outlier_detection",
    "executable"   : "./simple_outlier_detector",
    "cmd_args" : {
      "f": 0.10,
      "l": 0.12
      "infile" : "/path/to/input.csv",
      "outfile": "/path/to/output.pickle"
      "meta": ["/home/analytic_bot/template-A.yaml", "/home/analytic_bot/template-B.yaml"]
    },
  }
})

### Starting the engine

A Scflex task list is essential a Spark application. 

the recommended way to start the engine is through the "start_app.sh" start-up file. The example of a typical start-up file is provided as follows:

In [None]:
$HOME/Scflex/Scflex-master/Scflex/bin/start_service 
 --app simple_outlier_detection_heart 
 --master spark://127.0.0.1:7077 
 --conf setups/.keys/scflex_conf.yaml 
 --logdir $HOME/log/simple_outlier_detection_heart

where
 * "app" is the application name given to the task list (it's an arbitrary label)
 * "master" is the Spark master node url
 * "logdif" is the logging directory for the application
 * "conf" is the important setting which specifies the configuration file for the project. The configuration file should be in yaml/json format, and contain minimally the following content:

In [None]:
db_pars:
  username    : analytic_bot
  password    : aBotPassword
  cluster_url : 127.0.0.1:27017
  database    : Analytics
  authSource  : admin
task_db_name   : Analytics
task_coll_name : simple_outlier_detection_heart
match_dict     : {"role": "engine_conf"}

where the "db_pars" specifies the url and authentication to the MongoDB database, and the rest provides information for Scflex to locate the task engine configuration (engine_conf) file.

Note that to start the application as a daemon (background process), one can do either make the start_app a (linux) service or use nohup with it. For example:

In [None]:
nohup start_app.sh > nohup.log &

### Advanced usages

#### A. Templating (development suspended)

It is possible to use a template instead of specifying the common attributes in each of the task. Templating is also useful when a change to the program needs to be propagated. One need only to modify the template instead of modifying the entire task list.

Current Scflex only supports a MongoDB template file (i.e. a MongoDB document). A template file must have three entries of "name", "role:task_template", and "content". Scflex will read the entries below "content" and make it to (flag, argument) pairs to be fed to the executable. User provided flags will supersede the template if the parameter names overlap.

To add an example task template:

In [2]:
db.simple_outlier_detection.insert({
  "name" : "default_simple_outlier_detection_heart",
  "role" : "task_template",
  "content": {
    "mode"    : "interactive",
    "infile"  : "/path/to/input.csv",
    "outfile" : "/path/to/output.pickle"
    "parfile" : "/path/to/parfile.yaml"
  }
});

For a task to use the template, the task's program_pars entry must contain the following two entries:
1. db_pars_path: which points to a yaml/json file specifying the database credentials.
1. conf_loc: which specifies the location of the template file in the database. The location must be in the format of "database_name:collection_name:template_name".

(see "Adding tasks" section for an example)

## Task status monitoring

Scflex task monitoring support is currently only minimal. To inspect the task list, one can use the web-ui provided by Apache Spark, which is by default located at localhost:8080 (i.e. just do "lynx localhost:8080" and locate the application).
To inspect the status of individual tasks, Scflex includes a minimalist web portal to display the status of all tasks in a task list. By default it's located at localhost:9090.

### Appendix

\# example "setup_log_dir.sh"

In [None]:
#!/bin.sh
mkdir /home/analytic_bot/log/uCare_simple_outlier_detection
chgrp uCare -R /home/analytic_bot/log/uCare_simple_outlier_detection
chmod g+w /home/analytic_bot/log/uCare_simple_outlier_detection

\# example "setup_task_db.js"

In [None]:
// create user in admin database
use admin;
db.system.users.updateOne(
        {"_id":"admin.analytic_bot"}, 
        {$push: {"roles": 
                {"role": "readWrite", 
                 "db": "Analytics"
                }}
        }
);

// create user in Analytics database
use Analytics;
db.createUser(
  {
    user: "analytic_bot",
    pwd: "analyticpass",
    roles: [{ role: "readWrite", "db" : "Analytics"}]
  }
);