[WIP] Proposal for modularizing loopback datasource juggler

The purpose of this proposal is to come up with a strategy to separate out loopback-datasource-juggler (Juggler) into individual components. The expected outcome of this document is to build a list of user-stories (tasks needed to deliver this epic) with input from involved parties (internal and community) to pragmatically refactor this project based on separation of concerns (SoC).

Please give feedback at https://github.com/strongloop/loopback/issues/1958

Development
Boot
Runtime
Maintenance

Proposal

Overview

tldr;

I am proposing we start with the registries first and then the rest in the following order (bottom-up approach): registries, connector, data types, model, relation, transaction, and finally migration. Cross-cutting concerns such as validation, hook, mixin will need to be integrated as soon as it's required in any of the impls listed.

loopback-datasource-juggler
- dao
- datasource/connector registry
- datasource loader
- connectors
  - memory
  - transient
- attachment mixins
  - use find from mysql instead of mongo, etc
loopback-connector
- base connector
- sql connector
  - parameterized sql builder
- noSql connector
loopback-types
- geo
- base types
- type handlers
  - parsing
  - coercion
loopback-model
- model definition registry
- model definition loader
- model class registry
- model class builder
- model class loader (instantiator)
- templating
- scope/view
  - constraints
- promisifying
- inclusion
loopback-relation
loopback-transaction
loopback-migration
- introspection
- automigration

Cross-cutting concerns

loopback-validation
loopback-hook
loopback-mixin

Undecided?

loopback-registries
- registry loader

Details

This proposal is divided into the following aspects of the entire dev experience:

Dev
Boot
Runtime
Maintenance

There will be some overlap, but they will be described in the coming sections.

Development

Model definitions

This is when the developer is creating model definitions that will be used during boot and run time, they will either take existing definition (external or loopback definitions) and register them into the app.

Existing definitions

This will involve conversion of whatever format to a loopback model definition (model.json). We do not currently have tooling to convert other formats into loopback-types (except swagger I believe), so this is one aspect to consider.

Create tooling that will convert custom definitions into LB model defs.
- This will probably involve aspects of automigration already implemented
- Maybe move all related ETL concerns in to a new module loopback-migration/etl
- Create loopback-migration/etl with this concern in mind (flexible enough to accomodate for this in the future)

New definitions

End-users will either define them manually or use slc loopback:model to help define them. We should be flexible enough to allow both (which we already do).

Properties to include in the definition file

This will depend on what happens during the other aspects of the entire dev experience/lifecycle. Will update this section when the other sections are flushed out.

Boot

This is divided into two separate lifecycles, one for loopback-boot (Boot) and the other for Juggler.

Boot's lifecycle

During boot, Juggler must be registered early in Boot's lifecycle to make Juggler's dependents available in subsequent phases of Boot's lifecycle. Boot's only interaction with Juggler should be start Juggler's boot lifecycle, which may depend on other configs to be loaded by Boot beforehand.

Juggler's boot lifecycle

Once Boot initiates Juggler's boot lifecycle, it must determine which data types are valid:

boot will load a data type definition manifest (datatypes.json?)
- create the data type definition collection
  - array containing model definitions
- before loadDataTypes hook
- load datatypes
  - before hook/after hooks between each data type load
    - Determine which types to allow hooks on
      - Most flexible is all types, but built-in + user-defined is probably good enough
      - Allow provider style
        
        We pass the data type definition to you
      - Mixin style
        
        We copy prop and behav directly to the model
        
        Default to throw if overwriting existing prop/behavs
        
        Provide a setting to force overwriting
  - load built-ins
    - Primitives
      - String
      - Number
      - ...
  - Built-in LoopBack data types
    - Geopoint
    - ObjectId
  - User-defined (custom) data types
    - must follow a specific format (class definition) we define
      - probably write a set of standard unit tests to ensure this
      - if the tests pass, your type is "registerable"
        
        ensures the type is instantiable
    - throw for invalid data type definitions
  - Virtual properties/hidden
    - ones that are defined, but not actually persisted
- after loadDataTypes hook
- these types will be used for validation when reading in model definitions

Should this happen during Boot's lifecycle?

We then need to load all the connectors to determine which data types are available to ensure valid types in the data type registry:

read in connectors from datasources.json
- based on the type of connector, register new data types based on the types of connectors registered
- validators/validation may also need to be updated to reflect the current data type availability

Now that the data types and connectors are configured, Juggler must then read in the model definitions and prepare the models definitions:

create the model registry
- array containing models definitions
before LoadModelDefsIntoRegistry hook
load model definition into registry
- load one model
- from preconfigured dirs
  - predefined in config.json
    - default to existing dirs
      - common/models
      - server/models
  - configurable via config.json
  - will be picked up during Boot's lifecycle
    - will involve some file reading module to read in definitions
      - file reader
        
        defaults to built-in reader
        
        defaults to json
        
        may be swapped out by user to support other formats
        
        swapped out at boot time and run time?
- before modelToLoad validation hook
- type handler
  - parsing
  - coercion
  - run validation on each model definition to ensure Juggler next phase can
    - validation will depend on many factors
      - are the data types used in the definition valid?
      - ...
    - if invalid
      - throw error and stop execution
    - if valid
      - load model into registry
      - there should be a config (maybe model-config property) to allow models to load in a specific order
  - after modelToLoad validation hook
  - repeat loading next model
after LoadModelDefsIntoRegistry hook

Do we care about registering model definitions into a registry or do we only care that there is a registry of model classes?

Do we need a custom collection since we will have multiple collections with before and after hooks?

Once all the models definitions are loaded into the model definition registry, we need to convert each model into model classes based on the model definition available in the registry:

Will probably involve the class loader to read in the defs and convert them into classes
- Users may want to use a custom class loader
  - Is there a benefit for end-users to have a custom class loader?
  - Not sure how to handle this as loopback will expect a specific format
  - Maybe just pass the loader to the user and allow users to use the one we include in Juggler
- create model class registry
- before hook
  - provider style
  - convert one definition into a model class
    - before hook
      - provider style
    - convert
    - after hook
      - provider style
- after hook
  - provider style

Do users want to modify model definitions at runtime?

Once the classes and connectors have been registered, we can then instantiate models at runtime at will. This also means model instantiation is now available during any subsequent Boot phase.

Should consider bulk instantiation use cases for performance

Runtime

With the models classes avaiable in the registry, during runtime users will want to:

add/remove data types
add/remove data sources/connectors
- which may cause new data types to be available/unavailable
add/remove models
- may involve Adding/removing model definitions
import/export model definitions
instantiate models
query instances of model
- persisted models
  - crud
    - virtual properties (hidden/not actually persisted)
    - read models based on relations
    - scopes/views of resultsets
  - custom queries
  - relations
  - validation
  - mixins
  - hooks for each of the above
- transactions
  - atomic operations
- other types such as remote models
- promise-based
- templating?
  - what does this mean?
- constraints
get the list of items in each registry
- connector registry
- model def registry
- model class registry
- maybe make the registry info available via REST also
- registries should be available to the app object passed around LB
  - ie. app.registry.modelDefinitions, etc
  - the registry will live in juggler, but referenced from the LB app

Should consider bulk instantiation use cases for performance

Maintenance

migration
- import/export model definitions
- bulk
- import/export model instances
convert model defs from random formats to loopback model definitions

Notes from conversation with @raymondfeng

Lifecycle

loopback starts boot starts

does boot start up juggler first?
- probably not since it would couple the two together
  - juggler registers things to happen at various point in the boot lifecycle
  - meaning juggler must load before boot since end users will depend on complete models with their boot scripts
  - that or juggler can register itself into a specific phase of the boot lifecycle, which then it can only register tasks to happen at any point AFTER the current phase of the boot lifecycle (since the previous phases have already passed)
- if juggler is configured before boot, then it is free to register hooks into any phase of the boot lifecycle
- other option is to make juggler the FIRST item in the first phase of boot, so all items are ready before usage (internally and by end-user) in any phase since there are no phases before since juggler is the first task boot starts juggler
juggler starts it boot lifecycle
- boot
loopback
- boot
  - aop
  - extension points
- juggler
  - aop
  - registers tasks to extension poitns
- debugging at various points in the lifecycle

Responsibility

Juggler
- Responsible for loading up all depend modules (boot)
- Acts as the controller
  - Doesn't perform any tasks on it's own, delegates all tasks to submodules
- Creates registries?
  - Contains registries?
  - Maybe move to loopback-registry
- Register connectors
  - Memory
  - Transient
DAO
- Facade to database type connectors
- May contain addition logic to trigger different events
- hooks
  - event emitting
  - normalization
    - crud in json, try to type convert, validate)
  - validation
- templating
- promisifying
- scope
  - predefined criteria
    - custom model
      - customer type must be xyz
      - additional constraints on the model
      - basically "view" of the model
- mixin capabilities
- transactions
Datasource
- config of the backend connector
Datatypes
- Contains a list of data types
  - geo
  - array
  - objectId
  - etc
- should contain the registry
- get/set new datatypes at boot/run time
Models
- Construction
- Registry
  - base model (abstract)
    - runtime impl of relation
      - connects the models
    - scope define views on the model
    - dsl for js classes
      - no crud
      - bind model to some backend then have some behaviours
  - collections (abstract)
- Definition
  - introspection
    - tries to introspect json doc and infer the model
  - relation
    - connects the models
  - scope def
  - validation
    - prop or model level
    - extendible
      - email has to be uniq
  - model builder
    - transform model def into js class
- Runtime
  - inclusions
  - AOP
    - hooks
    - change to observers
  - transaction abstraction

Module structure

loopback-datasource-juggler
- Data source registry
- Data type registry
- Model registry
  - Support for various types
    - Base types
      - Primitives
    - Geopoint
    - ObjectId
  - Type handler
    - Parsing
    - Coercion
    - Validation
loopback-model
- DSL to cover JS classes
- Builder/loader facility - extension points
- Instantiation in juggler
  - require lb-model
  - require builder
    - pass in model def
    - create a class def
    - get a customer class with properties
    - new customer (should contain default values)
    - perform validation
    - in memory
  - Create instances
    - require relations (mixin)
    - require validation (mixin)
  - Mixins (extension point)
    - predefined
      - copy props and behaviours to the model
    - provider
      - user provides a function, we pass the model def to you
loopback-relation
loopback-validation
loopback-migration
- Importer
- Exporter

Actual files

File will move based on their roles in the list above.

Overview

Use cases

User wants to define a data source (non-built-in data source)
User wants to define a model (user-defined model)
- User wants to define a custom data type (user-defined data types)
- User wants to define a custom constraint (user-defined constraint)

Architecure

Based on modularity and extensibility.

Assembled from n modules
Each module responsible for its own concerns

Questions

How many modules to refactor?
What is the relationship between each module
- How do the modules work together?
hardcore, require, extension points, how do two modules work together, etc

Juggler

Interaction

Developer experience

We begin by starting from the "developer experience" angle because for the next iteration of Juggler to be successful, it must be "good enough" out-of-box and provide extension points for end-users to build in their own capabilities when it is insufficent.

The process begins with defining the models up front from scratch. This info is captured in the form of the "model definition" and is used by Juggler to convert model info into configs that is used to define "model classes". These classes will ultimately used to generate actual model instances. To get a better picture of what this means, a simple example of a model definition would look like:

Model definition

Model definitions can be broken down into the following:

Model-level settings
- Index
- Relations
- ACLs
- Scopes
- Properties
  - Settings that apply to all properties?
- Options
- Strict
  - Specifies whether the model only accepts predefined properties only
  - Should probably change this to setting name to predefined-properties-only
- Metadata
  - ie) name of column for PK, composite keys, etc
- Composition
  - Base
  - Mixins
  - Extend from multiple models?
Property-level settings
- Name
- Type
- Metadata
  - ie) if oracle, what is the column name that should be used
- Contraints
  - Required
  - ENUM
  - Regex
  - Max
  - Min
  - User-defined
    - Custom contraints set up by the user
      - Follows a predefined format understood by Juggler (like registering middleware)

######model-definition.json

The following is a example of a Customer model.

{
  "name": "Customer", // top-level
  "description": "A Customer model representing our customers.",
  "base": "User",
  "idInjection": false,
  "strict": true,
  "options": {...},
  "properties": {...},
  "hidden": {...},
  "validations": [...],
  "relations": {...},
  "acls": {...},
  "scopes": {...},
  "indexes": {...},
  "methods": [...],
  "http": {"path": "/some/path"}
}

Lifecycle

This leads us to the high level overview of how Juggler is used in the overall architecture of LoopBack (LB):

LoopBack ---> Juggler ---> Data Source

LB asks Juggler to fetch some data

LB has no knowledge of how Juggler will perform the task
LB has no knowledge of which data source will be used
LB must use the API provided by Juggler
Juggler makes requests to various data sources on behalf of the client (in this case, LB)
- Juggler maintains the list of data sources registered
  - Data sources are registered during boot
  - Data sources may be added or removed during runtime
    - Programmatically
    - REST
  - Built-in data sources are automatically registered at boot
    - In-memory connector
      - Use by testing infrastructure in most LB-related projects
      - Has option to persist data to the filesystem
        
        Currently only supports JSON output
        
        Should be able to choose output format
        
        Could be useful for performing dumps/migrations
    - Transient connector
      - Works like the In-memory connector
      - Used for embedded relationships
- Juggler maintains the list of data types registered
  - Data types are registered during boot
  - Data types may be added or removed during runtime

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Proposal for modularizing loopback datasource juggler

Proposal

Overview

Cross-cutting concerns

Undecided?

Details

Development

Model definitions

Existing definitions

New definitions

Properties to include in the definition file

Boot

Boot's lifecycle

Juggler's boot lifecycle

Runtime

Maintenance

Notes from conversation with @raymondfeng

Lifecycle

Responsibility

Module structure

Actual files

Overview

Use cases

Architecure

Juggler

Interaction

Developer experience

Model definition

Lifecycle

Clone this wiki locally