Skip to content
This repository has been archived by the owner on Nov 9, 2022. It is now read-only.

Enable users to control collections during merging #190

Closed
aajacobs opened this issue Oct 11, 2018 · 5 comments
Closed

Enable users to control collections during merging #190

aajacobs opened this issue Oct 11, 2018 · 5 comments
Labels
enhancement New feature or request fixed in dev Is fixed on the develop branch; will be closed with the next release
Milestone

Comments

@aajacobs
Copy link
Contributor

aajacobs commented Oct 11, 2018

Currently, a merged document inherits the collections from all source documents being merged, with the "mdm-merged" collection being added.

To support business workflows, smart mastering should allow users to select how collections are handled during a merge, including what collections should be added, and what collections should be removed from a merged document. Without having control of the collections, batch processing of documents requires writing more complex queries to isolate documents for processing. And being able to put merged documents into custom collections allows users to master multiple types of entities in the same database, instead of having multiple entities all being in the "mdm-merged" collection.

Ideally, this type of workflow should be supported:
(1) User puts all documents to be mastered in a "toBeMastered" collection
(2) A batch process runs by selected documents in the "toBeMastered" collection
(3) New documents are put into a user-specified collection, including the option to bring forward collections from source documents or not, like "masterPerson".
(4) Users can remove collections, like "toBeMastered" from merged and original documents, so the next time the batch process runs to select documents in the "toBeMastered" collection, it won't rerun against the same documents.

@dmcassel
Copy link
Contributor

dmcassel commented Oct 11, 2018

Proposal: allow a collectionStrategy entry under the algorithms part of merge config:

{
  "options:" {
    "algorithms": {
      "collectionStrategy": {
        "function": "myCollectionBuilder",
        "at": "/some/dir/myCode.sjs"
      }
    }
  }
}

The collection strategy function could get an object (SJS) or map (XQuery) where the keys are the URIs of the source document and the values are the collections that each source document is in. The function would return an array (SJS) or sequence (XQuery) of strings with the names of the collections to add to the new document. The default strategy (as happens now) would be the union of source document collections plus $const:MERGED-COLL.

Matching already allows callers to specify a filter-query to narrow down what documents should be considered for matching. I think you're also saying we should just rely on the filter query and forget the requirement that docs be in the $const:CONTENT-COLL. Correct?

@dmcassel
Copy link
Contributor

Should allow the collection strategy to control the collections when archiving, too. Currently when merging, all source docs get put into the $const:ARCHIVED-COLL.

@dmcassel
Copy link
Contributor

dmcassel commented Oct 15, 2018

Configuration will control the collections that get applied to documents at various times. Configuration will be part of the merge options.

  <algorithms xmlns="http://marklogic.com/smart-mastering/merging">
    <collections>
      <on-merge function="union" at="/some/dir/code.xqy" ns="some-namespace"/>
      <on-archive function="remove-content-coll" at="/some/dir/code.xqy" ns="some-namespace"/>
      <on-no-match function="add-content-coll" at="/some/dir/code.xqy" ns="some-namespace"/>
      <on-notification function="add-notification-coll" at="/some/dir/code.xqy" ns="some-namespace"/>
    </collections>
  </algorithms>

The on-merge strategy will determine what collections are applied to newly created merged documents. Default strategy: union of all collections on source documents, plus $const:CONTENT-COLL. Comment if there's interest in having an intersection plus $const:CONTENT-COLL strategy available out of the box.

The on-archive strategy will determine what collections are applied to documents that get archived (merged into other documents). Default strategy: add $const:ARCHIVED-COLL, remove $const:CONTENT-COLL.

The on-no-match strategy will determine what collections are applied to documents passed to process:process-match-and-merge but do find any matches. Default strategy: no change to the document's collections. There will be an out of the box strategy to add the $const:CONTENT-COLL collection.

The on-notification strategy will determine what collections are applied to newly created notification documents. Default strategy: notification documents will get the $const:NOTIFICATION-COLL collection.

For each type of content strategy, we'll define an API that can be used to make custom strategies.

@popzip
Copy link

popzip commented Oct 16, 2018

To restate: The goal is for users to write merge algorithms/logic for metadata, including collections. Correct?

@dmcassel
Copy link
Contributor

this story covers collections, there's another one for permissions, but yeah, that's the idea

@ryanjdew ryanjdew added the fixed in dev Is fixed on the develop branch; will be closed with the next release label Nov 9, 2018
@ryanjdew ryanjdew added this to the 1.2.0 milestone Nov 12, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request fixed in dev Is fixed on the develop branch; will be closed with the next release
Projects
None yet
Development

No branches or pull requests

5 participants