Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Suggestion - additional info from metadata #133

Closed
DavidEnnis-CleverLlamas opened this issue Jul 1, 2022 · 7 comments
Closed

Feature Suggestion - additional info from metadata #133

DavidEnnis-CleverLlamas opened this issue Jul 1, 2022 · 7 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@DavidEnnis-CleverLlamas
Copy link

DavidEnnis-CleverLlamas commented Jul 1, 2022

In the use of QueryMarkLogic, you can set the option to return metadata with or without the content.

Under the hood, MarkLogic returns the entire rapi:metadata/ payload (metadata-values, collections, permissions, quality, properties)

However, the implementation seems to toss out some of the metadata and only sets

  • metadata-values as meta:xxx using DocumentMetadataHandle.getMetadataValues()
  • properties as property:xxx using DocumentMetadataHandle.getProperties().

I need additional information from the rapi:metadata payload. FOr the first use-case, I need collections. It would be a shame to have to make a second call for information already provided.

I was was thinking of one of the following:

  1. Add the missing items as attributes (permission:, collection, quality) just like meta: and property: (formats to be considered)
    AND/OR
  2. add the entire rapi:medatada fragment as an attribute so that at least it is available.

Willing to work on this if there is value in it going back into the main project. -David Ennis

@rjrudin
Copy link

rjrudin commented Aug 2, 2022

Thanks @17llamas - approach 1 above seems like a simple and logical default thing to do. We'll get this into the next release.

@rjrudin rjrudin added this to the 1.16.3.1 milestone Aug 5, 2022
@rjrudin
Copy link

rjrudin commented Aug 23, 2022

@17llamas Let me know how this sounds for exposing collections, permissions, and document quality:

  1. Collections will be added as a "collections" attribute with all collections joined in a comma-delimited string
  2. For each unique role in the set of permissions, a "permission:(role-name)" attribute will be added with the list of capabilities for that role joined in a comma-delimited string - e.g. "permission:my-role" = "read,update"
  3. The document quality will be added to a "quality" attribute

We are considering adding an "ml-" prefix to each of these, though we initially won't touch the "meta:" and "property:" prefixes. That would help ensure uniqueness for these FlowFile attributes so that they don't collide with existing attributes.

@rjrudin rjrudin self-assigned this Aug 23, 2022
@rjrudin rjrudin added the enhancement New feature or request label Aug 23, 2022
rjrudin added a commit that referenced this issue Aug 25, 2022
rjrudin added a commit that referenced this issue Aug 26, 2022
@rjrudin
Copy link

rjrudin commented Aug 26, 2022

Some logging (via the LogAttribute processor) showing all the metadata for some test documents:

-------------------QUERY RESULT-------------------
FlowFile Attribute Map Content
Key: 'filename'
        Value: '/PutMarkLogicTest/20.xml'
Key: 'marklogic-collections'
        Value: 'QueryMarkLogicTest-2,QueryMarkLogicTest,test1'
Key: 'marklogic-permissions'
        Value: 'rest-writer,update,rest-reader,read,rest-reader,execute'
Key: 'marklogic-quality'
        Value: '12'
Key: 'meta:meta1'
        Value: 'hello1'
Key: 'meta:meta2'
        Value: 'hello2'
Key: 'meta:my-uri'
        Value: '/PutMarkLogicTest/20.xml'
Key: 'path'
        Value: './'
Key: 'property:{org:example}hello'
        Value: 'world'
Key: 'uuid'
        Value: '35eb577d-f996-4773-a16a-9c25c67666ac'
-------------------QUERY RESULT-------------------
<?xml version="1.0" encoding="UTF-8"?>
<root><sample>xmlcontent</sample><dateTime xmlns="namespace-test">2000-01-01T00:00:00.000000</dateTime></root>

@DavidEnnis-CleverLlamas
Copy link
Author

@rjrudin
Copy link

rjrudin commented Aug 27, 2022

@17llamas That's a good question - there are some areas between processors where behavior differs when it seems like it should be the same. For example, I would think that any processor that retrieves one to many items from ML would follow the same original/results pattern, where each FlowFile sent to "results" is a clone of the original FlowFile sent to "original".

I am going to look into this further for 1.16.3.2 to firm up consistency between the processors. Going to get 1.16.3.1 out on Monday to address an SSL bug in RunFlowMarkLogic and then will get a plan together for 1.16.3.2.

@DavidEnnis-CleverLlamas
Copy link
Author

HI Rob

A few notes:

  • The option of including all meta is great, However, for a company that might make extensive use of the properties fragment, this could cause some memory overhead. Not my use-case - just pointing it out. Flow attributes are in-memory. Parsing all properties fragment elements will each go into memory. Perhaps a warning on this point or in the future, allow which items from meta to keep.

Good that you will look at standardizing the Controllers a bit. I have gone through each line-by-line and it looks like they are created at different times by different people - and in some cases, for certain specific use-cases. This is clear when you look at the rows endpoint where very few of the options of the API are available to configure (so in my case, I use the eval endpoint and run the optic query from there).

Regarding no passing upstream flow attributes as is the case with QueryML, I have opened a separate item for that since it has it's own defined problem statement.

@rjrudin rjrudin closed this as completed Aug 29, 2022
@rjrudin
Copy link

rjrudin commented Aug 29, 2022

Will be addressing the properties issue in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants