Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Processor Plugins hierarchical #7

Closed
DiegoPino opened this issue Mar 17, 2020 · 11 comments
Closed

Make Processor Plugins hierarchical #7

DiegoPino opened this issue Mar 17, 2020 · 11 comments
Labels
enhancement New feature or request

Comments

@DiegoPino
Copy link
Member

What is this?

See #5 and #4 and #6

The idea is that Plugins, which are driven by config entities, as defined here https://github.com/esmero/strawberry_runners/pull/5/files#diff-6cb3b61e72b132f4e76eaf33127a920e are not only sorted by weight but also can/should be hierarchical. Why? Because we would like to allow, by logic, to have Post Processors Plugins to work on other Post processors Outputs. E.g One Post Processor extract files from a PDF, then another uses those files to process HOCR.

How to accomplish this?

  1. Our configuration entity needs more logic. First step would be to add two properties for that
  • Parent
  • Depth

Which would allow us to use something similar to this form in the Entity List Builder https://api.drupal.org/api/drupal/core%21modules%21system%21tests%21modules%21tabledrag_test%21src%21Form%21TableDragTestForm.php/class/TableDragTestForm/8.7.x
To allow people to move/drag Plugin instances into hierarchy.

Parent can be NULL (Top Post Processor) or another Post Processor Config Entity UUID/ID
Depth Can be used to find quickly siblings, etc.

  1. Our Event Subscriber that will get all the JSON events or the EVENT itself then needs better logic to be able to build a tree of execution. Means if we push data in to a QUEUE, then ITEM B can not process until ITEM A. That is quite complex and we can discuss how to deal with this. Options are (thinking loud)
    A. ITEM A is actually the one that in its process adds a new QUEUE ITEM for ITEM B. Means each TOP Post Processor is the responsible to (Parent/sibling) for generating the output but also, triggering the next processing (please ask if this makes no sense)
    OR
    B. We have many QUEUES, one per Depth... and each QUEUE once processed triggers a new Event that then sets some flag that allows a the NEXT DEPTH should be processed. This can lead to unnecessary processing.
@giancarlobi
Copy link
Contributor

@DiegoPino I'm thinking about this. I need to make more clear the workflow in my head. Option A seems to be the right one but I need to solve other dubs as: do we need enable plugin per ADO or per as: entries? What about if I want put exif data manually? and more. I'll use this afternoon to think about this.

@giancarlobi
Copy link
Contributor

@DiegoPino Can we think in hierarchy of Plugins as workflow? Then a Plugin can participate in more than one workflow, right? Each workflow starts with a Plugin Top, the first of the chain.
Then we have to pair workflow with specific as: , right? Or assign workflow to ADO type? Or ...
More afternoon thinking about this.

@DiegoPino
Copy link
Member Author

@giancarlobi yes, a diagram of the workflow can help. Let me see if i can get something done today.
Option A, after sleeping, makes more sense to me. Imagine like traversing a tree from the root to the leaves. Each processor triggers its next child after processing. So the event really just adds the top level processor to the queue. Each Processor is then responsible for looking if there is a child processor that depends on it and adds the corresponding element to a queue, once ready.

About adding data manually.

We could need a flag?. I would say, a signature produced by the queue worker and then some conditional, that says, if ALREADY present, and, not sure how, manual, don't process again? I feel Islandora was not doing right the derivation, because there is no way, of, example given, trigger via the UI a Thumbnail only, if there is no Thumbnail yet. We have no thumbnails of course. But how we decide when processing is needed can be a decision based on what is there (your own EXIF) + a file change? I feel we can almost use some type of Version Control via checksums like git does? Plus timestamps? Open to ideas, but we need to be consistent when we code and also make sure we document this

Thanks my friend

@giancarlobi
Copy link
Contributor

@giancarlobi yes, a diagram of the workflow can help. Let me see if i can get something done today.
Option A, after sleeping, makes more sense to me. Imagine like traversing a tree from the root to the leaves. Each processor triggers its next child after processing. So the event really just adds the top level processor to the queue. Each Processor is then responsible for looking if there is a child processor that depends on it and adds the corresponding element to a queue, once ready.

@DiegoPino Yes, really we need a workflow diagram.
Perfect, the event just adds the top level processor of a specific workflow.
Each processor has to know which workflow is member of so it can take the right decision when ends. Also because each processor could be member of more than one workflow.

@giancarlobi
Copy link
Contributor

About second question I'm running my neurons and I answer late 😄

@giancarlobi
Copy link
Contributor

giancarlobi commented Mar 18, 2020

About adding data manually.

We could need a flag?. I would say, a signature produced by the queue worker and then some conditional, that says, if ALREADY present, and, not sure how, manual, don't process again?

@DiegoPino I'd like something simple as Archipelago philosophy and SBFJSON based.

  • Derivatives are generated only if the user needs them, the default is no derivatives
  • User can enable derivatives by UI, Webform, script, ...
  • User has to be able to fine select derivatives per single as:
  • User has to be able to set a default for all ADO as:
  • User has to be able to set derivatives regeneration

So, what about a flag into SBF JSON?
We can set a main flag for all as: at root level and/or a flag at as: level.
The flag stores the processor state (ToDO, DONE, ReDO, ERROR, ...).
The UI/Webform/script writes the flag into SBF-JSON then the event read the flags and execute the workflow when required.
When workflow ends the flag will be set based on workflow result.
When an ADO as: file is updated, we can suggest to the user to regenerate derivatives but the user has to decide it, no automatic.
What do you think, is this too much manually??

@DiegoPino
Copy link
Member Author

I like the asking people. I feel automatic-invisible is not the right way. We wan always, add, in case of need some rule based system that executes derivatives automatically for certain type of users that won't understand what is needed, and as you say, can be just a flag that we set as hidden on certain forms. Good!

So we need to decide on those states right? We need logs + info on JSON about the status.

About your question of workflows:

  • YES. So here is how plugins work (imagine Blocks, which are also plugins)
  • A plugin is just code that does stuff
  • Each Plugin, to be useful needs to have some settings. So really the settings, provided by the Configuration Entity, are what people see/interact with. Same Plugin can be used by different "config entities".
  • A Workflow would be then a set of Config entities, all connected to each other, each one triggering the logic of one or more Plugins

That said, we don't have right now Multiple Workflows, so i would suggest either we do this. Like there is another Config Entity that groups all Single Plugin Config Entities into a Workflow (Lets say its named "PDF Processing for Books".

Or, we start simple and have no Worklows yet, just a single Worklow. Once we get that running we add the top wrapper to have named group of Configurations that run Plugins/Processors.

Let me know if this makes sense?

@giancarlobi
Copy link
Contributor

@DiegoPino that make really sense.
We can start with small single steps then we add more pieces to our puzzle.

  • a single Plugin == single Workflow
  • a simple Flavour Status Flag: NULL(not present)=Do nothing; 0_FLName=To Process FLName; 1_FLName=OK Done ; 2_FLName= Error
    Do you like this?

@giancarlobi
Copy link
Contributor

Obviously Flavour Status Flag as 0_FLName could be expanded into JSON syntax.

@DiegoPino
Copy link
Member Author

Nice!

@DiegoPino
Copy link
Member Author

Solved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants