-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The Collect, Extract and Integration chain #16
Comments
It's a broad topic, here's some broad thoughts. The original intent of extraction was always to perform serialisation only, and not involve itself with location or interaction with databases. In this case, extracting into a temporary directory and passing this directory on to integration is exactly aligned with this. Integration then is the complete opposite. It doesn't do any generation of data on it's own, but merely "mediates" the data, and aligns it with the overall pipeline. Where Collection represents the "input" of a processing graph, Integration then is the "output". Inbetween, data may "fan out", become divided into smaller tasks, but in the end, it must all pass through integration, i.e. "fan in", if the content is to ever see the light of day.
Canonically, no process should ever know about existing assets or the state of existing assets until it comes to integration. In the case of versioning, which requires knowledge about which is the currently highest version in order to increment it, this would have to happen solely during integration. This means that an integrator is free to not only produce final outputs, but also communicate and gather information (unrelated to validation and extraction) in order to make it's final decision. An integrator is always assumed to be right, so no validation is ever required here, nor serialisation. Which in most cases should converge into plain file-copying and persistence of data within each Instance and/or Context.
Not sure how you mean here, but if you mean that the first instance will create a temporary directory, whereas subsequent instances would be written to an already existing temporary directory, than that's perfectly fine and intended. The temporary directory is much like Git's "staging area" in that it holds an arbitrary amount of information, but does so temporarily until it all is converged, or integrated, with the rest of the data. |
Why would it be up to the This would also limit Validations (eg. for versioning) like this: https://github.com/mkolar/pyblish-kredenc/blob/master/plugins/common/validate_version_number.py I feel it might be nice to have the Either way. I would love to see a simple pseudocode example on what the Collector does, what the Extractor does and what the Integrator does. |
Because it isn't related to the quality of what you are outputting. If a version on disk is faulty, then that is a fault carried over from a previous publish.
Sure, I'll have a look at this. |
I've mocked up an example for you here. |
It would be nice and convenient, but also break encapsulation. Think about it. That data doesn't need validation, it has already been saved to disk. The damage is already done. Furthermore, that data isn't part of what an artist has produced, it's part of what previous Integrators have produced. If anyone should be warned about an invalid version or bad naming convention on already written files, it should be the developer who produced the integrator. |
This isn't correct. The damage wouldn't have been done if the Validator catches it before Extraction. Plus it won't even be in the 'damaging' position if it would have Validated after Extraction. It would only be stored in the temporary location. I think it's not that we're validating whether previous extractions went alright, but whether the version we are integrating now is up to par with our requirements. Though as you state it's definitely not up to the artist to provide where it would go towards, unless there's user-defined data that influences "as what type of data it gets extracted". A good example could be publishing shader variations (which we do a lot in our pipeline). For example we build a red, blue and yellow bottle of wine. Each individual variation (for a single asset) could be Validated whether it's named correctly or already existing, etc. The point being that when a user can interact with data which influences Integration we want it to get validated because it's prone to human error. But I think it's good to see where the ship leads us if we keep it purely implemented in Integration.
Some questions that come to mind:
|
Are we talking about looking at existing files on disk, and validating whether those files are valid, during the publish of a new file? Here's what I'm hearing.
When we're about to publish MyAsset once more, it would then create You would like to (1) include
I guarantee you that there is a better way to solve this exact thing which doesn't involve integration to be validated. I invite you to produce this asset in the
Yes, that's right, multiple extractors write to the same directory. That's what this is doing. The directory is a generic staging area, each extractor could create it's own little subdirectory if needed, but in general, the data each extractor produces should be unique enough to not need to do that. The way I handled this in Napoleon was to create one subdirectory per family, and typically only extracted a single family via single extractor.
It depends on what file we're talking about. Let's take the model from /tmp
└── ben.mb In this case, an integrator with support for In case a playblast and gif is also present.. /tmp
├── ben.mov
├── ben.gif
└── ben.mb The integrator will now need to support gifs and playblasts to properly manage their final locations, and when it does will know what to do with files in whichever format they are expected to reside in, for example, it could make the distinction based on their suffix. So you see there needs to be an interplay between extractors and integrators. There needs to be an "API" or "contract" which they have both agreed to. Any extractor going rouge to produce things an integrator isn't expecting, will simply not get integrated. No harm done. |
So much has changed since this discussion and I'm not even sure how to "relate" this to the current state of Magenta. If this is relevant I think it would be great to see it outlined briefly what exactly we need to fix or add, otherwise close the discussion. |
Goal
Decide upon the way forward for building new family types and how to implement its
Collector
,Extractor
and possiblyIntegrator
.The main goal is to have a simple, consistent and strong solution that can be used throughout Magenta and also allows the plug-ins to be easily used in other packages.
Implementation
Our
Integrator
depends on knowing data about the Extracted content. It needs to know:extractDir
integrateDir
We also want to implement versioning. So that could be additional required data.
The
extractDir
data is defined in theExtractor
(seeplugin.py
) and is a temporary directory. This sets up the dependency that any instance has only 1extractDir
. This would require to inherit fromplugin.py
.The
integrateDir
is computed using the project's schema and data from the instance. That data is:root
,container
andasset
.Currently I've separated how we inject data into the Instance by taking it from the Context. This way injecting this data into the instance does not need to be done from within each
Collector
. Have a look here: BigRoy/pyblish-magenta@b6e4d19Though this will always override it for any instance that has been Collected, which might be more annoying than what we gain from removing this duplicity in code.
Thoughts?
The text was updated successfully, but these errors were encountered: